What is Data partitioning?
It is a database procedure of partitioning that involves breaking up a very large table into a number of smaller sections. Queries that access only a tiny portion of the data can run faster since there is fewer data to scan when huge tables are divided into smaller individual tables. When the amount of data is large and a single system cannot handle it, partitioning is used.
Now let us discuss different methods of achieving partitioning that is as follows:
Partitioning Methods
There are 3 types of partitioning that are listed below and later discussed as follows:
- Horizontal Partitioning
- Vertical Partitioning
- Directory based Partitioning
Let us discuss them in detail for better understanding as follows:
1. Horizontal Partitioning: Without the need to make separate tables for each portion, horizontal partitioning divides big tables into smaller, more manageable pieces. A partitioned table’s data is physically kept in row groups known as partitions. It is possible to access and save each partition independently. In horizontal partitioning, each shard has the same schema as the parent database.
Application: Zipcode
Note: It is also known as shading or sometimes referred as range-based partitioning.
2. Vertical Partitioning: Tables with fewer columns are created using the vertical partitioning technique, and the remaining columns are stored in new tables. Data is presented in a vertical format.
The main purpose of vertical table partitioning is to increase SQL Server speed, particularly when a query needs to fetch all columns from a database with a lot of text or BLOB columns.
Application: Large Reports(be it of any domain)
3. Directory-based Partitioning: A search function that is aware of the partitioning structure and decouples it from the database access code. It enables modifying the partitioning scheme or adding new database servers without impacting the application. It results in a horizontally scalable application that is loosely connected. Since key-based partitioning requires the use of a hash function that cannot be often updated.
- Directory-based partitioning is more adaptable than key-based or range-based partitioning.
- Range-based partitioning establishes range values that cannot be changed.
However, since directory-based partitioning is a more dynamic method and is therefore more flexible, we can use any technique to assign data to the shards.
Now geeks you must be wondering what is the criteria behind above discussed methods of partitioning. So let us do discuss them now to get grasp understanding over concept of partitioning that is as follows:
Partition Criteria:
- Key or Hash-based Robin Partitioning: To determine the partition number, we apply the hash function to the entry’s key attribute.
- List Robin Partitioning: The column that corresponds to one of the sets of discrete values is used to choose which partition to use. A set of appropriate values is assigned to the specific partition.
- Round Robin Partitioning: The ith tuple is assigned to partition number i%n if there are n partitions. This implies that (i%n) nodes would receive the ith data. Sequential assignments are made to the data. The distribution of data is guaranteed by this partitioning criterion.
- Consistent hashing: This form of division is novel. The hash-based partitioning had the drawback of requiring a change in the hash function when adding new servers. A server outage and data redistribution would result from changing the hash function.
Advantages of Partitioning:
- Performance Optimization
- Availability
- Load Balancing
- Scalability
- More manageable
Disadvantages of Partitioning:
- The complexity of the software must be maintained, including the logic for routing inquiries and aggregating compute results.
- Additional hardware management: more DevOps work.
- Additional overhead when redundancy.
Complete Reference to Databases in Designing Systems – Learn System Design
Previous Parts of this System Design Tutorial