What is Partition Key In Azure Cosmos DB

The Partition Key in Azure Cosmos DB is an important factor for distributing data efficiently across the database’s physical partitions. The partition key plays an important role in storing and accessing data and the partition key makes the filtering data more efficient and optimized. In this article we will discuss the partition key in Azure Cosmos DB, how it is created, and how it is used in data storage and data retrieval.

What is Azure Cosmos DB?

Azure Cosmos DB is an Azure service provided by Microsoft, which is a globally distributed and multi-model database that provides high-availability and low latency access for modern applications. It is an AI database that has options and features with industry-leading capabilities such as millisecond response times, and guaranteed speed at any scale and provides automatic and instant scalability. Azure Cosmos DB is best suited for IT solutions that handle massive amount of data and it guarantees high availability.

What is the Partition key in Azure Cosmos DB?

In Azure Cosmos DB, the partition key is a property chosen by the user when creating a container in Cosmos DB. The user defines a partition key based on one of the key item’s properties and each item in the container has a value for this partition key. The partition key is used by Cosmos DB to group similar objects together to achieve horizontal scaling of data, which will distribute the data across multiple partitions.

Primary terminologies related to “Partition Key In Azure Cosmos DB”

Below are the key terminologies related to Partition Key in Azure Cosmos DB:

Partition Key

The partition key is a property defined by the user in a cosmos db container associated to each document stored. The partition key is composed of the partition key path and partition key value. As an example “/userid” is the path and a value “johndoe” related to the key ‘userid’ is the partition key value.

Logical Partitions

The data stored within a container are divided into logical groups based on the partition key values. Logical partitions enable to distribute data across multiple servers. All documents with same partition key value are stored in the same logical partitions.

Physical Partitions

The physical partitions are the underlying storage in cosmos db which holds one or more logical partitions for optimal storage and performance. The physical partition is fully managed by the Azure cloud services and not controlled by users.

Throughput

Throughput refers to the amount of resources allocated for handling operations like read, write, delete or query in a database, measured in Request Units per second or RUs. Throughput in azure cosmos db can be provisioned on a per-container basis or shared within multiple containers of a database.

Request Units (RUs)

Request units can be compared as currency for Cosmos DB resources. Every cosmos db operation like read, write, delete and query consume varying amount of RUs based on different factors like data size and query complexity.

Choosing the right Partition Key

Choosing a good partition key is crucial for query performance and cost and leads to even data distribution and avoids overloaded partitions. Analyse the data and the query patterns for the right partition key that will align with the queries. The right partition key then will help to perform efficient lookups across partitions. So, pick a partition key value that appears frequently as a filter.

Examples of good partition keys include

Location (city, state etc)

Customer Id or User Id

Product Id

Team Name or Team Id

Date, etc…

Example

To explain the partition key with an example, let us say we have a database of hotels in many different cities. Here the City Name is the key data by which the hotels are organized and searched. So the Partition Key Value is: CityName and Partition Key Path is: /CityName

Below is the steps for creating the Partition Key in Azure Cosmos DB.

Step 1: Goto Azure Cosmos DB NoSQL API and select the new Container option.

Step 2: The new Container option will display a new popup window to create the container with Partition Key.

Select the Database name (Create a new database if not available)

Step 3: Input Container id, example ‘Hotels’. Input the Partition Key, example ‘CityName’.

Press ‘OK’ to create the new container with the partition Key.

Step 4: Add some meaningful and relevant data with partition key values.

Below is how the partition key with data looks.

Step 5: Run a query to fetch data using the partition key. Below is the result:

Benefits of Partition Key

The Partition Key is the core concept in Azure Cosmos DB which provides numerous benefits including Scalability, Performance Optimization and Query Efficiency.

Scalability

Azure Cosmos DB partitions data horizontally and distributes data across physical partitions based on the partition key value. So by scaling horizontally Cosmos DB database can handle large volumes of data and high throughput.

Performance

The partition key which helps in evenly distributing data across multiple partitions, enables to parallelize read and write operations and prevents any single partition becoming a bottleneck. So the right partition key is crucial for consistent and optimal performance even when data volume and access patterns change.

Query Efficiency

In Cosmos DB, when queries are run with the partition key in their filter, it can execute very efficiently as the query can target a single partition. So partition key helps to retrieve data efficiently and with minimal latency. Queries that do not have partition key filter may perform cross-partition scan, leading to less efficient and high latency output when there is large datasets.

Changing the Partition Key

While creating a partition key all factors should be analysed so that it is the right one and best suited for grouping data and data lookups. Once a partition key is added when a container is created, it can not be changed. Only option to change partition key is to migrate data from one container to another, which can re-distribute data based on the new partition key.

Conclusion

In this article we have discussed about Partition Key in Azure Cosmos DB, it’s purpose, and its key benefits. We also looked into how partition key works and the best practices of creating the right partition key for query performance and optimal data retrieval speed. Also explained the steps to create the partition key with an example.

Partitin Key In Azure Cosmos DB – FAQs

In Azure Cosmos DB, is a partition key always required?

Partition Key is not mandatory for containers below 10 GB of data, but is highly recommended for performance and scalability

How do I choose a Partition Key?

The partition key should be chosen based on the key item in a document to ensure even distribution of data across partitions and for maintaining high performance and scalability.

Can I change the Partition Key?

Partition key is added when a new container is created and the partition key can not be changed. If the partition key need to be changed then a new container has to be created with the desired partition key and all data need to be migrated to this new container.

Is there a limit on the size of a logical partition?

Logical Partition size in Azure Cosmos DB is limited to 20 GB. So, a single partition should not exceed this size to maintain performance.

How does partition key selection affect query performance?

Partition key enhances query performance and so queries with partition key in the filter perform better as it is routed directly to the appropriate partition and improves the overall throughput.