DynamoDB: Data partitioning is the secret sauce

6 min readMay 23, 2023

DynamoDB is known for its exceptional performance and single digit milliseconds latency. DynamoDB achieves it through data partitioning, which makes it easier to find data from small subset.

3 main ingredient of data partitioning in dynamoDB

Few things which makes data partitioning works:

Partition key: An attribute value used to decide data partitioning. This is the main ingredient for speed which dynamoDB promises.

Request router: Every incoming request whether it’s for read/write first needs to be entertained by request router. Request router will pass partition key value to Hash function(undisclosed algorithm for dynamoDB ) to decide partition number to fulfil the request and then forward the request to that partition.

Request router has time complexity of O(1), which does not add much latency in serving the request. In lot of distributed databases request router will be at application side, which means application needs to have metadata available about entire cluster and should know about the cluster topology and data partitioning.

When number of partition grows at high scale, request router keeps different map for frequently accessed partitions for faster operations.

In dynamoDB all this information is hidden from application and there is no way to know how many machines or partitions being used behind the scene.

Hash function: A function which will have knowledge of number of partitions available. It takes partition key as input and generates the partition number to serve the request.

Each partition is responsible to host specific range of the partition keys and should not be confused that every partition will host only single partition key.

Keeping the data in sorted order

DynamoDB supports another kind of key, sort key. In dynamoDB primary key can be defined in 2 ways:

Non composite: Using only partition key, considering partition key can be uniqued across items.

Composite key: Combination of partition key and sort key, making it unique to be used as primary key.

Sort key helps in keeping the data in sorted manner, which makes data access faster in case of data needed for certain range for a given partition.

This is mainly used to support more access patterns for data in single table. If you would like to know more details around sort key, Here is the one good blog by my friend: https://jssajaykumar.medium.com/dynamodb-sort-key-3ee999fc4acc

Limitation on partition size

Single partition in dynamoDB cannot be more than 10 GB and maximum item size supported is 400 KB, which means single partition can hold upto (10 GB/400 KB) 25000 items.

At the start dynamoDB does not create many partitions and will consider keeping multiple item collections in single partition. As data size grows and partition size grows beyond 10 GB, DynamoDB will spilt the partition into 2 halves moving half of data to other partition. All this is a background process which would not impact latency.

DynamoDB having request router at server side and hiding the partitions meta data details from applications helps in achieving partition split seamlessly without any downtime or changes on application side.

If all the partition key values have almost equal items, splitting is going to be easy and half of the partition key values will be moved to new partition. This is happy case scenario but in reality this might not be true and we might have few partition key values being holding lot more data. Consider the scenario where user id is being used as partition key.

In below examples one user id(UID51) is unbalanced with having many items belonging to this partition.

In this situation if partition 3 grows beyond 10 GB DynamoDB will split it into more partitions and can use sort key to break it further in the case partition key is creating data imbalance.

Adaptive capacity: Handling hot partitions for read and write

In case of provisioned cluster capacity DynamoDB divides the RCUs and WCUs across partitions equally. For example, if table is provisioned with 500 RCUs and has 4 partitions each partition will have 100 RCUs assigned.

If DynamoDB keeps getting sustained traffic across partitions, this works really well. In reality some item collections will have more read and write than others and making that partition hot partition.

Let say its partition 3 in our case. Partition 3 will require more RCUs and WCUs than any other partitions otherwise any request above 100 RCUs/sec will start getting throttle exception, impacting the performance for entire application.

To solve this problem DynamoDB introduced something called Adaptive capacity where RCUs gets increased during peak period for partitions

DynamoDB uses Adaptive capacity multiplier to increase RCUs for a partition,

If Adaptive capacity multiplier is 1.5 increased RCUs will be 100 *1.5 = 150 RCUs/sec

Adaptive capacity adjustment is instant in dynamoDB and by default enabled for every table and global secondary index. Adaptive capacity is not unlimited and its restricted by total provisioned capacity for table and application will receive throttle exception after that.

Though I talked about RCUs or read capacity in examples, adaptive capacity is applied to write operation as well.

Isolate frequently accessed items

If your application drives disproportionately high traffic to one or more items, adaptive capacity rebalances your partitions such that frequently accessed items don’t reside on the same partition. This isolation of frequently accessed items reduces the likelihood of request throttling due to your workload exceeding the throughput quota on a single partition. You can also break up an item collection into segments by sort key, as long as the item collection isn’t traffic that is tracked by a monotonic increase or decrease of the sort key.

If your application drives consistently high traffic to a single item, adaptive capacity might rebalance your data so that a partition contains only that single, frequently accessed item. In this case, DynamoDB can deliver throughput up to the partition maximum of 3,000 RCUs and 1,000 WCUs to that single item’s primary key. Adaptive capacity will not split item collections across multiple partitions of the table when there is a local secondary index on the table.

DynamoDB takes care of heavy lifting needed to get consistent performance and has done lot of improvements over the years to get the consistent performance by understanding usage patterns.

Its a great DB to evaluate where data can be distributed evenly across partitions and does not require data to be read in bulk.

We have created a series of dynamoDB blogs and do check them out to understand more concepts on DynamoDB:

dynamodb-all-you-need-to-know-about-it

provisioned-vs-on-demand-capacity

how-different-indexes-are-here

demystifying-dynamodb-partition-keys

sort-key

unlocking-efficient-data-retrieval-in-dynamodb

pagination-in-dynamodb-efficient-data-reading

how-simple-and-complex-is-write-operation

References:

Adaptive capacity: https://aws.amazon.com/blogs/database/how-amazon-dynamodb-adaptive-capacity-accommodates-uneven-data-access-patterns-or-why-what-you-know-about-dynamodb-might-be-outdated/