DynamoDB introduction

DynamoDB: All you need to know about it

Deepti Mittal
8 min readMay 13, 2023

DynamoDB is a key-value database offering of AWS. It’s fully hosted and managed by AWS , which means as a user we need to take care of only 3 things while using dynamoDB.

  • Data modelling/schema creation
  • Writing the data
  • Reading the data

DynamoDB offers low, predictable latencies at any scale. Customers can typically achieve average size data retrieval in the single-digit milliseconds. It stores data on Solid State Drives (SSDs) and replicates it synchronously across multiple AWS Availability Zones in an AWS Region to provide high availability and data durability. It’s so easy to set dynamoDB that u can be up ad running in few mins.

In this blog I intend to cover the basic things one should be aware of while considering dynamoDB.

Support for large scale data

DynamoDB guarantees predictable latencies even when data grows from 1 GB to 10 TB. Main reason behind this is Dynamo DB stores data in partitions which helps it to work with small portion of database instead of entire dataset to support any CRUD operation.

When creating table in dynamoDB its important to define 2 keys:

  • Partition key: This helps in deciding in which partition, current item needs to be written and read from. Its important to define partition key with low cardinality to achieve better and consistency latency. Below diagram helps in understanding how partitioning works in dynamoDB.
Data partitioning in dynanoDB
DynamoDB data partitioning
  • Sort key: This is optional if partition key can be used as primary key for table. As the name suggests it keeps the data in sort order for a given partition key and will help in faster access. It’s recommended to define sort key if there are access patterns where only a portion of data needs to be read from partition based on ranges.

To achieve good performance with dynamo DB it’s important to know access patterns and decide partition key and sort key to support those.

Indexes

Concept of indexes is very different in dynamoDB than any other DB I have explored till now. Indexes in dynamoDB makes copy of its own data from main table to support different access pattern.

Dynamo DB has 2 kind of indexes:

  • Local secondary index: An index which needs to be created during table creation and cannot be added, modified or deleted later. Its called local as it uses same partition key as main table and can have different sort key.
  • Global secondary index(GSI): GSI can be defined during table creation and can be added, modified or deleted at any point of time. Like LSI it also maintains its own copy of data . It can have different partition key than main table and hence called global.

More details about indexes can be found in my another blog at : https://medium.com/@deeptimittalblogger/dynamodb-how-different-indexes-are-here-9b394c1ba4f8

Working with data

DynamoDB supports APIs and PartiQL, a SQL-compatible query language to work with data.

Write item: Support 2 kinds of API, Single item and batch . There are 3 ways to use the APIs

  • Rest API: PutItem API for write single item and BatchWriteItem for writing multiple items at the same time. I
  • AWS SDK: DynamoDB has SDK support for almost all the programming language.
  • AWS CLI: AWS CLI commands like aws dynamodb put-item to insert data from CLI.

Latency is same using either of the approaches and takes JSON as request body.

Here is the sample body for put item which will accept array in case of batch.

DynamoDB in recent years also came up with support of transaction while writing data.

TransactWriteItems is a synchronous write operation that groups up to 100 action requests. It can work across tables but not across accounts and regions. One caution of advise because dynamoDB advocates single table design so its better to work

Read item:

  • ExecuteStatement retrieves a single or multiple items from a table. Uses PartiQL for it.
  • BatchExecuteStatement retrieves multiple items from different tables in a single operation. Uses PartiQL for it.
  • GetItem retrieves a single item from a table. This is the most efficient way to read a single item because it provides direct access to the physical location of the item.
  • DynamoDB also provides the BatchGetItem operation, allowing you to perform up to 100 GetItem calls in a single operation.
  • Query Retrieves all of the items that have a specific partition key. Within those items, you can apply a condition to the sort key and retrieve only a subset of the data. Query provides quick, efficient access to the partitions where the data is stored.
  • Scan Retrieves all of the items in the specified table. This operation should not be used with large tables because it can consume large amounts of system resources.

All of these can be used through Rest API, AWS SDK and AWS CLI.

Provisioned and on demand tables

There are 2 ways to set up dynamoDB tables.

Provisioned: For defining a table as provisioned , one must specify its provisioned throughput capacity. This is the amount of read and write activity that the table can support. DynamoDB uses this information to reserve sufficient system resources to meet your throughput requirements. Setting table as provisioned is useful if application load is already known and it’s almost consistent throughout, except short peak periods, which can be handled through auto scaling. One thing to consider even if you are not using provisioned capacity you will be charged for it.

On Demand: If someone is just starting with dynamoDB or does not have much idea about load, on demand is preferred way to set up tables as AWS takes care of scaling the table as per load and one can end up paying only for WCU and RCU being consumed.

Complexity with cost

DynamoDB’s cost calculation is also very different from other services offered by AWS. There are 3 factors which influences cost:

Storage Cost: Cost associate for storage of data, its a similar concept like other DBs but changes based on the region. For more details check the AWS pricing calculator.

Write capacity unit(WCU): A write capacity unit represents one write per second, for an item up to 1 KB in size. For example, suppose that you create a table with 10 write capacity units. This allows you to perform 10 writes per second, for items up to 1 KB in size per second.

Item sizes for writes are rounded up to the next 1 KB multiple.

If tables have indexes defined it consumes additional WCUs per index. Hence having more indexes will have direct impact on cost.

Read capacity unit(RCU): A read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second, for an item up to 4 KB in size.

For example, suppose that you create a table with 10 provisioned read capacity units. This allows you to perform 10 strongly consistent reads per second, or 20 eventually consistent reads per second, for items up to 4 KB.

Another example, an item of 10 KB will consume 3 RCUs for strong consistency and 2 RCUs for eventually consistent read.

While reading data AWS charges for average RCUs being consumed in a second.

DynamoDB exposes metrics around avg of WCU and RCU being consumed over a period of time which can help to get ideas around load and help deciding capacity for provisioned cluster.

Limitations

DynamoDB being a very advanced database has certain limitations, which help it to provide consistent performance even if load increases by 100 times.

  • Single Item size cannot be beyond 400 KB.
  • Scan and query cannot return 1 MB of data per page. So if data size to be retrieved is 5 MB. It will fetch the data in 5 pages and every page request is sequential. This will impact the overall performance of the application.
  • Single partition size for main table and LSI cannot be more than 10 GB, if it increases beyond that DynamoDB will try to create another partition using sort key to further partition the data.
  • One table can have upto 5 LSIs and it’s a hard limit.
  • One table can have upto 20 GSIs defined, its a soft limit which can be increased by asking the AWS support.
  • TransactWriteItems and TransactReadItems is limited to 25 items per request.
  • BatchWriteItem is limited to 25 items per request
  • BatchGetItem is limited to 100 items per request
  • Single partition has limit of 3000 RCUs and 1000 WCUs. In case of hot partition, partition will use adaptive capacity defined for some period of time before raising throttling exception.

Usually these limits are enforced to come up with better data models suited to utilise the capabilities provided by dynamoDB.

If you want more details about limitations, this blog is very good: https://dynobase.dev/dynamodb-limits/

When to consider dynamoDB

Learning about dynamoDB in details and its limitations my opinion is to use dynamoDB in following scenario:

  • When data load will keep increasing to take advantage of consistent performance.
  • When data access patterns are already known, to come up with correct data models for main table, LSIs and GSIs.
  • Data can we divided into small partitions using low cardinality keys.
  • Have small data to be retrieved with every read request.
  • Data can be designed into single table as dynamo DB does not offer join capabilities across tables.

Some of the real industry where dynamoDB is popular:

  • Gaming Industry: Require data to be accessed in sorted manner and huge amount of data to be processed.
  • Entertainment Industry: To store huge amount of data for its subscribers, tracking their liking, clicks to perform analytics later.
  • Transportation industry: Huge data gets generated to track drivers and costumers in real time. Using DynamoDb apps are able to provide almost realtime update on location change .
  • Retail industry: Industry where even few seconds of downtime or slowness may result in millions dollar revenue loss uses dynamoDB to store customer information, order details, product catalogue and so many more.

I want to conclude the blog with one last thing, we all hear that dynamoDB is very costly to own data and maintain data, but considering the value and speed add it brings and time and cost it saves for not needing a team/people to run and maintain dynamoDB definitely justifies the cost of this offering.

This blog has just introduced some key concepts on dynamoDB and I will be covering every concept in details in my following blogs. If you would like me to cover any concept of dynamoDB in details, do post in comments and I will share the knowledge which I have.

Also let us know in comment section if you have used dynamoDB in any use case or considering to use it.

Check out our other blogs in this series to understand concepts in details

data-partitioning-is-the-secret-sauce

provisioned-vs-on-demand-capacity

how-different-indexes-are-here

demystifying-dynamodb-partition-keys

sort-key

unlocking-efficient-data-retrieval-in-dynamodb

pagination-in-dynamodb-efficient-data-reading

how-simple-and-complex-is-write-operation

--

--

Deepti Mittal

I am working as software engineer in Bangalore for over a decade. Love to solve core technical problem and then blog about it.