How to choose the right NoSQL database service in Azure — Azure CosmosDB or Azure Table Storage?

In this article, I am going to guide you through an example use-case on how you would go about choosing the right Azure cloud service to host a NoSQL database for a fictitious fashion retailer, FashionRuS.com

We will consider two options:

  1. Azure Table Storage
  2. Azure CosmosDB

To make this comparison more meaningful, let me start with a completely fictitious use case. In no way this use-case should resemble reality. At least that is my hope.

My fictitious customer is FashionRus.com

Fashionrus before hiring us, hired ExpensiveX consultancy to create an MVP fashion eCommerce website. ExpensiveX advised with some slick powerpoints said that instead of just getting a database, they can just host all the products from Fashionrus in version control using Git, in JSON format. After all, there are only a few megabytes of data, or so they thought.

Fashionrus with the help of ExpensiveX consultancy implemented the file-based solution to host their products API. The ProductsService API delivers product data to their Ecomerce site. The source of product data is from SAP, and the data is delivered as JSON files.

This file-based solution worked well until FashionRus delivered the whole dataset of products(1000) and since they sell in 30 countries, the actual number of products delivered is 30,000. It soon became obvious that there are too many products to serve from memory.

In the meantime, Fashionrus almost ran out of budget in the project and they no longer can afford for ExpensiveX to fix the problem, so they hired us.

And here we are. We have identified that it is not feasible to use the file system to hold so much data, and we explain to the client that they should go with a NoSQL database. 

The reason for moving to a database would deliver the following benefits:

  • near-real-time synchronization of data across all replicas of the Product Service
  • Reduce memory consumption
  • Improve performance

The ProductService already accesses all the data in the JSON format, therefore we should pick a NoSQL database.

Because FashionRus uses Microsoft Azure to host their E-commerce website, we have shortlisted the following options:

  1. Azure Table Storage
  2. Azure Cosmos DB

Let’s explore these options.

Azure Table Storage

Azure Table Storage is a NoSQL database provided by Azure that can store petabytes of structured or semi-structured data. 

Using a single Azure storage account, it’s possible to create as many tables and as many entities as needed. In terms of throughput, Azure Table Storage is able to handle up to 20,000 operations/second per account, or 2,000 operations/second per table. This is more than enough for most use-cases. 

Depending on the Azure Table Storage configuration, data can be replicated within a region, with the option of a secondary read-only region. 

For instance, if we choose to use Azure Table Storage with a zone redundant setup, we can make the data available within all the availability zones in a single region:

It is important to know that Azure Table Storage doesn’t provide a managed backup functionality. Yes, it allows you to replicate the data across data centers with regional storage, making it less likely to have data loss in a disaster scenario. But that doesn’t protect you against an application that corrupts your data or worse a malicious deletion of all the data due to some kind of intrusion. 

Azure Table and Entity Design

When using Azure Table Storage, special consideration should be given in designing the tables and entity structure, with a particular focus on the PartitionKey and the RowKey

It is important to think ahead, on what will be the most likely data access queries that need to be supported, what data should be lumped together, and based on that, design the table and entity structure.

An Example Use Case for Azure Table Storage

Going back to our use case, our Fashion retail company 
A Fashion retail company needs to store product data in Azure table storage for several countries. This data will be accessed by an API that delivers product information by product id and by country name.

A simple design is to create a Table Storage Account with the following structure:

And the Products Entity should look similar to this:

The advantage of this design is that we can keep all product data related to a country clustered together, and this makes it more efficient for doing product lookups within the same country, without having to search on the whole data set.
Also, this design is optimized for searching on a product id as this is the row key and is indexed.

Unfortunately, here lies the biggest limitation of Azure Table Storage. If we need to search on any property that is not a partition key or a row key, then the lookup will be very inefficient because Azure Table Storage needs to perform a full table scan on those properties. 
This is why it is really important to know upfront how we are going to access the data before we decide on a data structure.

Azure Cosmos DB

Azure Cosmos DB is a globally distributed, NoSQL, multi-model database service. A Cosmos database can be globally distributed from 1 to more than 30 regions.

Azure Cosmos DB — Its schema-free

Unlike Table Storage, there is no need to create a schema and no need to decide on which fields to index. In Cosmos DB every single field in a database is automatically indexed. Nice!

Azure Cosmos DB — Support for Multiple APIs

Another great feature of Azure Cosmos DB is that it supports multiple flavors of database models and multiple APIs. 

APIs available to use with Cosmos DB

As the Core option for CosmosDB, data is stored as a JSON Document but can be queried with SQL-like syntax. 

If you prefer MongoDB then there is the option of storing the data as Documents but instead, query it using MongoDB API. This also means that you can connect to CosmosDB using a MongoDB client. This is great for situations where you might want to allow developers to develop locally without really letting them connect to Azure.

CosmosDB also supports storing data using Columns. A column-family database organizes data into rows and groups of columns, known as column-families. These are very similar to relational databases but the big power is the ability to store sparse data in a denormalized way.

A column-family database organizes data into rows and columns. In its simplest form, a column-family database can appear very similar to a relational database, at least conceptually. The real power of a column-family database lies in its denormalized approach to structuring sparse data.

CosmosDB also supports Cassandra API, so you can store all the data in CosmosDB but still be able to use all the Cassandra open source tooling. 

Finally, CosmosDB also supports storing data as key-value pairs, while still supporting the Azure Table Storage API.

It is important to know, that you can only pick the database model and API at the creation of the CosmosDB database, and it can’t be changed after creation. Therefore pick your model carefully.

Why CosmosDB for Azure Table Storage?

And why would you want to use ComosDB to store key-value pairs, rather than just using Azure Table Storage, which is much cheaper?

There are several reasons for this. You might have an existing application running on Azure Table Storage, but you have found that the latency is not within the limits required. CosmosDB gives you guarantees in latency(<15ms), while with Azure Storage there are no upper bounds. Yes, Azure Table Storage is already pretty fast!

Or, you might want to globally distribute your database across multiple Azure regions, something that you can easily do with CosmosDB. 

Also, you may have new requirements to implement more complex search queries on multiple fields in your existing data. Azure Table Storage only allows doing fast queries on the partition key and the row key. Any extra fields require full table scans. With CosmosDB all the fields are indexed by default, and there is no need to define any field, other than the partition key.

Comparison between Azure Table Storage and Cosmos DB Pricing

It is a wonderful thing to have a globally distributed database, over the whole world and sometime in the future, the moon, latency less than 15ms and as much throughput as can be imagined, and no need to worry about which fields to index. Also, the fact we can use ComosDB with five different database APIs options is fantastic.
It is all wonderful until you have to pay for it, and with CosmosDB one of the big concerns, everyone has are costs. Are the concerns warranted?

Let’s do a quick comparison of costs between Azure Table Storage and Azure Cosmos DB. 

It is important to understand that both database services are serverless, so what we are going to pay for is resources utilization like storage, number of operations, data transfer, etc. 

Basic Price comparison between Azure Table Storage and Azure CosmosDB

The way costs are calculated for Azure Table Storage is different from Azure CosmosDB. 
In Azure Table Storage we pay for storage and number of operations for each type e.g. (Write Operations, Read Operations, Scan Operations, etc).

On the other hand, Azure CosmosDB also charges us for storage, but at a higher rate, as we are paying for premium SSD storage. But the way to calculate charges for usage is completely different. 

Instead of paying for the number of operations used, i.e. Pay as You Go, in CosmosDB we are charged for throughput allocated. If we use cars as an analogy, with CosmosDB you are charged for the size of your engine(e.g. 1950cc ), regardless of whether you use your car or not. There is no obligation of using the full capability of the engine, but if you try to go any more than that, you can’t, without paying for a bigger engine.

How does Azure CosmosDB size throughput?

This begs the question, how does then CosmosDB measure throughput? 

This is done by using the concept of Request Units. Requests units are an abstract measurement of resources consumed when doing a database query.
For a small query, the number of request units will be small, for a complex query, the number of Request Units will be higher. The type of operation will also affect the amount of RUs used. And the more results returned in a query, the more request units are used.
For all intents and purposes, request units are like a cryptocurrency that you spend when doing a query. 
Returning to the concept of throughput, in Azure CosmosDB throughput is allocated as a capacity measured in RUs per second. As a minimum, a CosmosDB database has to be provisioned with a throughput of 400 RUs per second. 

Using the CosmosDB capacity calculator we can estimate the minimum cost for an Azure CosmosDB database with a single region, with the least capacity possible.

https://cosmos.azure.com/capacitycalculator/

We get the following estimate:

The costs escalate quickly, once we require more database performance and global distribution.


Posted

in

,

by