Database Decision Flow For Data Access Patterns

Hülya Pamukçu Crowell
4 min readOct 11, 2023

Our previous article discussed a high-level approach for continuously optimizing systems based on data access patterns. We emphasized the importance of selecting a suitable data model and technology that responds well to the needs of both data producers and consumers. In this article, we will take a closer look at how to choose the right technology based on the data shape being produced and the query patterns of the consumers. We will provide a decision-making framework to help guide these choices.

Approach

While choosing the right technology and storage system depends on various factors, we are scoping the decision flow to data shape being written for write access patterns and query characteristics for read access patterns. The requirements, such as transactional needs, consistency, and cost, are not considered for this iteration.

As we look into different parts of the system writing and using the data, the best way is to start thinking about them separately and converge on a solution that will satisfy overall requirements with some trade-offs. If we initially focus on only read patterns and optimize for consumers, we might have degraded application performance for the writes, impacting end-user experience. If we only optimize the producer end and do not have the proper analytics representation, we might end up with slow, costly queries.

By following this approach, if we derive multiple database choices that satisfy the needs of both producers and consumers, we can then assess whether we need to make any compromises based on other decision factors and metrics. This could involve fully optimizing a subset of the paths, placing adapters for producers, and creating pipelines to transfer data for consumption. For instance, if the primary application uses a relational model, we may use an RDBMS for analytics.

Terminology

We are listing the database-type definitions used in the context of this document for clarity.

  • Relational: Data is stored in tables of rows and columns, and schema is enforced at write.
  • Key value: The key is a unique identifier for the data, and the value can be as simple as a scalar value or complex data structure.
  • Wide column: Allows data stored in related columns in a column family.
  • Columnar: Data stored in columns rather than rows to efficiently retrieve large amounts of rows over a subset of columns.
  • Document: Store unstructured data in JSON/XML format.

Database Decision Flow for Producer Access Patterns

The following diagram shows the decision flow based on the written data shape. Note that the current focus here is the needs of the data generator and what is best suited for the data model and user flow of the corresponding application without considering the downstream consumers, which might be the different parts of the same application. For example, a social app creates posts as single records associated with a user but later reads as a fanned entity while loading the news feed page of the same app.

Another interesting point is how data “accumulates” for an entity for these choices. In the relation model, a record has all the required fields at creation. It can be updated potentially infrequently and fill in the optional information—for example, employee information in a company database. In time series, we record the information for a specific time and do not change that again while adding more entries as new information arrives, for example, sensor data. When it comes to the wide column or multi-dimensional key-value stores, these accumulate data from different writes via other sources or times. For example, user static preferences are written by an application, while derived preferences are written after async processing by a separate service. They are “collected” in the same row by the user ID row key at different columns grouped by other column families to be retrieved together.

Database Decision Flow for Consumer Access Patterns

The following diagram shows the decision flow based on query patterns. Requirements and characteristics of the consumers are also considered in arriving at a decision. Similarly, the recommended option is to use a purpose-built store specifically designed for the intended use case unless other factors make it unsuitable. Not choosing a purpose-built store may require additional considerations to optimize its performance. For instance, if a wide column store is chosen instead of a time series database due to cost, a data model design where keys are time-prefixed, and columns represent different measures can help retrieval times. Also, how to store older data more efficiently and provide different time granularity, which are features available in a time series database, need to be added.

Recap

In this article, we dived deeper into an aspect of optimization for data access patterns and presented decision flows in choosing database technologies based on a subset of characteristics. We hope you found this article helpful in defining your strategy for optimum data access patterns and guiding the decision process.

Photo by author @qulia

--

--