Continuous Optimization for Data Access Patterns
while keeping proximity to user experience and business
Delivering and maintaining optimum access to data requires upfront investment and ongoing monitoring and iteration. This includes choosing the suitable data model and technologies and making trade-offs as requirements or patterns change. An evaluation framework based on business and user experience metrics and a methodology to iterate over can simplify and solidify this process. This article discusses a phased approach to achieve continuous optimization for different access patterns.
Approach
Our approach to crafting and maintaining optimal data access at scale has three core phases. First, we start with a strong foundation focusing on the user experience and the business goals with clear end-to-end metrics. Next, the data model is designed in the second phase, and technology choices are made. Finally, we monitor and iterate with these preliminary steps to maintain an optimal data access path.
Let’s also set a few guiding principles. To ensure long-term and consistent optimization with high ROI, our procedure should:
- Take into account the overall business value and measurable user and system metrics.
- Enable quick iterations while maintaining optimal delivery.
- Adapt to the changes independent of implementation choices made.
Phase 1: The Foundation
The foundation is the key to the success of the initial outcome and any iteration. It is built based on a deep understanding of user needs and business requirements. The output of this phase is end-to-end metrics that are technology and implementation-agnostic to compare different options.
Identify critical user journeys
User metrics are one of our key drivers. Yet, the optimization efforts will be bounded by other aspects like cost. Therefore, it is essential to identify the journeys that we care about the most. This early prioritization helps scope and focus and is the fundamental information for future trade-offs.
Once we identify a few critical paths, we continue analyzing deeper in preparation for the next phase. Some questions and points that might be considered are as follows:
- What are the characteristics of data and query needs?
- Is data input as items, point-in-time values, or batch?
- Do frequent queries rely on other data? The denormalization can be considered.
- Are there multiple critical queries with different keys? Since secondary indexes might be needed, we must look for technologies that support this natively.
- Is it read-heavy or write-heavy load, or both?
- Do we need to retrieve items in sorted order?
- Can items be colocated for faster queries?
- Do we need to plan for access outside the application, e.g., for analytics use cases?
- Do we need a data access layer (DAL)? — This is an important decision; while DAL can provide higher flexibility to replace the technology later, it also adds latency and maintenance costs and comes with scalability concerns. If our technology choice provides flexible enough API, and we do not forecast replacing the technology, this is a trade-off we can make to skip the DAL. Alternatively, this upfront investment can enable us to replace the part of the system without affecting the clients.
To read about a practical example of why it is important to understand query patterns and optimize upfront to avoid inefficiencies that can be costly, see the article in Shopify’s engineering blog: Reducing BigQuery Costs: How We Fixed A $1 Million Query.
Enumerate business requirements
While user metrics are a driving factor, there usually are business requirements that we need to include in our optimization efforts. This information must also be gathered early to avoid rework or other costly implications. Additional core metrics might be identified during this step. Some questions and points that can be considered are as follows:
- What are the consistency needs? Is eventual consistency sufficient, or is strong consistency required?
- Do we need to support transactions?
- Think about trade-offs of the optimization choices with impact on other business drivers like cost; there might be a few use cases that could be more critical where the full scan is needed, which might be okay if the frequency is low and does not impact users directly.
- Consider complexity and TCO(total cost of ownership). For example, building systems from scratch to meet the requirements might look feasible for higher flexibility. However, this could be a costlier choice in the long term than a managed/cloud solution.
- Consider compliance requirements on what can be in the model and what can be stored; for example, is PII allowed?
Phase 2: The Data Model and Technology
The foundation built above provides the understanding of the requirements to define the data model and decide on the technologies. As this phase proceeds, we may need to refine the analysis in the previous step. While we can not reach a comprehensive list right away, the iterations in the remaining phases will help us continue building on top of what we started with. By the end of this phase, we will have a data model and technology that we can use to collect metrics and evaluate our assumptions.
Define the data model
We can define the data model entity and relations based on the critical user journeys we identified in the previous phase. A few areas to pay attention to are as follows:
- Start with the entities necessary for the critical paths: Identify key attributes for each entity required at creation time; the rest are part of the “property bag” or imply relation to other entities.
- Focus on any entity relation that needs to be known at creation time, e.g., one-to-one or one-to-many relations. It is essential to get these right in the early iterations, as it will be costly to introduce them later on the existing items as constraints. On the other hand, relations like many-to-many can be easily added at any point.
- Think about the queries identified in the previous phase. Are all queries answerable through these relations? For example, is there a connection between entities to support a master-detail pattern that the UI might need?
- When defining the data model, consider both data producers and consumers. If we only look at the write path, we might end up with a slow model for queries or vice-versa.
Choose the technology
Our deep dive into access patterns and business understanding will help us choose the technology. The choice here is based on what we know and might not be the final one. A few specific considerations on making a selection or revisiting the previous choices are as follows:
- It should be a purpose-built store best suited for the use case (KeyValue, Timeseries, Graph, Relational, Wide Column, Ledger, etc.).
- Prefer technologies that allow easy schema evolution and add support for new access patterns.
- It should provide comprehensive metrics for monitoring and as well as troubleshooting.
- It should meet other business requirements like compliance and security.
- Take advantage of partitioning and sorting features.
- Consider caching strategies and carefully implement caching, as we need to think about the freshness of data. For example, if the access does not follow a consistent pattern and we have frequent cache misses, this might hurt the end-to-end metrics with extra hop rather than improving.
- Opt for simpler architecture and fewer components for easy management and faster evolution.
Phase 3: Monitor, Tune, and Iterate
In this phase, we evaluate the alternatives, tune them as needed, and continuously monitor and iterate. The value of the foundational step becomes more apparent in this phase. We have the technology-agnostic core metrics tied to the user experience and the business goals we can rely on. This enables us to assess confidently while leveraging the technology-provided specific metrics to refine and make subsequent decisions.
In the validation phase:
- Generate authentic access and data, and simulate the actual use cases as closely as possible while isolating your dev, staging, and prod environments.
- Test end-to-end with each integration point in mind, creating sustained and spikey loads.
- Monitor not only technology metrics but also experience and business metrics during pre-prod and prod.
- Add safeguards in your development cycle to detect early and prevent regressions.
- Use percentiles for insights while noting outliers for future problem areas, i.e., detect and mitigate Gray failures.
- For issues, look for hot partitions and keys and throttled requests.
- Identify what was missed in the first two phases, apply improvements, and iterate.
Recap
This article provided a methodology to achieve ongoing optimum outcomes for data access patterns. We hope this helps with your data access optimization journey and refining your strategy.