Modeling Data Mesh Catalog

Hülya Pamukçu Crowell
2 min readJan 30, 2022

--

Our previous article outlined the key elements of a meta-architecture for Data Mesh. We talked about the marketplace as an essential layer in a distributed ownership ecosystem that brings mesh experience. It is a hub for discovering and exchanging secure, compliant data. Core to the marketplace is the catalog of data products, metadata, and actions that users can take based on constraints and policies. This article will present the Data Mesh catalog implementation with Object Relation Mapping (ORM) framework to model its entities, relations, and rules.

To reflect the Data Mesh principles in our catalog, let us first outline a few requirements and considerations. Keep in mind that this is only the bare minimum for building a full-fledged catalog tailored to enterprise needs.

Catalogs and APIs should allow users to create:

  • domains containing data products with output datasets
  • input relations from datasets to data products
  • roles in the domains and assign read/write access to datasets
  • tags and attach them to data products for searching

It should also enforce the following rules:

  • A role should not have write-access to a dataset in a different domain
  • While searching data products, only return those with a quality score higher than a certain threshold.

We will use entgo.io, an open-source entity framework that allows CRUD operations and provides rules and constraints for mutation or query operations.

Modeling entities and relations

  • DataProduct: Belongs to a domain, consumes, and produces datasets, addressable with a unique, fully qualifiable name.
  • Dataset: Produced by a single data product, can be read from and written to by roles attached.
  • Role: Belongs to a domain.
  • Tag: Data products are discoverable with association to tags.

Catalog API

Entgo provides a powerful API to interact with entities. We only need to add a thin mutation and query layer to make the calls idempotent and define the basic operations.

A Sample Mesh

Using these APIs, we create a sample mesh. The following diagram was generated by pulling entities and their relationships and converting them to Graphviz Dot notation (code).

Modeling constraints

We use Entgo’s mutation and query policies to define the above constraints.

  • DenyIfCrossDomainWriteRule is the custom MutationRule that intercepts Role mutations to ensure relation from Role to Dataset with write access is only possible when they are in the same domain.
  • Users should only retrieve trustable data products. FilterSupressedDataProductRule is the custom QueryRule that adds a filter to ensure users retrieve DataProduct entities with a quality score higher than a certain threshold during a search.

Recap

Catalog APIs are a unified, compliant, federated way of managing data products in Data Mesh. We hope you found this article helpful to get started with modeling the entities, relationships, and rules that reflect the core principles of mesh.

--

--