Data Mesh Catalog with React, Relay, and GraphQL

Hülya Pamukçu Crowell
4 min readJul 15, 2022

--

Our previous articles provided a high-level architecture for Data Mesh and an approach to model the catalog with ORM. We created a catalog backend as the central mechanism for the tools and personas of mesh with core entities, their relations, and policies. This article will focus on the Data Catalog user experience for data products and domain owners, which is critical to making informed decisions. We will also discuss the rationale behind technology choices in building the interface.

In the light of the Data Mesh principles, for data product and domain owners, the data catalog should provide the following core functionality:

  • Can define/create new data products.
  • Provide high-level stats for the domains and the health of data products.
  • Present an inventory of data products to discover and collaborate with other owners.
  • Can search for similar or related products( e.g., via tags, datasets)
  • Can search for unhealthy data products.

Technology choices

For the catalog backend, earlier, we chose entgo.io/Ent and represented the core entities and policies. Considering the need for flexibility in searching for entities related to other entities like tags or datasets and filtering based on criteria, the natural choice for our purposes is GraphQL. Additionally, if we need to plug in different sources to surface in the catalog, GraphQL supports that. Luckily, Ent supports generating GraphQL server code from the entity schema we created earlier. You can get a fully functioning server with a minor effort by authoring resolvers. Schema changes and adding new entities are straightforward.

On the client side, we are using Relay to execute and optimize the fetch requests to the GraphQL server. Relay handles the complexities of caching and decouples the data needs of a component and its retrieval. Relay also provides APIs for pagination on cursor connections, which we will leverage for the inventory component of the catalog. We choose React for the UI components, specifically a design system: Northstar, for a consistent experience.

Implementation

  • Catalog GraphQL Server: Before running the codegen, we will add annotations to the fields to enable relay connections, query, and mutation types. We will also add annotations for the order-by fields.

Below is the modified schema for DataProduct (earlier version).

Entgql extension generates the server types and methods. To later pass filter arguments to the queries, e.g., get only healthy data products, we enable the filter inputs by customizing the codegen config. Finally, we can author the resolver functions with the created types to respond to client requests. One example of a resolver is below. Note the cursor, filter, and order arguments for pagination.

We can now run our server, populate with test data and execute queries using GraphiQL.

Test query with pagination, filter, and order
  • Catalog UI: The catalog provides a high-level overview and product inventory with search functionality. Owners can also create new data products from the UI.

Note that some of the mutations like setting quality scores should not be exposed in the catalog UI and should be allowed only for specific tool accounts where these checks are being run and audited.

We are using Relay as the client framework. It provides pagination and query functionality for GraphQL by handling the complexities of fetch and caching logic. It maintains the entities in the store and only fetches when needed. You can use fragments to define what data your component needs, and Relay decides what aggregated queries should run. With the render-as-you-fetch pattern, we render the components as data becomes available.

Below are the summary components rendered with the data from the same query on domain stats.

Data Mesh Catalog Summary Components

Let's try a complex query by enabling filters for Name and Tag fields, ordering for the QualityScore field, and page setting with page size in the inventory component. The below image annotates the parts of the query and its corresponding control.

Data Products Inventory Table Component

Below is Relay's request for the first page sent to the server.

Note the after cursor for the second page's request.

To make this work, we define a fragment. Then, the Relay compiler generates GraphQL documents and runtime artifacts to talk to the server.

Note that we annotate the fragment with connection to indicate that we want to perform pagination over dataproducts which adheres to the connection spec. With the help of Relay and GraphQL, we are not doing any client-side filtering, ordering, or complex pagination.

The component can now use the types generated along with the usePaginationFragment hook to subscribe to data updates and re-render. It can then update its state and trigger a next page load (with loadNext) or refetch with different input.

Lastly, the form component to create new data products leverages the useMutation hook.

Recap

This article discussed building the catalog experience for data product and domain owners using React, Relay, and GraphQL. We hope you find this article helpful as we keep building the Data Mesh experience.

--

--