Parallel DAG Runner with Scala Futures and Promises

Hülya Pamukçu Crowell
3 min readFeb 28, 2021

This article explains a DAG (Directed Acyclic Graph) runner implementation using Scala Futures and Promises. The runner can execute actions in parallel and is not limited by dependency depth.

DAG has many applications, from data pipelines to CD pipelines. DAGs are how dependencies can be represented, ensuring no cyclic references. These are common requirements at every software stack, and in fact, there are countless frameworks and implementation of DAG runners from a single server to distributed executions. For example, a build system needs to look at the project dependencies and define an order, i.e., topological sorting. The build system would also need to parallelize as many build tasks as possible to shorten the overall build time.

The following is a sample DAG with parallel executions of tasks ordered left to right based on the dependencies. Lower order tasks will complete before dependents ones can start. We expect the actions on this diagram to run in groups of (A1, A2, A3), (B1, B2, B3), (C1), (D1), (E1)

Cycle Detection

In this implementation, cycle detection happens while building the DAG. Therefore, while running the DAG, it can be assumed that there are no cycles. At each addition of edge [u, v], we check if there is a path from [v, u] using BFS (Bread First Search). We are using BFS instead of DFS (Depth First Search) not to hit recursion limits. We also added an optimization using UnionFind data structures. UnionFind keeps track of connected components, and BFS is not needed if u and v do not belong to the same connected component as there cannot be a cycle by adding [u, v]. For example, if the last edge we are adding is [E1, C1], we need to check if there is a path C1E1 since E1 and C1 are in the same connected component. On the other hand, if the last edge we are adding is [E1, A2], we can skip the check.

Topological Sorting

Topological sorting of DAG produces a sequence of tasks based on dependencies defined in directed edges of DAG. We will use this as the basis of visiting nodes in DAG and scheduling tasks. Also, we want parallel execution where we can. For example, after A1 completes, B1, B2, and B3 can be scheduled and run in parallel.

Therefore, we used Scala futures to make these calls async and offload traversal to a separate thread to avoid hitting the recursion limits.

To synchronize the calls and avoid duplicate scheduling of the actions, we use a concurrent map and assign a Promise object to each Node using a synchronized method on the map.

Visualization

To visualize the execution, each action reports its [start, end] times as the future value; at the end, we collect the results and print them. Visualizer trait keeps track of each slot with [start, end] times. ConsoleVisualizer implementation sorts by start time in the sample execution and marks execution segments with “*” and prints.

Sample Run

  • Creating DAG for the above diagram
  • Defining actions: Injects random sleep to simulate actual use cases. “time” block to track [start, end]. Completes Promise with the result.
  • Visualizing results

The execution output matches the expected grouping we showed earlier:

Recap

In this implementation, we illustrated:

  • Representing dependencies with generalized DAG to support any type as key and any custom action.
  • Running actions in parallel
  • Cycle detection while creating the DAG
  • An “unlimited” dependency depth
  • A mechanism to visualize the execution graph

--

--