The data structure

5.3. The data structure

We now present a data structure that is closely related to equality saturation but supports arbitrary graph rewriting. It is modelled on the graph unfolding construction as presented in Baldan, 2008Paolo Baldan, Andrea Corradini and Barbara König. 2008. Unfolding Graph Transformation Systems: Theory and Applications to Verification.

Rather than maintaining equivalence relations between terms, as done in term graphs, we maintain equivalence relations between graph vertices. Our data structure stores the set of all applied rewrites – the main subject of this section is to show how all operations of interest on this data structure can be implemented efficiently.

The persistent graph rewriting data structure is given by a set D\mathcal{D} of events δ=(GR,V,μ)D\delta = (G_R, V^-, \mu) \in \mathcal{D}, with

  • vertex deletion set VV(Dδ)V^- \subseteq V(\mathcal{D} \smallsetminus \delta) and
  • glueing relation μ:VV(GR)\mu: V^- \rightharpoonup V(G_R).

We have extended the V()V(\cdot) notation to D\mathcal{D} by defining it as the union of all vertex sets of replacement graphs in D\mathcal{D}. We will similarly use V(δ)V(\delta) to denote the set of vertices in the replacement graph of a rewrite δ\delta.

Events resemble rewrites as defined in Definition . but differ in that they do not apply to a single graph GG, i.e. there is no graph such that VV(G)V^- \subseteq V(G). Instead,

V  δDV(δ)=V(D).V^-\ \subseteq\ \bigsqcup_{\delta \in \mathcal{D}} V(\delta) = V(\mathcal{D}).

We will see below how a graph GG can be constructed such that an event δD\delta \in \mathcal{D} is indeed a valid rewrite on GG.

Using the disjointness of the union in (1), for all vV(D)v \in V(\mathcal{D}), there is a unique δD\delta \in \mathcal{D} such that vV(δ)v \in V(\delta) that we call the owner of vv. The parents (or directed causes) P(δ)P(\delta) of an event δ\delta are then the owners of the vertices in the deletion set VV^- of δ\delta:

P(δ)={δpDVV(δp)}.P(\delta) = \left\{ \delta_p \in \mathcal{D} \mid V^- \cap V(\delta_p) \neq \varnothing \right\}.

Inversely, we define the children of δ\delta as the set of event whose parents include δ\delta

P1(δ)={δcDδP(δc)}.P^{-1}(\delta) = \left\{\delta_c \in \mathcal{D} \mid \delta \in P(\delta_c)\right\}.

The following figure shows an example of a data structure D\mathcal{D} on undirected graphs.

Events on an undirected graph with their history. Coloured directed edges represent the parent-child relationship. The area that they rewrite in the parent event is represented by dashed regions of the same colour. The map between graphs is given by the vertex IDs.

Events on an undirected graph with their history. Coloured directed edges represent the parent-child relationship. The area that they rewrite in the parent event is represented by dashed regions of the same colour. The map between graphs is given by the vertex IDs.

Merges, confluent persistence and event creation #

A rewrite r=(GR,V,μ)r = (G_R, V^-, \mu) that applies to a replacement graph of an event δp\delta_p, i.e. VV(δp)V^- \subseteq V(\delta_p) immediately defines a valid event δr:=r\delta_r := r. In that case, δr\delta_r has a unique parent

P(δr)={δp}.P(\delta_r) = \{ \,\delta_p\, \}.

Creating an event δr\delta_r from a rewrite rr is the simplest type of data mutation that can be recorded in D\mathcal{D}. For D\mathcal{D} to be a confluently persistent data structure, it must also be allowed to merge mulitple data mutations together. Rather than handling merges of versions of the data structure explicitly, an event δD\delta \in \mathcal{D} can define graph mutation operations that apply on collections of events PDP \subseteq \mathcal{D} – the resulting mutation is equivalent to explicitly creating a merged version of the versions in PP, followed by the desired rewrite. In this case, the parents of δ\delta are precisely the set P=P(δ)P = P(\delta).

In other words, the parent-child relationships of D\mathcal{D} is precisely the event history of D\mathcal{D}: a directed graph with vertex set D\mathcal{D} and edges δ1δ2\delta_1 \to \delta_2 if δ1P(δ2)\delta_1 \in P(\delta_2). For D\mathcal{D} to define a valid confluently persistent data structure, we need to

  1. Ensure that the event history is acyclic, and
  2. Define conditions that guarantee that events correspond to valid data mutations.

We hit both birds with one stone by restricting how D\mathcal{D} can be constructed and modified in such a way that acyclicity is guaranteed. Specifically, we introduce two procedures:

  • CreateEmpty constructs an empty D=\mathcal{D} = \varnothing, and
  • AddEvent, adds an event δ\delta to D\mathcal{D}.

The first is straightforward – and importantly, the only way to construct an instance D\mathcal{D}. AddEvent, on the other hand, enforces two conditions that δ\delta must satisfy to be added to a set D\mathcal{D}:

  • P(δ)DP(\delta) \subseteq \mathcal{D}, and
  • all parents of δ\delta must be compatible.

We defer the discussion on the second condition, enforced by the AreCompatible procedure, to its dedicated section below. The restriction P(δ)DP(\delta) \subseteq \mathcal{D} defines a partial order on events by guaranteeing that an event δ\delta can only be defined and added to D\mathcal{D} after all its parents P(δ)P(\delta) have been added.

We say that D\mathcal{D} is valid if it can be constructed from a single call to CreateEmpty, followed by a sequence of calls to AddEvent. This is equivalent to requiring that

  1. the parent-child relationship is acylic and
  2. the parents of every event satisfy AreCompatible.

For the remainder of this chapter, we will always assume that D\mathcal{D} is valid, and thus the event history of D\mathcal{D} is always well-defined and acyclic.

def CreateEmpty() -> Set[Event]:
    return set()

def AddEvent(
    events: Set[Event],
    replacement_graph: Graph
    deletion_set: Set[V],
    glueing_relation: EquivalenceRelation[V]
) -> Set[Event]:
    new_event = (
        replacement_graph,
        deletion_set,
        glueing_relation
    )
    parents = parents(new_event)
    assert(issubset(parents, events))
    assert(AreCompatible(parents))

    events = union(events, {new_event})

Compatible events #

Assuming the parent-child relationship is acylic, we can define the ancestors (or causes) δ\lfloor\delta\rfloor of an event δ\delta recursively

δ={δ}  δpP(δ)δp.\lfloor\delta\rfloor = \{\,\delta\,\}\ \cup\ \bigcup_{\delta_p \in P(\delta)} \lfloor\delta_p\rfloor.

Events DDD \subseteq \mathcal{D} are compatible (or a configuration) if all vertex deletion sets VV^- for all ancestors of δD\delta \in D are disjoint. That is, writing

D=δDδ,\lfloor D\rfloor = \bigcup_{\delta \in D} \lfloor \delta\rfloor,

we require that all sets {V(GR,V,μ)D}\{V^- \mid (G_R, V^-, \mu) \in \lfloor D \rfloor \} are disjoint. In the example above, events δ5\delta_5 and δ6\delta_6 are compatible, wheresa δ5\delta_5 and δ4\delta_4 are not. As pseudocode, this is implemented by the following procedure.

def AreCompatible(events: Set[Event]) -> bool:
    all_ancestors = union([ancestors(d) for d in events])
    deleted_vertices = set()
    for d in all_ancestors:
        for v in deletion_set(d):
            if v in deleted_vertices:
                return False
            deleted_vertices.add(v)
    return True

Note that this definition of event compatibility is a strictly stronger version of parallel independence as is typically defined in DPO rewriting Corrad., 2018Andrea Corradini, Dominique Duval, Michael Löwe, Leila Ribeiro, Rodrigo Machado, Andrei Costa, Guilherme Grochau Azzi, Jonas Santos Bezerra and Leonardo Marques Rodrigues. 2018. On the Essence of Parallel Independence for the Double-Pushout and Sesqui-Pushout Approaches. In Graph Transformation, Specifications, and Nets, Cham. Springer International Publishing, 1--18. doi: 10.1007/978-3-319-75396-6_1. It does not allow for events δ1\delta_1 and δ2\delta_2 such that a vertex vv is both

  • in the read only context of δ1\delta_1, i.e. vV1dom(μ1)v \in V_1^- \cap dom(\mu_1), and thus present both before and after the application of δ1\delta_1,
  • in the deletion set of δ2\delta_2, i.e. vV2v \in V^-_2.

This excludes asymmetric conflicts as discussed in e.g. Baldan, 2008Paolo Baldan, Andrea Corradini and Barbara König. 2008. Unfolding Graph Transformation Systems: Theory and Applications to Verification, which arise in the more generali definition. This restriction simplifies our considerations as makes the event history of any event unique.

The runtime is O(DAlogDA)O(D_A \log D_A), where DAD_A is the sum of the sizes of all vertex deletion sets of events in D\lfloor D \rfloor

δDVδ.\sum_{\delta \in \lfloor D \rfloor} |V^-_\delta|.

The log\log factor can typically be removed if the vertices vv span a contiguous integer range or by using a hash function. Alternatively, the log\log factor can also be reduced by using separate sets to track deleted vertices of each event.

When talking about compatible sets of events DDD \subseteq \mathcal{D}, it simplifies considerations to always choose DD such that the ancestors of δD\delta \in D are also in DD, i.e. D=D.D = \lfloor D \rfloor. We introduce the notation

Γ(D)={DDD and D is compatible}P(D)\Gamma(\mathcal{D}) = \{ \lfloor D \rfloor \mid D \subseteq \mathcal{D} \text{ and } D \text{ is compatible}\} \subseteq \mathcal{P}(\mathcal{D})

for the set of all compatible sets of rewrites of the form D\lfloor D \rfloor.

Events are rewrites on the flattened history #

We have so far explored how events can be added to D\mathcal{D}, as well as when they are compatible. However, until we have established that adding events to D\mathcal{D} is in some sense equivalent to applying rewrites on a graph, it is hard to see how the data structure D\mathcal{D} would be useable for graph rewriting. This is precisely our next point.

In a valid non-empty D\mathcal{D}, events δD\delta \in \mathcal{D} form a directed acyclic graph and therefore there must always be (at least) one “root” event δ1D\delta_1 \in \mathcal{D} with no parents P(δ1)=P(\delta_1) = \varnothing. δ1\delta_1 is thus a valid rewrite that can be applied to any graph.

For the applications of D\mathcal{D} that we consider, it will always be sufficient to have a unique root event δ1\delta_1. Viewing δ1\delta_1 as a rewrite that applies to the empty graph G0=G1G_0 = \varnothing \to G_1, we can understand it as injecting the input graph G1G_1 into D\mathcal{D}.

Non-root events in D\mathcal{D} on the other hand typically correspond to valid (semantics preserving) rewrites in the GTS under consideration.

Consider a set of compatible events DΓ(D)D \in \Gamma(\mathcal{D}). Define a topological ordering δ1,,δk\delta_1, \ldots, \delta_k of the events in DD, i.e. if δjP(δi)\delta_j \in P(\delta_i) then i<ji < j.

Proposition 5.1Events as valid rewrites

There are graphs G0,,GkG_0, \ldots, G_k such that for all 1ik1 \leqslant i \leqslant k, the event δi\delta_i defines a valid rewrite rir_i on Gi1G_{i-1} and Gi=ri(Gi1)G_i = r_i(G_{i-1}).

Define the empty graph G0=G_0 = \varnothing. The event δ1\delta_1 has no parent and thus must have an empty vertex deletion set and glueing relation. It is thus a valid rewrite r1r_1 on G0G_0. Define G1=r1(G0)G_1 = r_1(G_0).

We can similarly define Gi=ri(Gi1)G_i = r_i(G_{i-1}) inductively for graphs G2,,GkG_2, \ldots, G_k if we show for 2ik2 \leqslant i \leqslant k that the ii-th event δi\delta_i defines a valid rewrite rir_i on Gi1G_{i-1}. The set of vertices in Gi1G_{i-1} is the union of all vertices in the replacement graph of δ1,,δi1\delta_1, \ldots, \delta_{i-1} minus their vertex deletion sets

V(Gi1)=(1j<iV(δj))(1j<iVj),V(G_{i-1}) = \left(\bigsqcup_{1 \leqslant j < i} V(\delta_j)\right) \setminus \left(\bigsqcup_{1 \leqslant j < i} V^-_j\right),

where VjV^-_j is the vertex deletion set of δj\delta_j.

Now, by definition of the event δi\delta_i,

Vi1j<iV(δj).V^-_i \subseteq \bigsqcup_{1 \leqslant j < i} V(\delta_j).

On the other hand, because of the compatibility of all events in DD, we know that ViVj=V^-_i \cap V^-_j =\varnothing for all 1j<i1 \leqslant j < i. It thus follows ViV(Gi1)V^-_i \subseteq V(G_{i-1}). Hence δi\delta_i is indeed a valid rewrite of Gi1G_{i-1}, and thus rir_i and GiG_i are well-defined.

This construction is illustrated in the following figure for the compatible set δ5δ6\lfloor \delta_5 \rfloor \cup \lfloor \delta_6 \rfloor of the previous example.

Applying events as rewrites in topological order. The result is a sequence of valid graph rewrites that start from the graph of δ1\delta_1δ1​.

Applying events as rewrites in topological order. The result is a sequence of valid graph rewrites that start from the graph of δ1\delta_1.

We now show that the graph GkG_k is determined uniquely by DΓ(D)D \in \Gamma(\mathcal{D}) and provide an explicit procedure to construct it.

Proposition 5.2Flat graph extraction

The graph GkG_k obtained by applying the set of compatible rewrites DΓ(D)D \in \Gamma(\mathcal{D}) in topological order on the empty graph is independent of the topological ordering chosen.

Given the set of rewrites DDD \subseteq \mathcal{D}, the procedure FlattenHistory returns GkG_k in time

O(m+n)O(m + n)

where nn and mm are the total number of vertices and edges across all replacement graphs in DD.

Let us start with the definition of FlattenHistory:

 1def FlattenHistory(events: Set[Event]) -> Graph:
 2    all_ancestors = union([ancestors(d) for d in events])
 3    graph = Graph()
 4    for a in toposort(all_ancestors):
 5        add_graph(graph, replacement_graph(a))
 6        for (del_v, repl_v) in glueing_relation(a):
 7            move_edges(graph, repl_v, del_v)
 8        for v in deletion_set(a):
 9            remove_vertex(graph, v)
10    return graph

toposort is a function that returns a topological ordering of the rewrites in DD according to the parent-child rewrite relation, add_graph inserts the graph passed as second argument into the graph passed as first argument, remove_vertex removes the vertex along with all incident edges from the graph and move_edges moves all edges of the second vertex to the first vertex.

Correctness of FlattenHistory.  It is easy to see that if the graph GkG_k that is obtained from applying the rewrites in order is independent of the choice of the toplogical ordering, then FlattenHistory is a correct implementation of the procedure, as it applies one rewrite at a time, in topological order.

Rewrite order invariance.  Consider two rewrites δ1,δ2D\delta_1, \delta_2 \in D such that neither is an ancestor of the other. Let

Dpre=δ1δ2DD_{pre} = \lfloor \delta_1 \rfloor \cup \lfloor \delta_2 \rfloor \subseteq D

and proceed by induction over DpreD_{pre}: assume the graph GpreG_{pre} obtained by applying the rewrites in DpreD_{pre} is invariant on the choice of the topological ordering of DpreD_{pre}. Clearly this is true for Dpre=D_{pre} = \varnothing. All that remains to be shown is that GpostG_{post} obtained by applying first δ1\delta_1 then δ2\delta_2 on GpreG_{pre} is equal to GpostG_{post}', obtained by applying the same rewrites in the reverse order on GpreG_{pre}.

The vertex sets V1V^-_1 and V2V^-_2 of δ1\delta_1 and δ2\delta_2 must be disjoint because δ1,δ2D\delta_1, \delta_2 \in D and hence are compatible. Furthermore, the replacement graphs (by definition of the rewrites) and the glueing relations of δ1\delta_1 and δ2\delta_2 (by rewrite compatibility) cannot contain vertices in V1V2V^-_1 \sqcup V^-_2. It follows that the order in which vertices of V1V2V^-_1 \sqcup V^-_2 are removed from GpreG_{pre} does not affect the graph GpostG_{post}. Furthermore, vertex merging is a commutative operation, and so is disjoint graph addition. It follows Gpost=GpostG_{post} = G_{post}' and hence the result.

Runtime.  In total nn vertices and mm edges will be added to graph by add_graph on line 5. As a result, at most nn vertices can ever be deleted by line 9. Finally, while a naive implementation of move_edges of line 7 might result in the same edge being moved many times, all edge moves can be cached and only executed once at the end: notice that every time edges are moved away from a vertex, that vertex is subsequently removed from the graph. Instead of removing the vertex, keep it “hidden”, with a link to the vertex that the edges should be moved to. Once all graph operations are completed, traverse all hidden vertices and follow the links to the vertices that the edges should be moved to. This can be done in O(n)O(n) time. Then move all edges to the correct vertex, in time O(m)O(m), and delete the hidden vertices.

Now instead of exploring the space of all graphs G\mathcal{G} reachable by repeatedly applying rewrites, we can explore the rewrite space by adding events to D\mathcal{D}. Write flat(D)flat(D) for the graph returned by FlattenHistory on set DD. If G\mathcal{G}' is the set of all graphs returned by FlattenHistory on compatible events

G={ flat(D)DΓ(D)},\mathcal{G}' = \{\ flat(D) \mid D \in \Gamma(\mathcal{D})\},

then Proposition . and Proposition . combined guarantee that GG\mathcal{G}' \subseteq \mathcal{G}. To conclude, we show that indeed any graph in G\mathcal{G} is in G\mathcal{G}', and hence G=G\mathcal{G} = \mathcal{G}'.

Proposition 5.3Rewrites as valid events

Let DΓ(D)D \in \Gamma(\mathcal{D}) be a set of compatible events and G=flat(D)G = flat(D). Any rewrite rr that can be applied on GG defines an on GG defines an event δ=r\delta = r that can be added to D\mathcal{D}.

We recall that a rewrite r=(GR,V,μ)r = (G_R, V^-, \mu) defines an event δ=(GR,V,μ)\delta = (G_R, V^-, \mu) that can be added to D\mathcal{D} if

  • P(δ)DP(\delta) \subseteq \mathcal{D}, and
  • all rewrites in P(δ)P(\delta) are compatible.

By the rewrite definition, VV(G)V^- \subseteq V(G). It follows in particular that

VδDV(δ),V^- \subseteq \bigcup_{\delta' \in D} V(\delta'),

and thus VV(D)V^- \subseteq V(\mathcal{D}), as well as P(δ)DDP(\delta) \subseteq D \subseteq \mathcal{D}. This proves both conditions.

Starting from the empty graph D=\mathcal{D} = \varnothing, we can create a root event δ0=(G,,)\delta_0 = (G, \varnothing, \varnothing) with an empty vertex deletion set and glueing relation and add it to D.\mathcal{D}.

Clearly, flat({δ0})=Gflat(\{\delta_0\}) =G. We then apply Proposition . repeatedly. If we have a sequence r1,,rkr_1, \ldots, r_k of valid rewrites that can be applied on GG, then the sequence of events δ1=r1,,δk=rk\delta_1 = r_1, \ldots, \delta_k = r_k that it defines can also be added to D\mathcal{D} in this order. As we have further seen in Proposition . and Proposition ., the graph GkG_k that is obtained as a result of the rewrites is the same graph returned by FlattenHistory called on D=DD = \mathcal{D}.

In other words, we conclude that exploring the rewrite space on GG is fully equivalent to exploring the space of valid events starting from D={δ0}\mathcal{D} = \{ \delta_0 \}.