5.3. The data structure
We now present a data structure that is closely related to equality saturation but supports arbitrary graph rewriting. It is modelled on the graph unfolding construction as presented in Baldan, 2008. 2008. Unfolding Graph Transformation Systems: Theory and Applications to Verification.
Rather than maintaining equivalence relations between terms, as done in term graphs, we maintain equivalence relations between graph vertices. Our data structure stores the set of all applied rewrites – the main subject of this section is to show how all operations of interest on this data structure can be implemented efficiently.
The persistent graph rewriting data structure is given by a set of events , with
- vertex deletion set and
- glueing relation .
We have extended the notation to by defining it as the union of all vertex sets of replacement graphs in . We will similarly use to denote the set of vertices in the replacement graph of a rewrite .
Events resemble rewrites as defined in Definition . but differ in that they do not apply to a single graph , i.e. there is no graph such that . Instead,
We will see below how a graph can be constructed such that an event is indeed a valid rewrite on .
Using the disjointness of the union in (1), for all , there is a unique such that that we call the owner of . The parents (or directed causes) of an event are then the owners of the vertices in the deletion set of :
Inversely, we define the children of as the set of event whose parents include
The following figure shows an example of a data structure on undirected graphs.
Events on an undirected graph with their history. Coloured directed edges represent the parent-child relationship. The area that they rewrite in the parent event is represented by dashed regions of the same colour. The map between graphs is given by the vertex IDs.
Merges, confluent persistence and event creation #
A rewrite that applies to a replacement graph of an event , i.e. immediately defines a valid event . In that case, has a unique parent
Creating an event from a rewrite is the simplest type of data mutation that can be recorded in . For to be a confluently persistent data structure, it must also be allowed to merge mulitple data mutations together. Rather than handling merges of versions of the data structure explicitly, an event can define graph mutation operations that apply on collections of events – the resulting mutation is equivalent to explicitly creating a merged version of the versions in , followed by the desired rewrite. In this case, the parents of are precisely the set .
In other words, the parent-child relationships of is precisely the event history of : a directed graph with vertex set and edges if . For to define a valid confluently persistent data structure, we need to
- Ensure that the event history is acyclic, and
- Define conditions that guarantee that events correspond to valid data mutations.
We hit both birds with one stone by restricting how can be constructed and modified in such a way that acyclicity is guaranteed. Specifically, we introduce two procedures:
CreateEmptyconstructs an empty , andAddEvent, adds an event to .
The first is straightforward – and importantly, the only way to construct an
instance . AddEvent, on the other hand, enforces two conditions
that must satisfy to be added to a set :
- , and
- all parents of must be compatible.
We defer the discussion on the second condition, enforced by the AreCompatible
procedure, to its dedicated section below. The restriction
defines a partial order on events by
guaranteeing that an event can only be defined and added to
after all its parents have been added.
We say that is valid if it can be constructed from a single call
to CreateEmpty, followed by a sequence of calls to AddEvent. This is
equivalent to requiring that
- the parent-child relationship is acylic and
- the parents of every event satisfy
AreCompatible.
For the remainder of this chapter, we will always assume that is valid, and thus the event history of is always well-defined and acyclic.
def CreateEmpty() -> Set[Event]:
return set()
def AddEvent(
events: Set[Event],
replacement_graph: Graph
deletion_set: Set[V],
glueing_relation: EquivalenceRelation[V]
) -> Set[Event]:
new_event = (
replacement_graph,
deletion_set,
glueing_relation
)
parents = parents(new_event)
assert(issubset(parents, events))
assert(AreCompatible(parents))
events = union(events, {new_event})
Compatible events #
Assuming the parent-child relationship is acylic, we can define the ancestors (or causes) of an event recursively
Events are compatible (or a configuration) if all vertex deletion sets for all ancestors of are disjoint. That is, writing
we require that all sets are disjoint. In the example above, events and are compatible, wheresa and are not. As pseudocode, this is implemented by the following procedure.
def AreCompatible(events: Set[Event]) -> bool:
all_ancestors = union([ancestors(d) for d in events])
deleted_vertices = set()
for d in all_ancestors:
for v in deletion_set(d):
if v in deleted_vertices:
return False
deleted_vertices.add(v)
return True
Note that this definition of event compatibility is a strictly stronger version of parallel independence as is typically defined in DPO rewriting Corrad., 2018. 2018. On the Essence of Parallel Independence for the Double-Pushout and Sesqui-Pushout Approaches. In Graph Transformation, Specifications, and Nets, Cham. Springer International Publishing, 1--18. doi: 10.1007/978-3-319-75396-6_1. It does not allow for events and such that a vertex is both
- in the read only context of , i.e. , and thus present both before and after the application of ,
- in the deletion set of , i.e. .
This excludes asymmetric conflicts as discussed in e.g. Baldan, 2008. 2008. Unfolding Graph Transformation Systems: Theory and Applications to Verification, which arise in the more generali definition. This restriction simplifies our considerations as makes the event history of any event unique.
The runtime is , where is the sum of the sizes of all vertex deletion sets of events in
The factor can typically be removed if the vertices span a contiguous integer range or by using a hash function. Alternatively, the factor can also be reduced by using separate sets to track deleted vertices of each event.
When talking about compatible sets of events , it simplifies considerations to always choose such that the ancestors of are also in , i.e. We introduce the notation
for the set of all compatible sets of rewrites of the form .
Events are rewrites on the flattened history #
We have so far explored how events can be added to , as well as when they are compatible. However, until we have established that adding events to is in some sense equivalent to applying rewrites on a graph, it is hard to see how the data structure would be useable for graph rewriting. This is precisely our next point.
In a valid non-empty , events form a directed acyclic graph and therefore there must always be (at least) one “root” event with no parents . is thus a valid rewrite that can be applied to any graph.
For the applications of that we consider, it will always be sufficient to have a unique root event . Viewing as a rewrite that applies to the empty graph , we can understand it as injecting the input graph into .
Non-root events in on the other hand typically correspond to valid (semantics preserving) rewrites in the GTS under consideration.
Consider a set of compatible events . Define a topological ordering of the events in , i.e. if then .
There are graphs such that for all , the event defines a valid rewrite on and .
Define the empty graph . The event has no parent and thus must have an empty vertex deletion set and glueing relation. It is thus a valid rewrite on . Define .
We can similarly define inductively for graphs if we show for that the -th event defines a valid rewrite on . The set of vertices in is the union of all vertices in the replacement graph of minus their vertex deletion sets
where is the vertex deletion set of .
Now, by definition of the event ,
On the other hand, because of the compatibility of all events in , we know that for all . It thus follows . Hence is indeed a valid rewrite of , and thus and are well-defined.
This construction is illustrated in the following figure for the compatible set of the previous example.
Applying events as rewrites in topological order. The result is a sequence of valid graph rewrites that start from the graph of .
We now show that the graph is determined uniquely by and provide an explicit procedure to construct it.
The graph obtained by applying the set of compatible rewrites in topological order on the empty graph is independent of the topological ordering chosen.
Given the set of rewrites , the procedure
FlattenHistory returns in time
where and are the total number of vertices and edges across all replacement graphs in .
Let us start with the definition of FlattenHistory:
1def FlattenHistory(events: Set[Event]) -> Graph:
2 all_ancestors = union([ancestors(d) for d in events])
3 graph = Graph()
4 for a in toposort(all_ancestors):
5 add_graph(graph, replacement_graph(a))
6 for (del_v, repl_v) in glueing_relation(a):
7 move_edges(graph, repl_v, del_v)
8 for v in deletion_set(a):
9 remove_vertex(graph, v)
10 return graph
toposort is a function that returns a topological ordering of the rewrites in
according to the parent-child rewrite relation, add_graph inserts the
graph passed as second argument into the graph passed as first argument,
remove_vertex removes the vertex along with all incident edges from the graph
and move_edges moves all edges of the second vertex to the first vertex.
Correctness of FlattenHistory. It is easy to see that if the graph
that is obtained from applying the rewrites in order is independent of the
choice of the toplogical ordering, then FlattenHistory is a correct
implementation of the procedure, as it applies one rewrite at a time, in
topological order.
Rewrite order invariance. Consider two rewrites such that neither is an ancestor of the other. Let
and proceed by induction over : assume the graph obtained by applying the rewrites in is invariant on the choice of the topological ordering of . Clearly this is true for . All that remains to be shown is that obtained by applying first then on is equal to , obtained by applying the same rewrites in the reverse order on .
The vertex sets and of and must be disjoint because and hence are compatible. Furthermore, the replacement graphs (by definition of the rewrites) and the glueing relations of and (by rewrite compatibility) cannot contain vertices in . It follows that the order in which vertices of are removed from does not affect the graph . Furthermore, vertex merging is a commutative operation, and so is disjoint graph addition. It follows and hence the result.
Runtime. In total vertices and edges will be added to graph
by add_graph on line 5. As a result, at most vertices can ever be deleted
by line 9. Finally, while a naive implementation of move_edges of line 7 might
result in the same edge being moved many times, all edge moves can be cached and
only executed once at the end: notice that every time edges are moved away from
a vertex, that vertex is subsequently removed from the graph. Instead of
removing the vertex, keep it “hidden”, with a link to the vertex that the edges
should be moved to. Once all graph operations are completed, traverse all hidden
vertices and follow the links to the vertices that the edges should be moved to.
This can be done in time. Then move all edges to the correct vertex, in
time , and delete the hidden vertices.
Now instead of exploring the space of all graphs reachable by
repeatedly applying rewrites, we can explore the rewrite space by adding events
to . Write for the graph returned by FlattenHistory on
set . If is the set of all graphs returned by FlattenHistory
on compatible events
then Proposition . and Proposition . combined guarantee that . To conclude, we show that indeed any graph in is in , and hence .
Let be a set of compatible events and . Any rewrite that can be applied on defines an on defines an event that can be added to .
We recall that a rewrite defines an event that can be added to if
- , and
- all rewrites in are compatible.
By the rewrite definition, . It follows in particular that
and thus , as well as . This proves both conditions.
Starting from the empty graph , we can create a root event with an empty vertex deletion set and glueing relation and add it to
Clearly, . We then apply
Proposition . repeatedly. If we have a sequence
of valid rewrites that can be applied on , then the
sequence of events that it defines can
also be added to in this order. As we have further seen in
Proposition . and
Proposition ., the graph that is obtained as a
result of the rewrites is the same graph returned by FlattenHistory called on
.
In other words, we conclude that exploring the rewrite space on is fully equivalent to exploring the space of valid events starting from .