Fully and Confluently Persistent Graph Rewriting

Chapter 5

Fully and Confluently Persistent Graph Rewriting

This chapter leverages another construction from graph rewriting theory that finds a direct application in quantum compilation: the unfolding of graph transformation systems Baldan, 1999Paolo Baldan, Andrea Corradini and Ugo Montanari. 1999. Unfolding and Event Structure Semantics for Graph Grammars. In Foundations of Software Science and Computation Structures, Berlin, Heidelberg. Springer Berlin Heidelberg, 73--89. doi: 10.1007/3-540-49019-1_6 Winskel, 1987Glynn Winskel. 1987. Event structures. Whereas most applications to date have focused on using unfolding for model verification (see section 5.1 for a review), we will use the same techniques to instead speed up optimisation problems over the space of reachable graphs in a GTS.

In the unfolding, graph rewrites are expressed as persistent modifications of the graph. Mutable data structures are typically ephemeral: modifying the data structure overwrites information and invalidates any references to the old data. In contrast, a persistent data structure applies changes to the data so that both the old and new versions remain accessible – a famous example of this are version control systems such as git.

A data structure is fully persistent if modifications can be applied not only to the latest version but also to previous versions of the data structure. In that case, a version of the data may be used to create several new versions. Instead of a linear edit history of all mutations, the result is an edit history tree, with possibly many “most recent” versions – leaves in the edit history.

Finally, a fully persistent data structure is also confluently persistent if different versions of the data in the edit history can be joined together. As a result, the edit history forms a directed acyclic graph (DAG) of versions of the data, linked by data mutation and joining operations. Adopting terminology from git, we call a join of two or more versions a merge of multiple versions.

In this chapter, we will consider all graphs to be hypergraphs (V,E)(V, E) with vertex set VV and hyperedge set EVE \subseteq V^\ast. All results can easily be adapted to accommodate graph attributes, weights, and types as required by applications. This means the data structure and algorithms we present apply directly to minIR graphs and, more broadly, to most instances of graph rewriting.

The central object of study in this chapter is the graph rewrite. We restate a simplified version of Definition . here. We opt for convenience for a rewrite definition that omits the edge deletion set EE^- of Definition .. This is not a restriction of the general case, as can be seen by adding a “dummy” vertex vev_e for each edge ee in a graph: a rewrite that removes an edge eEe \in E^- can equivalently be expressed by the rewrite that removes the dummy vertex veVv_e \in V^-1.

As in previous chapters, \sqcup denotes disjoint union, f:ABf: A \rightharpoonup B denotes a partial function and domdom denotes the domain of definition of a (partial) function.

Definition 5.4Graph rewrite

A rewrite rr on a graph G=(V,E)G = (V, E) is given by a tuple r=(GR,V,μ)r = (G_R, V^-, \mu), with

  • GR=(VR,ER)G_R = (V_R, E_R) is a graph called the replacement graph,
  • VVV^- \subseteq V is the vertex deletion set, and
  • μ:VVR\mu: V^- \rightharpoonup V_R is the glueing relation, a partial function that maps a subset of the deleted vertices of GG to vertices in the replacement graph.

Define the context subgraph GCG_C as the subgraph induced by the vertices

VC=(VV)  dom(μ).V_C = (V \smallsetminus V^-) \ \cup\ dom(\mu).

The rewritten graph resulting from applying rr to GG is the glueing

r(G)=(GCGR)/μ.r(G) = (G_C \sqcup G_R) / \sim_\mu.

obtained from the union of GCG_C and GRG_R by merging all vertices within the same class in the equivalence relation μ\sim_\mu that is the closure of μ\mu. We refer to section 3.5 for more details and an illustration of glueings and rewrites.

In this chapter, we will consider sequences of multiple rewrites. We will use the notation V(G)V(G) and E(G)E(G) to designate the vertices, respectively the edges, of a graph GG. It is further assumed that the vertices V(G)V(G) and V(G)V(G') for GGG \neq G' are always disjoint, a fact that we underline by always writing unions of graphs and vertices with \sqcup.

We make use of the fact that for every rewrite r=(GR,V,μ)r = (G_R, V^-, \mu), the equivalence classes α\alpha of μ\sim_\mu are of the form

α={m}{vVμ(v)=m},\alpha = \{ m \} \sqcup \{ v \in V^- \mid \mu(v) = m \},

for some mV(GR)m \in V(G_R). For every set of merged vertices αV(G)\alpha \subseteq V(G) in r(G)r(G), there is thus a unique vertex not in VV^-:

αV={m}.\alpha \smallsetminus V^- = \{ m \}.

We choose to always identify the merged vertex in r(G)r(G) with mm. Using this convention, the set of vertices of r(G)r(G) is simply

V(r(G))=(V(G)V)V(GR).V(r(G)) = (V(G) \smallsetminus V^-) \sqcup V(G_R).


  1. This makes use of the fact that unlike in DPO, our rewrite definition allows the (implicit) deletion of edges with one endvertex in VV^-↩︎

5.1. Related work

The unfolding of a graph transformation system (GTS) was first proposed in Baldan, 1999Paolo Baldan, Andrea Corradini and Ugo Montanari. 1999. Unfolding and Event Structure Semantics for Graph Grammars. In Foundations of Software Science and Computation Structures, Berlin, Heidelberg. Springer Berlin Heidelberg, 73--89. doi: 10.1007/3-540-49019-1_6 as a generalisation of a well-known construction on Petri nets Winskel, 1987Glynn Winskel. 1987. Event structures. Originally defined for DPO rewriting, the unfolding was later generalised to SPO Baldan, 2007Paolo Baldan, Andrea Corradini, Ugo Montanari and Leila Ribeiro. 2007. Unfolding semantics of graph transformation. Information and Computation 205, 5 (May 2007, 733--782). doi: 10.1016/j.ic.2006.11.004 Baldan, 2014Paolo Baldan, Andrea Corradini, Tobias Heindel, Barbara König and Paweł Sobociński. 2014. Processes and unfoldings: concurrent computations in adhesive categories. Mathematical Structures in Computer Science 24, 4 (June 2014). doi: 10.1017/s096012951200031x and SqPO Behr, 2019Nicolas Behr. 2019. Sesqui-Pushout Rewriting: Concurrency, Associativity and Rule Algebra Framework. Electronic Proceedings in Theoretical Computer Science 309 (December 2019, 23--52). doi: 10.4204/eptcs.309.2 in arbitrary adhesive categories. The unfolding is a powerful GTS technique that has found applications in model verification Baldan, 2008Paolo Baldan, Andrea Corradini and Barbara König. 2008. Unfolding Graph Transformation Systems: Theory and Applications to Verification Baldan, 2008Paolo Baldan, Andrea Corradini and Barbara König. 2008. A framework for the verification of infinite-state graph transformation systems. Information and Computation 206, 7 (July 2008, 869--907). doi: 10.1016/j.ic.2008.04.002 Costa, 2012Simone André da Costa and Leila Ribeiro. 2012. Verification of graph grammars using a logical approach. Science of Computer Programming 77, 4 (April 2012, 480--504). doi: 10.1016/j.scico.2010.02.006 and other formal analysis tools such as model-based diagnosis Baldan, 2008Paolo Baldan, Thomas Chatain, Stefan Haar and Barbara König. 2008. Unfolding-Based Diagnosis of Systems with an Evolving Topology and model transformation analysis Bisztr., 2009Dénes András Bisztray. 2009. Compositional verification of model-level refactorings based on graph transformations. PhD Thesis. University of Leicester.

Unfoldings of finite GSTs are often infinite. A lot of work has therefore concerned itself with finding sufficient conditions for finiteness or the existence of finite complete prefixes of unfoldings Baldan, 2008Paolo Baldan, Andrea Corradini and Barbara König. 2008. Unfolding Graph Transformation Systems: Theory and Applications to Verification Baldan, 2004Paolo Baldan, Andrea Corradini and Barbara König. 2004. Verifying Finite-State Graph Grammars: An Unfolding-Based Approach. In CONCUR 2004 - Concurrency Theory, Berlin, Heidelberg. Springer Berlin Heidelberg, 83--98. doi: 10.1007/978-3-540-28644-8_6 Baldan, 2008Paolo Baldan, Andrea Corradini, Barbara König and Stefan Schwoon. 2008. McMillan's Complete Prefix for Contextual Nets Baldan, 2010Paolo Baldan, Alessandro Bruni, Andrea Corradini, Barbara König and Stefan Schwoon. 2010. On the Computation of McMillan's Prefix for Contextual Nets and Graph Grammars. In Graph Transformations, Berlin, Heidelberg. Springer Berlin Heidelberg, 91--106. doi: 10.1007/978-3-642-15928-2_7 Schwoon, 2013Stefan Schwoon. 2013. Efficient verification of sequential and concurrent systems. PhD Thesis. École normale supérieure de Cachan-ENS Cachan. On the other hand, unfoldings of GTSs of quantum computation are expected to be intractably large Yang, 2021Yichen Yang, Mangpo Phitchaya Phothilimtha, Yisu Remy Wang, Max Willsey, Sudip Roy and Jacques Pienaar. 2021. Equality Saturation for Tensor Graph Superoptimization. CoRR abs/2101.01332. doi: 10.48550/ARXIV.2101.01332, with no complete prefixes in general. Rather, our interests lie in finding heuristics that determine the subspace of the unfolding of interest, combined with fast algorithms to expand finite unfolding prefixes into larger ones. This chapter is to our knowledge the first work in this direction.

Persistent data structures on the other hand have a rich history in computer science Drisco., 1989James R. Driscoll, Neil Sarnak, Daniel D. Sleator and Robert E. Tarjan. 1989. Making data structures persistent. Journal of Computer and System Sciences 38, 1 (February 1989, 86--124). doi: 10.1016/0022-0000(89)90034-2 Lagogi., 2005George Lagogiannis, Yannis Panagis, Spyros Sioutas and Athanasios Tsakalidis. 2005. A survey of persistent data structures. In Proceedings of the 9th WSEAS International Conference on Computers, Stevens Point, Wisconsin, USA. World Scientific and Engineering Academy and Society (WSEAS), and particularly within functional programming Okasaki, 1996Chris Okasaki and Peter Lee. 1996. Purely functional data structures. Carnegie Mellon University, USA Okasaki, 1998Chris Okasaki and Andy Gill. 1998. Fast Mergeable Integer Maps. In Workshop on ML, Septempter 1998, 77--86 Hinze, 2005Ralf Hinze and Ross Paterson. 2005. Finger trees: a simple general-purpose data structure. Journal of Functional Programming 16, 02 (November 2005, 197). doi: 10.1017/s0956796805005769. Confluently persistent data structures were first explored in Drisco., 1994James R. Driscoll, Daniel D. K. Sleator and Robert E. Tarjan. 1994. Fully persistent lists with catenation. Journal of the ACM 41, 5 (Septempter 1994, 943--959). doi: 10.1145/185675.185791. A general treatment of the approach was subsequently presented in Fiat, 2003Amos Fiat and Haim Kaplan. 2003. Making data structures confluently persistent. Journal of Algorithms 48, 1 (August 2003, 16--58). doi: 10.1016/s0196-6774(03)00044-0 and improved in Collet., 2012Sébastien Collette, John Iacono and Stefan Langerman. 2012. Confluent Persistence Revisited. In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, January 2012. Society for Industrial and Applied Mathematics, 593--601. doi: 10.1137/1.9781611973099.50. Chaler., 2018Parinya Chalermsook, Mayank Goswami, László Kozma, Kurt Mehlhorn and Thatchaphol Saranurak. 2018. Multi-Finger Binary Search Trees. In International Symposium on Algorithms and Computation 2018. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi: 10.4230/LIPICS.ISAAC.2018.55 proposed a data structure for confluently persistent tries.

Within the field of graph transformations, there is a well-developed theory for persistent (and confluently persistent) transformations in the form of the concurrent graph transformation formalism of Corradini et al. Corrad., 1996A. Corradini, U. Montanari and F. Rossi. 1996. Graph Processes. Fundamenta Informaticae 26, 3,4 (241--265). doi: 10.3233/fi-1996-263402 and Baldan et al. Baldan, 1999Paolo Baldan, Andrea Corradini, Ugo Montanari, Francesca Rossi, Hartmut Ehrig and Michael Löwe. 1999. Concurrent Semantic of Algebraic Graph Transformations. In Handbook of Graph Grammars and Computing by Graph Transformation, August 1999. World Scientific, 107--187. doi: 10.1142/9789812814951_0003. This categorical formalism has also been extended to include support for overlapping transformations in Echahed, 2017Rachid Echahed and Aude Maignan. 2017. Parallel Graph Rewriting with Overlapping Rules. In Proceedings of the 21st International Conference on Logic for Programming, Artificial Intelligence and Reasoning, LPAR, 300--318. doi: 10.29007/576h.

The first practical application of persistent graph rewriting was developed by the graph rewriting engine GRAPE Weber, 2017Jens H. Weber. 2017. GRAPE – A Graph Rewriting and Persistence Engine. In Graph Transformation. Springer International Publishing, 209--220. doi: 10.1007/978-3-319-61470-0_13 and was originally based on ephemeral data structures with transactional ACID semantics. Later, its successor, GrapeVine Weber, 2022Jens H. Weber. 2022. Tool Support for Functional Graph Rewriting with Persistent Data Structures - GrapeVine. In Graph Transformation. ICGT 2022. Springer International Publishing, 195--206. doi: 10.1007/978-3-031-09843-7_11, enhanced this with the first fully persistent data structure for graph rewriting. In this work, the vertices and edges that result from graph rewrites are stored individually in a specialised database. Depending on the requested data version, the graph can be retrieved from the database’s individual vertex and edge entities using the database’s graph query language. To our knowledge, no confluently persistent data structure has been proposed for graph rewriting.

As we will see in section 5.5, confluent persistence is a particularly valuable property in the absence of a rewriting strategy, i.e. a procedure to select and prioritise among possible graph transformations Echahed, 2008Rachid Echahed. 2008. Inductively Sequential Term-Graph Rewrite Systems. In Graph Transformations, ICGT. Springer Berlin Heidelberg, 84--98. doi: 10.1007/978-3-540-87405-8_7. This distinguishes the approach presented in this thesis from most previous work. Rewriting strategies feature prominently in PORGY, a tool for port graph rewriting Andrei, 2011Oana Andrei, Maribel Fernández, Hélène Kirchner, Guy Melançon, Olivier Namet and Bruno Pinaud. 2011. PORGY: Strategy-Driven Interactive Transformation of Graphs. Electronic Proceedings in Theoretical Computer Science 48 (February 2011, 54--68). doi: 10.4204/eptcs.48.7 Ferná., 2010Maribel Fernández and Olivier Namet. 2010. Strategic programming on graph rewriting systems. Electronic Proceedings in Theoretical Computer Science 44 (December 2010, 1--20). doi: 10.4204/eptcs.44.1; the graph rewriting software GROOVE provides the notion of a control program to govern the transformation order Rensink, 2004Arend Rensink. 2004. The GROOVE Simulator: A Tool for State Space Generation. In Applications of Graph Transformations with Industrial Relevance. Springer Berlin Heidelberg, 479--485. doi: 10.1007/978-3-540-25959-6_40; and finally, tools such as GrGen provide advanced control flow primitives to specify rewrite rule execution Geiß, 2006Rubino Geiß, Gernot Veit Batz, Daniel Grund, Sebastian Hack and Adam Szalkowski. 2006. GrGen: A Fast SPO-Based Graph Rewriting Tool. In Graph Transformations. ICGT 2006.. Springer Berlin Heidelberg, 383--397. doi: 10.1007/11841883_27.

Specifying rewriting strategies yields efficient graph transformation procedures and is particularly effective for systems with provable properties such as confluence and termination Verma, 1995Rakesh M. Verma. 1995. Transformations and confluence for rewrite systems. Theoretical Computer Science 152, 2 (December 1995, 269--283). doi: 10.1016/0304-3975(94)00255-0. As a result, rewriting strategies have also been used successfully within classical compiler optimisations Assmann, 2000Uwe Assmann. 2000. Graph rewrite systems for program optimization. ACM Transactions on Programming Languages and Systems 22, 4 (July 2000, 583--637). doi: 10.1145/363911.363914 and quantum circuit optimisation Fagan, 2018Andrew Fagan and Ross Duncan. 2018. Optimising Clifford Circuits with Quantomatic. In Proceedings 15th International Conference on Quantum Physics and Logic, QPL 2018, Halifax, Canada, 3-7th June 2018, 85--105. doi: 10.4204/EPTCS.287.5 Duncan, 2020Ross Duncan, Aleks Kissinger, Simon Perdrix and John van de Wetering. 2020. Graph-theoretic Simplification of Quantum Circuits with the ZX-calculus. Quantum 4 (June 2020, 279). doi: 10.22331/q-2020-06-04-279.

However, such properties of the transition system – or successful heuristic approximations for it – cannot always be derived. In these cases, the space of graphs reachable from an input graph within the transition system must be explored non-deterministically. In the absence of a control program, GROOVE will fall back to an exhaustive exploration of the search space – for an exploration up to depth Δ\Delta, the result is a search tree of size O(γΔ)\mathcal O(\gamma^\Delta), where γ\gamma is the number of possible rewrites at every graph in the search space (assuming γ\gamma is constant for every reachable graph).

Exhaustive exploration is used extensively in model checking, typically to verify properties that must hold for all reachable graphs Rensink, 2004Arend Rensink, Ákos Schmidt and Dániel Varró. 2004. Model Checking Graph Transformations: A Comparison of Two Approaches. In Graph Transformations. ICGT 2004. Springer Berlin Heidelberg, 226--241. doi: 10.1007/978-3-540-30203-2_17. It has also proven to be very useful for compiler optimisation, where the constantly evolving rewrite rules, instruction sets and complex, architecture-dependent cost functions render it challenging to fix a deterministic program transformation schedule.

Jia et al. showed in Jia, 2019Zhihao Jia, Oded Padon, James Thomas, Todd Warszawski, Matei Zaharia and Alex Aiken. 2019. TASO: optimizing deep learning computation with automatic generation of graph substitutions. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, October 2019. ACM, 47--62. doi: 10.1145/3341301.3359630 that computation graph optimisation using graph transformations was achievable without predefined rewriting strategies. They discovered new state-of-the-art implementations for computation graphs of interest to the deep learning community using a simple exhaustive search of the space of possible rewrites with backtracking. This approach was then adapted to quantum circuit optimisation in Xu, 2022Mingkuan Xu, Zikun Li, Oded Padon, Sina Lin, Jessica Pointing, Auguste Hirth, Henry Ma, Jens Palsberg, Alex Aiken, Umut A. Acar and Zhihao Jia. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 and Xu, 2023Amanda Xu, Abtin Molavi, Lauren Pick, Swamit Tannu and Aws Albarghouthi. 2023. Synthesizing Quantum-Circuit Optimizers. Proceedings of the ACM on Programming Languages 7, PLDI (June 2023, 835--859). doi: 10.1145/3591254.

These recent results fit within a long line of compiler research called superoptimisation Fraser, 1979Christopher W. Fraser. 1979. A compact, machine-independent peephole optimizer. In Proceedings of the 6th ACM SIGACT-SIGPLAN symposium on Principles of programming languages - POPL ’79. ACM Press, 1--6. doi: 10.1145/567752.567753 Massal., 1987Henry Massalin. 1987. Superoptimizer: a look at the smallest program. In Proceedings of the second international conference on Architectual support for programming languages and operating systems, October 1987. ACM, 122--126. doi: 10.1145/36206.36194 Sands, 2011Duncan Sands. 2011. Super-optimizing LLVM IR. (November 2011). Retrieved on 13/01/2025 (LLVM Developer's meeting) from http://llvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdf Bansal, 2006Sorav Bansal and Alex Aiken. 2006. Automatic generation of peephole superoptimizers. ACM SIGARCH Computer Architecture News 34, 5 (October 2006, 394--403). doi: 10.1145/1168919.1168906 Sasnau., 2017Raimondas Sasnauskas, Yang Chen, Peter Collingbourne, Jeroen Ketema, Jubi Taneja and John Regehr. 2017. Souper: A Synthesizing Superoptimizer. CoRR abs/1711.04422. On top of excellent optimisation performance, this approach to compiler optimisation using graph transformation systems (GTS) is exceptionally flexible, as rewrite rules can be generated and tailored on demand to the constraints and instruction set of the target hardware. For any supplied cost function, the compiler can explore all valid program transformations to find the rewrites sequence that minimises cost. This keeps the cost function-specific logic separate from the transformation semantics of the program, making it straightforward to replace or update the optimisation objective.

The adaptation of superoptimisation to quantum optimisation of Xu, 2022Mingkuan Xu, Zikun Li, Oded Padon, Sina Lin, Jessica Pointing, Auguste Hirth, Henry Ma, Jens Palsberg, Alex Aiken, Umut A. Acar and Zhihao Jia. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 and Xu, 2023Amanda Xu, Abtin Molavi, Lauren Pick, Swamit Tannu and Aws Albarghouthi. 2023. Synthesizing Quantum-Circuit Optimizers. Proceedings of the ACM on Programming Languages 7, PLDI (June 2023, 835--859). doi: 10.1145/3591254 is, however, showing scaling difficulties: unlike classical superoptimisation, which is usually designed to optimise small subroutines within programs, e.g. focusing on arithmetic instructions, single instruction multiple data (SIMD) etc., the technique should in principle be able to optimise quantum programs in their entirety and requires tens of thousands of rewrite rules. This leads to immense search spaces that superoptimisation does not scale well to.

For the special case of term rewriting, i.e. rewriting of tree expressions, a technique known as equality saturation was introduced in Tate, 2009Ross Tate, Michael Stepp, Zachary Tatlock and Sorin Lerner. 2009. Equality saturation: a new approach to optimization. ACM SIGPLAN Notices 44, 1 (January 2009, 264--276). doi: 10.1145/1594834.1480915 to compress and reduce the size of the search space significantly. Equality saturation can be viewed as a twist on persistent data structures designed to optimise terms through term rewriting. It is persistent insofar as it preserves all data inserted into it, though, unlike persistent data structures, it does not retain the history of transformations. This introduces a new step in the optimisation process known as the extraction phase, where the best term stored within the data structure must be identified and recovered.

An efficient implementation was presented in Willsey, 2021Max Willsey. 2021. Practical and Flexible Equality Saturation. PhD Thesis. University of Washington, and it has recently been adopted in modern compiler optimisation pipelines Fallin, 2022Chris Fallin. 2022. Cranelift: Using E-Graphs for Verified, Cooperating Middle-End Optimizations. (August 2022). Retrieved on 14/01/2025 (RFC) from https://github.com/bytecodealliance/rfcs/blob/main/accepted/cranelift-egraph.md. Though the approach was extended to optimise computation graphs for deep learning in Yang, 2021Yichen Yang, Mangpo Phitchaya Phothilimtha, Yisu Remy Wang, Max Willsey, Sudip Roy and Jacques Pienaar. 2021. Equality Saturation for Tensor Graph Superoptimization. CoRR abs/2101.01332. doi: 10.48550/ARXIV.2101.01332, equality saturation does not generalise to graph rewriting. Equality saturation and the difficulties of adapting it to graph rewriting are interesting (and subtle!) enough to warrant their own section.

5.2. A closer look at equality saturation

Below, we provide a succinct introduction to equality saturation and discuss its shortcomings in the context of quantum computation and graph rewriting in general. For further details on equality saturation, we recommend the presentation of Willsey, 2021Max Willsey. 2021. Practical and Flexible Equality Saturation. PhD Thesis. University of Washington, its implementation Willsey, 2025Max Willsey. 2025. egg: egraphs good. Retrieved on 14/01/2025 (code repository) from https://github.com/egraphs-good/egg, and this blog discussion Bernst., 2024Max Bernstein. 2024. What's in an e-graph?. (Septempter 2024). Retrieved on 14/01/2025 (blog post) from https://bernsteinbear.com/blog/whats-in-an-egraph/.

Unlike a general-purpose compiler utility, equality saturation is specifically a technique for term rewriting. Terms1 are algebraic expressions represented as trees, in which tree nodes correspond to operations, the children of an operation are the subterms passed as arguments to the operation, and leaf nodes are either constants or unbound variables. For instance, the term f(x×2,3)f(x \times 2, 3) would be represented as the tree:

This representation is particularly suited for any pure functional (i.e. side-effect-free) classical computation. Every node of a term is identified with its own term: the subterm given by the subtree the node is the root of. Given term transformation rules, term rewriting consists of finding subterms that match known transformation patterns. The matching subtrees can then be replaced with the new equivalent trees.

In equality saturation, all terms obtained through term rewriting are stored within a single persistent data structure. Term optimisation proceeds in two stages. First, an exploration phase adds progressively more terms to the data structure to discover and capture all possible terms that the input can be rewritten to until saturation (see below), or a timeout, is reached. In the second phase, the saturated data structure is passed to an extraction algorithm tasked with finding the term that minimises the cost function of interest among all terms discovered during exploration.

The data structure that enables this is a generalisation of term trees. Just as in terms, nodes correspond to operations and have children subterms corresponding to the operation’s arguments. To record that a new term obtained through a rewrite is equivalent to an existing subterm, we extend the data structure we employ to also store equivalence classes of nodes, typically implemented as Union-Find data structures Galler, 1964Bernard A. Galler and Michael J. Fisher. 1964. An improved equivalence algorithm. Communications of the ACM 7, 5 (May 1964, 301--303). doi: 10.1145/364099.364331 Cormen, 2009Thomas H Cormen, Charles E Leiserson, Ronald L Rivest and Clifford Stein. 2009. Introduction to algorithms (Third edition ed.). MIT Press, Cambridge, Massachusetts. If we, for instance, applied the rewrite x2x+xx * 2 \mapsto x + x to the term above, we would obtain

Nodes within a grey box indicate equivalent subterms. This diagram encodes that any occurrence of the x * 2 term can equivalently be expressed by the x + x term. Henceforth, when matching terms for rewriting, both the * term and the + term are valid choices for the first argument of the f operation. Suppose, for example, the existence of a rewrite f(x+y,z)f(x,yz)f(x + y, z) \mapsto f(x, y * z), then this would match the above data structure, resulting in

A consequence of using equivalence relations in the data structure is that the ordering in which the rewrites are considered and applied becomes irrelevant!

As presented, the exploration process would never terminate, and the data structure size would grow indefinitely: as more rewrites are applied, more and more terms are created, resulting in an ever-increasing set of possible rewrites to be considered and processed. Equality saturation resolves this by enforcing a term uniqueness invariant: every term or subterm explored is expressed by exactly one node in the data structure. We can see in the above example that this is currently not the case: the term xx for instance, is present multiple times – so is 33. As a result, the nodes no longer form a forest of trees but instead directed acyclic graphs:

This is commonly known as term sharing, and the resulting data structure is known as a term graph Willsey, 2021Max Willsey. 2021. Practical and Flexible Equality Saturation. PhD Thesis. University of Washington. Maintaining this invariance is not hard in practice: whenever a new term is about to be added by applying a rewrite, it must first be checked whether the term exists already – something that can be done cheaply by keeping track of all hashes of existing terms. In the affirmative case, rather than adding a new term to the matched term’s equivalence class, both terms’ classes must be merged.

It might be that equivalence classes must be merged recursively: given the terms f(x,3)f(x, 3) and f(y,3)f(y, 3), if the classes of xx and yy are merged (and thus xx and yy have been proven equivalent), then the classes of their respective parent f(x,3)f(x, 3) and f(y,3)f(y, 3) must also be merged. Doing so efficiently is non-trivial, so we will not go into details here and refer again to Willsey, 2021Max Willsey. 2021. Practical and Flexible Equality Saturation. PhD Thesis. University of Washington.

In the absence of terms of unbounded size, the uniqueness invariant guarantees that the exploration will eventually saturate: as rewrites are applied, there will come a point where all equivalent terms have been discovered, i.e. every applicable rewrite will correspond to an equivalence within an already known class, thereby not providing any new information. This marks the end of the exploration phase2.

Term optimisation then proceeds to the extraction phase. Reading out an optimised term out of the saturated term data structure is not trivial. For every equivalence class in the data structure, a representative node must be chosen so that the final term, extracted recursively from the root by always selecting the representative node of every term class, minimises the desired cost function3.

The strategy for choosing representative terms depends heavily on the cost function. In simple cases, such as minimising the total size of the extracted term, this can be done greedily in reverse topological order, i.e. proceeding from the leaves towards the root Willsey, 2021Max Willsey. 2021. Practical and Flexible Equality Saturation. PhD Thesis. University of Washington. There are also more complex cases, however: if the cost function allows for the sharing of subexpressions that may be used more than once in the computation, for instance, then finding the optimal solution will require more expensive computations such as solving boolean satisfiability (SAT) or Satisfiability Modulo Theories (SMT) problem instances Biere, 2021Armin Biere, Marijn J. H. Heule, Hans Maaren and Toby Walsh. 2021. Handbook of satisfiability (Second edition ed.). IOS Press, Amsterdam.

Equality saturation on graphs? #

Equality saturation is a fast-developing subfield of compilation with a growing list of applications. Unfortunately for us4, adapting these ideas to quantum computation (and graph rewriting more generally) presents several unsolved challenges.

The root of the problem lies in the program representation. The minIR representation we presented in section 3.3 – but also the quantum circuit representation – captures quantum computations, not as a term, but in a directed acyclic graph (DAG) structure.

A generalisation of equality saturation to computation DAGs was studied in Yang, 2021Yichen Yang, Mangpo Phitchaya Phothilimtha, Yisu Remy Wang, Max Willsey, Sudip Roy and Jacques Pienaar. 2021. Equality Saturation for Tensor Graph Superoptimization. CoRR abs/2101.01332. doi: 10.48550/ARXIV.2101.01332 in the context of optimisation of computation graphs for deep learning. Their approach is based on the observation that the computation of a (classical) computation DAG can always be expressed by a term for each output of the computation. Consider, for example, the simple computation that takes two inputs (x, y) representing 2D cartesian coordinates and returns its equivalent in polar coordinates (r, θ).

By introducing two operations polarr\textit{polar}_r and polarθ\textit{polar}_\theta that compute polar\textit{polar} and subsequently, discard one of the two outputs, the DAG can equivalently be formulated as two terms

corresponding to the two outputs r and θ of the computation. This involves temporarily duplicating some of the data and computations in the DAG – though all duplicates will be merged again in the term graph due to the term sharing invariant.

This duplicating and merging of data is fundamentally at odds with the constraints we must enforce on linear data, such as quantum resources. Each operation (or data) of a DAG that is split into multiple terms introduces a new constraint that must be imposed on the extraction algorithm: a computation DAG will only satisfy the no-discarding theorem (section 2.1) for linear values if, for each split operation it contains, it either contains all or none of its split components.

To illustrate this point, consider the following simple rewrite on quantum circuits that pushes X gates (\oplus) from the right of a CX gate to the left:

Both the left and right hand sides would be decomposed into two terms, one for each output qubit. The left terms could be written as

X(CX0(0,1))andCX1(0,1)X(CX0(0, 1)) \quad\textrm{and}\quad CX1(0, 1)

whereas the right terms would be

CX0(X(0),X(1))andCX1(X(0),X(1)).CX0(X(0), X(1)) \quad\textrm{and}\quad CX1(X(0), X(1)).

We introduced the term X()X(\cdot) for the single-qubit X gate and two terms CX1(,)CX1(\cdot, \cdot) and CX2(,)CX2(\cdot, \cdot) for the terms that produce the first, respectively second, output of the two-qubit CX gate. 11 and 00 denote the input qubits of the computation. This would be interpreted as two different rewrites

X(CX0(0,1))CX0(X(0),X(1))andCX1(0,1)CX1(X(0),X(1)).\begin{aligned}X(CX0(0, 1)) &\mapsto CX0(X(0), X(1))\\\textrm{and}\quad CX1(0, 1) &\mapsto CX1(X(0), X(1)).\end{aligned}

Unlike classical computations, however, either of these rewrites on their own would be unphysical: there is no implementation of either split operations CX0CX0 or CX1CX1 on their own. We would thus have to enforce at extraction time that for every application of this pair of rewrite rules, either both or none of the rewrites are applied.

Conversely, satisfying the no-cloning theorem requires verification that during extraction, terms that share a subterm but correspond to distinct graph rewrites are never selected simultaneously – otherwise, the linear value corresponding to the shared subterm would require cloning to be used twice.

The no-discarding and no-cloning restrictions result in a complex web of AND respectively XOR relationships between individual terms in the term graph. These constraints could be ignored during the exploration phase and then be modelled in the extraction phase by an integer linear programming (ILP) problem. However, Yang, 2021Yichen Yang, Mangpo Phitchaya Phothilimtha, Yisu Remy Wang, Max Willsey, Sudip Roy and Jacques Pienaar. 2021. Equality Saturation for Tensor Graph Superoptimization. CoRR abs/2101.01332. doi: 10.48550/ARXIV.2101.01332 observed that this approach causes the term graph to encode a solution space that grows super-exponentially with rewrite depth (see Fig. 7 in Yang, 2021Yichen Yang, Mangpo Phitchaya Phothilimtha, Yisu Remy Wang, Max Willsey, Sudip Roy and Jacques Pienaar. 2021. Equality Saturation for Tensor Graph Superoptimization. CoRR abs/2101.01332. doi: 10.48550/ARXIV.2101.01332), rendering the ILP extraction problem computationally intractable beyond 3 subsequent rewrites. Recent work has attempted to tackle this issue using reinforcement learning Bărbu., 2024George-Octavian Bărbulescu, Taiyi Wang, Zak Singh and Eiko Yoneki. 2024. Learned Graph Rewriting with Equality Saturation: A New Paradigm in Relational Query Rewrite and Beyond. arXiv: 2407.12794 [cs.DB].

Linearity-preserving rewrites are an exponentially small subset #

A simple calculation shows that in the case that all values in the computation graph are linear and only graphs up to a maximal size are considered, the number of possible rewrites only grows exponentially in the rewrite depth. In other words, for optimisation of quantum computations, the solution space of valid computations is much smaller5 than the space explored by the equality saturation approach of Yang, 2021Yichen Yang, Mangpo Phitchaya Phothilimtha, Yisu Remy Wang, Max Willsey, Sudip Roy and Jacques Pienaar. 2021. Equality Saturation for Tensor Graph Superoptimization. CoRR abs/2101.01332. doi: 10.48550/ARXIV.2101.01332.

Indeed suppose there is a maximal graph size V(G)Θ|V(G)| \leqslant \Theta and suppose that all rewrite patterns, i.e. the subgraph induced by the vertex deletion set VV^- of a rewrite, are connected. This is an assumption that was also made in chapter 4, see section 4.2 for a discussion.

In a computation graph of linear values GG, every vertex (value in the computation) vV(G)v \in V(G) has a unique incoming and outgoing edge. This means that any pattern embedding φ:PG\varphi: P \hookrightarrow G is uniquely defined by the image φ(vP)\varphi(v_P) of a single vertex vPV(P)v_P \in V(P). Thus for a GTS with mm transformation rules, there can be at most a constant number

mV(G)mΘ=:αm \cdot |V(G)| \leqslant m \cdot \Theta =: \alpha

of possible rewrites that can be applied to any graph GG. Let Gd\mathcal{G}_d be the set of all graphs that can be reached in the GTS in at most dd rewrites from some input graph G0G_0. Gd+1\mathcal{G}_{d+1} is the set of all graphs obtained by applying a rewrite to a graph GGdG \in \mathcal{G}_d. Thus we have the relation:

Gd+1αGd,|\mathcal{G}_{d+1}| \leqslant \alpha \cdot |\mathcal{G}_d|,

The total number of rewrites RdR_d that can be applied on any graph in Gd\mathcal{G}_d is thus

RdαGd=O(eαd).R_d \leqslant \alpha \cdot |\mathcal{G}_d| = O(e^{\alpha \cdot d}).


In summary, equality saturation is a specialisation of persistent data structures uniquely suited to the problem of term rewriting. It succinctly encodes the space of all equivalent terms, and using term sharing does away with the need to apply equivalent rewrites on multiple copies of the same term, which inevitably occurs on more naive rewriting approaches.

However, equality saturation cannot model rewrites that require deleting parts of the data. This is not a problem for terms representing classical operations, as data can always be implicitly copied during exploration and discarded during extraction as required. This is not the case for quantum computations – and for graph rewriting in general, where explicit vertex (and edge) deletions are an integral part of graph transformation semantics.

As a result, numerous constraints would have to be imposed to restrict the solution space encoded by term graphs to valid outcomes of graph rewriting procedures. This would make extraction algorithms complex and cumbersome. More importantly, we showed that in the case of computation graphs on linear values, such as quantum computations, the solution space explored by equality saturation is super-exponentially larger than the space of valid computations, rendering the extraction algorithm and meaningful exploration of the relevant rewriting space computationally intractable.


  1. Depending on the context, computer scientists also call them abstract syntax trees (AST) – for our purposes, it’s the same thing. ↩︎

  2. Of course, it is also practical to include a timeout parameter in implementations to guarantee timely termination even on large or ill-behaved systems. ↩︎

  3. Note that we are omitting a subtle point here that arises due to term sharing: depending on the cost function, choosing different representative nodes for the same class could be favourable for the other occurrences of the term in the computation. ↩︎

  4. but fortunately for this thesis ↩︎

  5. Exponential is super-exponentially smaller than super-exponential! Or put mathematically eo(n)/eΘ(n)=eo(n)Θ(n)=eo(n)e^{o(n)}/e^{\Theta(n)} = e^{o(n) - \Theta(n)} = e^{o(n)}↩︎

5.3. The data structure

We now present a data structure that is closely related to equality saturation but supports arbitrary graph rewriting. It is modelled on the graph unfolding construction as presented in Baldan, 2008Paolo Baldan, Andrea Corradini and Barbara König. 2008. Unfolding Graph Transformation Systems: Theory and Applications to Verification.

Rather than maintaining equivalence relations between terms, as done in term graphs, we maintain equivalence relations between graph vertices. Our data structure stores the set of all applied rewrites – the main subject of this section is to show how all operations of interest on this data structure can be implemented efficiently.

The persistent graph rewriting data structure is given by a set D\mathcal{D} of events δ=(GR,V,μ)D\delta = (G_R, V^-, \mu) \in \mathcal{D}, with

  • vertex deletion set VV(Dδ)V^- \subseteq V(\mathcal{D} \smallsetminus \delta) and
  • glueing relation μ:VV(GR)\mu: V^- \rightharpoonup V(G_R).

We have extended the V()V(\cdot) notation to D\mathcal{D} by defining it as the union of all vertex sets of replacement graphs in D\mathcal{D}. We will similarly use V(δ)V(\delta) to denote the set of vertices in the replacement graph of a rewrite δ\delta.

Events resemble rewrites as defined in Definition . but differ in that they do not apply to a single graph GG, i.e. there is no graph such that VV(G)V^- \subseteq V(G). Instead,

V  δDV(δ)=V(D).V^-\ \subseteq\ \bigsqcup_{\delta \in \mathcal{D}} V(\delta) = V(\mathcal{D}).

We will see below how a graph GG can be constructed such that an event δD\delta \in \mathcal{D} is indeed a valid rewrite on GG.

Using the disjointness of the union in (1), for all vV(D)v \in V(\mathcal{D}), there is a unique δD\delta \in \mathcal{D} such that vV(δ)v \in V(\delta) that we call the owner of vv. The parents (or directed causes) P(δ)P(\delta) of an event δ\delta are then the owners of the vertices in the deletion set VV^- of δ\delta:

P(δ)={δpDVV(δp)}.P(\delta) = \left\{ \delta_p \in \mathcal{D} \mid V^- \cap V(\delta_p) \neq \varnothing \right\}.

Inversely, we define the children of δ\delta as the set of event whose parents include δ\delta

P1(δ)={δcDδP(δc)}.P^{-1}(\delta) = \left\{\delta_c \in \mathcal{D} \mid \delta \in P(\delta_c)\right\}.

The following figure shows an example of a data structure D\mathcal{D} on undirected graphs.

Events on an undirected graph with their history. Coloured directed edges represent the parent-child relationship. The area that they rewrite in the parent event is represented by dashed regions of the same colour. The map between graphs is given by the vertex IDs.

Events on an undirected graph with their history. Coloured directed edges represent the parent-child relationship. The area that they rewrite in the parent event is represented by dashed regions of the same colour. The map between graphs is given by the vertex IDs.

Merges, confluent persistence and event creation #

A rewrite r=(GR,V,μ)r = (G_R, V^-, \mu) that applies to a replacement graph of an event δp\delta_p, i.e. VV(δp)V^- \subseteq V(\delta_p) immediately defines a valid event δr:=r\delta_r := r. In that case, δr\delta_r has a unique parent

P(δr)={δp}.P(\delta_r) = \{ \,\delta_p\, \}.

Creating an event δr\delta_r from a rewrite rr is the simplest type of data mutation that can be recorded in D\mathcal{D}. For D\mathcal{D} to be a confluently persistent data structure, it must also be allowed to merge mulitple data mutations together. Rather than handling merges of versions of the data structure explicitly, an event δD\delta \in \mathcal{D} can define graph mutation operations that apply on collections of events PDP \subseteq \mathcal{D} – the resulting mutation is equivalent to explicitly creating a merged version of the versions in PP, followed by the desired rewrite. In this case, the parents of δ\delta are precisely the set P=P(δ)P = P(\delta).

In other words, the parent-child relationships of D\mathcal{D} is precisely the event history of D\mathcal{D}: a directed graph with vertex set D\mathcal{D} and edges δ1δ2\delta_1 \to \delta_2 if δ1P(δ2)\delta_1 \in P(\delta_2). For D\mathcal{D} to define a valid confluently persistent data structure, we need to

  1. Ensure that the event history is acyclic, and
  2. Define conditions that guarantee that events correspond to valid data mutations.

We hit both birds with one stone by restricting how D\mathcal{D} can be constructed and modified in such a way that acyclicity is guaranteed. Specifically, we introduce two procedures:

  • CreateEmpty constructs an empty D=\mathcal{D} = \varnothing, and
  • AddEvent, adds an event δ\delta to D\mathcal{D}.

The first is straightforward – and importantly, the only way to construct an instance D\mathcal{D}. AddEvent, on the other hand, enforces two conditions that δ\delta must satisfy to be added to a set D\mathcal{D}:

  • P(δ)DP(\delta) \subseteq \mathcal{D}, and
  • all parents of δ\delta must be compatible.

We defer the discussion on the second condition, enforced by the AreCompatible procedure, to its dedicated section below. The restriction P(δ)DP(\delta) \subseteq \mathcal{D} defines a partial order on events by guaranteeing that an event δ\delta can only be defined and added to D\mathcal{D} after all its parents P(δ)P(\delta) have been added.

We say that D\mathcal{D} is valid if it can be constructed from a single call to CreateEmpty, followed by a sequence of calls to AddEvent. This is equivalent to requiring that

  1. the parent-child relationship is acylic and
  2. the parents of every event satisfy AreCompatible.

For the remainder of this chapter, we will always assume that D\mathcal{D} is valid, and thus the event history of D\mathcal{D} is always well-defined and acyclic.

def CreateEmpty() -> Set[Event]:
    return set()

def AddEvent(
    events: Set[Event],
    replacement_graph: Graph
    deletion_set: Set[V],
    glueing_relation: EquivalenceRelation[V]
) -> Set[Event]:
    new_event = (
        replacement_graph,
        deletion_set,
        glueing_relation
    )
    parents = parents(new_event)
    assert(issubset(parents, events))
    assert(AreCompatible(parents))

    events = union(events, {new_event})

Compatible events #

Assuming the parent-child relationship is acylic, we can define the ancestors (or causes) δ\lfloor\delta\rfloor of an event δ\delta recursively

δ={δ}  δpP(δ)δp.\lfloor\delta\rfloor = \{\,\delta\,\}\ \cup\ \bigcup_{\delta_p \in P(\delta)} \lfloor\delta_p\rfloor.

Events DDD \subseteq \mathcal{D} are compatible (or a configuration) if all vertex deletion sets VV^- for all ancestors of δD\delta \in D are disjoint. That is, writing

D=δDδ,\lfloor D\rfloor = \bigcup_{\delta \in D} \lfloor \delta\rfloor,

we require that all sets {V(GR,V,μ)D}\{V^- \mid (G_R, V^-, \mu) \in \lfloor D \rfloor \} are disjoint. In the example above, events δ5\delta_5 and δ6\delta_6 are compatible, wheresa δ5\delta_5 and δ4\delta_4 are not. As pseudocode, this is implemented by the following procedure.

def AreCompatible(events: Set[Event]) -> bool:
    all_ancestors = union([ancestors(d) for d in events])
    deleted_vertices = set()
    for d in all_ancestors:
        for v in deletion_set(d):
            if v in deleted_vertices:
                return False
            deleted_vertices.add(v)
    return True

Note that this definition of event compatibility is a strictly stronger version of parallel independence as is typically defined in DPO rewriting Corrad., 2018Andrea Corradini, Dominique Duval, Michael Löwe, Leila Ribeiro, Rodrigo Machado, Andrei Costa, Guilherme Grochau Azzi, Jonas Santos Bezerra and Leonardo Marques Rodrigues. 2018. On the Essence of Parallel Independence for the Double-Pushout and Sesqui-Pushout Approaches. In Graph Transformation, Specifications, and Nets, Cham. Springer International Publishing, 1--18. doi: 10.1007/978-3-319-75396-6_1. It does not allow for events δ1\delta_1 and δ2\delta_2 such that a vertex vv is both

  • in the read only context of δ1\delta_1, i.e. vV1dom(μ1)v \in V_1^- \cap dom(\mu_1), and thus present both before and after the application of δ1\delta_1,
  • in the deletion set of δ2\delta_2, i.e. vV2v \in V^-_2.

This excludes asymmetric conflicts as discussed in e.g. Baldan, 2008Paolo Baldan, Andrea Corradini and Barbara König. 2008. Unfolding Graph Transformation Systems: Theory and Applications to Verification, which arise in the more generali definition. This restriction simplifies our considerations as makes the event history of any event unique.

The runtime is O(DAlogDA)O(D_A \log D_A), where DAD_A is the sum of the sizes of all vertex deletion sets of events in D\lfloor D \rfloor

δDVδ.\sum_{\delta \in \lfloor D \rfloor} |V^-_\delta|.

The log\log factor can typically be removed if the vertices vv span a contiguous integer range or by using a hash function. Alternatively, the log\log factor can also be reduced by using separate sets to track deleted vertices of each event.

When talking about compatible sets of events DDD \subseteq \mathcal{D}, it simplifies considerations to always choose DD such that the ancestors of δD\delta \in D are also in DD, i.e. D=D.D = \lfloor D \rfloor. We introduce the notation

Γ(D)={DDD and D is compatible}P(D)\Gamma(\mathcal{D}) = \{ \lfloor D \rfloor \mid D \subseteq \mathcal{D} \text{ and } D \text{ is compatible}\} \subseteq \mathcal{P}(\mathcal{D})

for the set of all compatible sets of rewrites of the form D\lfloor D \rfloor.

Events are rewrites on the flattened history #

We have so far explored how events can be added to D\mathcal{D}, as well as when they are compatible. However, until we have established that adding events to D\mathcal{D} is in some sense equivalent to applying rewrites on a graph, it is hard to see how the data structure D\mathcal{D} would be useable for graph rewriting. This is precisely our next point.

In a valid non-empty D\mathcal{D}, events δD\delta \in \mathcal{D} form a directed acyclic graph and therefore there must always be (at least) one “root” event δ1D\delta_1 \in \mathcal{D} with no parents P(δ1)=P(\delta_1) = \varnothing. δ1\delta_1 is thus a valid rewrite that can be applied to any graph.

For the applications of D\mathcal{D} that we consider, it will always be sufficient to have a unique root event δ1\delta_1. Viewing δ1\delta_1 as a rewrite that applies to the empty graph G0=G1G_0 = \varnothing \to G_1, we can understand it as injecting the input graph G1G_1 into D\mathcal{D}.

Non-root events in D\mathcal{D} on the other hand typically correspond to valid (semantics preserving) rewrites in the GTS under consideration.

Consider a set of compatible events DΓ(D)D \in \Gamma(\mathcal{D}). Define a topological ordering δ1,,δk\delta_1, \ldots, \delta_k of the events in DD, i.e. if δjP(δi)\delta_j \in P(\delta_i) then i<ji < j.

Proposition 5.1Events as valid rewrites

There are graphs G0,,GkG_0, \ldots, G_k such that for all 1ik1 \leqslant i \leqslant k, the event δi\delta_i defines a valid rewrite rir_i on Gi1G_{i-1} and Gi=ri(Gi1)G_i = r_i(G_{i-1}).

Define the empty graph G0=G_0 = \varnothing. The event δ1\delta_1 has no parent and thus must have an empty vertex deletion set and glueing relation. It is thus a valid rewrite r1r_1 on G0G_0. Define G1=r1(G0)G_1 = r_1(G_0).

We can similarly define Gi=ri(Gi1)G_i = r_i(G_{i-1}) inductively for graphs G2,,GkG_2, \ldots, G_k if we show for 2ik2 \leqslant i \leqslant k that the ii-th event δi\delta_i defines a valid rewrite rir_i on Gi1G_{i-1}. The set of vertices in Gi1G_{i-1} is the union of all vertices in the replacement graph of δ1,,δi1\delta_1, \ldots, \delta_{i-1} minus their vertex deletion sets

V(Gi1)=(1j<iV(δj))(1j<iVj),V(G_{i-1}) = \left(\bigsqcup_{1 \leqslant j < i} V(\delta_j)\right) \setminus \left(\bigsqcup_{1 \leqslant j < i} V^-_j\right),

where VjV^-_j is the vertex deletion set of δj\delta_j.

Now, by definition of the event δi\delta_i,

Vi1j<iV(δj).V^-_i \subseteq \bigsqcup_{1 \leqslant j < i} V(\delta_j).

On the other hand, because of the compatibility of all events in DD, we know that ViVj=V^-_i \cap V^-_j =\varnothing for all 1j<i1 \leqslant j < i. It thus follows ViV(Gi1)V^-_i \subseteq V(G_{i-1}). Hence δi\delta_i is indeed a valid rewrite of Gi1G_{i-1}, and thus rir_i and GiG_i are well-defined.

This construction is illustrated in the following figure for the compatible set δ5δ6\lfloor \delta_5 \rfloor \cup \lfloor \delta_6 \rfloor of the previous example.

Applying events as rewrites in topological order. The result is a sequence of valid graph rewrites that start from the graph of δ1\delta_1δ1​.

Applying events as rewrites in topological order. The result is a sequence of valid graph rewrites that start from the graph of δ1\delta_1.

We now show that the graph GkG_k is determined uniquely by DΓ(D)D \in \Gamma(\mathcal{D}) and provide an explicit procedure to construct it.

Proposition 5.2Flat graph extraction

The graph GkG_k obtained by applying the set of compatible rewrites DΓ(D)D \in \Gamma(\mathcal{D}) in topological order on the empty graph is independent of the topological ordering chosen.

Given the set of rewrites DDD \subseteq \mathcal{D}, the procedure FlattenHistory returns GkG_k in time

O(m+n)O(m + n)

where nn and mm are the total number of vertices and edges across all replacement graphs in DD.

Let us start with the definition of FlattenHistory:

 1def FlattenHistory(events: Set[Event]) -> Graph:
 2    all_ancestors = union([ancestors(d) for d in events])
 3    graph = Graph()
 4    for a in toposort(all_ancestors):
 5        add_graph(graph, replacement_graph(a))
 6        for (del_v, repl_v) in glueing_relation(a):
 7            move_edges(graph, repl_v, del_v)
 8        for v in deletion_set(a):
 9            remove_vertex(graph, v)
10    return graph

toposort is a function that returns a topological ordering of the rewrites in DD according to the parent-child rewrite relation, add_graph inserts the graph passed as second argument into the graph passed as first argument, remove_vertex removes the vertex along with all incident edges from the graph and move_edges moves all edges of the second vertex to the first vertex.

Correctness of FlattenHistory.  It is easy to see that if the graph GkG_k that is obtained from applying the rewrites in order is independent of the choice of the toplogical ordering, then FlattenHistory is a correct implementation of the procedure, as it applies one rewrite at a time, in topological order.

Rewrite order invariance.  Consider two rewrites δ1,δ2D\delta_1, \delta_2 \in D such that neither is an ancestor of the other. Let

Dpre=δ1δ2DD_{pre} = \lfloor \delta_1 \rfloor \cup \lfloor \delta_2 \rfloor \subseteq D

and proceed by induction over DpreD_{pre}: assume the graph GpreG_{pre} obtained by applying the rewrites in DpreD_{pre} is invariant on the choice of the topological ordering of DpreD_{pre}. Clearly this is true for Dpre=D_{pre} = \varnothing. All that remains to be shown is that GpostG_{post} obtained by applying first δ1\delta_1 then δ2\delta_2 on GpreG_{pre} is equal to GpostG_{post}', obtained by applying the same rewrites in the reverse order on GpreG_{pre}.

The vertex sets V1V^-_1 and V2V^-_2 of δ1\delta_1 and δ2\delta_2 must be disjoint because δ1,δ2D\delta_1, \delta_2 \in D and hence are compatible. Furthermore, the replacement graphs (by definition of the rewrites) and the glueing relations of δ1\delta_1 and δ2\delta_2 (by rewrite compatibility) cannot contain vertices in V1V2V^-_1 \sqcup V^-_2. It follows that the order in which vertices of V1V2V^-_1 \sqcup V^-_2 are removed from GpreG_{pre} does not affect the graph GpostG_{post}. Furthermore, vertex merging is a commutative operation, and so is disjoint graph addition. It follows Gpost=GpostG_{post} = G_{post}' and hence the result.

Runtime.  In total nn vertices and mm edges will be added to graph by add_graph on line 5. As a result, at most nn vertices can ever be deleted by line 9. Finally, while a naive implementation of move_edges of line 7 might result in the same edge being moved many times, all edge moves can be cached and only executed once at the end: notice that every time edges are moved away from a vertex, that vertex is subsequently removed from the graph. Instead of removing the vertex, keep it “hidden”, with a link to the vertex that the edges should be moved to. Once all graph operations are completed, traverse all hidden vertices and follow the links to the vertices that the edges should be moved to. This can be done in O(n)O(n) time. Then move all edges to the correct vertex, in time O(m)O(m), and delete the hidden vertices.

Now instead of exploring the space of all graphs G\mathcal{G} reachable by repeatedly applying rewrites, we can explore the rewrite space by adding events to D\mathcal{D}. Write flat(D)flat(D) for the graph returned by FlattenHistory on set DD. If G\mathcal{G}' is the set of all graphs returned by FlattenHistory on compatible events

G={ flat(D)DΓ(D)},\mathcal{G}' = \{\ flat(D) \mid D \in \Gamma(\mathcal{D})\},

then Proposition . and Proposition . combined guarantee that GG\mathcal{G}' \subseteq \mathcal{G}. To conclude, we show that indeed any graph in G\mathcal{G} is in G\mathcal{G}', and hence G=G\mathcal{G} = \mathcal{G}'.

Proposition 5.3Rewrites as valid events

Let DΓ(D)D \in \Gamma(\mathcal{D}) be a set of compatible events and G=flat(D)G = flat(D). Any rewrite rr that can be applied on GG defines an on GG defines an event δ=r\delta = r that can be added to D\mathcal{D}.

We recall that a rewrite r=(GR,V,μ)r = (G_R, V^-, \mu) defines an event δ=(GR,V,μ)\delta = (G_R, V^-, \mu) that can be added to D\mathcal{D} if

  • P(δ)DP(\delta) \subseteq \mathcal{D}, and
  • all rewrites in P(δ)P(\delta) are compatible.

By the rewrite definition, VV(G)V^- \subseteq V(G). It follows in particular that

VδDV(δ),V^- \subseteq \bigcup_{\delta' \in D} V(\delta'),

and thus VV(D)V^- \subseteq V(\mathcal{D}), as well as P(δ)DDP(\delta) \subseteq D \subseteq \mathcal{D}. This proves both conditions.

Starting from the empty graph D=\mathcal{D} = \varnothing, we can create a root event δ0=(G,,)\delta_0 = (G, \varnothing, \varnothing) with an empty vertex deletion set and glueing relation and add it to D.\mathcal{D}.

Clearly, flat({δ0})=Gflat(\{\delta_0\}) =G. We then apply Proposition . repeatedly. If we have a sequence r1,,rkr_1, \ldots, r_k of valid rewrites that can be applied on GG, then the sequence of events δ1=r1,,δk=rk\delta_1 = r_1, \ldots, \delta_k = r_k that it defines can also be added to D\mathcal{D} in this order. As we have further seen in Proposition . and Proposition ., the graph GkG_k that is obtained as a result of the rewrites is the same graph returned by FlattenHistory called on D=DD = \mathcal{D}.

In other words, we conclude that exploring the rewrite space on GG is fully equivalent to exploring the space of valid events starting from D={δ0}\mathcal{D} = \{ \delta_0 \}.

5.4. Exploration and extraction

In the previous section, we proposed a data structure D\mathcal{D} that is confluently persistent and can be used to explore the space of all possible transformations of a graph transformation system (GTS). We are now interested in using D\mathcal{D} to solve optimisation problems over the space of reachable graphs in the GTS. Following the blueprint of equality saturation (see section 5.2), we proceed in two phases:

  1. Exploration.  Given an input graph GG, populate D\mathcal{D} with events that correspond to rewrites applicable to graphs reachable from GG,
  2. Extraction.  Given a cost function ff, extract the optimal graph in D\mathcal{D}, i.e. the graph that is a flattening of a set of compatible edits DDD \subseteq \mathcal{D} and minimises ff.

Each phase comes with its respective challenges, which we discuss in this section. We will first look at the exploration phase, which requires a way to find and construct new events δ\delta that can be added to D\mathcal{D}. We will consider the extraction phase in the second part of this section and see that the problem of optimisation over the power set P(D)\mathcal P(\mathcal D) can be reduced to boolean satisfiability formula that admit simple cost functions in the use cases of interest.

There is an additional open question that we do not cover in this section and would merit a study of its own: the choice of heuristics that guide the exploration phase to ensure the “most interesting parts” of the GTS rewrite space are explored. We propose a very simple heuristic to this end in the benchmarks of section 5.5, but further investigations are called for.

Exploring the data structure with pattern matching #

We established in the previous section that rewrites that apply on GG can equivalently be added as events to D\mathcal{D}. In other words, a graph GG' is reachable from GG using the rewrites of a GTS if and only if there is a set of compatible events DDD \subseteq \mathcal{D} such that GG' is the graph obtained from FlattenHistory on input DD.

To expand D\mathcal{D} to a larger set DD\mathcal{D}' \supseteq \mathcal{D}, we must find all applicable rewrites on all graphs within D\mathcal{D}. A naive solution would iterate over all subsets of DDD \subseteq \mathcal{D}, check whether they form a compatible set of events, compute FlattenHistory if they do, and finally run pattern matching on the obtained graph to find the applicable rewrites. We can do better.

The idea is to traverse the set of events in D\mathcal{D} using the glueing relations μ\mu that connect vertices between events. Define the function μˉ:V(D)P(V(D))\bar{\mu}: V(\mathcal{D}) \to \mathcal{P}(V(\mathcal{D})) that is the union of all glueing relations μ\mu in events in D\mathcal{D}:

μˉ(v)={μc(v)δc=(Vc,Vc,μc)P1(δv)}.\bar\mu(v) = \{\mu_c(v) \mid \delta_c = (V_c, V^-_c, \mu_c) \in P^{-1}(\delta_v)\}.

where we write δv\delta_v for the owner of vv, i.e. the (unique) event δvD\delta_v \in \mathcal{D} such that vV(δv)v \in V(\delta_v). We define the set ED(v)\mathcal{E}_D(v) of equivalent vertices of vv that are compatible with DD by applying μˉ(v)\bar\mu(v) recursively and filtering out vertices whose owner is not compatible with DD. It is easiest to formalise this definition using pseudocode for the EquivalentVertices procedure. The set of vertices in ED(v)\mathcal{E}_D(v) are vertices of descendant events of δv\delta_v.

def EquivalentVertices(
    v: Vertex, events: Set[Event]
) -> Set[Vertex]:
    all_vertices = set({v})
    for w in mu_bar(v):
        new_events = union(events, {owner(w)})
        if AreCompatible(new_events):
            all_vertices = union(all_vertices,
                EquivalentVertices(w, new_events)
            )
    return all_vertices

Whilst it looks as though EquivalentVertices does not depend on D\mathcal{D}, it does so through the use of the function calls to mu_bar.

We use EquivalentVertices to repeatedly extend a set of pinned vertices πV(D)\pi \subseteq V(\mathcal{D}). A set of pinned vertices must satisfy two properties:

  • the set Dπ={δvvπ}D_\pi = \{\delta_v \mid v \in \pi \} is a set of compatible events,
  • there is no vertex vπv \in \pi and event δD\delta \in D such that vV(δ)v \in V^-(\delta).

As a result, for the flattened graph G=flat(Dπ)G = flat(D_\pi), it always holds that πV(G)\pi \subseteq V(G). Furthermore, if G(π)GG(\pi) \subseteq G is the subgraph of GG induced by π\pi, then for any superset of pinned vertices ππ\pi' \supseteq \pi, we have G(π)G(π)G(\pi) \subseteq G'(\pi') where G=flat(Dπ)G' = flat(D_{\pi'}). In other words: extending a set of pinned vertices results in an extension of the flattened graph – a very useful property when pattern matching. This property follows from the second property above and the definition of FlattenHistory.

This gives us the following simple procedure for pattern matching:

  1. Start with a single pinned vertex π={v}\pi = \{v\}.
  2. Construct partial embeddings PG(π)P \rightharpoonup G(\pi) for patterns PP.
  3. Pick a new vertex vv in G=flat(Dπ)G = flat(D_\pi) but not in G(π)G(\pi) (that we would like to extend the domain of definition of our pattern embeddings to).
  4. For all vertices vED(v)v' \in \mathcal{E}_D(v), build new pinned vertex sets π=π{v}\pi' = \pi \cup \{v'\}, filter out the sets π\pi' that are not valid pinned vertex sets.
  5. Repeat steps 2–4 until all pattern embeddings have been found.

Step 1 is straightforward – notice that pattern matching must be started at a vertex in V(D)V(\mathcal{D}), so finding all patterns will require iterating over all choices of vv. The pattern embeddings are constructed over iterations of step 2: each iteration can be seen as one step of the pattern matcher – for instance, as presented in chapter 4 – extending the pattern embeddings that can be extended and discarding those that cannot. If all possible pattern embeddings have been discarded, then matching can be aborted for that π\pi set.

How step 3 should be implemented depends on the types of graphs and patterns that are matched on. It is straightforward in the case of computation graphs with only linear values, i.e. hypergraphs with hyperedges that have directed, ordered endpoints and vertices that are incident to exactly one incoming and one outgoing edge. In that case, vv can always be chosen in such a way as to ensure progress on the next iteration of step 2, i.e. the domain of definition of at least one partial pattern embedding PG(π)P \hookrightarrow G(\pi) will be extended by one vertex. The text in the blue box below explains this case in more detail.

Step 4 produces all possible extensions of π\pi to pinned vertex sets π\pi' that include a descendant vv' of vv (or vv itself). All vertices in ED(v)\mathcal{E}_D(v) are in events compatible with DD by definition, so to check that π\pi' is a valid pinned vertex set, we only need to check the second property of pinned vertices. Let PP be a pattern, let SS be the set of all π\pi sets under consideration. Step 4 increments the sizes of all pinned vertex sets πS\pi \in S whilst maintaining the following invariant.

Invariant for step 4.  If there is a superset DDπD' \supseteq D_\pi of compatible events such that PP embeds in G=flat(D)G' = flat(D'), then there is a superset ππ\pi' \supseteq \pi of vertices such that PP embeds in flat(Dπ)flat(D_{\pi'}).

Finally, step 5 ensures the process is repeated until, for all partial pattern embeddings, either the domain of definition is complete, or the embedding of PP is not possible. Given that step 4 increments the size of π\pi sets at each iteration, this will terminate as long as the vertex picking strategy of step 3 selects vertices that allow to extend (or refute) the partial pattern embeddings constructed and extended in step 2. This is satisfied, for example, in the case of linear minIR graphs, as explained in the box.

Choosing the next vertex to pin in linear minIR (step 3).   Assuming patterns are connected, for any partial pattern embedding PG(π)P \hookrightarrow G(\pi) there is an edge ePE(P)e_P \in E(P) with no image in G(π)G(\pi) but such that at least one of the endvertex vPv_P of ePe_P has an image vGv_G in π\pi – say, ePe_P is the outgoing edge of vPv_P. Let vPv'_P be an endvertex of ePe_P in PP that has no image in G(π)G(\pi) – and say, it is the ii-th outgoing endvertex of ePe_P in PP.

Then vPv_P uniquely identifies an edge eGe_G in G=flat(Dπ)G = flat(D_\pi) – the unique outgoing edge of vGv_G – which, in turn, uniquely identifies a vertex vGV(G)v'_G \in V(G) – the ii-th outgoing endvertex of eGe_G. By choosing vGv'_G in step 3, step 4 will create pinned vertex sets that include all possible vertices equivalent to vGv_G', which are all vertices that vGv_G might be connected to through its outgoing edge1. The next iteration of step 2 will then either extend the partial pattern embedding to vPv_P or conclude that an embedding of PP is not possible.

Using the approach just sketched, pattern matching can be performed on the persistent data structure D\mathcal{D}. The runtime of steps 2 and 3 depend on the type of graphs and patterns that are matched on – these are, however, typical problems that appear in most instances of pattern matching, independently of the data structure D\mathcal{D} used here. A concrete approach to pattern matching and results for the graph types of interest to quantum compilation was presented in chapter 4.

The runtime of step 4 and the number of overall iterations of steps 2–4 required for pattern matching will depend on the number of events in D\mathcal{D} (AreCompatible runs in runtime linear in the number of ancestors), the number of equivalent vertices that successive rounds of step 4 will return and the types of patterns and pattern matching strategies.

Extraction using SAT #

Moving on to the extraction phase, we are now interested in extracting the optimal graph from D\mathcal{D}, according to some cost function of interest. Unlike exploring the “naive” search space of all graphs reachable in the GTS, the optimal solution within the persistent data structure D\mathcal{D} cannot simply be read out.

We showed in section 5.3 that finding an optimal graph GG' that is the result of a sequence of rewrites on an input graph GG is equivalent to finding an optimal set of compatible events DΓ(D)P(D)D \in \Gamma(\mathcal{D}) \subseteq \mathcal{P}(\mathcal{D}) – the optimal graph GG' is then recoved by taking G=flat(D)G' = flat(D).

There are 2D2^{|\mathcal{D}|} elements in P(D)\mathcal{P}(\mathcal{D}), which we encode as a boolean assignment problem by introducing a boolean variable xδx_\delta for all events δD\delta \in \mathcal{D}. The set of events DD is then given by

D={δDxδ}P(D).D = \{\delta \in \mathcal{D} \mid x_\delta\} \in \mathcal{P}(\mathcal{D}).

We can constrain the boolean assignments to compatible sets DD by introducing a boolean formula

¬(xδxδ)\neg (x_\delta \land x_{\delta'})

for all δ,δD\delta,\delta' \in \mathcal{D} such that their vertex deletion sets intersect V(δ)V(δ)V^-(\delta) \cap V^-(\delta') \neq \varnothing. Any assignment of {xδδD}\{x_\delta \mid \delta \in \mathcal{D}\} that satisfies all constraints of this format defines a compatible set of events.

How many such pairs of events (δ,δ(\delta,\delta') are there? By definition of parents, two events δ\delta and δ\delta' can only have overlapping vertex deletion sets if they share a parent. Assuming all events have at most ss children, ensuring DD is a set of compatible events requires at most O(s2D)O(s^2 \cdot |\mathcal{D}|) constraints.

To further restrict to DΓ(D)D \in \Gamma(\mathcal{D}), i.e. to sets of compatible events D=DD = \lfloor D \rfloor that contain all ancestors, we can add the further constraints: δD\delta \in D implies P(δ)DP(\delta) \subseteq D. This introduces up to sDs \cdot |\mathcal{D}| implication constraints

xδ(¬xδ),x_\delta \lor (\neg x_{\delta'}),

for all δ,δD\delta,\delta' \in \mathcal{D} such that δchildren(δ)\delta' \in children(\delta).

For any set of events D\mathcal{D}, the conjunction of all constraints presented above, i.e. the event compatibility constraints (1) and the parent-child relation constraints (2), defines a boolean satisfiability problem (SAT) with variables xδx_\delta. We have shown:

Proposition 5.4Extraction as SAT problem

Consider a GTS with a constant upper bound ss on the number of rewrites that may overlap any previous rewrite.

The set of valid sequences of rewrites that can be extracted from a set of events D\mathcal{D} in the GTS is given by the set of satisfying assignments of a SAT problem Cook, 1971Stephen A. Cook. 1971. The complexity of theorem-proving procedures. In Proceedings of the third annual ACM symposium on Theory of computing - STOC ’71. ACM Press, 151--158. doi: 10.1145/800157.805047 Moskew., 2001Matthew W. Moskewicz, Conor F. Madigan, Ying Zhao, Lintao Zhang and Sharad Malik. 2001. Chaff: engineering an efficient SAT solver. In Proceedings of the 38th conference on Design automation - DAC ’01. ACM Press, 530--535. doi: 10.1145/378239.379017 with D|\mathcal{D}| variables of size O(D)O(|\mathcal{D}|).

Finding the optimal assignment #

We now have to find the optimal assignment among all satisfiable assignments for the SAT problem given above. In the most general case where the cost function ff to be minimised is given as a black box oracle on the graph GG', i.e. on the flattened history of the solution set DDD \subseteq \mathcal{D}, this optimisation problem is hard2.

However, if ff can be expressed as a function of xδx_\delta instead of the flattened history G=flat(D)G' = flat(D), then the ‘hardness’ can be encapsulated within an instance of a SMT problem (satisfiability modulo theories Nieuwe., 2006Robert Nieuwenhuis and Albert Oliveras. 2006. On SAT Modulo Theories and Optimization Problems. In Theory and Applications of Satisfiability Testing - SAT 2006. Springer Berlin Heidelberg, 156--169. doi: 10.1007/11814948_18 Barrett, 2018Clark Barrett and Cesare Tinelli. 2018. Satisfiability Modulo Theories), a well-studied generalisation of SAT problems for which highly optimised solvers exist Moura, 2008Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Tools and Algorithms for the Construction and Analysis of Systems. Springer Berlin Heidelberg, 337--340. doi: 10.1007/978-3-540-78800-3_24 Sebast., 2015Roberto Sebastiani and Patrick Trentin. 2015. OptiMathSAT: A Tool for Optimization Modulo Theories. In Computer Aided Verification. Springer International Publishing, 447--454. doi: 10.1007/978-3-319-21690-4_27. A class of cost functions for which the SMT encoding of the optimisation problem becomes particularly simple are local cost functions:

Definition 5.1Local cost function

A cost function ff on graphs is local if for all rewrites rr there is a cost Δfr\Delta f_r such that for all graphs GG that rr applies to

f(r(G))=f(G)+Δfr.f(r(G)) = f(G) + \Delta f_r.

The cost Δfr\Delta f_r of a rewrite rr also immediately defines a cost to the event that rr defines δ=r\delta = r. We can thus associate a cost Δfδ\Delta f_\delta with each event δD\delta \in \mathcal{D}, given by the cost of any of the rewrites that δ\delta defines.

An instance of such a local cost function often used in the context of the optimisation of computation graphs are functions of the type

f(G)=vV(G)w(v)f(G) = \sum_{v \in V(G)} w(v)

for some vertex weight function ww – i.e. cost functions that can be expressed as sums over the costs w()w(\cdot) associated to individual vertices in GG3. Indeed, it is easy to see that in this case we can write

f(r(G))=vr(V(G))w(v)=vV(G)w(v)vVw(v)+vVRw(v):=Δfr=f(G)+Δfr,\begin{aligned}f(r(G)) &= \sum_{v \in r(V(G))} w(v)\\&= \sum_{v\in V(G)} w(v) - \underbrace{\sum_{v \in V^-} w(v) + \sum_{v \in V_R} w(v)}_{:= \Delta f_r}\\&= f(G) + \Delta f_r,\end{aligned}

where VV^- and VRV_R are the vertex deletion set and replacement graph of rr respectively.

As discussed in section 2.2, many of the most widely used cost functions in quantum compilation are local, as the cost of a quantum computation is often estimated by the required number of instances of the most expensive gate type (such as \texttt{CX} gates on noisy devices, or \texttt{T} gates for hardware with built-in fault tolerance protocols).

In these cases, the cost function is integer valued and the extraction problem is indeed often sparse:

Definition 5.2Sparse cost function

The local cost function ff is said to be ε\varepsilon-sparse on D\mathcal{D} if

{δDΔfδ=0}(1ε)D.\big|\{\delta \in \mathcal{D}\,|\,\Delta f_\delta = 0 \}\big| \geq (1 - \varepsilon) |\mathcal{D}|.

In case of ε\varepsilon-sparse local cost functions, the SAT problem on D\mathcal{D} can be simplified to only include

D0={δDΔfδ0}\mathcal{D}_{\neq 0} = \{\delta \in \mathcal{D} \mid \Delta f_\delta \neq 0\}

by repeatedly applying the following constraint simplification rules on any δ0D\delta_0 \in \mathcal{D} such that Δfδ0=0\Delta f_{\delta_0} = 0:

  • for every parent δpparents(δ0)\delta_p \in parents(\delta_0) and child δcchildren(δ0)\delta_c \in children(\delta_0), remove the parent-child constraints between δp\delta_p and δ0\delta_0 and between δ0\delta_0 and δc\delta_c. Insert in their place a parent-child constraint between δp\delta_p and δc\delta_c.
  • for every non-compatible sibling event δsD,δsδ0\delta_s \in \mathcal{D}, \delta_s \neq \delta_0, remove the compatibility constraint between δ0\delta_0 and δs\delta_s. Insert in its place a compatibility constraint between δs\delta_s and δc\delta_c for all δcchildren(δs)\delta_c \in children(\delta_s).

This reduces the SAT or SMT problem to a problem with D0=εD|\mathcal D_{\neq 0}| = \varepsilon |\mathcal{D}| variables and at most O(min(D,ε2D2)O(min(|\mathcal{D}|, \varepsilon^2|\mathcal{D}|^2) constraints.

With the completion of this section, we have described an equivalent computation on D\mathcal{D} for every step of a GTS-based optimisation problem:

  1. a rewrite that can be applied on a graph GG can be added as an event to D\mathcal{D},
  2. a graph GG' that results from a sequence of rewrites can be recovered from D\mathcal{D} using FlattenHistory,
  3. the set of all graphs reachable from events in D\mathcal{D} can be expressed as a SAT problem; depending on the cost function, the optimisation over that space can then take the form of an SMT problem.

In essence, using the confluently persistent data structure D\mathcal{D} we replace a naive, exhaustive search over the space G\mathcal{G} of all graphs reachable in the GTS with a SAT (or SMT) problem – solvable using highly optimised dedicated solvers that could in principle handle search spaces with up to millions of possible rewrites Zulkos., 2018Edward Zulkoski. 2018. Understanding and Enhancing CDCL-based SAT Solvers. PhD Thesis. University of Waterloo.


  1. To realise this, notice that all vertices equivalent to vGv_G' are vertices that will be merged with vGv_G'. Hence, they will all be attached to the outgoing edge of vGv_G at its ii-th outgoing endvertex. ↩︎

  2. Hardness can be seen by considering the special case of the extraction problem in which all events are compatible and no two events have a parent-child relation: then there are no constraints on the solution space and the optimisation problem requires finding the minimum of an arbitrary oracle over 2D2^{|\mathcal{D}|} inputs. ↩︎

  3. A similar argument also applies to cost functions that sum over graph edges, as would be the case in minIR, where operations are modelled as hyperedges. ↩︎

5.5. Bounding the search space size

We show in this section that under some assumptions on the GTSs that hold in the use cases of interest to quantum compilation, there is a provable gap between the size of the search space of all reachable graphs in the GTS and the size of the corresponding confluently persistent data structure D\mathcal{D}.

Let us introduce first the notion of overwriting rewrites.

Definition 5.3Overwriting rewrite

For two rewrites r1r_1 and r2r_2, we say that r2r_2 overwrites r1r_1, written r1r2r_1 \twoheadrightarrow r_2, if the deletion set V2V^-_2 of δ2\delta_2 includes a vertex of the vertex set V1V_1 of the replacement graph of r1r_1

V2V1.V^-_2 \cap V_1 \neq \varnothing.

The definition can identically be applied to events. In this case, the overwriting events are precisely given by the parent-child relation: the set of all overwriting events of δD\delta \in \mathcal{D} is by definition the set of parents P(δ)P(\delta) of δ\delta.

Our argument relies on the comparison of asymptotic bounds for the sizes of two sets GΔ\mathcal{G}_\Delta and DΔ\mathcal{D}_\Delta, which we now define. Consider a GTS and an input graph GG. A graph GG' is reachable from GG within depth Δ0\Delta \geqslant 0 if there is a sequence of rewrites r1,,rr_1, \dots, r_\ell in the GTS from GG to GG' such that all subsequences rβ1,,rβkr_{\beta_1}, \dots, r_{\beta_k} formed of overwriting rewrites rβirβi+1r_{\beta_i} \twoheadrightarrow r_{\beta_{i + 1}} have length at most kΔk \leqslant \Delta.

The set GΔ\mathcal{G}_\Delta is the set of all graphs reachable within depth Δ\Delta. We derive:

  1. a lower bound on the size of GΔ|\mathcal{G}_\Delta|, the space of all graphs reachable in at most Δ\Delta rewrites from some input GG, and
  2. an upper bound on the size of the equivalent confluently persistent data structure DΔ\mathcal{D}_\Delta, i.e. such that
    GΔ={flat(D)DΓ(DΔ)}.\mathcal{G}_\Delta = \{ flat(D) \mid D \in \Gamma(\mathcal{D}_\Delta) \}.

In order to obtain bounds, we will introduce hypotheses that the GTSs must satisfy. Throughout this section, we will illustrate and motivate the restrictions that they impose in the following two use cases.

Use case 1: ζ\zeta-complete GTS #

The first GTS we consider can be defined on any graph domain that has a notion of graph size (e.g. based on number of nodes, number of edges, etc.1) and for any graph size ζ\zeta. The GTS is such that for any subgraph HGH \subseteq G of size H=ζ|H| = \zeta, there is at least one transformation rule in the GTS that matches HH. We will call this case CompleteGTS.

This is the use case of quantum superoptimisation discussed in section 3.1 and used for benchmarking in section 4.7. In those cases, the transformation rules are obtained by enumerating all small circuits up to a certain size ζ\zeta, thus guaranteeing that any subcircuit of size ζ\zeta will be matched by the GTS.

Note that there is also an (obvious) upper bound on the number of transformation rules that can match on any given subgraph: the total number of transformation rules in the GTS.

Use case 2: single-rule GTS in a uniform domain #

At the other extreme of the GTS spectrum, we can consider a GTS made of a single (arbitrary) transformation rule. In this case, we require that graphs are drawn from a domain uniformly at random, so that for any subgraph HGH \subseteq G, all patterns of size H|H| are equally likely. We will call this case SingleRuleGTS.

In this case, we will not show that our hypotheses hold for all inputs, but rather that they hold with a high probability. We will phrase our statements as a function of ϵ>0\epsilon > 0 and will require that they hold with probability 1ϵ1 - \epsilon for randomly drawn GG.

This regime is interesting as it is the simplest instance of problem domains for which few assumptions can be made about the GTS themselves, but all inputs are expected to be equally likely.

Lower bound on the naive search tree #

The event history of the set of graphs GΔ\mathcal{G}_\Delta defines a tree TΔT_\Delta, where GΔ\mathcal{G}_\Delta are the nodes and GpGΔG_p \in \mathcal{G}_\Delta is the parent of GcGΔG_c \in \mathcal{G}_\Delta if there is a rewrite rr rewriting GpG_p to GcG_c in the GTS. Paths in TΔT_\Delta are sequences of rewrites. We call TΔT_\Delta the naive search tree of the GTS. We wish to derive a lower bound for TΔ|T_\Delta|.

Graph partitioning #

For fixed search depth Δ\Delta, let n>0n > 0 be the largest integer such that for all graphs GTΔG' \in T_\Delta, there exists disjoint subgraphs Π1,,Πn\Pi_1, \ldots, \Pi_n of GG' that satisfies the following property, for all 1in:1 \leq i \leq n:

there exists a rewrite in the GTS that can be applied to Πi\Pi_i.

Hypothesis 1Linear scaling of nn

For a fixed GTS, a fixed depth Δ\Delta and a family of input graphs GG, we have the scaling n=Θ(G)n = \Theta(|G|).

We conjecture that this scaling holds for many GTSs of interest:

Proposition 5.5
Hypothesis 1 holds for CompleteGTS and for SingleRuleGTS probabilistically.

In CompleteGTS, it suffices to partition any input GG into n=Gζ=Θ(G)n = \lfloor \frac{|G|}\zeta \rfloor = \Theta(|G|) disjoint subgraphs of size at least ζ\zeta. Each subgraph will match a rule of the GTS by definition.

For SingleRuleGTS, let ϵ>0\epsilon > 0. Let L|L| be the size of the left hand side LL of the GTS rule. By assumption, for any subgraph HH of size L|L| of an input GG, there is a constant probability pp that the rule matches HH. For a subgraph HH of size kLk |L|, the probability of the rule not matching in HH is (1p)k(1 - p)^k. Picking

k>11+δlnϵ+lnLGln(1p)k > \frac1{1 + \delta}\frac{\ln \epsilon + \ln {\frac{|L|}{|G|}}}{\ln (1 - p)}

for some δ>0\delta > 0 ensures that whenever δ>lnkkln(1p)\delta > - \frac{\ln k}{k \ln (1 - p)} (i.e. for kk large enough),

klnkln(1p)>(1+δ)k>lnϵ+lnLGln(1p)k>lnϵnln(1p), \begin{aligned} &k - \frac {\ln k}{\ln (1 - p)} > (1 + \delta)k > \frac{\ln \epsilon + \ln {\frac{|L|}{|G|}}}{\ln (1 - p)}\\ \Leftrightarrow\quad &k > \frac{\ln{\frac \epsilon n}}{\ln (1 - p)}, \end{aligned}

where n=GkLn = \frac{|G|}{k |L|} was chosen. It follows that a partition of GG into n\lfloor n \rfloor disjoint subgraphs of size at least kLk |L| satisfies the hypothesis with probability 1ϵ1 - \epsilon.

Lower bound on TΔ|T_\Delta| #

Fix the tree depth Δ\Delta and the GTS. Any rewrite from the GTS removes at most a constant KK number of vertices from the graph it applies on. Thus, Any graph in TΔT_\Delta is at least of size (1KΔ)G=Θ(G)(1 - K\Delta)|G| = \Theta(|G|). Let nn' be the smallest value of nn for a graph GminGΔG_{min} \in \mathcal G_\Delta. Whenever hypothesis 1 applies, we have n=Θ(Gmin)=Θ(G)n' = \Theta(|G_{min}|) = \Theta(|G|).

For each GTΔG' \in T_\Delta and Πi\Pi_i, pick a rewrite ri(G)r_i(G') that applies to Πi\Pi_i and let RR be the set of all such rewrites

RΔ={ri(G)1in,GTΔ}.R_\Delta = \{ r_i(G') \mid 1 \leq i \leq n', G' \in T_\Delta \}.

We can consider the subtree TΔTΔT'_\Delta \subseteq T_\Delta that only contains graphs obtained by applying rewrites in RΔR_\Delta.

For Δ=1\Delta = 1, the search tree TΔT'_\Delta will contain 2n2^{n'} graphs: for each subgraph Πi\Pi_i of the input graph GG, we can choose to either apply ri(G)r_i(G) or not2.

By repeating Δ\Delta times the search tree of size 2n2^{n'} for Δ=1\Delta = 1, we obtain a lower bound

GΔ=TΔTΔ=(2n)Δ.|\mathcal{G}_\Delta| = |T_\Delta| \geqslant |T'_\Delta| = (2^{n'})^\Delta.

We frame this result as the following proposition.

Proposition 5.6Lower bound for GΔ|\mathcal{G}_\Delta|
The naive search tree size GΔ|\mathcal{G}_\Delta| is in Ω(2Δn)\Omega(2^{\Delta \cdot n'}).

As a result, for any GTS satisfying Hypothesis 1 the size of GΔ\mathcal G_\Delta grows at least exponentially with input graph size GG and search depth Δ\Delta.

Upper bound on the factorised search space #

Consider two rewrites r1r_1 and r2r_2. If neither overwrites the other, i.e. r1↠̸r2r_1 \not\twoheadrightarrow r_2 and r2↠̸r1r_2 \not\twoheadrightarrow r_1, then the order in which they are applied is irrelevant (see also Proposition 5.2). The persistent data structure D\mathcal{D} uses this symmetry explicitly when exploring the set of reachable graphs.

This drastically reduces the size of the event history of DΔ\mathcal{D}_\Delta. The event history defines a directed acyclic graph FΔF_\Delta that is the equivalent for DΔ\mathcal{D}_\Delta to the naive search tree TΔT_\Delta of GΔ\mathcal{G}_\Delta. The vertices of FΔF_\Delta are the flattened histories of events in DΔ\mathcal{D}_\Delta:

V(FΔ)={flat({δ})δDΔ},V(F_\Delta) = \{flat(\{\,\delta\,\}) \mid \delta \in \mathcal{D}_\Delta\},

with an edge from flat({δp})flat(\{\,\delta_p\,\}) to flat({δc})flat(\{\,\delta_c\,\}) if there is a parent-child relation δpP(δc)\delta_p \in P(\delta_c). We call FΔF_\Delta the factorised search space of DΔ\mathcal{D}_\Delta.

By construction, any graph GV(T)G' \in V(T) in the naive search tree maps injectively to a subgraph SFΔS \subseteq F_\Delta of the factorised search space, given by the subgraph of FF induced by the rewrites on the path from GG to GG' in TT.

Whereas our earlier discussion was focused on proving a lower bound for the size of the search tree, we now show an upper bound on the number of graphs FΔ|F_\Delta| in the factorised search space.

Graph covering #

Instead of considering partitions of graphs in TΔT_\Delta as we did above, we now consider coverings of graphs GG' in FΔF_\Delta, i.e. a set of subgraphs Γ1,,Γn\Gamma_1, \ldots, \Gamma_n such that their union is GG' but that might not be disjoint.

Let mm, ss and γ\gamma be parameters and fix a covering Γ1,,Γm\Gamma_1, \ldots, \Gamma_m for each graph GV(FΔ)G' \in V(F_\Delta) such that:

  • for all GG' and all 1im1 \leqslant i \leqslant m, there are at most ss applicable rewrites to the ii-th covering set Γi\Gamma_i of GG'. Furthermore, all rewrites within Γi\Gamma_i are mutually exclusive, i.e. they modify a shared subgraph so that it is never possible to apply more than one rewrite among the (up to ss) applicable ones3;
  • for all GG' and for all rewrites rr that apply to GG', there is 1im1 \leqslant i \leqslant m such that rr applies to Γi\Gamma_i (i.e. the matching subgraph of rr is fully contained within one of the coverings). Furthermore, the matching subgraph of rr overlaps with at most γ\gamma distinct Γi\Gamma_i;
  • for all rewrites rr that apply to the covering set Γi\Gamma_i of a graph GG', the image r(Γi)Γir(\Gamma_i) \subseteq \Gamma_i' must be a subgraph of the ii-th covering subgraph Γi\Gamma_i' of r(G)r(G').

The first condition is satisfied whenever the size of the coverings can be bounded: in that case ss can be chosen based on the number of distinct subgraphs that can be contained in a covering set and the number of rules that can apply to each. The second condition is related to the connectivity between the covering sets: γ\gamma can thus often be derived by considering how many neighbours a covering set has, and how many of those neighbours can a match of a GTS rule span.

The third condition above can be understood as “rewrites must preserve the coverings”. In other words, the coverings are chosen such that a graph mutation produced by the application of a rewrite rE(FΔ)r \in E(F_\Delta) on GG is always contained within a single covering subgraph of r(G)r(G').

Hypothesis 2Linear scaling of mm and constant s,γs, \gamma

For a fixed GTS, a fixed depth Δ\Delta and a family of input graphs GG, we have the scaling m=Θ(G)m = \Theta(|G|) and s=Θ(1)s = \Theta(1) and γ=Θ(1)\gamma = \Theta(1).

These conditions along with Hypothesis 2 are somewhat restrictive and future work should explore how to relax them. For our use cases CompleteGTS and SingleRuleGTS, we restrict our considerations to the special case where the graph domain is quantum circuits. We make the further simplifying assumptions (similar results can be obtained with variations on these assumptions)

  • transformation rules have two-qubit circuits as left and right hand sides, and
  • the number of qubits on the inputs is fixed (i.e. the number of gates on each qubit scales with circuit size).

Define \ell to be the largest number of gates on any one qubit in the left hand sides of the GTS transformation rules. Consider a partition of the gates on each qubit into sequences of \ell gates. We can obtain a covering of a quantum circuit GG by considering covering sets Γ1,,Γm\Gamma_1, \ldots, \Gamma_m defined for all 1im1 \leq i \leq m such that for all matches HH of a left hand side of a transformation rule of the GTS, HΓiH \subseteq \Gamma_i if and only if there is a gate vv in HH such that vv is in the ii-th sequence of \ell gate on some qubit in GG. Imposing the condition that rewrites must preserve coverings, the covering of the input fixes the covering of all reachable graphs GG' in the GTS.

Proposition 5.7
Restrict the graph domain to quantum circuits, the GTS to two-qubit rules and the inputs to a fixed number of qubits. Then hypothesis 2 holds for CompleteGTS and for SingleRuleGTS.

Let GG be the input circuit with qq qubits. Consider the covering Γ1,,Γm\Gamma_1, \dots, \Gamma_m of GG as constructed above. By construction m=N/=Θ(G)m = \lceil N / \ell \rceil = \Theta(|G|), where NN is the maximum number of gates on a qubit of GG.

The covering set Γi\Gamma_i contains the set ViV_i of gates composed of the ii-th sequence of \ell gates for each qubit in GG. Furthermore, for all vViv \in V_i, if HH is a match of a two-qubit rule that contains vv, then HH may contain at most 212\ell - 1 other gates. Hence by construction, Γi2V=2q2=Θ(1)|\Gamma_i| \leqslant 2\ell|V| = 2q\ell^2 = \Theta(1). This is a constant and thus there is a constant s=Θ(1)s = \Theta(1) such that for all 1im1 \leqslant i \leqslant m there are most ss matches of a two-qubit rule that intersect Γi\Gamma_i.

Finally, any match HH spans two qubits; the gates on each qubit (at most \ell) may belong to at most two distinct sequences of \ell gates of that qubit. Thus, any match HH spans at most γ=4\gamma = 4 distinct covering sets. These arguments made no assumption on properties of the rule set and thus apply equally to CompleteGTS and SingleRuleGTS.

Upper bound on FΔ|F_\Delta| #

The preservation of coverings under rewrites allows us to consider a covering of V(FΔ)V(F_\Delta): for each 1im1 \leqslant i \leqslant m, let FΔ(i)V(FΔ)F_\Delta^{(i)} \subseteq V(F_\Delta) be the set of graphs in V(FΔ)V(F_\Delta) that are the result of a rewrite of its ii-th covering subgraph. Every graph in V(FΔ)V(F_\Delta) is the result of a rewrite on some covering subgraph ii, or is the input graph GG. So, from a bound UΔFΔ(i)U_\Delta \geqslant |F_\Delta^{(i)}| for all 1im1 \leqslant i \leqslant m, we can obtain a bound

V(FΔ)1+mUΔ.|V(F_\Delta)| \leqslant 1 + m \cdot U_\Delta.

The bound UΔU_\Delta can be obtained recursively: U1=sU_1 = s upper bounds by definition the number of rewrites in any covering subgraph of the root graph GG, and thus the number of graphs in F1(i)F_1^{(i)}. We then proceed by induction for 1<δΔ1 < \delta \leqslant \Delta.

A rewrite rE(FΔ)r \in E(F_\Delta) overlaps with at most γ\gamma other covering subgraphs. It can overwrite at most one previous rewrite for each subgraph, and thus will have at most γ\gamma parent graphs in sets FΔ1(i1),,FΔ1(iγ)F_{\Delta-1}^{(i_1)}, \cdots, F_{\Delta-1}^{(i_\gamma)}. Each of the γ\gamma sets is of size at most Uδ1U_{\delta - 1}. Furthermore, there are at most ss rewrites in any covering subgraph. We thus obtain the recursion:

UδsUδ1γ,U_\delta \leqslant s \cdot U_{\delta - 1}^\gamma,

Unrolling the recursion, we can write this as

Uδs1+γ+γ2++γδ1=sγδ1γ1=sΘ(γδ1).U_\delta \leqslant s^{1 + \gamma + \gamma^2 + \dots + \gamma^{\delta - 1}} = s^{\frac{\gamma^\delta - 1}{\gamma - 1}} = s^{\Theta(\gamma^{\delta - 1})}.

Recalling that by construction FΔ=DΔ|F_\Delta| = |\mathcal{D}_\Delta|, we obtain:

Proposition 5.8Upper bound for DΔ|\mathcal{D}_\Delta|

The factorised search space size DΔ|\mathcal{D}_\Delta| is in msΘ(γΔ1)m \cdot s^{\Theta(\gamma^{\Delta - 1})}.

Discussion and empirical exploratory analysis #

We have derived bounds on the size of the search spaces and shown that under some assumptions on the properties of the GTS, the factorised search space grows linearly in the size of the input graph GG. This stands in stark contrast to the lower bound of the naive search tree, which scales exponentially with the size of the input graph.

However, when considering the overall optimisation problem of finding the optimal solution over the set of reachable graphs in a GTS, the exponential overhead does not disappear: it is rather shifted to the extraction phase that relies on a SAT solver. It is therefore an open question whether the factorised search space can be used to improve optimisation problems on GTSs.

To this end, we devise a simple numerical experiment that assesses the potential of using the unfolding construction as presented in this chapter in the context of quantum computation optimisation.

The toy problem.  We consider a very simple circuit optimisation problem that is desiged to require a deep search space (i.e. a large number of rewrites) to be solved. This will exacerbate the scaling difference between an optimiser that must traverse the naive search space and another that relies on the factorised representation instead.

The inputs are quantum circuits composed of two-qubit CX\textit{CX} and single-qubit Rz\textit{Rz} rotation gates. The angles of the rotations are not relevant and set randomly. They are of the following form:

i.e. each pair of subsequent qubits have 2 CX\textit{CX} gates at either end and 10 Rz\textit{Rz} rotation in-between, on the control qubits of the CX\textit{CX} gates. These circuits admit a very simple optimisation that can be expressed by the following two transformation rules:

Given the objective of minimising the number of CX\textit{CX} gates, the optimiser must commute the leftmost CX\textit{CX} gates through all of the rotation gates, until the two CX\textit{CX} on each qubit are adjacent and cancel out. We study the performance of the optimisers as we increase the number 2q2q of qubits in the circuit.

Optimisers.  We define two optimisers. Badger is a backtracking search through the naive search space of reachable graphs in the GTS: starting from the input, the search space is expanded by computing all possible rewrites at a given state. States with the lowest cost function are processed first. This is similar to an A* search Hart, 1968Peter Hart, Nils Nilsson and Bertram Raphael. 1968. A Formal Basis for the Heuristic Determination of Minimum Cost Paths. IEEE Transactions on Systems Science and Cybernetics 4, 2 (100--107). doi: 10.1109/tssc.1968.300136.

Seadog on the other hand performs the backtracking search on the factorised search space instead: when expanding a state of the search space, only rewrites that overlap with the last rewrite are considered and added to the search space, as discussed in section 5.4. In a second phase, the search space is encoded as a SAT problem that is solved using Z3 Moura, 2008Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Tools and Algorithms for the Construction and Analysis of Systems. Springer Berlin Heidelberg, 337--340. doi: 10.1007/978-3-540-78800-3_24.

The Badger optimiser is released and publicly available as part of the open-source TKET package4. The Seadog optimiser on the other hand is still in early development; more benchmarks and a release will follow.

Results.  We ran the experiment on an Apple M3 Max CPU (4.05GHz) for inputs between 22 (1212 gates) and 7878 qubits (468468 gates). Both optimisers ran on a single core. For each instance, we set a timeout of 22 seconds and report the relative CX\textit{CX} gate reduction, i.e.

CXinitCXfinalCXinit.\frac{\textit{CX}_\text{init} - \textit{CX}_\text{final}}{\textit{CX}_\text{init}}.

The results are shown in the figures on the right.

Discussion.  On the left, we observe that both optimisers are able to find the optimum for circuits with up to 30 CX gates. Beyond that point, the time limit starts impacting Badger performance, which drops continuously and reaches 0% for inputs of 50 CX gates and above. Seadog on the other hand does not time out and is able to explore the entire (factorised) search space exhaustively up until 70 CX gates.

Observe that the Badger optimiser reaches the time limit for as few as 10 CX. Indeed, the complete naive search space size can be calculated to have 12q12^q states (each pair of qubits can be in one of 12 states). For 2q=62q = 6 we get 17281728 states, but this already reaches over 2000020'000 states for 2q=82q = 8.

CX gate count reduction (left) and runtime (right) for the Badger and Seadog optimisers. 100% gate count reduction is optimal. A timeout was set to 2 seconds.

CX gate count reduction (left) and runtime (right) for the Badger and Seadog optimisers. 100% gate count reduction is optimal. A timeout was set to 2 seconds.

Size of factorised search space for Seadog.

Size of factorised search space for Seadog.

On the other hand, the factorised search space will only contain 1212 states for each qubit pair. This results in a linear scaling of the search space size, as can clearly be seen in the second figure.

Where the runtime exceeds the 2 second timeout, this is due to pre- and post-optimisation steps such as memory allocation/deallocation, I/O, file parsing etc that are included in the measurements. The quadratic runtime scaling that we observe in Seadog is due to a hash function that is run on every state of the search space to detect and discard duplicates: as the number of states in the search space grows linearly with input size and each state requires a hash in linear time, the overall runtime grows quadratically. Future work may be able to address this issue by designing updateable hash functions that do not require the full graph to be rehashed when applying a local rewrite.

Future work should also investigate how to scale Seadog to larger input sizes on a broader class of problems. We have observed that the SAT-based extraction phase of Seadog corresponds to less than 1% of the runtime budget (under 15ms for all input sizes). Whilst being asymptotically exponential in the worst case, it is thus not currently a bottleneck. On the other hand, the number of states visited per second in the exploration phase is currently up to 10×10\times slower for Seadog compared to Badger. Further investigations into the causes of this are still required, but we expect that large performance improvements can be realised on the current implementation and as a result could scale to larger inputs.


  1. The only constraint on the notion of graph size |\cdot| is that it must be compatible with the subgraph relation: if GGG \subseteq G', then GG.|G| \leqslant |G'|. ↩︎

  2. Note that this counting is already an act of clemency: we are not counting all permutations of the rewrites, which would be considered separately by a naive exploration that applies one graph rewrite at a time. In this case, the search tree for Δ=1\Delta = 1 would contain n!2nn! \cdot 2^n graphs. ↩︎

  3. This can always be made to hold by replacing any set of mutually disjoint rewrites with their cartesian product, in effect viewing the application of multiple disjoint rewrites as one large rewrite. Thi comes at the cost of a larger value for s.s. ↩︎

  4. As a Python package on PyPI and a rust crate on crates.io). ↩︎