Chapter 1
Introduction
Quantum computing is the computational model that arises from the quantum mechanical manipulation of finite dimensional physical systems. Realising this new computing paradigm requires an entirely new technology stack: most obviously, new dedicated hardware, but also an extensive collection of software tools that transform the intents of a human user into a symphony of electric pulses that operate all components of the hardware installation (lasers, magnetic fields, currents, photodetectors, etc.).
Turning human-readable code into machine instructions is the realm of compilers, a problem as old as classical1 computer science itself. By analogy, the same problem in the quantum world was named quantum compilation.
Interestingly, whereas the term quantum compilation has been in use for the longest part of the existence of quantum computing as a field, it is only recently that the quantum compilation community has started to adopt tools, ideas and results from our classical counterparts.
Meanwhile, quantum computing has a long history of adopting diagrammatic and graph-based representations to model and reason about computations and their quantum mechanical properties. The most famous example of this is undoubtedly the quantum circuit, a quantum analogue to boolean circuits that visualises how data flows from one operation to the next (cf. section 2.1).
Going beyond circuits, the field of categorical quantum mechanics has embraced and extended diagrammatic formalisms to model a variety of quantum processes and computations. Particularly noteworthy in this line of work are the numerous advances in quantum circuit optimisation (e.g. Duncan, 2020. 2020. Graph-theoretic Simplification of Quantum Circuits with the ZX-calculus. Quantum 4 (June 2020, 279). doi: 10.22331/q-2020-06-04-279 Gogioso, 2022. 2022. Annealing Optimisation of Mixed ZX Phase Circuits. In Proceedings 19th International Conference on Quantum Physics and Logic, QPL 2022, Wolfson College, Oxford, UK, 27 June - 1 July 2022, 415--431. doi: 10.4204/EPTCS.394.20), quantum simulations (e.g. Kissin., 2022. 2022. Simulating quantum circuits with ZX-calculus reduced stabiliser decompositions. Quantum Science and Technology 7, 4 (July 2022, 044001). doi: 10.1088/2058-9565/ac5d20 Sutcli., 2025. 2025. Fast classical simulation of quantum circuits via parametric rewriting in the ZX-calculus. arXiv: 2403.06777 [quant-ph]), error correction (e.g. Beaudr., 2020. 2020. The ZX calculus is a language for surface code lattice surgery. Quantum 4 (January 2020, 218). doi: 10.22331/q-2020-01-09-218 Cowtan, 2024. 2024. CSS code surgery as a universal construction. Quantum 8 (May 2024, 1344). doi: 10.22331/q-2024-05-14-1344) and many more related subjects (e.g. Simmons, 2021. 2021. Relating Measurement Patterns to Circuits via Pauli Flow. Electronic Proceedings in Theoretical Computer Science 343 (Septempter 2021, 50--101). doi: 10.4204/eptcs.343.4 Felice, 2023. 2023. Quantum Linear Optics via String Diagrams. In Proceedings 19th International Conference on Quantum Physics and Logic, Wolfson College, Oxford, UK, 27 June - 1 July 2022. Open Publishing Association, 83-100. doi: 10.4204/EPTCS.394.6) that the family of ZX-like calculi have enabled in the last five years alone.
A challenge in quantum compilation has been to combine the principled and abstract graph-based transformation semantics of diagrammatic reasoning with the feature set and performance requirements of practical compilation tools. General purpose tools graph rewriting tools such as Quantomatic Fagan, 2018. 2018. Optimising Clifford Circuits with Quantomatic. In Proceedings 15th International Conference on Quantum Physics and Logic, QPL 2018, Halifax, Canada, 3-7th June 2018, 85--105. doi: 10.4204/EPTCS.287.5 proved too slow for quantum circuit optimisation and other tools from the graph transformation community such as GROOVE Rensink, 2004. 2004. The GROOVE Simulator: A Tool for State Space Generation. In Applications of Graph Transformations with Industrial Relevance. Springer Berlin Heidelberg, 479--485. doi: 10.1007/978-3-540-25959-6_40 and GrGen.NET Geiß, 2006. 2006. GrGen: A Fast SPO-Based Graph Rewriting Tool. In Graph Transformations. ICGT 2006.. Springer Berlin Heidelberg, 383--397. doi: 10.1007/11841883_27 have not been adopted.
Instead, successful graph-based tools such as PyZX Kissin., 2020. 2020. PyZX: Large Scale Automated Diagrammatic Reasoning. In Proceedings 16th International Conference on Quantum Physics and Logic, Chapman University, Orange, CA, USA., 10-14 June 2019. Open Publishing Association, 229-241. doi: 10.4204/EPTCS.318.14 and its faster re-implementation QuiZX Kissin., 2022. 2022. Classical Simulation of Quantum Circuits with Partial and Graphical Stabiliser Decompositions. In . Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi: 10.4230/LIPICS.TQC.2022.5 focused on performant rewriting for a restricted subdomain (in this case, the ZX calculus). This specialisation makes it difficult to expand these approaches to new primitives and constraints that are emerging from hardware advances within quantum computing. It also limits the interaction and sharing across field boundaries and impedes the development of tools applicable to a broader range of graph transformation domains.
The ambitious aim of this thesis is to advocate for graph transformation as a robust basis for a scalable and modular compiler platform for quantum computations – and hope that in the process, our contributions will strengthen the bridge between research in classical compilation, quantum computing and graph transformations. The key desired properties of our compilation framework can be summarised as follows:
Scalable. The compiler should handle quantum computations of the kind we realistically expect to execute within the coming decade: thousands of logical qubits, relying on possibly millions of physical qubits. Just as importantly, the compiler architecture should scale to take advantage of large classical computational resources, in order to maximise the optimisation potential when available.
Modular. The computational primitives available on present quantum hardware are wide-ranging and evolving rapidly, the programming models for end users are adapting, and hardware constraints and characteristics change from device to device. A future-proof compiler platform must therefore imperatively be extensible in its supported instruction set, its optimising cost function and the program transformation strategies.
Why is now the time for such a compiler and why are these qualities so important? We develop some arguments in section 1.1. Our concrete contributions to this goal are then summarised in section 1.2, along with an outline of the thesis.

This thesis hopes to strenghten the bridge between the fields of classical compilation, quantum computing and graph transformations. Three-legged bridges also exist in the real world – here the Butterfly Bridge in Copenhagen. Image credits: Christian Lindgren, archdaily.com.
-
To distinguish traditional computing from quantum computing, the field refers to the former as classical computing. We will adopt this term throughout, for lack of a better word. ↩︎
1.1. A new compilation regime
We have introduced quantum compilation by drawing an analogy with the well-developed field of classical compilers. The novel directions in which quantum compilation is taking the field make for exciting new challenges. Three new quantum-specific properties of compilation form the core motivation for this work.
Large variations in architecture #
The vast differences between proposed hardware architectures are a first distinguishing characteristic of current quantum computing developments. Unlike classical computing, where silicon-based transistors have become the definitive physical foundation for all electronic chips, the search for the most scalable and reliable technology for quantum computing is ongoing – and doubtless one of the most burning questions for the nascent industry. This introduces an incredible variety of compilation problems.
Quantum hardware designs differ both in the types of quantum particles used to implement qubits and in the control systems employed to manipulate these particles. Suggestions for the former include charged ions Kielpi., 2002. 2002. Architecture for a large-scale ion-trap quantum computer. Nature 417, 6890 (June 2002, 709--711). doi: 10.1038/nature00784 T. Pel., 1995. 1995. Decoherence, Continuous Observation, and Quantum Computing: A Cavity QED Model. Physical Review Letters 75, 21 (November 1995, 3788--3791). doi: 10.1103/PhysRevLett.75.3788, neutral atoms Jaksch, 2000. 2000. Fast Quantum Gates for Neutral Atoms. Physical Review Letters 85, 10 (Septempter 2000, 2208--2211). doi: 10.1103/physrevlett.85.2208 Deutsch, 2000. 2000. Quantum Computing with Neutral Atoms in an Optical Lattice. Fortschritte der Physik 48, 9–11 (Septempter 2000, 925--943). doi: 10.1002/1521-3978(200009)48:9/11<925::aid-prop925>3.0.co;2-a, photons Knill, 2001. 2001. A scheme for efficient quantum computation with linear optics. Nature 409, 6816 (January 2001, 46--52). doi: 10.1038/35051009, transmons Blais, 2007. 2007. Quantum-information processing with circuit quantum electrodynamics. Physical Review A 75, 3 (March 2007, 032329). doi: 10.1103/physreva.75.032329 and even Majorana Sau, 2010. 2010. Generic New Platform for Topological Quantum Computation Using Semiconductor Heterostructures. Physical Review Letters 104, 4 (January 2010, 040502). doi: 10.1103/physrevlett.104.040502 particles. The equipment that drives the desired operations on these particles is then drawn from a jolly mixture of lasers, magnetic fields, microwaves, dilution fridges, etc. Each combination results in different trade-offs: some will render a specific computation particularly easy; others promise to scale well to large systems but are very error-prone and unreliable; others still achieve high fidelities at the expense of slow operations.
From the perspective of a compiler engineer, this means we must equip quantum compilers to handle a wide variety of hardware primitives, multiple optimisation goals, and hardware-specific program constraints. Traditional compilation is ill-equipped to handle this considerable challenge.
A comparison of machine code for different architectures illustrates the difference between the quantum and classical worlds. Classical CPUs are dominated by two architectures, x86, used mainly by Intel and AMD, and ARM, used by a wide range of desktop and mobile chip manufacturers1.
x86 CPU (e.g. Intel and AMD)
mov eax, 5 ; Load 5 into EAX
add eax, 3 ; Add 3 to EAX
mov [result], eax ; Store the result in memory
ARM CPU (e.g. mobile, Apple M-series)
ldr r0, =5 ; Load 5 into R0
add r0, r0, #3 ; Add 3 to R0
ldr r1, =result ; Load address of result
str r0, [r1] ; Store the result in memory
There are noticeable differences between the two architectures, mostly around
variable naming conventions, as well as explicit memory loads ldr and stores
str instructions in the case of ARM, which in x86 are handled implicitly by
mov. This simplistic example naturally ignores some of the more fine-grained
considerations that can make translations hard in certain edge cases. A
discussion of these can be found in Ford, 2021. 2021. Migrating Software from x86 to ARM Architecture: An Instruction Prediction Approach. In 2021 IEEE International Conference on Networking, Architecture and Storage (NAS), October 2021. IEEE, 1--6. doi: 10.1109/NAS51552.2021.9605443. However, overall, the
instructions and capabilities of the two platforms are broadly equivalent, as is
confirmed by the existence of emulation tools such as
Apple Rosetta.
Let us contrast this with the difference between two quantum architectures. Consider, on the one hand, an architecture that can natively perform CX and H gates on qubits (e.g. superconducting qubits, ion traps, etc.) and, on the other hand, a platform based on photons and optical components.
Quantum circuit (qubits)
h q[0];
cz q[0],q[1];
Linear circuit (photons)
bs.h(5*pi/2, pi, pi, 2*pi) m[0], m[1];
bs.h m[2], m[3];
perm([2, 1, 3, 0]) m[1], m[2], m[3], m[4];
barrier m[0], m[1], m[2], m[3], m[4], m[5];
bs.h(1.910633) m[0], m[1];
bs.h(1.910633) m[2], m[3];
bs.h(1.910633) m[4], m[5];
perm([1, 0]) m[2], m[3];
bs.h m[3], m[4];
perm([3, 0, 1, 2]) m[1], m[2], m[3], m[4];
On the left is a quantum circuit expressed in the OpenQASM2 standard Cross, 2017. 2017. Open Quantum Assembly Language. arXiv: 1707.03429 [quant-ph]. The right-hand side is the equivalent linear optics circuit computed by Perceval Heurtel, 2023. 2023. Perceval: A Software Platform for Discrete Variable Photonic Quantum Computing. Quantum 7 (February 2023, 931). doi: 10.22331/q-2023-02-21-931, expressed in a custom, OpenQASM2-like, format. The conversion is by no means straightforward! Some of the challenges include encoding qubits into multiple photon modes and mapping quantum operations to an optically realisable procedure made of optical components and measurements Felice, 2023. 2023. Quantum Linear Optics via String Diagrams. In Proceedings 19th International Conference on Quantum Physics and Logic, Wolfson College, Oxford, UK, 27 June - 1 July 2022. Open Publishing Association, 83-100. doi: 10.4204/EPTCS.394.6.
Other architectures, such as neutral atoms, may broadly support qubit-based operations but might not offer control over individual qubits and, instead require any operations to be applied in parallel to large groups of qubits Bluvst., 2022. 2022. A quantum processor based on coherent transport of entangled atom arrays. Nature 604, 7906 (April 2022, 451--456). doi: 10.1038/s41586-022-04592-6. Finally, it is to be expected that error-correcting codes that individual platforms will introduce to reduce error rates at the hardware level will introduce further constraints and new instruction sets yet again.
It is noteworthy that current trends in the classical world are also pushing compilers towards more heterogenous architectures that may include GPUs, FPGAs and other accelerators. This has led to significant changes in the design of current compilers, which we will touch upon later. Nonetheless, this shift has, so far, mostly “limited” itself to new forms of parallelism and the introduction of more specialised instruction sets rather than a fundamental redesign of existing tools and computing paradigms. The breadth of technologies and trade-offs that quantum compilers must face have no equivalent in the classical world – at least for the time being.
Asymmetric computational resources #
A second exciting paradigm shift in compilation that quantum is driving forward is cross-compilation. A common assumption in compilation is that the program is executed on the same machine (or at least the same architecture) on which it was compiled. By contrast, in cross-compilation, the compiler and the compiled binary program run on different machines, possibly with different architectures. An instance of this would be using a recent ARM system-on-chip machine to create a binary program for a traditional Windows PC with an Intel CPU. This is a supported feature of most modern compilers (and made easier by the relative similarities between processor architectures, as seen above), but such tasks are by no means trivial and can be laborious to get to work well in practice2.
The situation is very different for quantum computing. Quantum computational resources are so limited that native compilation, in which the program is compiled and run on the same machine, is unfeasible – and will remain so for the foreseeable future3. When we put the possibility of pure-quantum compilation aside, we are left with a cross-compilation problem that is entirely the realm of classical computer science; the output of which happens to be destined to run on a quantum computer. This is simliar to how in classical computing, GPU programs are typically compiled on CPUs before being uploaded and executed on the GPU.
Cross-compilation presents significant challenges. As quantum programs grow in size and complexity, debugging and verifying their correctness without access to the target hardware becomes increasingly difficult Rovara, 2024. 2024. A Framework for Debugging Quantum Programs. arXiv: 2412.12269 [quant-ph], as we hit the limits of what can be simulated classically. Quantum simulation is a vibrant research area that is the subject of theses (e.g. Flanni., 2020. 2020. The application of quantum simulation to topological and open many-body systems. PhD Thesis. University of Strathclyde Azad, 2024. 2024. Tensor networks for classical andquantum simulation of open and closedquantum systems. PhD Thesis. University College London.) in its own right.
On the flip side, using classical hardware for quantum program compilation comes with a giant opportunity for compilers: the classical computational resources available to the compiler, measured in the size of the memory and the number of operations that can be handled, are many orders of magnitude larger (and cheaper!) than what the quantum hardware that will execute the program is capable of. We can today execute tens to hundreds of billions of operations per second (GFLOPS) on desktop computers, up to the “exascale”, i.e. FLOPS, for the largest supercomputers Dongar., 2024. 2024. TOP500 List. (November 2024). Retrieved on 30/12/2024 from https://top500.org/lists/top500/list/2024/11/. Quantum hardware, on the other hand, will not be executing programs with sizes beyond 1000 error-corrected gates, or 10,000 physical gates, for another three years – that is believing the most optimistic roadmaps in the industry IBM, 2024. 2024. Expanding the IBM Quantum roadmap to anticipate the future of quantum-centric supercomputing. Retrieved on 30/12/2024 from https://www.ibm.com/quantum/blog/ibm-quantum-roadmap-2025 Quanti., 2024. 2024. Quantinuum Unveils Accelerated Roadmap to Achieve Universal, Fully Fault-Tolerant Quantum Computing by 2030. Retrieved on 30/12/2024 from https://www.quantinuum.com/press-releases/quantinuum-unveils-accelerated-roadmap-to-achieve-universal-fault-tolerant-quantum-computing-by-2030.
It is expected that even a few thousand quantum gates will suffice to solve problems that our largest supercomputers struggle with. Meanwhile, every gate that must be performed comes at a high cost: it may fail, introduce errors, or take a long time to complete. It therefore behoves us to use all the classical resources at our disposal to reduce quantum operations to a minimum.
Given the strict hardware limitations, all near-term architectures are expected to face, quantum compilation must evolve into cross-compilers that are able to utilise the full power of classical hardware available to them; doing so will push the boundaries of what is possible with quantum computing just a bit further – in a field where every marginal gain may unlock new applications.
The confluence of classical and quantum compilation #
Finally, quantum compilation also stands in front of some momentous engineering challenges. As we will see in section 2.2, significant research efforts have focused on the compilation and optimisation of quantum programs expressed as quantum circuits (cf. section 2.1). This formalism has its roots in quantum information theory, the field that gave birth to quantum computing and makes for an ideal framework to develop the theory and optimisation techniques. However, it does not include any of the fundaments of compiler and programming language design that make classical software as composable and scalable as it is today.
For example, there is no concept of subroutine or function calls; neither can a program execution be branching or looping based on runtime values. This makes code reuse impossible, resulting in huge program sizes and unsurmountable challenges for scaling up compilation to problems of real-world interest Ittah, 2022. 2022. QIRO: A Static Single Assignment-based Quantum Program Representation for Optimization. ACM Transactions on Quantum Computing 3, 3 (June 2022, 1--32). doi: 10.1145/3491247. The absence of code abstractions is being felt even more acutely with the emergence of hybrid quantum-classical computations, as we discuss in section 2.3.
With applications of quantum computing that cannot be expressed as quantum circuits proliferating, a move away from circuit-based representations is becoming unavoidable Hossei., 2023. 2023. OpenQASM 3.0 Specification. Retrieved on 15/03/2025 from https://openqasm.com/versions/3.0/intro.html QIR Al., 2021. 2021. QIR Specification v0.1. Retrieved on 31/12/24 from https://www.qir-alliance.org/. This is also an opportunity to incorporate learnings from the decades of experience that have been gathered in classical computer science. Many of the tools and software that were originally developed for classical computations are thus being adopted and adapted to the specificities of quantum. This convergence of quantum computing and classical compiler technologies is heralding new opportunities – but also pose important questions around how to represent quantum programs and optimise them.
-
There are other architectures, such as RISC-V Waterm., 2016. 2016. Design of the RISC-V Instruction Set Architecture. PhD Thesis. University of Berkeley and MIPS Hennes., 1982. 1982. MIPS: A microprocessor architecture. ACM SIGMICRO Newsletter 13, 4 (December 1982, 17--22). doi: 10.1145/1014194.800930, but as of 2025 the quasi totality of consumer and professional CPUs run on x86 or ARM from mobile phones to laptops, desktops, and data centres. See Valve ., 2024. 2024. Steam Hardware & Software Survey: December 2024. (December 2024). Retrieved on 30/01/2025 from https://store.steampowered.com/hwsurvey/processormfg/ for a detailed hardware market share analysis, albeit focused on gaming. Details on mobile market share can be found in this survey – all of the listed manufacturers use the ARM architecture. ↩︎
-
There are new tools promising to make cross-compilation easier, such as Zig. This only proves our point, though: classical cross-compilation has long been a neglected edge case. ↩︎
-
First valiant efforts at defining optimisation problems relevant to quantum compilation that could be run on quantum hardware have been recently presented in Rattac., 2024. 2024. Quantum circuit compilation with quantum computers. arXiv: 2408.00077 [quant-ph]. However, this concerns only specific optimisation subroutines of the overall compilation problem. It is hard to imagine today that deploying an entire compilation stack such as LLVM on quantum hardware would ever be sensible. Why tooling so close to the classical compiler frameworks will be required for quantum compilation is a topic we will return to in section 2.4. ↩︎
1.2. Contributions and thesis outline
Preliminaries #
The thesis starts in chapter 2 with a review of the main concepts on which the rest of the thesis is built. Aside from a short introduction to quantum computations (section 2.1) and a survey of the major quantum circuit optimisation techniques (section 2.2), this chapter makes two observations that impart a research direction to the rest of the thesis:
- The emergence of hybrid quantum-classical computations is rendering the quantum circuit obsolete as the main representation of quantum computations within compilers (section 2.3).
- The best optimisation outcomes will combine classical and quantum compiler optimisations. This can be achieved by adopting abstractions that are interoperable with classical compiler infrastructure (section 2.4).
A graph transformation formalism for quantum computations #
Chapters 3, 4, and 5 form the core of this thesis and present our main contributions. The results in chapter 3 are crucial stepping stones for the rest of the thesis. Chapters 4 and 5 meanwhile present our most significant contributions to the state of the art.
In chapter 3, we propose minIR, a new graph-based intermediate representation (IR) for quantum computations. MinIR is a minimal subset of the Hierarchical Unified Graph Representation (HUGR), recently presented in joint work Mark K., 2025. 2025. HUGR: A Quantum-Classical Intermediate Representation. Retrieved (talk recording) from https://www.youtube.com/live/D8esZrt7ogk?feature=shared&t=5217 and the subject of ongoing development. It is to our knowledge the first compiler IR with support for linear types – required to model the restrictions that quantum mechanics imposes on quantum computations.
Unlike quantum circuits, minIR (and HUGR) programs can model computations that act on arbitrary combinations of classical (bits) and quantum data (qubits) within a single, unified representation. It represents the best of two worlds: it combines the safety guarantees of quantum-specific representations such as quantum circuits (i.e. it is impossible to declare physically unrealisable computations), whilst at the same time being interoperable with classical compiler IRs.
Graph-based representations of computations, known as computation graphs in deep learning and dataflow graphs within the compiler community, are common in these fields. Our original contribution is in the formalisation of the IR transformation semantics: whereas classical compilers typically define IR transformations in terms of the values that they depend on and the values that they overwrite, this approach implicitly relies on value copying and discarding and thus does not generalise to linear values. Instead, we define graph rewriting semantics on minIR and show sufficient conditions for which minIR transformations preserve the validity of the program, and in particular the linearity conditions.
The encoding of quantum computations as graphs sets the stage for quantum compilation and optimisation using graph transformation systems (GTS), in which the set of transformations that the compiler is allowed to perform is expressed by a set of graph transformation rules. This is in effect a generalisation of an approach first proposed in Xu, 2022. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 in the context of quantum circuits. We improve on this work with two major contributions that resolve critical issues that concerning the scaling of the technique to large numbers of transformation rules and large inputs respectively.
Pattern matching #
Our first major contribution is a pattern matching algorithm, presented in chapter 4. The main result is a runtime complexity bound independent of the number of patterns being matched, achieved using a one-off pre-computation. This is to our knowledge the first pattern matching algorithm for quantum circuits that does not depend on the number of patterns. Whilst similar multi-pattern matching techniques have been explored in other domains such as RETE networks Forgy, 1982. 1982. Rete: A fast algorithm for the many pattern/many object pattern match problem. Artificial Intelligence 19, 1 (Septempter 1982, 17--37). doi: 10.1016/0004-3702(82)90020-0 Varró, 2013. 2013. A Rete Network Construction Algorithm for Incremental Pattern Matching Ian, 2003. 2003. The execution kernel of RC++: RETE*, a faster RETE with TREAT as a special case. International Journal of Intelligent Games and Simulation 2, 1 (Feb 2003, 36-48) and computational biology Danos, 2007. 2007. Scalable Simulation of Cellular Signaling Networks Boutil., 2017. 2017. Incremental Update for Graph Rewriting, no algorithm is known with provable sub-exponential worst-case complexity. These results were published in Mondada, 2025. 2025. Scalable Pattern Matching in Computation Graphs. Electronic Proceedings in Theoretical Computer Science 417 (March 2025, 71--95). doi: 10.4204/eptcs.417.5.
The proved complexity bound applies to computations with only linear values1, of which quantum circuits are a special case. The result is expressed in terms of maximal pattern width and depth , two measures of pattern size defined in section 4.2. The main result, presented in Proposition 4.13, is reproduced here:
Let be patterns with width and depth . The pre-computation runs in time and space complexity
For any subject graph , the pre-computed prefix tree can be used to find all pattern embeddings in time
where is a constant.
The runtime complexity is dominated by an exponential scaling in maximal pattern width . Meanwhile, the advantage of our approach over matching one pattern at a time grows with the number of patterns . It is thus of particular interest for matching numerous small width patterns.
We illustrate this point by comparing our approach to a standard algorithm that matches one pattern at a time Jiang, 1998. 1998. Marked subgraph isomorphism of ordered graphs. In Advances in Pattern Recognition, Berlin, Heidelberg. Springer Berlin Heidelberg, 122--131. doi: 10.1007/bfb0033230, with runtime complexity . Using (cf. section 4.2), and comparing to eq. (2), we thus have a speedup in the regime . On the other hand, is upper bounded by the maximum number of patterns of bounded width and depth. Using a crude lower-bound for derived in Appendix , we obtain a computational advantage for our approach when
In the case of quantum circuits, the width of the patterns is given by the number of qubits. The low-qubit regime where our approach shines coincides exactly with the typical applications of GTSs in quantum compilation: in Xu, 2022. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 and Xu, 2023. 2023. Synthesizing Quantum-Circuit Optimizers. Proceedings of the ACM on Programming Languages 7, PLDI (June 2023, 835--859). doi: 10.1145/3591254, all rules used have at most 4 qubits.
We present benchmarks on a real world dataset of 10,000 quantum circuits in section 4.7, showing a 20x speedup over a leading C++ implementation of pattern matching for quantum circuits.
Confluently persistent graph rewriting #
Our second major contribution, in chapter 5, uses a well-known construction in GTSs, the unfolding Baldan, 1999. 1999. Unfolding and Event Structure Semantics for Graph Grammars. In Foundations of Software Science and Computation Structures, Berlin, Heidelberg. Springer Berlin Heidelberg, 73--89. doi: 10.1007/3-540-49019-1_6, to derive a novel data structure that compresses the representation of the space of all graphs reachable from an input within a GTS. We call the factorised search space of . Optimisation problems over the space of reachable graphs in a GTS can then equivalently be expressed as optimisation problems over .
We show in section 5.5 that under some assumptions on the GTS and input, there is an exponential complexity separation in the input size between the size of the factorised search space – which admits an asymptotically linear upper bound – and the size of the rewrite space that it encodes – which grows at least exponentially.
is furthermore the first confluently persistent data structure Drisco., 1994. 1994. Fully persistent lists with catenation. Journal of the ACM 41, 5 (Septempter 1994, 943--959). doi: 10.1145/185675.185791 Fiat, 2003. 2003. Making data structures confluently persistent. Journal of Algorithms 48, 1 (August 2003, 16--58). doi: 10.1016/s0196-6774(03)00044-0 [?] it performs non-destructive rewrites on immutable graph objects by maintaining an explicit history of all graph rewrites and their dependencies. This allows concurrent application of multiple rewrites and can merge rewritten graphs that were obtained independently. This represents an exciting development in its own right that opens the door to functional programming and massively parallelised approaches to graph rewriting (see section 6.2).
The intuition behind the exponential reduction in search space size is as follows: if rewrites apply to disjoint subgraphs of a common graph , then will be of size , storing the set possible rewrites, rather than the up to distinct graphs in obtained by applying a subset of the rewrites. To generalise to arbitrary rewrites, the data structure must keep track of the dependencies and overlaps between rewrites and update these as more rewrites are added to .
A lot of parallels can be drawn between this approach and equality saturation, a technique for term rewriting with applications in classical compilers. We explore these connections in section 5.2.
Unlike the results of chapter 4, the construction and bounds proven in chapter 5 can be applied to a wide range of graph rewriting domains. It has particularly significant implications for applications of GTSs that are unable to derive rewriting strategies from first principles, and hence have to resort to an exhaustive (or heuristic) exploration of the rewrite space . They can proceed as follows:
-
Exploration phase. Construct the factorised search space by finding and applying rewrites, in time proportional to . With our results, this results in an exponential speedup over the naive exploration of (section 5.3).
-
Extraction phase. Unlike the case of where the optimal solution is an element , constructing the optimal solution in is a non-trivial extraction problem. We show in section 5.4 that the extraction can be expressed as a boolean satisfiability (SAT) problem; depending on the cost function, the optimisation can then be encoded as a side condition on SAT or by a generalisation of the problem to Satisfiability Modulo Theories (SMT).
In the worst case, SAT and SMT problems will require exponential time to solve Cook, 1971. 1971. The complexity of theorem-proving procedures. In Proceedings of the third annual ACM symposium on Theory of computing - STOC ’71. ACM Press, 151--158. doi: 10.1145/800157.805047 Moskew., 2001. 2001. Chaff: engineering an efficient SAT solver. In Proceedings of the 38th conference on Design automation - DAC ’01. ACM Press, 530--535. doi: 10.1145/378239.379017 Biere, 2021. 2021. Handbook of satisfiability (Second edition ed.). IOS Press, Amsterdam, thus cancelling the exponential compression of the search space . However, SAT and SMT are standardised problems for which heavily optimised solvers and optimisers have been developed Moura, 2008. 2008. Z3: An Efficient SMT Solver. In Tools and Algorithms for the Construction and Analysis of Systems. Springer Berlin Heidelberg, 337--340. doi: 10.1007/978-3-540-78800-3_24 Sebast., 2015. 2015. OptiMathSAT: A Tool for Optimization Modulo Theories. In Computer Aided Verification. Springer International Publishing, 447--454. doi: 10.1007/978-3-319-21690-4_27. We expect that the instances of SAT and SMT that encode the extraction problem will scale well in practice:
- Clauses in the problem encode local properties that SAT solvers are well-suited to solve Zulkos., 2018. 2018. Understanding and Enhancing CDCL-based SAT Solvers. PhD Thesis. University of Waterloo: the boolean variables represent rewrites, which only impose restrictions on other rewrites that apply in the same neighbourhood of the graph.
- Furthermore, in quantum compilation applications, can be sparsified: most rewrties in do not change the cost function (think of IR transformations that reorder operations but do not reduce the runtime) and thus do not need to be encoded in the SAT problem.
In a first exploratory analysis, we present some empirical results that support our claims: by searching over the factorised search space instead of the naive search space, the optimiser is able to find the global optimum for circuits that are twice as large. Our results also exhibit a linear scaling in the size of the factorised search space, confirming that the approach should scale well to larger problems.
Conclusion #
The thesis concludes in chapter 6 with a discussion on how our contributions serve our overall goal of a scalable and modular quantum compiler platform. We discuss in particular two extensions of our work that we see as particularly promising: the generalisation of fast multi-pattern matching to non-linear values and to the persistent data structure of chapter 5 (section 6.1) and the deployment of confluently persistent graph rewriting to a massively parallel distributed compute architecture (section 6.2).
-
In the absence of linearity, pattern matching is an instance of the subgraph isomorphism problem, an NP-complete problem. The assumption is therefore necessary and expected. ↩︎
Chapter 2
Quantum Computing: a Computer Scientist's Perspective
Many (too many?) introductions to quantum computing have been written, so we will refrain from adding another entry to the collection. Instead, beyond the absolute basics, our focus is on the expressive power and syntax of quantum programs. This demystifies quantum compilation into program transformation problems, amounting to traditional compiler methods that will be very familiar to computer scientists.
In this chapter, we lay the groundwork for this thesis by introducing what programs meant to run on quantum computers look like today, what we expect they will look like in the (near) future, and how quantum compilers have been built to optimise them. We start in section 2.1 with a review of the basic computation primitives of quantum computers and how they are composed to form quantum circuits, the simplest form of quantum programs. This is followed by a review of the leading quantum circuit optimisation techniques in section 2.2. Finally, sections 2.3 and 2.4 introduce and discuss the impact of hybrid quantum computations, and how they challenge existing quantum compiler designs and optimisations.
2.1. Foundations of quantum computing
The most widespread computational model in quantum computing – and arguably its
simplest – is built on the qubit abstraction. As its name suggests, it is the
quantum analogue of the classical bit, i.e., a value that can take the values
0 or 1.
We will stick to our promise of not delving into the details of the physical
realisations of qubits in real-world architectures. Nonetheless, it is important
to note one fundamental difference with classical systems. Classical bit values
(the famous 0s and 1s of our computers) are typically encoded using two
voltages; another way of saying this is that bit values, and hence data,
correspond to electrical currents in the wires1 of a chip. Gates,
i.e. the lowest level of operations that can be applied to bits, then correspond
to barriers that let the electrical current flow through to outgoing wires, or
block it.
We can thus picture a classical gate as a black box with n input wires going
into the box and m output wires leaving it. For any combination of on and off
voltages on the input wires, the box will turn on some of the output wires. The
vital point to take away from this classical state of affairs is that we can
think of the carriers of input and output data (i.e. the input and output wires)
as physically distinct objects that can exist and can be read simultaneously.
Quantum physics rules out such an implementations of qubits. In the case of matter-based qubits, such as ions in traps or Josephson junctions on superconductors, quantum gates are operations that modify – or “mutate”, to borrow a term from programming languages – the physical qubits themselves. An input qubit to a gate is thus submitted to physical interactions that change its internal state. After the gate execution is completed, the qubits that held the input states now contain the operation’s output.
Similarly, photonic systems encode qubits using modes of the electromagnetic field. A gate in this setting acts by transforming these modes – mixing them, shifting their phases, or entangling them with ancillary modes. It is never possible to modify qubit data coherently whilst keeping access to the original input data.
This has several profound implications for quantum computing. First and
foremost, every quantum gate must have the same number of inputs as outputs.
Most iconic classical gates (AND, OR, XOR, etc.) are thus impossible to
implement on a quantum computer without some adjustments2. This also means
that the number of qubits must remain unchanged throughout the computation. A
computation that starts with n qubits must also end in n qubits – and have
n qubits at every point throughout the computation.
At this point, taking the preservation of qubits just described seriously, we should be asking how a quantum computation can even come to be at all, given that no qubit can be created out of thin air. In our attempt to remain blissfully ignorant of physical realities, we suggest adopting the following abstracted mental model of qubits: qubits can neither be created nor deleted3, they simply i) exist at all times, and ii) can be reset to the 0 state.
For our convenience, we can ignore qubits that are unimportant to us. If all we
need are n qubits, then we will limit our considerations to these and pretend
none other exists. Pushing further our myopic focus on qubits with a direct
utility, we can also adjust the window of qubits of interest as we progress
through the computation. If, for instance, a new qubit becomes useful halfway
through our program execution, we can enlarge the set of qubits we are keeping
track of and refer to this as “creating” a qubit. Conversely, qubits often
become irrelevant, in which case we move them outside of our field of
consideration and say that the qubits were discarded.
A final consequence of mutating qubits that we will highlight is that once a gate has been applied, the input states to the gate no longer exist! In other words, any state that we reach throughout our execution can only be used at most once. Here, your classical intuition might kick in:
Let us just maintain a copy of the original state before modifying it!
This would allow us to do more than one computation from a temporary value. However, copying is a big NO in quantum computing. It is a profound restriction (or property, depending on your point of view) with deep roots in the physics of quantum mechanics. This principle, the no-cloning theorem, is one of three fundamental properties of quantum physics that quantum computing builds upon.
The physical constraints of quantum computation #
No-cloning theorem #
The no-cloning principle Wootte., 1982. 1982. A single quantum cannot be cloned. Nature 299, 5886 (October 1982, 802--803). doi: 10.1038/299802a0 provides a formal foundation for the vague statement “qubits live forever” we made earlier. It is a fundamental tenet of quantum information, deserving a more rigorous treatment than we are giving it here. We recommend that the curious reader refers themselves to more respectful references such as Nielsen, 2016. 2016. Quantum Computation and Quantum Information (10th Anniversary edition). Cambridge University Press.
No-cloning theorem: it is impossible to copy an arbitrary unknown state onto another (possibly known) qubit, or to copy a (possibly known) qubit to a qubit with unknown arbitrary state.
If we use to denote an arbitrary state and to denote a known state, the principle can be restated as: there are no quantum computations mapping , nor . The consequences of this are profound.
A consequence of the first half is what we alluded to in the previous section: any qubit states can only be used once in a computation. This statement also justifies why every quantum gate implementation, no matter the hardware specifics, will mutate its input qubits to produce the output states.
The second half of the statement is often referred to as the “no delete” theorem. Indeed, if we view as a state encoding some data, we can interpret it as some amount of information. The state , on the other hand, is a fixed state and thus cannot store any information. From the perspective of information theory, the map thus destroys information: it turns an information storing left-hand side into a product of states, devoid of any information.
We can also revisit the first map and understand it from an information theoretic perspective as an attempt to create information out of thin air! Using this interpretation, the no-cloning theorem is thus the statement that quantum information is a preserved quantity in quantum computations: its amount will never increase or decrease.
Reversibility #
The fact that the amount of quantum information can never increase by transforming quantum states matches our intuition: if no new information is added from outside the system, then the total information encoded should not be increasing. Why, however, is it impossible to erase some information and thus reduce its total? The answer is reversibility of closed quantum systems: if we exclude the option of discarding parts of the physical system, every quantum of operation is undoable. In other words, a computation must have an inverse operation that recovers the input when applied to the output.
If a quantum operation were thus to erase any information, then an inverse operation would exist that creates information from nothing! The two halves of the no-cloning theorem, as we presented it, thus state the same principle once we consider that every operation must be reversible.
Universality #
Finally, a third distinguishing property of quantum computation is how arbitrarily large computations can be generated from single-qubit gates and pairwise entangling interactions between qubits (two-qubit gates) Barenco, 1995. 1995. Elementary gates for quantum computation. Physical Review A 52, 5 (November 1995, 3457--3467). doi: 10.1103/PhysRevA.52.3457. It is furthermore the case that the choice of a fixed two-qubit gate, along with single-qubit gates, is sufficient to generate any arbitrary quantum computation. We call a set of gates that can be used to construct any arbitrary quantum computation a universal gate set.
This is a boon for hardware design, as manipulating single-qubit systems is often much more manageable than controlling physical interactions between multiple entities. This decomposition into single-qubit and (a fixed) two-qubit gates means that the architecture i) does not need to support interactions between qubits, and ii) can be specialised and hand-tuned to execute the two-qubit interaction of choice as faithfully as possible. Having a two-qubit gate as the entangling operation is not the only choice. Some architectures, such as neutral atoms, choose instead to replace it with a global entangling operation that applies to many qubits simultaneously Evered, 2023. 2023. High-fidelity parallel entangling gates on a neutral-atom quantum computer. Nature 622, 7982 (October 2023, 268--272). doi: 10.1038/s41586-023-06481-y, resulting in a universal gate set that is more convenient to implement experimentally in their system.
Gate set universality can be generalised further to approximate universality, which is at the centre of the development of error-correcting codes. Indeed, any quantum computations can be approximated to arbitrary precision using only discrete finite sets of one and two-qubit gates Kitaev, 2002. 2002. Classical and Quantum Computation. American Mathematical Society Dawson, 2006. 2006. The Solovay-Kitaev algorithm. Quantum Information and Computation 6, 1 (January 2006, 81--95). doi: 10.26421/QIC6.1-6. This represents a significant simplification for error correction, as it removes the need for continuously parametrised gates and discretises the problem space.
Leveraging quantum properties for compilation #
We have introduced the universality, reversibility and no-cloning properties of quantum computations for a reason: these laws of physics that govern quantum computations and are absent from classical computer science are an excellent foundation for developing quantum-specific computation optimisations and compilation techniques in general.
As we have just discussed, the wide variety of universal gate sets are degrees of freedom that the compiler can use. Using universality to translate computations between universal gate sets, enabling programmers to seamlessly target different hardware, is one of quantum compiler’s first and most fundamental functions Sivara., 2020. 2020. t|ket⟩: a retargetable compiler for NISQ devices. Quantum Science and Technology 6, 1 (November 2020, 014003). doi: 10.1088/2058-9565/ab8e92.
Reversibility is also a source of flexibility when expressing quantum programs. Suppose the user wants to execute an operation but it is more convenient, or the hardware is only capable of executing a different gate . Then, using the inverse of , it is always possible to rewrite the program as
where these diagrams should be read as operations to be executed from left to right. This is nothing but the mathematical trick of multiplying the left-hand side with the identity operation expressed as 4.
Now, of course, this rewrite is only sensible if the operation is reasonably cheap to perform. There are plenty of instances where this is indeed the case. Morally, the quantum compiler always has the freedom to execute any quantum operation – at the risk of producing very inefficient code – given that reversibility always guarantees that the operation can be reversed and the competition undone whenever necessary.
Finally, no-cloning is a very useful guarantee that the compiler can use to simplify reasoning about computations5. In chapter 4 we will see that it dramatically simplifies pattern matching, which helps identify all possible optimisations quickly. More generally, no-cloning restricts the set of programs that the compiler must consider, resulting in elegant graph transformation semantics – a topic we explore in chapter 3.
The quantum circuit representation #
We could not conclude our overview of the basics of quantum computing without mentioning the quantum circuit, a representation of quantum computation ubiquitous in the field. With the understanding that we have gained in this section, the two building blocks of the circuit model and the conventions around their graphical representation should be of no surprise to the reader:
- Qubits are represented by straight, horizontal lines. Their evolution through time can be followed along the line from left to right: At the leftmost point on the line, the qubit is in its input state; when the qubit reaches its rightmost point, operations have mutated it into the output state of the circuit.
- Gates on qubits are boxes placed vertically across one or multiple qubit lines. The qubits it is on represents the set that the gate may act on (and mutate), whereas the left-to-right ordering of the gates reflects their ordering in time.
A simple circuit composed of two qubits and three gates , and could for instance look like this
The previous diagram was in fact also a circuit, in which each arrow pointing to the right was a segment of a qubit line. In this case, would be executed before and ; would act on both qubits, whereas and would only modify the first and second qubits, respectively. Note that there is no ordering specified between and : because they act on disjoint sets of qubits, their relative ordering makes no difference. It is thus common to display them as acting at the same time. We could have equivalently chosen to draw them as:
All these circuits represent the same computation.
Certain quantum gates are particularly useful and appear very regularly in practice. These have standard names that are widely used in the field. The most common single qubit gates are arguably the Hadamard, represented in circuits by a box, and the , and -axis rotations, drawn as , and boxes respectively. Note that rotation gates are parametrised by an angle that must be specified to execute the rotation.
There are also commonly used multi-qubit gates. For these, it becomes slightly awkward to draw them as boxes, as they may act on qubits that are not drawn next to each other in the circuit6 or might be applied to qubits in a specified order. As a solution, common gates were given representations that do not spell out their name but mark which qubit they are acting on and in what order. Here are the representations of three of the most famous ones, in order: the (also known as CNOT) gate, the and the (also known as the three-qubit Toffoli):
You will probably notice that there seems to be a system to this graphical notation. There is, but unfortunately, explaining it would require us to discuss Pauli matrices and commutation relations and quickly lead us astray. The references in section 2.5 are a good starting point for further reading.
-
In the case of integrated circuits and printed circuits boards, the wires we refer to here would be called “interconnects” or “traces”. ↩︎
-
The NOT gate is the notable exception to this. It is often found in quantum programs and called X. ↩︎
-
This is true physically: the carriers of quantum information, typically atoms or photons, live forever in the absence of interactions with their environment. However, we would be seriously deluding ourselves if we believed that the control systems we use to manipulate and keep these particles trapped could do so for any significant amount of time. Instead, experimentalists must constantly devise creative ways to stop the qubits from escaping or interacting with their surroundings (and destroying themselves in the process). ↩︎
-
The denotes the composition of functions, so unlike the left-to-right diagram, it must be read from right to left. ↩︎
-
In particular, no-cloning resolves the problem of aliasing once and for all! ↩︎
-
This becomes immediately apparent if you attempt to draw a gate that should act on the first and third qubit line of a circuit, but leave the second one untouched. ↩︎
2.2. Quantum circuit optimisation: a review
Much of the foundations of classical computer science rely on boolean logic and discrete mathematics Lehman, 2017. 2017. Mathematics for Computer Science. Samurai Media Limited. In some regards, this is a poor man’s maths, as much of the structure that comes with continuous infinite mathematical objects is lost along the way when discretised.
In contrast, quantum computation, on the other hand, encompasses the whole breadth of (finite dimensional) quantum physical system evolution. Underlying this is a rich mathematical theory steeped in the theory of Hilbert spaces and Lie groups1. A direct consequence of the mathematics of quantum computations is the flourishing of an entire field of research dedicated to quantum circuit optimisations Karupp., 2025. 2025. A Comprehensive Review of Quantum Circuit Optimization: Current Trends and Future Directions. Quantum Reports 7, 1 (January 2025, 2). doi: 10.3390/quantum7010002. They leverage the unique structure and symmetries of quantum physics to reduce the noise and resource requirements of quantum computations significantly.
In this section, we will review the main optimisation techniques that established themselves within quantum compilers, focusing on the representation of quantum computations they use and their assumptions about the computations they are optimising.
Cost function #
A key point to settle first when discussing circuit optimisations is the objective of the optimisation – the cost function to be minimised. Unlike much of classical compiler research, which can rely on an established set of hardware targets and benchmarking datasets to profile the empirical, “real world” performance of compiled programs, the quantum world must often contend with simplified noise and architecture models to design proxy metrics, given the limited scale and availability of current quantum devices.
The quantum compilers research community has mostly coalesced around cost functions based on gate count statistics Karupp., 2025. 2025. A Comprehensive Review of Quantum Circuit Optimization: Current Trends and Future Directions. Quantum Reports 7, 1 (January 2025, 2). doi: 10.3390/quantum7010002. Counting a type of gate is a simple and popular choice. Making some additional assumptions on the gate parallelism of future hardware, one may also consider cost functions based on gate depth, i.e. the length of the longest chain of gates that cannot be run simultaneously Seling., 2013. 2013. Quantum circuits of T-depth one. Physical Review A 87, 4 (April 2013, 042302). doi: 10.1103/physreva.87.042302 Basile., 2024. 2024. Comparing planar quantum computing platforms at the quantum speed limit. Physical Review Research 6, 2 (April 2024, 023026). doi: 10.1103/physrevresearch.6.023026. In spite (or precisely because) of their simplicity, gate counts serve well as cost functions in many quantum compilation use cases. Most circuit optimisations target one of two hardware regimes.
On most current hardware architectures, the major challenge is achieving high accuracy on entangling operations, i.e. quantum gates that make two or more qubits interact Acharya, 2024. 2024. Quantum error correction below the surface code threshold. Nature (December 2024). doi: 10.1038/s41586-024-08449-y Pino, 2021. 2021. Demonstration of the trapped-ion quantum CCD computer architecture. Nature 592, 7853 (April 2021, 209--213). doi: 10.1038/s41586-021-03318-4 Koch, 2007. 2007. Charge-insensitive qubit design derived from the Cooper pair box. Physical Review A 76, 4 (October 2007, 042319). doi: 10.1103/PhysRevA.76.042319 Blais, 2007. 2007. Quantum-information processing with circuit quantum electrodynamics. Physical Review A 75, 3 (March 2007, 032329). doi: 10.1103/physreva.75.032329. In superconducting qubit and ion trap architectures2, for example, the gate set is typically composed of one and two-qubit gate types, with error rates dominated by an order of magnitude by the latter Steiger, 2018. 2018. ProjectQ: an open source software framework for quantum computing. Quantum 2 (January 2018, 49). doi: 10.22331/q-2018-01-31-49 Sivara., 2020. 2020. t|ket⟩: a retargetable compiler for NISQ devices. Quantum Science and Technology 6, 1 (November 2020, 014003). doi: 10.1088/2058-9565/ab8e92. Circuit optimisations for computations on such noisy hardware thus often define cost functions based on the number of two-qubit gates – typically the gate, though many other two-qubit gates could be used equivalently.
On the other hand, future generations of hardware for larger scale computations are expected to be more resilient to noise, with the help of error detection and correction techniques. In this regime, the computational power of the hardware is no longer limited by hardware noise but rather by the affordances of the error-correcting code. Depending on how the quantum data is redundantly encoded in the code space, the fault-tolerant execution of specific operations may be anywhere between very straightforward and nigh-impossible. The bottleneck is widely expected to be the execution of single-qubit (non-Clifford) gates, such as the gate3. These cases can thus just as well be modelled by cost functions based on gate counts.
Unitary synthesis: the perfect optimisation #
The ne plus ultra of quantum circuit optimisation is unitary synthesis. It leverages the representation of a quantum computation as a square, complex-valued, unitary matrix, which is then re-synthesised as a new, equivalent (and ideally optimised!) quantum circuit. This approach thus breaks down quantum optimisation into two separate sub-problems:
- Reduce a -qubit quantum circuit into a matrix. This matrix is a unique representation of the computation, meaning that any two equivalent computations will be mapped to the same matrix.
- Find the optimal matrix decomposition into primitive quantum gates, thus obtaining a new quantum circuit, equivalent to the original.
The uniqueness of the unitary matrix representation makes it invaluable as a resource for computation optimisation. Not only does it reduce any potentially large collection of equivalent inputs to a single form; it also – crucially – provides a sound distance metric on the space of all circuits, in the form of the Haar measure. This can be used in search-based approaches to measure the distance between synthesised circuits and thus direct a search heuristic towards the optimal solution.
Early work explored general unitary decomposition schemes obtained analytically from linear algebra. These express arbitrary unitaries as a product of unitaries that typically correspond to one and two-qubit gates in the quantum circuit model Iten, 2016. 2016. Quantum circuits for isometries. Physical Review A 93, 3 (March 2016, 032318). doi: 10.1103/PhysRevA.93.032318 Iten, 2019. 2019. Introduction to UniversalQCompiler. arXiv: 1904.01072 [quant-ph]. Approaches have been proposed using the Cosine-Sine decomposition Mött., 2004. 2004. Quantum Circuits for General Multiqubit Gates. Physical Review Letters 93, 13 (Septempter 2004, 130502). doi: 10.1103/PhysRevLett.93.130502, the Quantum Shanon decomposition Krol, 2022. 2022. Efficient Decomposition of Unitary Matrices in Quantum Circuit Compilers. Applied Sciences 12, 2 (January 2022, 759). doi: 10.3390/app12020759, and the QR decomposition Sedlák, 2008. 2008. Towards optimization of quantum circuits. Open Physics 6, 1 (March 2008, 128--134). doi: 10.2478/s11534-008-0039-8. While some schemes have been shown to be asymptotically efficient for almost all unitaries Iten, 2016. 2016. Quantum circuits for isometries. Physical Review A 93, 3 (March 2016, 032318). doi: 10.1103/PhysRevA.93.032318, such strategies typically generate fixed-sized circuits and fail to synthesise short circuits when such circuits exist. The size of synthesised circuits grows exponentially with the number of qubits, making most such schemes impractical beyond three qubits.
Unitary matrix decomposition can also be combined with tools from classical circuit design: in Loke, 2014. 2014. OptQC : An optimized parallel quantum compiler. Computer Physics Communications 185, 12 (December 2014, 3307--3316). doi: 10.1016/j.cpc.2014.07.022, Loke et al. proposed an approach merging reversible circuit synthesis (see below), a classical compilation problem, with unitary matrix synthesis. They show that searching for decompositions , where and are classical reversible circuits can yield shorter circuits when using the Cosine-Sine decomposition for the unitaries and .
Search-based approaches have been developed to address the shortcomings of analytical decompositions. Unlike the algebraic approaches, the circuit decomposition problem is viewed as an optimisation problem in search-based circuit synthesis. The space of all possible quantum circuits is explored to find the one that implements the desired unitary whilst minimising the cost function. The major challenge of such methods is the gigantic (typically super-exponential) size of the search space of all possible programs. Without mitigation, most work in this space struggles to scale beyond a handful of qubits.
Up to 3 qubits, T-depth optimal circuits can be found using exhaustive brute force search first proposed in Amy, 2013. 2013. A Meet-in-the-Middle Algorithm for Fast Synthesis of Depth-Optimal Quantum Circuits. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32, 6 (June 2013, 818--830). doi: 10.1109/TCAD.2013.2244643 and improved in Gheorg., 2022. 2022. A (quasi-)polynomial time heuristic algorithm for synthesizing T-depth optimal circuits. npj Quantum Information 8, 1 (Septempter 2022). doi: 10.1038/s41534-022-00624-1. Asymptotic bounds on the number of T gates required for general unitary synthesis were recently given in Gosset, 2024. 2024. Quantum state preparation with optimal T-count. arXiv: 2411.04790 [quant-ph].
Scaling to 4 qubits and handling gate sets with continuous parameters, required for non-fault tolerant circuits, an A* search with smart pruning heuristics was proposed in Davis, 2020. 2020. Towards Optimal Topology Aware Quantum Circuit Synthesis. In 2020 IEEE International Conference on Quantum Computing and Engineering (QCE), October 2020. IEEE, 223--234. doi: 10.1109/QCE49297.2020.00036. This approach’s outputs are no longer provably optimal, but the results match optimal decompositions in all known instances. This line of work has subsequently been further refined with heuristics based on pre-defined circuit templates Smith, 2023. 2023. LEAP: Scaling Numerical Optimization Based Synthesis Using an Incremental Approach. ACM Transactions on Quantum Computing 4, 1 (February 2023, 1--23). doi: 10.1145/3548693 Madden, 2022. 2022. Best Approximate Quantum Compiling Problems. ACM Transactions on Quantum Computing 3, 2 (March 2022, 1--29). doi: 10.1145/3505181, parameter instantiation Younis, 2022. 2022. Quantum Circuit Optimization and Transpilation via Parameterized Circuit Instantiation. arXiv: 2206.07885 [quant-ph] Younis, 2021. 2021. QFAST: Conflating Search and Numerical Optimization for Scalable Quantum Circuit Synthesis. In 2021 IEEE International Conference on Quantum Computing and Engineering (QCE), October 2021. IEEE, 232--243. doi: 10.1109/QCE52317.2021.00041 Rakyta, 2022. 2022. Approaching the theoretical limit in quantum gate decomposition. Quantum 6 (May 2022, 710). doi: 10.22331/q-2022-05-11-710, machine learning Weiden, 2023. 2023. Improving Quantum Circuit Synthesis with Machine Learning. In 2023 IEEE International Conference on Quantum Computing and Engineering (QCE). IEEE. doi: 10.1109/QCE57702.2023.00093 and tensor networks Kuklia., 2023. 2023. QFactor: A Domain-Specific Optimizer for Quantum Circuit Instantiation. In 2023 IEEE International Conference on Quantum Computing and Engineering (QCE), Septempter 2023. IEEE, 814--824. doi: 10.1109/QCE57702.2023.00096.
Some of these heuristics also make it possible to synthesise circuits with device constraints in mind and can trade off decomposition accuracy for shallower circuit depth and lower noise. In Wu, 2020. 2020. QGo: Scalable Quantum Circuit Optimization Using Automated Synthesis. arXiv: 2012.09835 [quant-ph] Kuklia., 2023. 2023. QFactor: A Domain-Specific Optimizer for Quantum Circuit Instantiation. In 2023 IEEE International Conference on Quantum Computing and Engineering (QCE), Septempter 2023. IEEE, 814--824. doi: 10.1109/QCE57702.2023.00096, the authors have also explored partitioning the circuit into smaller parts optimised independently to scale these techniques to large circuit sizes. Despite the reduced optimisation performance that the boundaries of the partitioned circuits introduce, the combination of circuit partitioning with the techniques listed above yields some of the best-performing circuit optimisation techniques developed to date Costin., 2025. 2025. Berkeley Quantum Synthesis Toolkit. Retrieved on 09/01/2025 from https://bqskit.lbl.gov/. Circuit synthesis schemes have also been extended to generate circuits on a more expressive gate set, including elementary classical operations Alam, 2024. 2024. Learning dynamic quantum circuits for efficient state preparation. arXiv: 2410.09030 [quant-ph] Niu, 2024. 2024. AC/DC: Automated Compilation for Dynamic Circuits. arXiv: 2412.07969 [quant-ph].
However, a fundamental flaw of all unitary synthesis schemes is the -scaling in the number of qubits of the unitary representation itself. This means that no matrix-based synthesis method, however efficient, will ever be able to handle computations with much more than a dozen qubits. Circuit partitioning schemes such as Wu, 2020. 2020. QGo: Scalable Quantum Circuit Optimization Using Automated Synthesis. arXiv: 2012.09835 [quant-ph] Kuklia., 2023. 2023. QFactor: A Domain-Specific Optimizer for Quantum Circuit Instantiation. In 2023 IEEE International Conference on Quantum Computing and Engineering (QCE), Septempter 2023. IEEE, 814--824. doi: 10.1109/QCE57702.2023.00096 effectively circumvent the problem, but they are heavily dependent on the partitioning quality.
The search for scalable representations #
Our study of unitary synthesis introduced us to a convenient two-step approach to quantum computation optimisation. First, the input circuit is transformed into a “global” representation that captures the computation as a whole, abstracting away the precise sequences of gates that compose the original circuit. This representation is then the input for the second half of the problem, which produces a circuit of the desired shape, equivalent to the original input but with reduced cost.
In addition to simplifying the original problem, such global intermediate representations are well-positioned to leverage the quantum-specific structure and symmetries in the computation. They can thus enable more advanced optimisations and are robust to varying circuit representation and local optimisation landscape.
The unitary matrix is the most common representation of quantum computations, but as we have seen, it suffers from severe scaling problems in the number of qubits. The problem is not so much that quantum computations require exponential space to be described in the worst case – after all, the space of all -qubit unitaries is exponentially large. However, the set of unitaries implementable in practice can only be a tiny subset of 4 – the set of unitaries that admit a polynomial-sized circuit representation.
Another fruitful avenue of work for quantum optimisation has thus been the development of alternative representations for quantum computations that can encode polynomially sized quantum programs efficiently whilst enabling novel optimisations.
Phase Polynomials and Pauli Gadgets #
A particularly convenient global representation of many quantum circuits is as products of Pauli exponentials, also known as Pauli gadgets Cowtan, 2019. 2019. Phase Gadget Synthesis for Shallow Circuits. In Proceedings 16th International Conference on Quantum Physics and Logic, QPL 2019, Chapman University, Orange, CA, USA, June 10-14, 2019, 213--228. doi: 10.4204/EPTCS.318.13. These unitaries are of the form
where are real coefficients and are strings of length of the four Pauli matrices – so-called Pauli strings. In this formulation, fixes the number of qubits of the computation.
These exponentials are always valid -qubit unitaries and can express entangling operations across any number of qubits: the qubits on which an operation acts non-trivially are given by the indices of the characters in that are not the identity . For instance, the exponential
is a valid quantum computation on 3 qubits, entangling the first and third qubits. Beyond useful abstractions for optimisation, such entangling operations appear naturally when simulating quantum systems, for example in quantum chemistry McClean, 2016. 2016. The theory of variational hybrid quantum-classical algorithms. New Journal of Physics 18, 2 (February 2016, 023023). doi: 10.1088/1367-2630/18/2/023023.
The use of these primitives for quantum compilation was first explored in Cowtan, 2019. 2019. Phase Gadget Synthesis for Shallow Circuits. In Proceedings 16th International Conference on Quantum Physics and Logic, QPL 2019, Chapman University, Orange, CA, USA, June 10-14, 2019, 213--228. doi: 10.4204/EPTCS.318.13, and further generalised in Cowtan, 2020. 2020. A Generic Compilation Strategy for the Unitary Coupled Cluster Ansatz. arXiv: 2007.10515 [quant-ph]. Starting from an (unordered) sequence of Pauli gadgets, the gadgets are clustered into sets of mutually commuting gadgets. These can then be jointly synthesised into a circuit, markedly reducing the number of entangling operations as compared to naively implementing one exponential at a time.
Further improvements to this work have since been presented in Huang, 2024. 2024. Redefining Lexicographical Ordering: Optimizing Pauli String Decompositions for Quantum Compiling. CoRR abs/2408.00354. doi: 10.48550/ARXIV.2408.00354 and Schmitz, 2024. 2024. Graph Optimization Perspective for Low-Depth Trotter-Suzuki Decomposition. Physical Review A 109, 4 (April 2024, 042418). doi: 10.1103/PhysRevA.109.042418, where new heuristics are introduced to choose the Pauli gadget ordering. In Huang, 2024. 2024. Redefining Lexicographical Ordering: Optimizing Pauli String Decompositions for Quantum Compiling. CoRR abs/2408.00354. doi: 10.48550/ARXIV.2408.00354, the hardware-specific connectivity constraints between qubits are also taken into account to produce programs that can be executed on the targeted architecture without overhead.
A close relative of Pauli gadgets – a strictly smaller subset of it, to be precise – are the so-called phase polynomials Amy, 2018. 2018. On the controlled-NOT complexity of controlled-NOT–phase circuits. Quantum Science and Technology 4, 1 (Septempter 2018, 015002). doi: 10.1088/2058-9565/aad8ca, obtained when restricting the Pauli strings to combinations of Z Pauli matrices and identities: . These are particularly amenable to optimisation as in this case, the ordering of the gadgets becomes irrelevant – all exponential terms commute. This gives the compiler a lot of freedom during circuit synthesis.
The action of phase polynomials on quantum states is quite easy to understand. Instead of the exponentials of and -based Pauli string, the computation can equivalently be given by its action on the basis states. A quantum basis state – just like a classical state – is given by a bistring of bits . Writing for the basis state corresponding to the bitstring , the action of a phase polynomial on is given by
where is now also a bitstring of booleans , and denotes the boolean XOR operation. The boolean has value if and only if the -th character in the Pauli string is ,
The exponential expression in (2) is just a real number – indeed each term in the sum simply evaluates to either or . A phase polynomial is thus a diagonal unitary matrix: it maps every basis state to itself, multiplied by some phase .
Polynomially-sized circuits that implement diagonal matrices correspond to phase polynomials with non-zero terms , i.e. they can represent quantum computations efficiently and scale well with the number of qubits – thus allowing efficient algorithms that scale polynomially in the number of qubits .
The Graysynth algorithm, as presented in Amy, 2018. 2018. On the controlled-NOT complexity of controlled-NOT–phase circuits. Quantum Science and Technology 4, 1 (Septempter 2018, 015002). doi: 10.1088/2058-9565/aad8ca, has become the reference synthesis method for phase polynomials. The key observation made by its authors is that all terms of the sum within the exponential can be cycled through and obtained following the binary Gray codes Gray, 1953. 1953. Pulse code communication. Retrieved from http://www.google.com/patents/US2632058. The Hamming distance of one that separates successive bitstrings in the code translates into a single two-qubit gate when synthesised to a quantum circuit by Graysynth.
This approach was adapted to work with hardware connectivity constraints in Griend, 2022. 2022. Architecture-Aware Synthesis of Phase Polynomials for NISQ Devices. In Proceedings 19th International Conference on Quantum Physics and Logic, QPL 2022, Wolfson College, Oxford, UK, 27 June - 1 July 2022, 116--140. doi: 10.4204/EPTCS.394.8, Gogioso, 2022. 2022. Annealing Optimisation of Mixed ZX Phase Circuits. In Proceedings 19th International Conference on Quantum Physics and Logic, QPL 2022, Wolfson College, Oxford, UK, 27 June - 1 July 2022, 415--431. doi: 10.4204/EPTCS.394.20 and Vandae., 2022. 2022. Phase polynomials synthesis algorithms for NISQ architectures and beyond. Quantum Science and Technology 7, 4 (Septempter 2022, 045027). doi: 10.1088/2058-9565/ac5a0e. An up-to-date study of the performance of phase polynomial-based compiler optimisations and comparisons with standard approaches is performed in Meijer., 2025. 2025. A comparison of quantum compilers using a DAG-based or phase polynomial-based intermediate representation. Journal of Systems and Software 221 (March 2025, 112224). doi: 10.1016/j.jss.2024.112224.
The study of phase polynomials can also be generalised to arbitrary diagonal operators. Tight asymptotic bounds on the resource requirements for arbitrary diagonal operator synthesis and their implementation were recently given in Sun, 2023. 2023. Asymptotically Optimal Circuit Depth for Quantum State Preparation and General Unitary Synthesis. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 42, 10 (October 2023, 3301--3314). doi: 10.1109/TCAD.2023.3244885. The authors propose using a smart meshing of different Gray codes in parallel and, where available, additional qubits as ancilla registers to parallelise computations further and minimise circuit depth. The resulting general-purpose decomposition of arbitrary diagonal operators yields circuits of depth and size , as well as improved bounds in the presence of ancilla qubits.
Clifford synthesis #
The group of all -qubit unitaries contains a subgroup that has become an object of study across many domains of quantum computing science: the Clifford group. We have already mentioned that it is at the centre of quantum error correction theory Bravyi, 2005. 2005. Universal quantum computation with ideal Clifford gates and noisy ancillas. Physical Review A 71, 2 (February 2005, 022316). doi: 10.1103/PhysRevA.71.022316; it is also a cornerstone of measurement-based quantum computing Rausse., 2001. 2001. A One-Way Quantum Computer. Physical Review Letters 86, 22 (May 2001, 5188--5191). doi: 10.1103/PhysRevLett.86.5188 and graph states Hein, 2004. 2004. Multiparty entanglement in graph states. Physical Review A 69, 6 (June 2004, 062311). doi: 10.1103/physreva.69.062311, as well as one of the most promising approaches for fast quantum simulations Gottes., 1999. 1999. The Heisenberg representation of quantum computers. In Group 22: Proceedings of the 12th International Colloquium onGroup Theoretical Methods in Physics,. International Press, 32--43 Bravyi, 2019. 2019. Simulation of quantum circuits by low-rank stabilizer decompositions. Quantum 3 (Septempter 2019, 181). doi: 10.22331/q-2019-09-02-181 Kissin., 2022. 2022. Simulating quantum circuits with ZX-calculus reduced stabiliser decompositions. Quantum Science and Technology 7, 4 (July 2022, 044001). doi: 10.1088/2058-9565/ac5d20.
The Clifford subgroup of quantum circuits admits an efficient -sized program representation known as Clifford tableau Aarons., 2004. 2004. Improved simulation of stabilizer circuits. Physical Review A 70, 5 (November 2004, 052328). doi: 10.1103/PhysRevA.70.052328. This has been used profusely for compiler optimisation. In Aarons., 2004. 2004. Improved simulation of stabilizer circuits. Physical Review A 70, 5 (November 2004, 052328). doi: 10.1103/PhysRevA.70.052328 the first Clifford circuit synthesis procedure is given, using an analytical decomposition of Clifford tableaus into one and two-qubit gates. An improved, Bruhat-based decomposition that is optimal in the number of Hadamard gates was subsequently proposed in Maslov, 2018. 2018. Shorter Stabilizer Circuits via Bruhat Decomposition and Quantum Circuit Transformations. IEEE Transactions on Information Theory 64, 7 (July 2018, 4729--4738). doi: 10.1109/tit.2018.2825602. In the case of a Clifford fragment directly followed by measurements, the procedure can be further refined to replace gates with classical computation on the measurement outcomes Bravyi, 2021. 2021. Hadamard-Free Circuits Expose the Structure of the Clifford Group. IEEE Transactions on Information Theory 67, 7 (July 2021, 4546--4563). doi: 10.1109/TIT.2021.3081415. Finally, an alternative normal form that is well-suited to hardware with limited nearest neighbours connectivity was also derived using a diagrammatic approach Maslov, 2023. 2023. CNOT circuits need little help to implement arbitrary Hadamard-free Clifford transformations they generate. npj Quantum Information 9, 1 (Septempter 2023). doi: 10.1038/s41534-023-00760-2.
Just as in unitary synthesis, circuit decompositions of Clifford operations more efficient than the general analytical expressions can be obtained case-by-case using search and optimisation. The pendant to the provably optimal decompositions of unitaries obtained through brute force search Amy, 2013. 2013. A Meet-in-the-Middle Algorithm for Fast Synthesis of Depth-Optimal Quantum Circuits. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32, 6 (June 2013, 818--830). doi: 10.1109/TCAD.2013.2244643 also exists for Clifford circuits Kliuch., 2013. 2013. Optimization of Clifford circuits. Physical Review A 88, 5 (November 2013, 052307). doi: 10.1103/physreva.88.052307. Due to the more efficient representation and smaller search space, all optimal Clifford circuits could be found up to 6 qubits. Using modern SAT solvers, optimal Clifford synthesis has recently been pushed much further, with known optimal circuits beyond 20 qubits Peham, 2023. 2023. Depth-Optimal Synthesis of Clifford Circuits with SAT Solvers. In IEEE International Conference on Quantum Computing and Engineering, QCE 2023, Bellevue, WA, USA, September 17-22, 2023. IEEE, 802--813. doi: 10.1109/QCE57702.2023.00095 Schnei., 2023. 2023. A SAT Encoding for Optimal Clifford Circuit Synthesis. In Proceedings of the 28th Asia and South Pacific Design Automation Conference, January 2023. IEEE. doi: 10.1145/3566097.3567929.
Heuristic optimisation approaches have also been shown to be effective on Clifford optimisation Bravyi, 2021. 2021. Hadamard-Free Circuits Expose the Structure of the Clifford Group. IEEE Transactions on Information Theory 67, 7 (July 2021, 4546--4563). doi: 10.1109/TIT.2021.3081415 Fagan, 2018. 2018. Optimising Clifford Circuits with Quantomatic. In Proceedings 15th International Conference on Quantum Physics and Logic, QPL 2018, Halifax, Canada, 3-7th June 2018, 85--105. doi: 10.4204/EPTCS.287.5 and scale to larger systems. For Clifford computations on devices with restricted connectivity, an architecture-aware synthesis method was proposed in Winderl, 2024. 2024. Architecture-Aware Synthesis of Stabilizer Circuits from Clifford Tableaus. arXiv: 2309.08972 [quant-ph].
Diagrammatic representations #
Quantum computer science and quantum mechanics have a rich history in diagrammatic representations Feynman, 1949. 1949. Space-Time Approach to Quantum Electrodynamics. Physical Review 76, 6 (Septempter 1949, 769--789). doi: 10.1103/physrev.76.769 Coecke, 2017. 2017. Picturing Quantum Processes: A First Course in Quantum Theory and Diagrammatic Reasoning. Cambridge University Press. doi: 10.1017/9781316219317 Backens, 2019. 2019. ZH: A Complete Graphical Calculus for Quantum Computations Involving Classical Non-linearity. Electronic Proceedings in Theoretical Computer Science 287 (January 2019, 23--42). doi: 10.4204/EPTCS.287.2. These have allowed one to picture complex physical processes as intuitive operations in a graphical language and have – as a nice side effect – led to a plethora of state-of-the-art quantum circuit optimisation techniques!
A diagrammatic representation of quantum computation is obtained by lifting the gates that form a quantum circuit into the nodes of a more abstract graph-based graphical calculus. The most commonly used flavour of calculus for circuit optimisation is the ZX calculus Coecke, 2008. 2008. Interacting Quantum Observables Coecke, 2012. 2012. Strong Complementarity and Non-locality in Categorical Quantum Mechanics. In 2012 27th Annual IEEE Symposium on Logic in Computer Science, June 2012. IEEE, 245--254. doi: 10.1109/lics.2012.35 Weteri., 2020. 2020. ZX-calculus for the working quantum computer scientist. arXiv: 2012.13966 [quant-ph] Yeung, 2024. 2024. Teaching small transformers to rewrite ZX diagrams. MathAI submission.
By breaking up multi-qubit gates into several non-unitary tensors, the ZX calculus and related variants Roy, 2011. 2011. Towards Normal Forms for GHZ∕W Calculus. In AIP Conference Proceedings. AIP, 112--119. doi: 10.1063/1.3635852 Backens, 2019. 2019. ZH: A Complete Graphical Calculus for Quantum Computations Involving Classical Non-linearity. Electronic Proceedings in Theoretical Computer Science 287 (January 2019, 23--42). doi: 10.4204/EPTCS.287.2 Felice, 2023. 2023. Quantum Linear Optics via String Diagrams. In Proceedings 19th International Conference on Quantum Physics and Logic, Wolfson College, Oxford, UK, 27 June - 1 July 2022. Open Publishing Association, 83-100. doi: 10.4204/EPTCS.394.6 expose some of the symmetry and structure of quantum physics in the form of simple and intuitive graphical rules. This has enabled the discovery of many quantum optimisation techniques (e.g. Duncan, 2019. 2019. Graph-theoretic Simplification of Quantum Circuits with the ZX-calculus. arXiv: 1902.03178 [quant-ph] Weteri., 2024. 2024. Optimal compilation of parametrised quantum circuits. arXiv: 2401.12877 [quant-ph]), some of which we have already reviewed Huang, 2024. 2024. Redefining Lexicographical Ordering: Optimizing Pauli String Decompositions for Quantum Compiling. CoRR abs/2408.00354. doi: 10.48550/ARXIV.2408.00354 Gogioso, 2022. 2022. Annealing Optimisation of Mixed ZX Phase Circuits. In Proceedings 19th International Conference on Quantum Physics and Logic, QPL 2022, Wolfson College, Oxford, UK, 27 June - 1 July 2022, 415--431. doi: 10.4204/EPTCS.394.20 Griend, 2022. 2022. Architecture-Aware Synthesis of Phase Polynomials for NISQ Devices. In Proceedings 19th International Conference on Quantum Physics and Logic, QPL 2022, Wolfson College, Oxford, UK, 27 June - 1 July 2022, 116--140. doi: 10.4204/EPTCS.394.8 Cowtan, 2019. 2019. Phase Gadget Synthesis for Shallow Circuits. In Proceedings 16th International Conference on Quantum Physics and Logic, QPL 2019, Chapman University, Orange, CA, USA, June 10-14, 2019, 213--228. doi: 10.4204/EPTCS.318.13 Cowtan, 2020. 2020. A Generic Compilation Strategy for the Unitary Coupled Cluster Ansatz. arXiv: 2007.10515 [quant-ph]. This selection of papers is not quite exhaustive5 – there are currently over 300 hundred papers on the topic, as indexed by zxcalculus.com.
Aside from being an invaluable tool for research and compiler pass design, a significant contribution of these diagrammatic representations is the introduction of graph transformation systems (GTS) Ehrig, 1973. 1973. Graph-Grammars: An Algebraic Approach. In 14th Annual Symposium on Switching and Automata Theory, Iowa City, Iowa, USA, October 15-17, 1973. IEEE Computer Society, 167--180. doi: 10.1109/SWAT.1973.11 Rozenb., 1997. 1997. Handbook of Graph Grammars and Computing by Graph Transformations, Volume 1: Foundations. World Scientific König, 2018. 2018. A Tutorial on Graph Transformation. In Graph Transformation, Specifications, and Nets - In Memory of Hartmut Ehrig. Springer, 83--104. doi: 10.1007/978-3-319-75396-6_5 to quantum computing. More on this in chapter 3 (and much of the rest of this thesis)!
Reversible classical circuits #
Many more representations have either been taken over from classical compiler optimisations or were developed for specific purposes. The last we will mention is reversible circuit synthesis, an entirely classical circuit design problem which can draw from the results of decades of research. From a quantum perspective, reversible classical circuits correspond to unitaries (and more generally, isometries) that send basis states to basis states – and thus do not introduce any complex phase Shende, 2002. 2002. Reversible logic circuit synthesis. In IEEE/ACM International Conference on Computer Aided Design, 2002, November 2002. IEEE, 353--360. doi: 10.1109/iccad.2002.1167558. We highlight a selection of the more recent work in the field and refer the reader to the much more complete, albeit ageing, survey of Saeedi, 2013. 2013. Synthesis and optimization of reversible circuits—a survey. ACM Computing Surveys 45, 2 (February 2013, 1--34). doi: 10.1145/2431211.2431220.
Up to 4 (qu)bits, all reversible circuits and their optimal synthesis can be generated by brute force Li, 2014. 2014. A Synthesis Algorithm for 4-Bit Reversible Logic Circuits with Minimum Quantum Cost. ACM Journal on Emerging Technologies in Computing Systems 11, 3 (December 2014, 1--19). doi: 10.1145/2629542. Viewing reversible circuits as a permutation of all bitstrings, Susam et al. pre-compute optimal circuits only for swaps of two bitstrings (transpositions). These can then be used as part of a standard selection sort to synthesise arbitrary permutations. The number of such permutations scales much more favourably compared to arbitrary permutation, allowing fast circuit synthesis of up to 20+ (qu)bits in a fraction of a second, with good performance.
Truth table or matrix representations of reversible circuits suffer from the same exponential scaling as unitaries. To address these, other representations that have been used include exclusive sums of product terms (ESOP) Fazel, 2007. 2007. ESOP-based Toffoli Gate Cascade Generation. In 2007 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, August 2007. IEEE. doi: 10.1109/pacrim.2007.4313212 Bandyo., 2014. 2014. A Cube Pairing Approach for Synthesis of ESOP-Based Reversible Circuit. In 2014 IEEE 44th International Symposium on Multiple-Valued Logic, May 2014. IEEE, 109--114. doi: 10.1109/ismvl.2014.27, positive polarity Reed-Müller codes (PPRM) Jegier, 2017. 2017. PPRM-based approach to synthesis of reversible functions. In Photonics Applications in Astronomy, Communications, Industry, and High Energy Physics Experiments 2017, August 2017. SPIE, 1044523. doi: 10.1117/12.2280943 and decision diagrams Stojko., 2019. 2019. Reversible Circuits Synthesis from Functional Decision Diagrams by using Node Dependency Matrices. Journal of Circuits, Systems and Computers 29, 05 (August 2019, 2050079). doi: 10.1142/s0218126620500796 Wille, 2010. 2010. Effect of BDD Optimization on Synthesis of Reversible and Quantum Logic. Electronic Notes in Theoretical Computer Science 253, 6 (March 2010, 57--70). doi: 10.1016/j.entcs.2010.02.006 Pang, 2011. 2011. Positive Davio-based synthesis algorithm for reversible logic. In 2011 IEEE 29th International Conference on Computer Design (ICCD), October 2011. IEEE, 212--218. doi: 10.1109/iccd.2011.6081399.
The quantum framework is strictly more general than the classical regime in which the problem was studied initially. This affords additional freedom for decomposition schemes, such as decompositions of gates on 3 qubits into single and two-qubit gates Shende, 2008. 2008. On the CNOT-cost of TOFFOLI gates. arXiv: 0803.2316 [quant-ph]. Various optimised decompositions for sequences of Toffoli gates have also been similarly developed Scott, 2008. 2008. Pairwise decomposition of toffoli gates in a quantum circuit. In Proceedings of the 18th ACM Great Lakes symposium on VLSI, May 2008. ACM, 231--236. doi: 10.1145/1366110.1366168 Arabza., 2010. 2010. Rule-based optimization of reversible circuits. In 2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC), January 2010. IEEE, 849--854. doi: 10.1109/aspdac.2010.5419684 Datta, 2013. 2013. Exploiting Negative Control Lines in the Optimization of Reversible Circuits Rahman, 2014. 2014. Templates for Positive and Negative Control Toffoli Networks Datta, 2015. 2015. A Post-Synthesis Optimization Technique for Reversible Circuits Exploiting Negative Control Lines. IEEE Transactions on Computers 64, 4 (April 2015, 1208--1214). doi: 10.1109/tc.2014.2315641 Arpita, 2015. 2015. Optimization of reversible circuits using triple-gate templates at quantum gate level. In 2015 International Conference on Electronic Design, Computer Networks & Automated Verification (EDCAV), January 2015. IEEE, 120--124. doi: 10.1109/edcav.2015.7060551 Abdess., 2016. 2016. Technology Mapping of Reversible Circuits to Clifford+T Quantum Circuits. In 2016 IEEE 46th International Symposium on Multiple-Valued Logic (ISMVL), May 2016. IEEE, 150--155. doi: 10.1109/ismvl.2016.33 Gado, 2021. 2021. Optimization of Reversible Circuits Using Toffoli Decompositions with Negative Controls. Symmetry 13, 6 (June 2021, 1025). doi: 10.3390/sym13061025. Mohammadi and Eshghi introduced 4-valued truth tables to extend classical circuit synthesis to include (also known as ) gates Mohamm., 2008. 2008. Behavioral description of quantum V and V+ gates to design quantum logic circuits. In 2008 5th International Multi-Conference on Systems, Signals and Devices, July 2008. IEEE, 1--5. doi: 10.1109/ssd.2008.4632850. References Soeken, 2012. 2012. Optimizing the Mapping of Reversible Circuits to Four-Valued Quantum Gate Circuits. In 2012 IEEE 42nd International Symposium on Multiple-Valued Logic, May 2012. IEEE, 173--178. doi: 10.1109/ismvl.2012.64 as well as Rahman, 2012. 2012. Optimization of Reversible Circuits Using Reconfigured Templates incorporated controlled- gates into template matching strategies and showed significant improvements in synthesised gate count . Finally, Maslov, 2016. 2016. Advantages of using relative-phase Toffoli gates with an application to multiple control Toffoli optimization. Physical Review A 93, 2 (February 2016, 022311). doi: 10.1103/physreva.93.022311 proposed decomposing Toffolis only up to relative phase, introducing a lot of freedom in the quantum decompositions that are required compared to the traditional classical decompositions.
In summary, a variety of scalable representations – such as phase polynomials, Pauli gadgets, Clifford tableaus, diagrammatic calculi, and reversible circuits – have been developed to abstract computations and enable highly tailored optimisation methods. These approaches leverage the unique structure and symmetries of quantum computations, achieving significant reductions in circuit size, depth, and hardware-specific overheads. Techniques such as phase polynomial synthesis and Clifford tableau representations are widely applicable and are a cornerstone of modern quantum compilers Amy, 2019. 2019. Formal methods in Quantum Circuit Design. PhD Thesis. University of Waterloo Meijer., 2025. 2025. A comparison of quantum compilers using a DAG-based or phase polynomial-based intermediate representation. Journal of Systems and Software 221 (March 2025, 112224). doi: 10.1016/j.jss.2024.112224. Meanwhile, diagrammatic calculi, such as the ZX calculus, provide a flexible and theoretically robust framework for optimisations, often revealing simplifications invisible in the traditional gate-based model.
-
If intrigued, look at this nice introduction Kottma., 2024. 2024. Introducing (dynamical) Lie algebras for quantum practitioners. (February 2024). Retrieved on 08/01/2025 from https://pennylane.ai/qml/demos/tutorial_liealgebra and the references therein. It’s not as scary as it sounds. ↩︎
-
Experimental realisations of many-qubit interactions have also been demonstrated Erhard, 2019. 2019. Characterizing large-scale quantum computers via cycle benchmarking. Nature Communications 10, 1 (November 2019). doi: 10.1038/s41467-019-13068-7 Bluvst., 2022. 2022. A quantum processor based on coherent transport of entangled atom arrays. Nature 604, 7906 (April 2022, 451--456). doi: 10.1038/s41586-022-04592-6 Arrazo., 2021. 2021. Quantum circuits with many photons on a programmable nanophotonic chip. Nature 591, 7848 (March 2021, 54--60). doi: 10.1038/s41586-021-03202-1 Evered, 2023. 2023. High-fidelity parallel entangling gates on a neutral-atom quantum computer. Nature 622, 7982 (October 2023, 268--272). doi: 10.1038/s41586-023-06481-y and are at the core of other proposed architectures Bartol., 2023. 2023. Fusion-based quantum computation. Nature Communications 14, 1 (February 2023). doi: 10.1038/s41467-023-36493-1 Bouras., 2021. 2021. Blueprint for a Scalable Photonic Fault-Tolerant Quantum Computer. Quantum 5 (February 2021, 392). doi: 10.22331/q-2021-02-04-392. ↩︎
-
Much of quantum error correction theory is built on the Clifford group, a subset of quantum operations that preserve “Pauli errors” and can thus be corrected easily. The flip side of this is that correcting any non-Clifford operation is very hard, something that is resolved by constructing “error-free” magic states ahead of time. For more details, refer to a quantum error correction textbook such as Gottes., 2024. 2024. Surviving as a Quantum Computer in a Classical World. (February 2024). Retrieved on 08/01/2025 (lecture notes) from https://www.cs.umd.edu/class/spring2024/cmsc858G/QECCbook-2024-ch1-15.pdf. ↩︎
-
Polynomial-sized quantum circuits constitute a polynomial-dimensional submanifold of the exponential-dimensional Lie group. They are, hence, a measure zero subset of with respect to the Haar measure. ↩︎
-
and totally arbitrary! ↩︎
2.3. Rise of hybrid quantum-classical computation
Quantum measurements #
We have, until now, skipped over a crucial part of the quantum computation process: the role of quantum measurements. Quantum data, in isolation, is inherently inaccessible to us and the broader macroscopic world. A result from a quantum computation is only of value if we can probe it and get some readout value that we can display to the user or return to whoever launched the quantum computation.
Quantum physics measurements fundamentally differ from our classical understanding of just “reading out” data that is already there. This is the famous Schrödinger’s cat thought experiment of quantum mechanics: what data is within the qubits remains undefined until a measurement is performed. The act of observation will transform the quantum data: looking inside the box will, at random, either kill the cat or spare it1.
We thus need to add the measurement operation as a special case to our computer scientist’s model of quantum computing. Unlike purely quantum operations, measurements inherently involve interaction with the environment to produce a readout. Consequently, the no-delete and reversibility principles discussed earlier do not apply. Indeed, measurement is a lossy (and therefore irreversible) operation that projects the quantum state into one of a small subset of classical states. Which state the quantum state is projected into is non-deterministic. If one has access to an infinite supply of the same quantum state, then the whole state can be reconstructed by repeating measurements and analysing the distribution of outcomes2. Given no-cloning, however, this is unlikely to be the case, and so the full quantum result is hardly ever known. Instead, we must rely on well-designed measurement schemes to extract useful information from our partial access to the quantum states.
We model measurement as an operation that takes one qubit and outputs one purely classical bit3. In the circuit formalism, measurements are often implicitly added at the end of every qubit. Suppose we wish to make them explicit or add them elsewhere in the computation. In that case, we must introduce a graphical representation for the classical bit of data the measurement produces. The field has adopted the double-wire for this, even though a “half” wire would arguably have been more appropriate to reflect the reduced information content relative to quantum wires. I present to you the measurement box:
Measurements as first-class citizens #
It is very tempting to our feeble classical brains – and admittedly, we just did it ourselves in the previous paragraphs – to view measurements as merely a readout operation, an auxiliary operation that we are forced to perform at the end of a computation for operative reasons. This could not be further from the truth! In many ways, measurements are just as powerful tools as any other quantum operation – if not more so!
One eye-opening perspective on this is the field of measurement-based quantum computing (MBQC). Raussendorf and Briegel showed indeed Rausse., 2001. 2001. A One-Way Quantum Computer. Physical Review Letters 86, 22 (May 2001, 5188--5191). doi: 10.1103/PhysRevLett.86.5188 that arbitrary quantum computations can be reproduced in the MBQC framework using only some resource quantum states that can be prepared ahead of time and measurements! In other words, given entangled qubits, measurements are all you need to perform quantum operations.
We will not explore MBQC further in this chapter (nor in this thesis, for that matter). Instead, we will use this as a motivation to explore what we can achieve with measurements. We have so far spared you from any mathematical alphabet soup. As we start discussing more concrete constructions of quantum computations, some introductory linear algebra and conventions around notation will become unavoidable.
Dirac formalism. Quantum states are nearly unanimously written using kets: instead of referring to a quantum state as , we write it wrapped in special brackets as . This notation is also used when referring to the and states of qubits, written and .
Several states can be joined and considered together as one overall state. This is expressed using the tensor symbol: is the joint system of and . When the states in question are all explicitly qubit states, we use the shorthand binary notation .
We will introduce more notation along the way.
With this out of the way, let us look in more details at the first smart use of measurements: the block-encoding technique. Consider the following scenario: you would like to perform an operation on an arbitrary quantum state . Now, there are, unfortunately, many cases where implementing as a quantum circuit made of primitive gates that can be executed on hardware is very expensive4.
However, what we can always do is express as a matrix of dimensions , where is the number of qubits in the state . Then, there is a neat trick that we can sometimes apply: instead of trying to execute , we enlarge the matrix to a bigger :
where and are “garbage” matrices that we do not care about, but should combine into a matrix that we know how to execute on a quantum computer. Quantum computations must be matrices with a row and column number that is a power of two; so at a minimum, must be of size , i.e. be a computation on qubits.
We restrict our considerations in the following to the case on qubits – other cases are similar. We thus need to add a qubit to our state to be able to pass it to our new operation. Such qubits that are added temporarily to facilitate a computation are a recurring feature in quantum computing and have thus earned themselves a name – ancilla qubits.
Let us take a look at the quantum states that result from executing . If we add to a ancilla state, our quantum operation acts as5
The expression means the operation applied to – exactly the output state we are seeking. If we input the ancilla qubit in state , we get garbage:
So is definitely the input state we are more interested in.
How can we recover from (3)? This is precisely what measurements do! When quantum states are expressed as sum of states, the terms of the sum form the possible measurement outcomes6. If we only measure a subset of the qubits, then the term corresponding to that measurement is isolated and all other terms disappear. Hence, if we measure the first qubit (that we introduced ourselves) in the zero state, then the remaining qubits will be precisely in the desired state Success!
Using this “term isolating” property of measurements, known as state collapse, we can thus effect computations that would have been otherwise difficult or impossible to perform. There is however one important wrinkle that we cannot forget about: measurements are non-deterministic! We cannot assume that all measurements of the ancilla qubit will return the zero state. When is measured on the ancilla, the remaining qubits are left in the state. The computation has thus failed, and the execution must be aborted and restarted. How often the block-encoding protocol that we have presented fails depends on the details of and the choices of and and is the main disadvantage of an otherwise very powerful quantum technique.
We will now explore two strategies to deal with “fails” in measurements. At the core of them is the idea of hybrid quantum-classical programs.
Who said quantum computers could not fix their mistakes #
Failed computations are an expensive mistake in quantum computing as the no-cloning theorem prevents us from keeping a “backup” of the initial state. The fact that failures are in fact unlucky measurement outputs makes matters worse, given that measurements are the only irreversible quantum operation. It is therefore impossible in general to recover from a “wrong” measurement.
There are, however, prominent cases in which the computation can be corrected based on the measurement outcome, thus yielding deterministic results. Recall equation (3) of the previous section: there is a computation on qubits, that can be probabilistically computed using qubits using :
for some “garbage” . What if is a reversible operation, i.e. there is an operation to undo ? Well then, we can still, at least in theory, recover by applying :
but only if 1 was measured on the ancilla qubit7!
This is the beginning of quantum-classical hybrid computing: we start by performing quantum operations followed by measurements, the outcomes of which dictate what further quantum operations must be applied. We define for this purpose a classically controlled gate: a quantum operation that is only executed if a certain classical bit (the condition) is set. This bit will typically be a value derived from a previous measurement: it could be as simple as the outcome that a previous measurement yield, or a function of multiple past outcomes that must be evaluated on classical hardware (e.g. a CPU).
Mixing classical and quantum operations is a sure way to bring the quantum
circuit representation to its knees. We adopt the following representation, in
which a quantum gate that has an additional classical bit wire attached to it
represents a classically controlled operation that is only executed if the bit
value is 1.
Quantum Teleportation #
Quantum teleportation is a simple example of performing classically controlled quantum operations to do circuit corrections based on measurement outcomes. It is also coincidentally one of the most fundamental protocols of quantum theory. Its name is slightly misleading. Think of it as data transfer for quantum data, with a mind-bending twist: at the time of the transfer, only classical data must be communicated between the sending and receiving parties. As a result of this protocol, quantum information can be transferred using plain old-school copper wires (or any other classical communication channels)!
This is predicated on one crucial action being performed before the start of the
communication. For every qubit that should be transmitted, the parties must
beforehand create and share among themselves a pair of qubits that will serve as
the quantum resource during the protocol execution. This resource state is
widespread enough that it got its name: the Bell pair state. It is written in
Dirac notation as . As the notation indicates, it is a
state with perfectly correlated measurements: when measured, the two qubits will
always yield the same outcome, either both 0 or both 1.
There turns out to be a straightforward circuit that maps the two-qubit , which every two-qubit computation starts in, into the Bell pair state:
It is enough for us to think of it as a black box – or a grey box in this case.
We are interested in “teleporting” an arbitrary, single-qubit quantum state.
Such a state can always be expressed as
, i.e. in the most general case, a
one-qubit state will be in some superposition of the states and
. The paramenters and are complex coefficients that
encode the probabilities of measuring 0 or 1 – we can view them as the
weights of a weighted sum.
We are now interested in combining a Bell resource state in a joint system with the arbitrary state . The resulting three-qubit state is obtained with the operation, which distributes over sums just like usual multiplication:
We chose to place the Bell pair on the first two qubits and the arbitrary state on the third. The goal is to move the data that sits on that last qubit to the first qubit. Looking at the first qubit in the above expression, notice that the desired state appears in the first qubit if we can discard the second and third terms:
This sounds very much like the measurement operations we have used before to isolate terms – but we need to isolate two terms simultaneously. We can resolve this issue by reorganising the expression8
Obtaining the state on the first qubit is thus as simple as isolating the first of these four terms. We do not know a priori how to measure but we do know how to map that state to : that’s the inverse of the Bell pair state preparation circuit! This results in the following circuit:
This brings us to the same situation as we had for the block encoding application above: conditioned on the measurement outcome of the second and third qubits being 0, the computation performs a state “teleportation”, moving from the third to the first qubit. We can compute the effect of on the overall expression of (4) to find all possible output states:
As expected, we do get on the first qubit for the measurement 00
(corresponding to the state ), but as it stands, this only has a
probability of success.
You might notice, however, that the other states in which the first qubit can end up look remarkably similar, up to some sign flips and swaps . In particular, all states still have the amplitudes and somewhere, so it does not seem unfathomable that these “wrong” states can be mapped back to .
We can use the measurement outcomes of the second and third qubit to infer which
of the “mistakes” occurred, and hence what state the first qubit has ended in.
The 01 measurement outcome, for instance, results in the
state – this is just a bit flip away from
! This gate is known as . Its colleague the gate on the other
hand leaves states untouched but flips the sign of This
would fix the 10 outcome. Finally, 11 requires both a Z and a X
correction.
Putting these observations together, we can leverage classically controlled operations to obtain a fully deterministic protocol! The correct circuit implementing quantum teleportation is given by
In the scenario where a first party (Alice) wants to send a one-qubit quantum state to Bob, they can achieve that by creating a Bell pair state, the first qubit of which is given to Bob and the second to Alice. When Alice then gets in possession of another qubit whose data she wants to transmit to Bob, she can achieve that by executing , measuring her two qubits and communicating the (classical) measurement outcomes to Bob. Bob can perform the necessary corrections and will then have state .
It is beautiful and often overlooked how one of the most fundamental protocols of quantum information theory is, in fact, a hybrid classical-quantum computation. Quantum teleportation without classical communication is physically impossible: it would let Alice communicate with Bob instantly, even though he could be light years away – in other words, it would fundamentally break relativity.
Repeat until success: If you fail, retry! #
Classical computer science has a straightforward solution whenever probabilistic computations that can fail are used: probability amplification or boosting Scheid., 2018. 2018. Probability amplification. Retrieved lecture notes, online, visited 30/12/2024 from https://cs.uni-paderborn.de/fileadmin-eim/informatik/fg/ti/Lehre/SS_2018/AA/lecture_5.pdf. The idea is so simple that it barely deserves a name: execute several independent runs of the computation and choose the most common outcome. If the probability of failure is below a certain threshold (e.g. 50% for a binary output), then with basic statistics, one can extrapolate the number of runs required to obtain any desired accuracy9.
We have been ignoring this approach so far since no-cloning prohibits us from repeating a procedure more than once on an input state . However, in the scenario that the computation should only be executed on a specific, known input state and the computation that prepares that state is known, we can recover from computation failures by just preparing a new state identically.
Suppose we know how to execute the quantum computation mapping
As before, we would like to compute given an implementation of the
computation that acts on a -qubit state and an ancilla
qubit in the state. If the measurement of
returns 1, then the computation failed.
We can then discard all qubits and restart from the state, applying
followed by and an ancilla measurement, repeating until we
measure 0. As a pseudo-quantum circuit, we could express this as:
psi_qs = create_qubits(n)
while True:
ancilla_q = create_qubit()
obtain measurement m from:
if m == 0:
break # success! we can exit loop and proceed
else:
reset_qubits(psi_qs)
At each iteration, we can either exit the loop if the state collapse was
successful (m == 0), or reset the qubits to zero and try again. But pseudo
circuits do not run on hardware! The only way to express this computation as an
actual circuit is to unroll the loop, i.e. repeat the block within the loop as
many times as we expect might be necessary10. The first two iterations
would look as follows:
It should be obvious why we haven’t unrolled the loop any further – it quickly becomes unweildy. The resulting program is not only hard to display and read, but it also suffers from fundamental issues in practice. For one, the program size becomes hugely bloated, and beyond slowing down the compiler, it will also cause a host of issues on the control hardware in real-time, such as long load times, inefficient execution, and low cache efficiency.
Even more worryingly, when picking the maximum number of iterations, we face an impossible tradeoff: if the number of iterations is small, then the probability of failure will remain non-negligible. As we scale this value up, however, we are introducing more and more gates into the program to cover the odd case of multiple successive repeated failures. We do not intend to execute these gates on most computation runs. They come at a significant cost to the runtime. For each gate listed in the circuit, the condition for the gate’s execution must be checked, whether or not the gate ends up being executed. Furthermore, hardware schedulers might be forced to be pessimistic and schedule a time window for all conditional operations ahead of time. This will significantly delay any operation to be performed after the loop.
We, therefore, argue that the quantum circuit model is ill-suited as the representation for quantum programs that combine classical and quantum data. Such programs, however, are a fundamental building block towards developing meaningful large-scale quantum computations and are bound to become the norm. Beyond the examples discussed above—–including block-encodings, repeat-until-success schemes, distributed quantum computing and measurement-based quantum computing – one application of hybrid quantum-classical operations stands out as critically important for the large-scale deployment of quantum computing: quantum error correction (QEC) schemes. We discuss this use case in the next section.
-
It is ironic that Schrödinger’s thought experiment Schrö., 1935. 1935. Die gegenwärtige Situation in der Quantenmechanik. Naturwissenschaftern, intended to highlight the absurdity of quantum mechanics, has become the field’s most famous PR campaign. Sorry to disappoint – you won’t find felines occupying multiple states of existence (though qubits do!) ↩︎
-
This is known as state tomography Allahv., 2004. 2004. Determining a Quantum State by Means of a Single Apparatus. Physical Review Letters 92, 12 (March 2004, 120402). doi: 10.1103/PhysRevLett.92.120402. One must perform measurements in multiple bases, i.e., different choices of classical states to project to. ↩︎
-
Where did the qubit go? All the information in a qubit post-measurement is also contained in the classical bit of output data – it is, therefore, redundant and renders the qubit useless. In our model, we, therefore, bundle measurement and qubit discard into one operation. ↩︎
-
or outright impossible, in cases where is not a unitary linear operation, for example. ↩︎
-
This is obtained by a simple matrix multiplication. The vector representation of the quantum state is obtained using the Kronecker product. You can also just trust me that this works out this way. ↩︎
-
This is simplifying slightly. There is a necessary condition for this to be a valid measurement: the states in the sum must form a measurement basis, i.e. they must be orthogonal. This is satisfied here. ↩︎
-
Notice that, informally, we would hope to get a computation such that in the sense that it should somehow be closely related to . This way, the resulting correction would be close to the identity, and would be cheap to compute. ↩︎
-
Apologies, it seems at this point that we are conjuring up a complex expression out of nowhere. It is in fact just a change of basis – plain old linear algebra. The formula can be obtained easily by writing out the basis change matrix. ↩︎
-
This is fiendishly effective: the Hoeffding bounds guarantee that the probability of success will converge to 1 exponentially with the number of runs. ↩︎
-
In other words, we must pick a constant for the maximum number of times we expect the loop to be executed. If a single loop iteration has a failure probability of , the failure probability of the program with unrolled iteration is then . ↩︎
2.4. Quantum compilers cannot do it alone
We have (hopefully!) by now convinced our readership that quantum programs must interface with our established classical infrastructure and should rather be understood as an interleaved execution of both classical and quantum operations. The obvious question that thus poses itself is
How do we equip quantum compilers to deal with classical operations?
The simplest solution is to adopt the extended quantum circuit formalism with
support for classically controlled operations, as we have introduced in the
previous section. Using this representation, the basic types available for
computation are the qubit and the classical bit. We can also, at that point,
introduce purely classical operations on bits, for instance, to compute boolean
logic on measurement outcomes, such as “if both the first AND the second
measurement outcomes are 1, then …”.
However, the circuit model is inherently designed with the no-cloning principle in mind: specifically, with the assumption that at any one time, there are exactly (for some fixed value of ) resources available for computation. This for example means that in the following program
in which two measurements write to the same classical bit, it would be impossible to append a gate controlled on the first measurement outcome after the gate, as that value was overwritten on the classical wire by the second measurement. The solution could be to introduce1 a new, fresh classical wire for each measurement and avoid overwriting outcomes. However, there are also many other ways to break this wires-based representation: suppose you have an operation with one input and two outputs, such as a copy operation . We would need two wires for the output, but the input would only provide us with one… We now have to start creating additional wires ahead of time for this purpose and solve memory allocation problems to decide which wire should be given to which operation.
These are run-of-the-mill classical compiler problems! One might at first hope that the set of overlapping problems between classical and quantum compilers is manageably small. After all, in all the use cases we have covered so far, the amount of classical computation was very minimal, limiting itself to conditionals and loops based on simple boolean expressions. Surely the full-blown powers of a classical compiler are not required!
Unfortunately (and as usual), scientists have shown no lack of imagination in this field – and so have found very compelling use cases for complex classical computations within quantum programs. To drive this point home, let us consider the concrete example of quantum error correction.
The quantum error correction use case #
Error-correcting protocols do as their name suggests: they detect whenever data is subjected to errors and thus modified in an unexpected way. They then attempt to recover the intended valid state. In the classical world, such schemes are employed whenever the hardware is not reliable enough: this is hardly the case for computations themselves but is widespread in communications (e.g. within the TCP/IP protocol for the internet Eddy, 2022. 2022. Transmission Control Protocol (TCP). (August 2022). Retrieved as RFC 9293 from https://www.rfc-editor.org/info/rfc9293) or for memory and storage in critical applications.
No one expects to be able to manipulate matter-based qubits without introducing errors for a very long time. Photons, on the other hand, are prone to data losses throuh absorption and can only be entangled using complex and noisy schemes such as the Knill–Laflamme–Milburn protocol Knill, 2001. 2001. A scheme for efficient quantum computation with linear optics. Nature 409, 6816 (January 2001, 46--52). doi: 10.1038/350510092. Simply put, it is safe to assume that error correction will be found everywhere – as soon as our quantum computers manage to implement such protocols.
A sketch of quantum error correction goes roughly as follows: the data that would be stored on qubits is instead encoded in a redundant way on a larger number of qubits. Thus, when errors occur on a subset of the qubits, the data can be restored using the qubits that have not been corrupted. Before errors can be corrected, they must be detected. To this end, we first add fresh ancilla qubits to the program. Through smartly designed interactions with the data qubits, the ancilla qubits pick up the errors from the data. When we subsequently perform measurements on the ancilla qubits, these errors result in modified outcomes, called the error syndrome.
The challenging bit comes next: from a run of syndrome measurements, one must infer the most likely errors – a step known as syndrome decoding. This is a purely classical maximum likelihood problem that requires a non-trivial amount of computations to resolve. For small problem instances, all possible input syndromes can be tabled, and the outputs precomputed – in which case the problem at runtime is reduced to fast table lookups. However, the higher the fault tolerance we require, the more qubits must be used in the encodings, and so invariably, the problem quickly becomes very demanding computationally.
Meanwhile, these “cycles” of error detection and correction are under strict latency constraints: idling qubits waiting for corrections to be applied will accumulate new errors that must themselves be corrected – for error correction to be workable, we must be capable of detecting and correcting for errors faster than they are being introduced. The entire error correction cycle just described can be summarised by the following diagram:
The decoding time is a crucial factor in determining the overall cycle time and, thus, the clock rate of fault-tolerant quantum hardware. Consider, for example, a 32-qubit Toric code Kitaev, 2003. 2003. Fault-tolerant quantum computation by anyons. Annals of Physics 303, 1 (January 2003, 2--30). doi: 10.1016/S0003-4916(02)00018-0, one of the most well-studied quantum error-correcting codes. Without going into the details of the code itself, we can use the C++ implementation made available by the MQT toolkit Burgho., 2021. 2021. Advanced Equivalence Checking for Quantum Circuits. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 40, 9 (Septempter 2021, 1810--1824). doi: 10.1109/tcad.2020.3032630 to study the decoder performance for this code.
Consider first a “naive” compilation of the decoder – the kind of program that we could hope to get from a quantum compiler that “understands” classical operations but only implements optimisations directly relevant to quantum computations. Such a compiler does not currently exist, but the decoder being a C++ program, we can approximate what the compiled binary would look like by turning off all optimisations from an established classical compiler3.
The runtime averaged over 1000 runs of the decoder is . This is within the latency requirements of certain trapped ion architectures Ryan-A., 2021. 2021. Realization of Real-Time Fault-Tolerant Quantum Error Correction. Physical Review X 11, 4 (December 2021, 041058). doi: 10.1103/physrevx.11.041058, but far beyond the sub-microsecond regime that will be required to make error correction a reality on superconduction-based quantum computers Carrer., 2024. 2024. Combining quantum processors with real-time classical communication. Nature 636, 8041 (November 2024, 75--79). doi: 10.1038/s41586-024-08178-2. this can be contrasted with the program output by the same compiler, but with all compiler optimisations enabled: the average runtime is reduced by a factor close to 10x to – still a factor 100x away from the required performance on superconductors, but huge gains nonetheless! The details of the experiment with all build flags, the hardware used and how to reproduce the results are available here.
There is no hope of obtaining these types of speedups without an in-depth
understanding of classical hardware and battle-tested implementations for every
optimisation pass under the sun – in short, the full thrust of a modern
state-of-the-art compiler such as clang or gcc.
To make matters worse, such classical computations are bound to move to dedicated accelerators that require specialised compilation, such as GPUs and FPGAs, for the most time-critical subroutines: quantum error decoding using GPUs is already well-developed Bausch, 2024. 2024. Learning high-accuracy error decoding for quantum processors. Nature 635, 8040 (November 2024, 834--840). doi: 10.1038/s41586-024-08148-8 Cao, 2023. 2023. qecGPT: decoding Quantum Error-correcting Codes with Generative Pre-trained Transformers. arXiv: 2307.09025 [quant-ph] and more esoteric platforms FPGAs Overwa., 2022. 2022. Neural-Network Decoders for Quantum Error Correction Using Surface Codes: A Space Exploration of the Hardware Cost-Performance Tradeoffs. IEEE Transactions on Quantum Engineering 3 (1--19). doi: 10.1109/tqe.2022.3174017 Meinerz, 2022. 2022. Scalable Neural Decoder for Topological Surface Codes. Physical Review Letters 128, 8 (February 2022, 080505). doi: 10.1103/physrevlett.128.080505, superconducting circuits Ueno, 2021. 2021. QECOOL: On-Line Quantum Error Correction with a Superconducting Decoder for Surface Code. In 2021 58th ACM/IEEE Design Automation Conference (DAC), December 2021. IEEE, 451--456. doi: 10.1109/dac18074.2021.9586326 and compute-in-memory architectures Wang, 2024. 2024. CIM-Based Parallel Fully FFNN Surface Code High-Level Decoder for Quantum Error Correction. arXiv: 2411.18090 [cs.AR] are being actively studied.
These observations should leave the reader convinced that in order to compile and realise the kind of hybrid quantum-classical programs that we expect will become the norm in the field, quantum compilers will need to embrace and encompass the full breadth and depth of classical compilers. This leaves us with no choice but to fully transform and integrate the existing quantum tooling and quantum optimisation research into the established compiler ecosystem. What this means exactly is the subject of the rest of this chapter.
A new quantum programming paradigm? #
We have seen it – quantum circuits are very limited in their expressiveness. They are well suited to presenting sequences of purely quantum operations and how the computation is parallelised across qubits, but they quickly become limiting once both quantum and classical data types are mixed and any type of control flow (conditionals, loops, function calls, etc.) is introduced.
How users express programs in the front end has deep implications for the kind of computations that the compiler must be capable of reasoning about and, hence, for the compiler’s architecture. The great merging of classical and quantum compilers is the perfect opportunity to reconcile program representations and integrate the learnings from decades of classical programming language research into quantum computing.
There have been several trailblazing initiatives to formalise quantum programming and create dedicated languages, such as QCL Ömer, 2000. 2000. Quantum Programming in QCL. (January 2000). Retrieved from http://tph.tuwien.ac.at/ oemer/doc/quprog.pdf, Quipper Green, 2013. 2013. Quipper: a scalable quantum programming language. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2013, New York, NY, USA. Association for Computing Machinery, 333--342. doi: 10.1145/2491956.2462177 Rios, 2018. 2018. A Categorical Model for a Quantum Circuit Description Language (Extended Abstract). Electronic Proceedings in Theoretical Computer Science 266 (February 2018, 164--178). doi: 10.4204/eptcs.266.11 Fu, 2023. 2023. Proto-Quipper with Dynamic Lifting. Proceedings of the ACM on Programming Languages 7, POPL (January 2023, 309--334). doi: 10.1145/3571204, Q# Micros., 2024. 2024. Introduction to the quantum programming language Q#. Retrieved on 31/12/2024 from https://learn.microsoft.com/en-us/azure/quantum/qsharp-overview and Silq Bichsel, 2020. 2020. Silq: a high-level quantum language with safe uncomputation and intuitive semantics. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2020. ACM, 286--300. doi: 10.1145/3385412.3386007. Their adoption in the quantum ecosystem have so far remained limited, overshadowed by the popularity of python-based APIs for quantum circuit-based representations, as offered by Qiskit Javadi., 2024. 2024. Quantum computing with Qiskit. arXiv: 2405.08810 [quant-ph], Pennylane Bergho., 2022. 2022. PennyLane: Automatic differentiation of hybrid quantum-classical computations. arXiv: 1811.04968 [quant-ph] and Cirq Cirq D., 2024. 2024. Cirq. There is, as a result, a justified dose of scepticism in the quantum community on how well the ideas from classical programming really translate to quantum.
It is thus all the more notable that we are seeing a new generation of quantum programming tooling being developed Koch, 2024. 2024. GUPPY: Pythonic Quantum-Classical Programming. (January 2024). Retrieved (talk recording) from https://www.youtube.com/live/D8esZrt7ogk?feature=shared&t=31448 Ittah, 2024. 2024. Catalyst: a Python JIT compiler for auto-differentiable hybrid quantum programs. Journal of Open Source Software 9, 99 (July 2024, 6720). doi: 10.21105/joss.06720 CUDA-Q., 2024. 2024. CUDA-Q Documentation. Retrieved on 31/12/24 from https://nvidia.github.io/cuda-quantum/latest/index.html, driven by the need to write more expressive programs for the improving hardware (as we have been discussing), as well as for performance reasons, to scale quantum compilation to large scale Ittah, 2022. 2022. QIRO: A Static Single Assignment-based Quantum Program Representation for Optimization. ACM Transactions on Quantum Computing 3, 3 (June 2022, 1--32). doi: 10.1145/3491247, accelerate quantum simulations Ittah, 2024. 2024. Catalyst: a Python JIT compiler for auto-differentiable hybrid quantum programs. Journal of Open Source Software 9, 99 (July 2024, 6720). doi: 10.21105/joss.06720 and integrate with classical high-performance computing (HPC) NVIDIA, 2024. 2024. NVIDIA Accelerates Quantum Computing Centers Worldwide With CUDA-Q Platform. Retrieved on 31/12/2024 from https://investor.nvidia.com/news/press-release-details/2024/NVIDIA-Accelerates-Quantum-Computing-Centers-Worldwide-With-CUDA-Q-Platform/default.aspx.
The history of programming is, first and foremost, a masterclass in constructing abstractions. Many of the higher-level primitives, that have proven invaluable classically, solve problems that we expect to encounter very soon in our hybrid programs – when we have not already. Examples include
- structured control flow to simplify reasoning about branching in quantum-classical hybrid programs,
- type systems to encode program logic and catch errors at compile time – this is particularly important for quantum programs as there is no graceful way of handling runtime errors on quantum hardware: by the time the error has been propagated to the caller, all quantum data stored on qubits is probably corrupted and lost,
- memory management such as reference counting and data ownership models. Current hardware follows a static memory model, in which the number of available qubits is fixed, and every operation acts on a set of qubits assigned at compile time. This becomes impossible to keep track of in instances such as qubit allocations within loops with an unknown number of iterations at compile time. It thus becomes necessary to manage qubits dynamically, just like classical memory.
To facilitate such a large swath of abstractions, the first step quantum compilers must take is to make a distinction between the language frontend and the intermediate representation (IR) that the compiler uses to reason about the program and perform optimisations. This will be the topic of chapter 3. The graph-based IR that we introduce in that chapter will then form the foundation for the new quantum compilation techniques that will be developed throughout the remainder of the thesis.
-
Or, as we would say in programming parlance, to allocate. ↩︎
-
We should at this point – at the risk of stoking controversy – acknowledge the commendable efforts of scientists chasing the Majorana particle Sau, 2010. 2010. Generic New Platform for Topological Quantum Computation Using Semiconductor Heterostructures. Physical Review Letters 104, 4 (January 2010, 040502). doi: 10.1103/physrevlett.104.040502 Haaf, 2024. 2024. A two-site Kitaev chain in a two-dimensional electron gas. Nature 630, 8016 (June 2024, 329--334). doi: 10.1038/s41586-024-07434-9 Mourik, 2012. 2012. Signatures of Majorana Fermions in Hybrid Superconductor-Semiconductor Nanowire Devices. Science 336, 6084 (May 2012, 1003--1007). doi: 10.1126/science.1222360. The topological quantum computers these would enable are, to our knowledge, the only quantum architecture proposed that could do away with error correction. ↩︎
-
Here we are using
Apple clang v15.0.0, running macOS 14.7 on an Apple M3 Max chip. ↩︎
2.5. Summary and further reading
This introductory chapter covered some of the basic principles of quantum computation and, in doing so, hopefully, made a convincing argument as to why we should expect the programs running on quantum hardware to become more complex in the future, with the intertwining of classical and quantum computations – processes we refer to as hybrid quantum-classical programs. Prior to that, we also presented quantum compilation, an emerging discipline that is introducing many new problems and ideas to the established corpus of work on compiler research.
If this quantum taster has intrigued you or you would like to learn the basics from people who actually know what they are talking about, nothing beats the reference book for quantum information and quantum computing by Nielsen and Chuang Nielsen, 2016. 2016. Quantum Computation and Quantum Information (10th Anniversary edition). Cambridge University Press. A fascinating alternative perspective on quantum theory has also been developed within the programme of categorical quantum mechanics, for which the illustrious “Dodo book” Coecke, 2017. 2017. Picturing Quantum Processes: A First Course in Quantum Theory and Diagrammatic Reasoning. Cambridge University Press. doi: 10.1017/9781316219317 would be the go-to introductory material1.
At the risk of turning this thesis into absolutely shameless Oxford self-promotion, guess what else was a product of this university’s world-class research? The quantum circuit itself! These diagrams came from theoretical physicists (no surprise here) interested in capturing thought experiments in quantum information theory Deutsch, 1989. 1989. Quantum Computational Networks. In Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences. Royal Society, 73--90.
The idea caught on, and soon, software tools were created to facilitate building such diagrams. The Quantum Computation Language (QCL) was one of the first Ömer, 2000. 2000. Quantum Programming in QCL. (January 2000). Retrieved from http://tph.tuwien.ac.at/ oemer/doc/quprog.pdf. Quantum software2 has since proliferated, especially as the possibility of actually performing these thought experiments on quantum hardware became more tangible. The result were software packages for quantum computing, designed for the automatic transformation and optimisation of quantum computations for execution on real hardware Javadi., 2024. 2024. Quantum computing with Qiskit. arXiv: 2405.08810 [quant-ph] Cirq D., 2024. 2024. Cirq Steiger, 2018. 2018. ProjectQ: an open source software framework for quantum computing. Quantum 2 (January 2018, 49). doi: 10.22331/q-2018-01-31-49 Sivara., 2020. 2020. t|ket⟩: a retargetable compiler for NISQ devices. Quantum Science and Technology 6, 1 (November 2020, 014003). doi: 10.1088/2058-9565/ab8e92 – we called them quantum compilers.
A recent development for quantum compilers focuses on scalability and first-class support for hybrid quantum-classical computations. Quantum circuits that include some form of classical control have been variously called “dynamic circuits” (e.g. Córco., 2021. 2021. Exploiting Dynamic Quantum Circuits in a Quantum Algorithm with Superconducting Qubits. Physical Review Letters 127, 10 (August 2021, 100501). doi: 10.1103/physrevlett.127.100501), “adaptive circuits” (e.g. Smith, 2024. 2024. Constant-Depth Preparation of Matrix Product States with Adaptive Quantum Circuits. PRX Quantum 5, 3 (Septempter 2024, 030344). doi: 10.1103/prxquantum.5.030344), “circuits with measurements and feedforward” (e.g. Graham, 2023. 2023. Midcircuit Measurements on a Single-Species Neutral Alkali Atom Quantum Processor. Physical Review X 13, 4 (December 2023, 041051). doi: 10.1103/physrevx.13.041051), and “circuits assisted by local operations and classical communication” (e.g. Piroli, 2021. 2021. Quantum Circuits Assisted by Local Operations and Classical Communication: Transformations and Phases of Matter. Physical Review Letters 127, 22 (November 2021, 220503). doi: 10.1103/physrevlett.127.220503).
Besides supporting advances in quantum hardware Córco., 2021. 2021. Exploiting Dynamic Quantum Circuits in a Quantum Algorithm with Superconducting Qubits. Physical Review Letters 127, 10 (August 2021, 100501). doi: 10.1103/physrevlett.127.100501 Graham, 2023. 2023. Midcircuit Measurements on a Single-Species Neutral Alkali Atom Quantum Processor. Physical Review X 13, 4 (December 2023, 041051). doi: 10.1103/physrevx.13.041051 Pino, 2021. 2021. Demonstration of the trapped-ion quantum CCD computer architecture. Nature 592, 7853 (April 2021, 209--213). doi: 10.1038/s41586-021-03318-4, hybrid classical-quantum computations are central to many quantum computing applications. As put recently by Alam and Clark Alam, 2024. 2024. Learning dynamic quantum circuits for efficient state preparation. arXiv: 2410.09030 [quant-ph]:
“[…] dynamic quantum circuits are a crucial milestone on the roadmap to fault-tolerant quantum computers.”
We have covered a small subset of applications of hybrid quantum-classical computations. Quantum teleportation is undoubtedly one of the oldest Bennett, 1993. 1993. Teleporting an unknown quantum state via dual classical and Einstein-Podolsky-Rosen channels. Physical Review Letters 70, 13 (March 1993, 1895--1899). doi: 10.1103/physrevlett.70.1895. The block-encoding technique that we discussed in section 2.3 is the foundation of several algorithms, including the Quantum Singular Value Decomposition (QSVT) Gilyén, 2019. 2019. Quantum singular value transformation and beyond: exponential improvements for quantum matrix arithmetics. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, June 2019. ACM, 193--204. doi: 10.1145/3313276.3316366 and the Linear Combination of Unitaries (LCU) Chakra., 2024. 2024. Implementing any Linear Combination of Unitaries on Intermediate-term Quantum Computers. Quantum 8 (October 2024, 1496). doi: 10.22331/q-2024-10-10-1496 Sze, 2025. 2025. Hamiltonian dynamics simulation using linear combination of unitaries on an ion trap quantum computer. arXiv: 2501.18515 [quant-ph]. Measurement-based quantum computing (MBQC) was introduced in Rausse., 2001. 2001. A One-Way Quantum Computer. Physical Review Letters 86, 22 (May 2001, 5188--5191). doi: 10.1103/PhysRevLett.86.5188 and is forming the base for some photonic quantum computing architectures Bartol., 2023. 2023. Fusion-based quantum computation. Nature Communications 14, 1 (February 2023). doi: 10.1038/s41467-023-36493-1 Bouras., 2021. 2021. Blueprint for a Scalable Photonic Fault-Tolerant Quantum Computer. Quantum 5 (February 2021, 392). doi: 10.22331/q-2021-02-04-392. Hybrid programs have also been shown to be useful for implementing the Quantum Fourier Transform (QFT) Bäumer, 2024. 2024. Quantum Fourier Transform Using Dynamic Circuits. Physical Review Letters 133, 15 (October 2024, 150602). doi: 10.1103/physrevlett.133.150602 and the Quantum Phase Estimation (QPE) algorithms Córco., 2021. 2021. Exploiting Dynamic Quantum Circuits in a Quantum Algorithm with Superconducting Qubits. Physical Review Letters 127, 10 (August 2021, 100501). doi: 10.1103/physrevlett.127.100501, two of the most fundamental computation primitives for quantum algorithms.
On the other hand, repeat until success schemes Paetzn., 2014. 2014. Repeat-until-success: non-deterministic decomposition of single-qubit unitaries. Quantum Information & Computation 14, 15–16 (November 2014, 1277–1301) are widespread in state preparation routines and will play a key role in fault-tolerant (FT) quantum computing. Arguably, the most well-known scheme for FT is magic state distillation Bravyi, 2005. 2005. Universal quantum computation with ideal Clifford gates and noisy ancillas. Physical Review A 71, 2 (February 2005, 022316). doi: 10.1103/PhysRevA.71.022316, a procedure expected to be a core building block of many FT architectures. State preparation is generally a ubiquitous problem for FT, as the error-correcting codes that are employed initiate computations starting from a logical zero state, which may be expensive to prepare on the qubits of the hardware Fowler, 2012. 2012. Surface codes: Towards practical large-scale quantum computation. Physical Review A 86, 3 (Septempter 2012, 032324). doi: 10.1103/physreva.86.032324.
Finally, quantum error-correcting (QEC) codes themselves must be implemented using hybrid programs. The quantum error correction (QEC) literature is vast and can get very technical very quickly, but diving into it promises bountiful rewards. The field is one of quantum information’s fastest-evolving areas of research. These work-in-progress lecture notes Gottes., 2024. 2024. Surviving as a Quantum Computer in a Classical World. (February 2024). Retrieved on 08/01/2025 (lecture notes) from https://www.cs.umd.edu/class/spring2024/cmsc858G/QECCbook-2024-ch1-15.pdf by a coryphaeus of the field make for excellent introductory material.
-
And while we’re on the topic of my supervisor’s brilliant work, there is also a very recent textbook, a sort of spiritual successor to Coecke, 2017. 2017. Picturing Quantum Processes: A First Course in Quantum Theory and Diagrammatic Reasoning. Cambridge University Press. doi: 10.1017/9781316219317, particularly focused on quantum compilation Kissin., 2024. 2024. Picturing Quantum Software: An Introduction to the ZX-Calculus and Quantum Compilation. Preprint. It is just as worth a read and might appeal more to the computer science-y reader. ↩︎
-
That is classical software written to control and optimise quantum computations. ↩︎
Chapter 3
Quantum Compilation as a Graph Transformation Problem
The specialised optimisation techniques that we reviewed in section 2.2 are effective for the scenarios they were designed for, but they are challenging to adapt to new hardware primitives, constraints, or cost functions.
This thesis proposes interpreting quantum compilation as a graph transformation system (GTS). GTSs endow quantum compilation with well-defined semantics and strong theoretical foundations Lack, 2005. 2005. Adhesive and quasiadhesive categories. RAIRO - Theoretical Informatics and Applications 39, 3 (July 2005, 511--545). doi: 10.1051/ITA:2005028. They establish a practical, purely declarative framework in which compiler transformations can be defined and studied.
This allows us to decouple the semantics of quantum programs and the architecture specifics from the compiler infrastructure itself. We can thus focus on building and designing scalable and efficient graph transformation algorithms that can then be applied on a wide range of compilation problems and hardware targets.
In this chapter, we formalise quantum computation and optimisation based on graphs and graph transformations, providing the foundation for all considerations in later chapters. Albeit slightly simplified, the intermediate representation IR we propose here is based on joint work Mark K., 2025. 2025. HUGR: A Quantum-Classical Intermediate Representation. Retrieved (talk recording) from https://www.youtube.com/live/D8esZrt7ogk?feature=shared&t=5217, as well as ongoing development.
The words graph rewrite and graph transformation are often used interchangeably in the literature. In the context of this thesis, we will take these words to distinguish two slightly different problems:
The study of equivalences and other relations between graphs under well-defined semantics is the subject of graph transformations. For instance:
- a graph transformation rule (Definition .) expresses that an instance of can always be transformed into an instance of , reflecting the semantics of the system that the graph is modelling.
- a minIR equivalence class (Definition .) is an instance of a graph transformation system (GTS), which uses known semantic relations, expressed, for example, as graph transformation rules, to define how graphs can be transformed.
Graph rewriting, on the other hand, encapsulates the algorithmic procedures and data structures that mutate graphs. A rewrite (Definition 3.9) is the tuple of data required to turn a graph into a new graph .
Given matches of patterns on a graph , a graph transformation system can consider the set of graph transformation rules that define the semantics of to produce a set of rewrites that can be applied to and mutate .
Our contributions in the subsequent chapters are mostly preoccupied with problems of graph rewriting, i.e. the definition and application of the data required to mutate graphs, as opposed to graph transformations. This chapter nonetheless considers both, using the mature graph transformation framework as a foundation to define IR rewriting semantics.
Section 3.1 starts with a review of previous related work at the intersection of graph transformation software and quantum program optimisation. We then discuss in section 3.2 a fundamental difference between classical computation graphs and the requirements of quantum computation. This motivates a new graph-based IR tailored to quantum computation that we present in section 3.3, along with formal graph rewriting semantics based on sesqui-pushout (SqPO) transformations (section 3.4). Whilst the SqPO transformation definition is constructive, its existence is not guaranteed. We conclude the chapter in section 3.5 by discussing a more restricted “operational” notion of graph rewriting that will be useful for the rest of the thesis.
3.1. Related work
Graph rewriting on computation graphs. Optimisation of computation graphs is a long-standing problem in computer science that is seeing renewed interest in the compiler Lattner, 2021. 2021. MLIR: Scaling Compiler Infrastructure for Domain Specific Computation. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), February 2021. IEEE, 2--14. doi: 10.1109/CGO51591.2021.9370308, machine learning (ML) Jia, 2019. 2019. TASO: optimizing deep learning computation with automatic generation of graph substitutions. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, October 2019. ACM, 47--62. doi: 10.1145/3341301.3359630 Fang, 2020. 2020. Optimizing DNN computation graph using graph substitutions. Proceedings of the VLDB Endowment 13, 12 (August 2020, 2734--2746). doi: 10.14778/3407790.3407857 and quantum computing communities Xu, 2022. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 Xu, 2023. 2023. Synthesizing Quantum-Circuit Optimizers. Proceedings of the ACM on Programming Languages 7, PLDI (June 2023, 835--859). doi: 10.1145/3591254. In all these domains, graphs encode computations that are either expensive to execute or evaluated repeatedly over many iterations, making the optimisation of the execution cost of the computation a primary concern.
Domain-specific heuristics are the most common approach in compiler optimisations Paszke, 2019. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Neural Information Processing Systems. doi: 10.5555/3454287.3455008 Sivara., 2020. 2020. t|ket⟩: a retargetable compiler for NISQ devices. Quantum Science and Technology 6, 1 (November 2020, 014003). doi: 10.1088/2058-9565/ab8e92 – a more flexible alternative are optimisation engines based on declarative sets of graph transformations Bonchi, 2022. 2022. String Diagram Rewrite Theory I: Rewriting with Frobenius Structure. Journal of the ACM 69, 2 (March 2022, 1 - 58). doi: 10.1145/3502719 Bonchi, 2022. 2022. String diagram rewrite theory II: Rewriting with symmetric monoidal structure. Mathematical Structures in Computer Science 32, 4 (April 2022, 511--541). doi: 10.1017/s0960129522000317. In such systems, a graph transformation system (GTS) is used to find a sequence of allowed transformations that rewrite a computation graph given as input into a computation graph with minimal cost.
Transformation systems were first studied on strings Dersho., 1990. 1990. Rewrite Systems, then generalised to trees and terms Bezem, 2003. 2003. Term Rewriting Systems (1. publ. ed.). Cambridge University Press, Cambridge, before being applied to graph domains Ehrig, 1973. 1973. Graph-Grammars: An Algebraic Approach. In 14th Annual Symposium on Switching and Automata Theory, Iowa City, Iowa, USA, October 15-17, 1973. IEEE Computer Society, 167--180. doi: 10.1109/SWAT.1973.11 Rozenb., 1997. 1997. Handbook of Graph Grammars and Computing by Graph Transformations, Volume 1: Foundations. World Scientific König, 2018. 2018. A Tutorial on Graph Transformation. In Graph Transformation, Specifications, and Nets - In Memory of Hartmut Ehrig. Springer, 83--104. doi: 10.1007/978-3-319-75396-6_5. Their use in quantum computing is part of a long tradition of diagrammatic reasoning in physics Penrose, 1964. 1964. Conformal treatment of infinity. General Relativity and Gravitation 43, 3 (901--922 (reprint)). doi: 10.1007/s10714-010-1110-5 Feynman, 1949. 1949. Space-Time Approach to Quantum Electrodynamics. Physical Review 76, 6 (Septempter 1949, 769--789). doi: 10.1103/physrev.76.769, and particularly in quantum mechanics with the advent of categorical quantum mechanics Abrams., 2008. 2008. Categorical quantum mechanics. arXiv: 0808.1023 [quant-ph] Coecke, 2012. 2012. Strong Complementarity and Non-locality in Categorical Quantum Mechanics. In 2012 27th Annual IEEE Symposium on Logic in Computer Science, June 2012. IEEE, 245--254. doi: 10.1109/lics.2012.35 Coecke, 2017. 2017. Picturing Quantum Processes: A First Course in Quantum Theory and Diagrammatic Reasoning. Cambridge University Press. doi: 10.1017/9781316219317.
GTS in quantum computing. In quantum computing, the ZX calculus Coecke, 2008. 2008. Interacting Quantum Observables and other diagrammatic theories that derive from it are particularly important. Properties of GTSs such as completeness, confluence and termination Verma, 1995. 1995. Transformations and confluence for rewrite systems. Theoretical Computer Science 152, 2 (December 1995, 269--283). doi: 10.1016/0304-3975(94)00255-0 are well-studied within this field Backens, 2014. 2014. The ZX-calculus is complete for stabilizer quantum mechanics. New Journal of Physics 16, 9 (Septempter 2014, 093021). doi: 10.1088/1367-2630/16/9/093021 Backens, 2019. 2019. ZH: A Complete Graphical Calculus for Quantum Computations Involving Classical Non-linearity. Electronic Proceedings in Theoretical Computer Science 287 (January 2019, 23--42). doi: 10.4204/EPTCS.287.2 Biamon., 2023. 2023. The ZX-Calculus is Canonical in the Heisenberg Picture for Stabilizer Quantum Mechanics. arXiv: 2301.05717 [quant-ph]. These results have formed the basis for software implementations of circuit optimisations with soundness and performance guarantees Duncan, 2020. 2020. Graph-theoretic Simplification of Quantum Circuits with the ZX-calculus. Quantum 4 (June 2020, 279). doi: 10.22331/q-2020-06-04-279 Kissin., 2020. 2020. PyZX: Large Scale Automated Diagrammatic Reasoning. In Proceedings 16th International Conference on Quantum Physics and Logic, Chapman University, Orange, CA, USA., 10-14 June 2019. Open Publishing Association, 229-241. doi: 10.4204/EPTCS.318.14 Sivara., 2020. 2020. t|ket⟩: a retargetable compiler for NISQ devices. Quantum Science and Technology 6, 1 (November 2020, 014003). doi: 10.1088/2058-9565/ab8e92 Borgna, 2023. 2023. Towards a compiler toolchain for quantum programs. PhD Thesis. Loria, Université de Lorraine.
Great strides are also being made in our theoretical understanding of transformation systems for quantum circuits. Recently, Clément et al. established completeness for the first time Cléme., 2023. 2023. A Complete Equational Theory for Quantum Circuits. In 2023 38th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), June 2023. IEEE, 1--13. doi: 10.1109/lics56636.2023.10175801 as well as minimality Cléme., 2024. 2024. Minimal Equational Theories for Quantum Circuits. In Proceedings of the 39th Annual ACM/IEEE Symposium on Logic in Computer Science, July 2024. ACM, 1--14. doi: 10.1145/3661814.3662088 of a GTS for quantum circuits. A set of circuit transformation rules were presented such that no rule is redundant, and for any two equivalent quantum circuits, there exists a sequence of local transformations rewriting one into the other. Such systems are however not confluent, and this is unlikely to change: most circuit optimisation problems are known to be computationally hard Weteri., 2024. 2024. Optimising quantum circuits is generally hard. arXiv: 2310.05958 [quant-ph].
There is also another inherent tension in integrating diagrammatic calculi into compilers. Diagrammatic theories arise from abstract primitives that admit a simple rewriting logic Heurtel, 2024. 2024. A Complete Graphical Language for Linear Optical Circuits with Finite-Photon-Number Sources and Detectors. arXiv: 2402.17693 [quant-ph] Booth, 2024. 2024. Graphical Symplectic Algebra. arXiv: 2401.07914 [cs.LO] Felice, 2023. 2023. Light-Matter Interaction in the ZXW Calculus. Electronic Proceedings in Theoretical Computer Science 384 (August 2023, 20--46). doi: 10.4204/EPTCS.384.2 Carette, 2023. 2023. Complete Graphical Language for Hermiticity-Preserving Superoperators. In 2023 38th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), June 2023. IEEE, 1--22. doi: 10.1109/LICS56636.2023.10175712; compilers meanwhile must capture all the expressivity, constraints and messiness of real-world hardware targets, with all the edge cases and exceptions that this entails.
An example of this is the ZX circuit extraction problem Quanz, 2024. 2024. Parallel Quantum Circuit Extraction from MBQC-Patterns. In 2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 1078-1087. doi: 10.1109/IPDPSW63119.2024.00179 Backens, 2021. 2021. There and back again: A circuit extraction tale. Quantum 5 (March 2021, 421). doi: 10.22331/q-2021-03-25-421: it is in general hard to recover an executable quantum circuit from a ZX diagram as the latter is strictly more general and primitives cannot be mapped one-to-one. Similarly, while simple quantum-classical hybrid computations can be expressed using extensions of ZX Borgna, 2021. 2021. Hybrid Quantum-Classical Circuit Simplification with the ZX-Calculus. In Programming Languages and Systems, Cham. Springer International Publishing, 121--139. doi: 10.1007/978-3-030-89051-3_8 Carette, 2021. 2021. Completeness of Graphical Languages for Mixed State Quantum Mechanics. ACM Transactions on Quantum Computing 2, 4 (December 2021, 1--28). doi: 10.1145/3464693 Koziel., 2024. 2024. Hybrid Quantum-Classical Machine Learning with String Diagrams. arXiv: 2407.03673 [quant-ph], it will never be possible to capture the full breadth and generality of classical CPU instruction sets in a practical and extensible (and algebraically satisfying) way.
Peephole optimisations. As an alternative to the very principled approach of elegant calculi, graph transformations can also be used in the absence of theoretical guarantees in a more ad hoc fashion. Indeed, many existing (classical and quantum) compiler optimisations can already be understood as graph transformations. For as long as compilation has existed, compilers have relied on local transformations of the IR, typically referred to as peephole optimisations McKeem., 1965. 1965. Peephole optimization. Communications of the ACM 8, 7 (July 1965, 443--444). doi: 10.1145/364995.365000 Tanenb., 1982. 1982. Using Peephole Optimization on Intermediate Code. ACM Transactions on Programming Languages and Systems 4, 1 (January 1982, 21--36). doi: 10.1145/357153.357155. Such optimisation strategies are based on the heuristic that local optimisations to the program will produce a well-optimised result overall. Mature compiler ecosystems have developed tools for declarative definitions, as well as automatic generation and correctness proving of peephole optimisations Menend., 2017. 2017. Alive-Infer: data-driven precondition inference for peephole optimizations in LLVM. ACM SIGPLAN Notices 52, 6 (June 2017, 49--63). doi: 10.1145/3140587.3062372 Lopes, 2015. 2015. Provably correct peephole optimizations with alive. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2015. ACM, 22--32. doi: 10.1145/2737924.2737965 Riddle, 2021. 2021. PDLL: a new declarative rewrite frontend for MLIR. (November 2021). Retrieved on 13/01/2025 (RFC on Discourse) from https://discourse.llvm.org/t/rfc-pdll-a-new-declarative-rewrite-frontend-for-mlir/4798. We refer to the classical compiler literature, e.g. Muchni., 2007. 2007. Advanced compiler design and implementation ([Nachdr.] ed.). Morgan Kaufmann, San Francisco, Calif. [u.a.], for more details on the various types of common peephole optimisations.
Quantum compilers adopted peephole-style optimisations from the beginning Cheung, 2007. 2007. Translation techniques between quantum circuit architectures. In Workshop on quantum information processing, 1--3 Steiger, 2018. 2018. ProjectQ: an open source software framework for quantum computing. Quantum 2 (January 2018, 49). doi: 10.22331/q-2018-01-31-49 Sivara., 2020. 2020. t|ket⟩: a retargetable compiler for NISQ devices. Quantum Science and Technology 6, 1 (November 2020, 014003). doi: 10.1088/2058-9565/ab8e92. They encompass some of the most common optimisations in quantum computing, including the Euler Angle reduction Chatzi., 2009. 2009. Improving quantum gate fidelities using optimized Euler angles. Physical Review A 80, 5 (November 2009, 052329). doi: 10.1103/physreva.80.052329, the two-qubit KAK decomposition Tucci, 2005. 2005. An Introduction to Cartan's KAK Decomposition for QC Programmers. arXiv: quant-ph/0507171 [quant-ph] Cross, 2019. 2019. Validating quantum computers using randomized model circuits. Physical Review A 100, 3 (Septempter 2019, 032328). doi: 10.1103/physreva.100.032328 and all gate set rebases contri., 2025. 2025. Documentation: pytket.passes.AutoRebase. Retrieved on 13/01/2025 (TKET docs) from https://docs.quantinuum.com/tket/api-docs/passes.html#pytket.passes.AutoRebase. A quantum-specific flavour of peephole optimisation with close links to GTSs, template matching Maslov, 2008. 2008. Quantum Circuit Simplification and Level Compaction. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 27, 3 (March 2008, 436--444). doi: 10.1109/tcad.2007.911334 Iten, 2022. 2022. Exact and Practical Pattern Matching for Quantum Circuit Optimization. ACM Transactions on Quantum Computing 3, 1 (January 2022, 1--41). doi: 10.1145/3498325, achieved state-of-the-art results for Clifford circuit optimisation Bravyi, 2021. 2021. Clifford Circuit Optimization with Templates and Symbolic Pauli Gates. Quantum 5 (November 2021, 580). doi: 10.22331/q-2021-11-16-580. Recently, quantum peephole optimisations were also proposed that leverage provable state information to perform contextual optimisations Liu, 2021. 2021. Relaxed Peephole Optimization: A Novel Compiler Optimization for Quantum Circuits. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), February 2021. IEEE, 301--314. doi: 10.1109/cgo51591.2021.9370310, similar to strength reduction and optimisation with preconditions in classical compilation Lopes, 2015. 2015. Provably correct peephole optimizations with alive. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2015. ACM, 22--32. doi: 10.1145/2737924.2737965.
Internal representations. The graph formalisation of quantum computations we will define in this chapter also draws a lot from the internal representations (IR) of programs in classical compilers. The classical compilation community has found significant advantages in sharing a common standardised IR format. Indeed, while the exact syntax constructs and abstractions vary across programming languages, and, at the other end of the compiler stack, the specific assembly instructions emitted differ between hardware targets, much of the compiler middleware can be broadly shared across use cases. This gave rise to the LLVM Lattner, 2004. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In International Symposium on Code Generation and Optimization, 2004. CGO 2004.. IEEE, 75--86. doi: 10.1109/CGO.2004.1281665 and, more recently, the MLIR Lattner, 2021. 2021. MLIR: Scaling Compiler Infrastructure for Domain Specific Computation. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), February 2021. IEEE, 2--14. doi: 10.1109/CGO51591.2021.9370308 projects, which provide common compiler IRs, along with all the infrastructure compilers typically require: IR transformation tooling, translation into hardware-specific assembly, efficient serialisations, in-memory formats etc.
The idea of adopting LLVM for quantum was championed by QIR QIR Al., 2021. 2021. QIR Specification v0.1. Retrieved on 31/12/24 from https://www.qir-alliance.org/, a standard introducing quantum primitives into the LLVM IR. This was subsequently adopted by many quantum hardware providers for its superior expressive power compared to circuit-based formats QIR Al., 2023. 2023. NVIDIA Joins the QIR Alliance as the Effort Enters Year Two. Retrieved on 02/01/2025 from https://www.qir-alliance.org/posts/year_one_in_review/. Building on top of QIR, an IR specifically for quantum-classical programs was proposed in Mark K., 2025. 2025. HUGR: A Quantum-Classical Intermediate Representation. Retrieved (talk recording) from https://www.youtube.com/live/D8esZrt7ogk?feature=shared&t=5217, with additional soundness guarantees based, among others, on the no-cloning principle of quantum information. In parallel, projects with similar aims have also emerged McCask., 2021. 2021. A MLIR Dialect for Quantum Assembly Languages. In 2021 IEEE International Conference on Quantum Computing and Engineering (QCE), October 2021. IEEE. doi: 10.1109/QCE52317.2021.00043 Ittah, 2022. 2022. QIRO: A Static Single Assignment-based Quantum Program Representation for Optimization. ACM Transactions on Quantum Computing 3, 3 (June 2022, 1--32). doi: 10.1145/3491247 that use the full MLIR and LLVM toolchain.
Challenges of GTS for compilation. Peephole optimisations of compiler IRs have proven to be a fast, general and scalable approach to compilation and code optimisation in practice. However, the optimisation results depend heavily on well-designed transformation orderings and the performance may vary widely across (equivalent) input programs. This is commonly known in compiler research as the phase ordering problem Click, 1995. 1995. Combining analyses, combining optimizations. ACM Transactions on Programming Languages and Systems 17, 2 (March 1995, 181--196). doi: 10.1145/201059.201061. When a compiler can modify code in multiple ways, it must determine which transformations to apply and in what sequence to achieve optimal results Whitfi., 1997. 1997. An approach for exploring code improving transformations. ACM Transactions on Programming Languages and Systems 19, 6 (November 1997, 1053--1084). doi: 10.1145/267959.267960 Liang, 2023. 2023. Learning Compiler Pass Orders using Coreset and Normalized Value Prediction. In Proceedings of the 40th International Conference on Machine Learning. JMLR.org. doi: 10.48550/ARXIV.2301.05104. This is a common design challenge in GTSs, often addressed through mechanisms such as rule controls Heckel, 2020. 2020. Graph Transformation for Software Engineers: With Applications to Model-Based Development and Domain-Specific Language Engineering. Springer International Publishing. doi: 10.1007/978-3-030-43916-3.
This issue is also a key challenge within quantum compilation, as can be verified by comparing the performance of peephole-based compilers with provably optimal circuit synthesis strategies. On problem sizes where exhaustive search is feasible, unitary synthesis tools can sometimes outperform current, mostly peephole-based compilers Sivara., 2020. 2020. t|ket⟩: a retargetable compiler for NISQ devices. Quantum Science and Technology 6, 1 (November 2020, 014003). doi: 10.1088/2058-9565/ab8e92 by up to 50%1 Wu, 2020. 2020. QGo: Scalable Quantum Circuit Optimization Using Automated Synthesis. arXiv: 2012.09835 [quant-ph].
-
at the cost of many hours of compute, of course. ↩︎
3.2. Computation graphs and linearity
Computation graphs represent the flow of data between operations in a program, with nodes as operations and edges as data dependencies. Widely used in machine learning frameworks and GPU optimisations Bergst., 2011. 2011. Theano: Deep learning on gpus with python. In NIPS 2011, BigLearning Workshop, Granada, Spain Zhao, 2023. 2023. AutoGraph: Optimizing DNN Computation Graph for Parallel GPU Kernel Execution. Proceedings of the AAAI Conference on Artificial Intelligence 37, 9 (June 2023, 11354--11362). doi: 10.1609/aaai.v37i9.26343, they are conceptually equivalent to dataflow graphs used in compiler design, which were pioneered by Feo, 1990. 1990. A report on the sisal language project. Journal of Parallel and Distributed Computing 10, 4 (December 1990, 349--366). doi: 10.1016/0743-7315(90)90035-n and Kahn, 1976. 1976. Coroutines and networks of parallel processes. PhD Thesis. IRIA and are now central to most compiler IRs.
In classical computations, these graph representations of computations are essentially term graphs Barend., 1987. 1987. Term Graph Rewriting – sets of algebraic expressions that are stored as trees, combined with an important optimisation known as term sharing. When identical subexpressions appear multiple times, they can be represented as one computation and referenced from multiple locations, creating a directed acyclic graph rather than a term tree Plump, 1999. 1999. Term Graph Rewriting. (October 1999). This sharing enables a more efficient representation. It can also be used as a compiler optimisation to identify subexpressions that can be cached and shared across expression evaluations for a more efficient execution – a technique known as common subexpression elimination (CSE) Cocke, 1970. 1970. Global common subexpression elimination. In Proceedings of a symposium on Compiler optimization -. ACM Press, 20--24. doi: 10.1145/800028.808480.
Each edge of a computation graphs corresponds to a unique value: the output of a previous computation that is being passed on to new operations. These values flow along edges in the graph – hence dataflow graph. Values are immutable: they are defined once and then passed as input to further operations, where they can only be consumed, never modified. In compiler speak, programs expressed using such immutable values are often called single static assignment (SSA) programs Cytron, 1991. 1991. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems 13, 4 (October 1991, 451--490). doi: 10.1145/115372.115320 Rosen, 1988. 1988. Global Value Numbers and Redundant Computations. In Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages - POPL ’88. ACM Press, 12--27. doi: 10.1145/73560.73562. In SSA:
- Every value is defined exactly once,
- Every value may be used any number of times (including zero).
Quantum computing throws this second pillar of SSA into the bin. Values in quantum computations are the result of computations on quantum data, and as such must obey the no-cloning and no-deleting theorems (section 2.1). We call values subject to these restrictions linear1. They introduce the following constraint on valid computation graphs:
Every linear value must be used exactly once.
Linear values change fundamentally how transformations of the computation graph must be specified. Where compilers on classical data can:
- freely share common subexpressions (term sharing),
- undo term sharing, i.e. duplicate shared terms into independent subterms, and
- delegate the identification and deletion of obsolete code to specialised passes (e.g., dead code elimination Cytron, 1991. 1991. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems 13, 4 (October 1991, 451--490). doi: 10.1145/115372.115320 Briggs, 1994. 1994. Effective partial redundancy elimination. ACM SIGPLAN Notices 29, 6 (June 1994, 159--170). doi: 10.1145/773473.178257),
quantum compilers must enforce much stricter invariants on IR transformations – or risk producing invalid programs.
In classical compilers, IR modification APIs (such as MLIR’s PatternRewriter) decouple program transformation from code deletion. Program transformations are specified by copying existing values and introducing new values and operations as needed, while the actual deletion of unused code is deferred to specialized dead code elimination passes. This approach is no longer feasible in the presence of linear values. Computation graphs for quantum computations must adopt proper graph rewriting semantics, in which the explicit deletion of obsolete values and operations is just as much a part of the rewriting data as the new code generation.
-
The terminology comes from “linear” logic Girard, 1987. 1987. Linear logic. Theoretical Computer Science 50, 1 (1--101). doi: 10.1016/0304-3975(87)90045-4. I apologise for slamming additional semantics on what I recognise is an already very overloaded term. ↩︎
3.3. A graph representation for quantum programs
For the purposes of this thesis, we introduce a simplified graph-based IR for quantum computations that we will call minIR. It captures all the expressiveness that we require for hybrid programs whilst remaining sufficiently abstract to be applicable to a variety of IRs and, by extension, quantum programming languages and compiler frameworks.
MinIR can be thought of as being built from statements of the form
x, y, ... := op(a, b, c, ...)
to be understood as an operation op applied on the SSA values a, b, c, ...
and producing the output values x, y, ....
Computation and dataflow graphs are commonly defined with operations as vertices
and values as edges. To faithfully capture the function signature of op, this
requires storing and preserving an ordering of the incoming and outgoing edges
(also known as port graphs Ferná., 2018. 2018. Labelled Port Graph – A Formal Structure for Models and Computations. Electronic Notes in Theoretical Computer Science 338 (October 2018, 3--21). doi: 10.1016/j.entcs.2018.10.002).
Instead, we adopt the hypergraph formalisation of computation graphs, which is more common within category theory (see string diagrams Seling., 2010. 2010. A Survey of Graphical Languages for Monoidal Categories and their formalisation in the hypergraph category Bonchi, 2022. 2022. String Diagram Rewrite Theory I: Rewriting with Frobenius Structure. Journal of the ACM 69, 2 (March 2022, 1 - 58). doi: 10.1145/3502719 Wilson, 2022. 2022. The Cost of Compositionality: A High-Performance Implementation of String Diagram Composition. Electronic Proceedings in Theoretical Computer Science 372 (November 2022, 262--275). doi: 10.4204/eptcs.372.19). This definition is particularly well-suited for our purposes because it frames the graph transformations of interest to us in the well-studied language of rewriting within adhesive categories Lack, 2004. 2004. Adhesive Categories. In Foundations of Software Science and Computation Structures, Berlin, Heidelberg. Springer Berlin Heidelberg, 273--288. doi: 10.1007/978-3-540-24727-2_20.
Hypergraphs and minIR #
At a minimum, a directed hypergraph – for simplicity sometimes in the following referred to simply as graph – is defined by a set of vertices and a set of (hyper) edges . We will always consider hypergraphs where the edges are directed and the vertices attached to are given by ordered lists. We formalise this incidence relation between vertices in and edges in by writing as the partition over the disjoint sets
and introducing source and target maps for each . Why we write sets in boldface will become clear in a moment.
A directed hypergraph is given by sets and for , along with maps
for all .
Note that in this thesis, as in most common uses of hypergraphs, the sets and will always be finite, and thus for a finite number of only.
For simplicity, we can further omit the subscript of the source and target maps whenever it can be inferred from the domain of definition of the map. For , we call the source vertices of and the target vertices of .
We introduce the notation to signify that there is an edge from to , i.e. there is for some and such that and . We define the equivalence relation of connected vertices, given by the transitive, symmetric and reflexive closure of . The equivalence classes of are the connected components of the graph. We will write , resp. for the connected component that contains the vertex , resp. the edge .
To proceed, it is useful to frame the hypergraph definition in a categorical setting. We write for the presheaf topos of the category , i.e. the category with functors as objects and natural transformations as morphisms. Definition 3.1 can be equivalently restated as:
hypergraphs are objects in the presheaf topos ,
where the category has objects and for and arrows given by (1), now interpreted as morphisms in rather than as functions in . In this framing, a graph is a functor that defines a set for each object of and specifies functions between these sets – one for each arrow in .
This is where the distinction between bold and non-bold typeface comes from: we use bold letters to refer to images in of a hypergraph functor, whereas the non-bold typeface is refers to objects in the indexing category . The distinction between and is less important for morphisms – it will typically be clear from the context. We thus use the same symbols for both.
Linearity constraints #
The introduction of not only gives us a notion of hypergraph homomorphisms – maps between hypergraphs that preserve the structure of the graph. It also provides us with a way to express the linearity constraints that arise from our discussion in section 3.2, and which we must enforce on our computation graphs.
The definition that follows adds the coproduct explicitly as an object of the category (which we did not need to do in Definition 3.1), as we need it as the codomain of the new morphisms and . The adhesitivity of the category does no longer comes for free – we will get back to this in section 3.4.
The category is the category given by objects Its arrows are the incidence morphisms given in (1), along with
and for all . The morphisms are split monomorphisms and the following diagrams commute for all and :
Directed hypergraphs with linearity conditions are objects in the full subcategory given by objects such that is the coproduct in and are the injections into .
We probably owe an explanation for this definition – at least for the sake of the few computer scientists that are still following us.
First of all, notice that every hypergraph with linearity constraint corresponds to a hypergraph in the sense of Definition 3.1: there is an obvious functor that maps each object and morphism in to the object or morphism with the same name in . By contravariance, we can thus (functorially) map every hypergraph with linearity constraints to the hypergraph .
Another way of looking at this is to realise that by requiring that be split monomorphisms, we obtain that the resulting functions in are injective. Up to isomorphism, we can consider that are subsets of vertices in . A hypergraph with linearity constraints is thus a directed hypergraph with two selected subsets of vertices and .
Vertices within these subsets are special. For every , there exist unique indices , and edge such that . In words, for every there is a unique edge in the hypergraph that has as one of its sources. We then say that is the unique use of vertex . Similarly, vertices in have a unique edge in the hypergraph with as one of its targets – it is the unique definition of .
Typed graphs #
MinIR graphs are strongly typed. We introduce typed graphs for this purpose, a concept for which graph transformation was first formalised in Ehrig, 2004. 2004. Fundamental Theory for Typed Attributed Graph Transformation. In Graph Transformations, Berlin, Heidelberg. Springer Berlin Heidelberg, 161--177. doi: 10.1007/978-3-540-30203-2_13. A type system for directed hypergraphs is just another object . A typed graph is then an object of the slice category , that is to say, typed graphs are morphisms of and morphisms between typed graphs are given by the subset of morphisms of that make the triangle diagram formed by , and commute.
To type hypergraphs with linearity constraints, we do not pick the type system in , as the existence of and morphisms impose restrictions that are too strict. We consider instead the category given by the same objects as , as well as the same morphisms, with the omission of and . There is an obvious functor and thus, by contravariance, every hypergraph with linearity constraints can be mapped to a hypergraph in . We say that a hypergraph with linearity constraints is -typed for a type system if there is a morphism , when interpreting is as an object of .
Two example hypergraphs. Vertices (labelled with capital letters) are circles and hyperedges (labelled with small letters) span between them. Vertices that are attached to an edge in the black half of the circle correspond to source vertices of the edge; those in the white half correspond to target vertices. The functions and map edges to the incident vertices, defining the directed hypergraph. On the left, there are further functions and that map vertices to the unique edge that uses or defines them. This defines a hypergraph with linearity constraints, with and . These cannot be defined on the graph on the right. On the other hand, the edges b and c are in , and are in the domain of functions and , thus defining a child region of a hierarchical graph. Note that it would be invalid to have any edge connecting or to or . The and vertices also have incidence morphisms, not displayed here.
Hierarchical hypergraphs #
A final bit of structure that minIR graphs require is a notion of hierarchy
between regions of the graph. This will be useful to define functions, control
flow blocks such as if-else, or any subroutine that can itself be viewed as an
operation in the computation.
Hierarchical hypergraphs were first proposed in Drewes, 2002. 2002. Hierarchical Graph Transformation. Journal of Computer and System Sciences 64, 2 (March 2002, 249--283). doi: 10.1006/jcss.2001.1790 and further generalised in Busatto, 2005. 2005. Abstract hierarchical graph transformation. Mathematical Structures in Computer Science 15, 4 (July 2005, 773--819). doi: 10.1017/s0960129505004846 Palacz, 2004. 2004. Algebraic hierarchical graph transformation. Journal of Computer and System Sciences 68, 3 (May 2004, 497--520). doi: 10.1016/s0022-0000(03)00064-3. However, we opt to use a more restrictive definition, closer to the notion of flattened hypergraphs of Drewes, 2002. 2002. Hierarchical Graph Transformation. Journal of Computer and System Sciences 64, 2 (March 2002, 249--283). doi: 10.1006/jcss.2001.1790. The reason for this is twofold. Firstly, hierarchical (hyper)graphs are typically defined recursively. It is not obvious under which conditions (and if) such definitions form adhesive categories, although progess in this direction was made in Padberg, 2017. 2017. Hierarchical Graph Transformation Revisited: Transformations of Coalgebraic Graphs. In Graph Transformation, Cham. Springer International Publishing, 20--35. doi: 10.1007/978-3-319-61470-0_2 with the introduction of coalgebraic graphs. As a result, to the extent that graph transformation results can be applied to such structures, it must be done so carefully.
The second, more practical, reason is that the notion of typed graph introduced above cannot be directly lifted to the hierarchical graph setting: while some subset of the hierarchical relation in minIR should be enforced by the type system, the type graph of a nested graph should be identical to the parent’s (as opposed to being itself nested within the type graph of the parent).
It is therefore more convenient for us to encode hierarchy in directed hypergraphs as follows. Note that it is not clear that our definition is adhesive either1 – but at least it is framed as a subcategory of a base category that is.
The category is the category with objects and arrows of along with additional objects for and arrows:
- that are split monomorphisms,
- input arrows , and output arrows .
Hierarchical hypergraph are the objects in the full subcategory given by objects such that
- for any edge of , the set has at most one element.
- the transitive and reflexive closure of for is a partial order on the connected components of .
Here and are the functions with domain defined piecewise as and for all on their respective (disjoint) domains of definition.
The same definition can also be applied to to obtain the category of hierarchical directed hypergraphs with linearity constraints . Similarly, we define the associated category for type systems; however, we do not impose any of the two conditions related to for the type category, i.e. is the full presheaf category, rather than a subcategory of it.
As with the incidence morphisms and , we will drop the subscript for the IO arrows and when it can be inferred from the domain of definition.
Just as in the discussion of Definition 3.2, we interpret the split monomorphisms as equivalent to requiring . Taking over terminology from Drewes, 2002. 2002. Hierarchical Graph Transformation. Journal of Computer and System Sciences 64, 2 (March 2002, 249--283). doi: 10.1006/jcss.2001.1790, we call elements the frames of . For each frame , there is thus a unique input edge and a unique output edge in that have respectively targets and zero sources, and sources and zero targets.
By the first condition we imposed on , the partial function mapping connected components to their parent edge:
is well-defined. We call the subgraphs of that share a same parent a region of 2. The subgraph of vertices and edges without parent is the root region of .
The minIR computation graph #
In minIR, the vertices are the values of the computation, while the hyperedges define the operations. This imposes some constraints for a hypergraph to be a valid minIR graph:
- all values in minIR must have a unique operation that defines them;
- values that are linear must also have a unique operation that uses them;
- the graph must be acyclic, meaning that no value can be defined in terms of itself.
This can be expressed as a hypergraph with linearity constraints by choosing and , where is the subset of linear values.
The following definition then comes as no surprise:
Let be a type system. A minIR graph typed in is an object of that is -typed and such that the adjacency relation is acyclic. We call , the linear values of .
In the context of minIR, relations encode the data flow of values across the computation. The lack of explicit operation ordering differentiates minIR (and HUGR) from most classical IRs, which, unless specified otherwise, typically assume that instructions may have side effects and thus cannot be reordered. All quantum operations (and the classical operations we are interested in) are side-effect free, which significantly simplifies our IR.
Input and output values #
Notice that in Definition 3.4, it is not enforced that every value has a definition, i.e. there might be ; nor that every value with a linear type is in , i.e. if is the typing morphism and are the linear types in the type system, there might be .
This would be easy to fix: we could on the one hand enforce the equality , thus guaranteeing that every value has a unique definition in the graph. On the other hand, we could define as the coproduct , where is a new object introduced to explicitly capture the set of non-linear values. Morphisms in this category would guarantee that the linearity of values is always preserved, and thus in particular the type morphism would map a value to a linear type if and only if it is linear.
Instead, we opt to allow undefined values and unused linear values to be able to express rewrite rules that match sugraphs of minIR graphs within the same category.
For a minIR graph with typing morphism , we call the set the input values of and its output values, where are the linear values in .
If , we say that is IO-free.
Note that by this definition, an output value in always has a linear type! This is because non-linear values do not need to be treated specially when they are outputs: unlike linear values that must always be used in a non-output position, non-linear values may have no outgoing edge, in which case they are simply discarded in the computation.
Structured control flow #
The operations and values of a minIR graph define the data flow of a program. However, a program must also be able to control and change the data flow at run time in order to express loops, conditionals, function calls etc. This is the program control flow, which minIR expresses using regions and so-called structured control flow.
Using regions, any non-trivial control flow (function calls, conditionals, loops etc.) is captured by a frame, a “black box” operation within the data flow of the program. Its implementation is then defined in the nested region of the frame. This can be used for function calls, but also for branches of control flow. A simple function call, that unconditionally redirects the control flow to the operations within a nested region, could for example be represented as follows:
In this figure and below, circles are SSA values (the vertices of the
hypergraph), while the edges spanning between them are the operations. Edges
attached to the white half of circles are value definitions, while hyperedges
attached to the black half of circles are value uses. The call and +
operation can be read left to right: for instance, the two values x and y on
the left of call are inputs (the operation uses those values, and is thus
attached to the black half of the circles), whereas the x + y value on the
right is the output of the call operation (the operation defines this value,
which is thus attached to the white half of the circle).
Dashed arrows indicate hierarchical dependencies that map the frame edge to the two input and output edges in the child region; dashed rectangles mark the non-root regions of the graph.
Importantly, the frame edge representing the call operation must intuitively
“forward” all its input values to the in operation of the child region, and
similarly passes on the value at the out operation to the output value of the
call output in the parent graph. Passing function arguments and retrieving
returned values in this fashion will be very familiar to any computer scientist.
Unlike most programming languages, this is also how in minIR values are passed
to and from any control flow constructs we would wish to model.
In terms of graph structure, this relation between values in parent and child
regions means that the arity and types of the inputs and outpus of call fix
the signatures of the child in and out operations.
Definition 3.3 already ensures that the input and
output arities of in and out are correct. The correct typing of these input
and output values will be ensured by the type system, which we discuss in a
separate section below.
To handle constructs that require more than one child region, such as an
if-else statement, we can use frames that have zero input and one output:
The output of the ifblock is intuitively a higher-order type representing an
operation that takes two inputs and ouputs the sum.
An if-else statement might then look as follows:
The if and else blocks must expect the same input and output values. This is
key to respecting any linearity constraints that values passed to ifelse might
be subject to. By definition, all operations that use or define a value will
be in the same region – in other words, values are only available within their
defining region. This in effect implements “variable scoping”. With some
imagination, this construction can easily be adapted to model loops, complex
control flow graphs, or any other control flow structures.
Why not plain branch statements? #
There is a simpler – and at least as popular – way of expressing control flow in IRs without requiring regions and operation hierarchies, using branch statements3. For instance, LLVM IR provides a conditional branch statement
br i1 %cond, label %iftrue, label %iffalse
that will jump to the operations under the iftrue label if %cond is true and
to the iffalse label otherwise.
This is a simple and versatile approach to control flow that can be used to express any higher-level language constructs. Unfortunately, conditional branching does not mix well with linear values.
Linearity, as defined in Definition 3.4, is a simple
constraint to impose on minIR graphs. In the presence of conditional branching,
however, the constraint would have to be relaxed to allow for single use in
each mutually exclusive branch of control flow. For instance, the following two
uses of b should be allowed (in pseudo-IR):
b := h(a)
<if cond> c := x(b)
<else> d := h(b)
This is a much harder invariant to check on the IR: linearity would no longer be enforceable as a syntactic constraint on the minIR graph as in Definition 3.4, but would instead depend on the semantics of the operations to establish mutual exclusivity of control flow4. Forbidding arbitrary branching in minIR and resorting instead to structured control flow as described above to express control flow is just as expressive and gives the linearity constraint a much simpler form.
Type graph #
We have seen how minIR graphs impose some structure on the types of
computations that can be expressed: a linear value cannot be used by two (or
zero) operations, frames will always have a unique in and out operation in
their child region with correct arities, etc.
However, without a “good” type system and associated semantics, it is still
possible to express nonsensical programs: we have mentioned for instance earlier
that it is up to the type system to enforce that the types of the in and out
operations match the types of the frame. Similarly, it is possible to construct
programs that break linearity: take the ifelse operation discussed in the
previous section, but now replace its semantics to be do-in-parallel, i.e. it
will execute both the if-block and else-block in parallel on the inputs that
it is given. This would violate the linearity of its inputs, but would
nonetheless be a syntactically valid minIR graph!
To resolve this we present here some typed operations, along with their semantics, that can be used to construct well-behaved type systems: programs typed in this system model the kind of quantum programs that we are interested in expressing and are guaranteed to be valid computations. Categorising all valid constructions or an exhaustive enumeration of conditions that type systems must satisfy to guarantee the validity of programs is beyond the scope of this thesis. It is in practice often straightforward to combine and extend the elements presented here to support further custom syntactic constructs and types.
Basic types and operations #
The most elementary types in our computations are Bits and Qubits. The
former is typically known as a Boolean and represents the purely classical
values 0 and 1. The latter is the canonical quantum example of a linear
type. Indeed, just like values in minIR graphs, the type system in
distinguishes between linear and
non-linear types.
Other typical classical types such as integers, floats, strings, custom
algebraic data types (ADT) etc could also be introduced as required. In the
figure below, we for instance introduce the Angle type to represent rotation
angles that parametrise quantum gates. Further examples of linear types, on the
other hand, include higher-dimensional qudits, but also any ADT that contains a
linear type within it.
As we saw in section 2.1, the number of input qubits in pure
quantum operations will match the number of output qubits: the single-qubit h
(hadamard gate) and two-qubit cx (controlled NOT) operations thus have one or
two Qubits as both inputs and outputs rz (Z rotation) is also a single-qubit
operation, but it takes an additional input of type Angle to specify the
rotation angle.
On the pure classical side, we are free to add any side-effect free operations
on our types; in our example we model addition + on Angles and negation
not on Bits. In the type system
, each type is represented
by a single vertex.
In our example, we thus have three vertices:
We introduce a different colour for each type. Operations such as cx are
represented by a hyperedge with two sources on Qubit and two targets on
Qubit. As in the previous diagrams, we can distinguish operation inputs from
outputs by whether they are attached to the dark or light half of the type
vertex: the rz operation thus has one Qubit input, one Angle input and one
Qubit output.
As you can tell from the diagram, whilst Qubit is a linear type in the type
system, it is not a linear value in the sense of a minIR graph: the Qubit type
has multiple uses and defines in the cx operation alone. This is the key
difference between and
.
Qubit allocation and measurement #
We also introduce non-pure quantum operations qalloc and measure which
respectively “create” a qubit (so no input, one Qubit output) and “destroy” it
(one Qubit input, one Bit output – depending on whether the qubit was
projected onto the or state). Remember that the reason these
operations seem to “break” the laws of pure quantum physics is because they
result from interactions with the classical environment.
measure is fundamental, as it connects quantum values with classical ones!
Region definition and structured control flow #
Our type system is so far missing a crucial aspect of minIR: the hierarchical structures. For this we need frame types, i.e. frames in the type graph. We must introduce a distinct type for each possible type signature of a frame. To keep this as simple as possible, we will introduce exactly one type for each signature.
If we write for the set of types in our type system (i.e. Bit, Qubit and
Angle in our example), then a type signature of an edge is given by a pair
of ordered lists of types . For each such
pair, we introduce
- the frame type
regiondef<I, O>, - the in and out types
in<I,O>andout<I,O>, - along with a new non-linear type
Region<I, O>, the higher-order type representing a region with inputs and outputs .
The regiondef<I, O> op takes zero inputs and returns one output
Region<I, O>, whereas in<I,O> takes zero inputs and returns values of type
and out<I,O> takes inputs of type and returns nothing. For instance,
for Qubit, Qubit and Qubit, Bit, we have the
following type graph.
Note that there is an important distinction in
in comparison to
: there is no notion of regions in the type system:
the Qubit and Bit types in the above diagram would be in the child region of
regiondef<I, O> if it were a graph in , but in the
type system, they might also be used by other operations in other regions (such
as cx, rz, h etc. defined earlier).
Using the Region<I,O> types, it is then easy to define typed operations for
any structured control flow of interest, such as the if-else example above.
The following figure gives an overview of the entire type system of our example.
For display purposes, we have included multiple copies of each type vertex; we
remind the reader that in the actual type graph, all circles of the same type
(colour) are one and the same.
A complete minIR type graph, following the example in this section. Value vertices with the same label (and same colour) form a single vertex in the type graph. They have been split into multiple vertices in this representation for better readability. The data types and op types with the <I,O> suffix are parametrised on the signature type for .
An example minIR program #
Taking a step back, let us make the introduced ideas more concrete through an example. We demonstrate how a simple program written in textual form can be translated and expressed as a minIR graph. All statements are of the form
x, y, ... := op(a, b, c, ...)
where a, b, c etc are the SSA values passed to op (or used by op),
and x, y etc are the SSA values returned by op (or defined by op). We
use curly bracket to define the child region of a regiondef operation. A valid
minIR program might then look as follows:
1main := regiondef<(Qubit, Qubit), (Qubit, Bit)> {
2 q0, q1 := in()
3
4 q0_1 := h(q0)
5 q0_2, q1_1 := cx(q0_1, q1)
6
7 m0 := measure(q0_2)
8
9 ifregion := regiondef<(Qubit,), (Qubit,)> {
10 q1 := in()
11 out(q1)
12 }
13 elseregion := regiondef<(Qubit,), (Qubit,)> {
14 q1 = in()
15 q1_1 := h(q1)
16 out(q1_1)
17 }
18 q1_2 := ifelse(m0, q1_1, ifregion, elseregion)
19
20 out(q1_2, m0)
21}
Note that the in() and out(..) operations are only allowed within nested
regions (as required by the type system). We have omitted the type parameters on
these operations, as it mirrors exactly the paremeter of the regiondef.
It corresponds to the two minIR graphs on the following page. We use “wiggly hyperedges” that stretch between values, as in the first figure. They may look unusual if you are used to computation graphs. One can opt to draw the same graph with boxes for hyperedges and wires for values, yielding the second figure. The two representations are equivalent, but the rewriting semantics are most explicit when viewing values as vertices.
An example of an IO-free minIR graph. The vertex colours indicate their types in the type system presented in the previous figure. The main, ifregion and elseregion ops are all of op type regiondef (with type parameters omitted), labelled here with custom names for clarity. The type parameters of the ifelse, in and out op type have similarly been omitted. All other operation types are given as labels on the edges.
An equivalent representation of the computation above, now representing operations as boxes and values as wires. The arrow direction indicates the flow from value definition to value use(s). Dashed arrows have been changed to point to regions instead of individual operations.
Differences to the quantum circuit model #
We conclude this presentation of minIR by highlighting the differences between this IR-based representation and the quantum circuit model that most quantum computing and quantum information scientists are familiar with5.
When restricted to purely quantum operations and no nested regions, the string diagram representation of a minIR graph (i.e. operations as boxes and values as wires) looks very similar to a quantum circuit. There is, however, a fundamental shift under the hood from reference to value semantics – to borrow terminology from C++.
In the reference semantics of quantum circuits, operations are typically thought of as “placed” on a qubit (the “lines” in the circuit representation), for instance, by referring to a qubit index. This qubit reference exists for the entire computation duration, and the quantum data it refers to will change over time as operations are applied to that qubit.
In the value semantics of computation graphs and SSA, on the other hand, qubits only exist in the form of the data they encode. When applying an operation, the (quantum) data is consumed by the operation and new data is returned. Given that the input data no longer exists, linearity conditions are required to ensure that no other operation can be fed the same value.
To make the difference clear, compare the program representations of the following computation:
Quantum circuit (pytket)6
import pytket as tk
circ = tk.Circuit(2)
circ.H(0)
circ.CX(0, 1)
circ.X(1)
SSA (minIR)
q0_0, q1_0 := in()
q0_1 := h(q0_0)
q0_2, q1_1 := cx(q0_1, q1_0)
q1_2 := x(q1_1)
out(q0_2, q1_2)
In value semantics, it becomes much harder to track physical qubits across their lifespan. This has very practical implications: without the convenient naming scheme, it would, for example, be non-trivial to count how many qubits are required in the SSA representation of the computation above. However, it is a drastically simpler picture from the point of view of the compiler and the optimiser – hence its popularity in classical compilers. When operations are defined based on qubit references, the compiler must carefully track the ordering of these operations: operations on the same qubit must always be applied in order. Through multi-qubit gates, this also imposes a partial ordering on operations across different qubits that must be respected.
SSA values remove this dependency tracking altogether: the notion of physical qubit disappears, and the ordering of statements becomes irrelevant. All that matters is connecting each use of a value (i.e. an input to an operation) with its unique definition, the output of a previous operation. In other words, the global ordering imposed by reference semantics is replaced by a causal order on the diagram Kissin., 2019. 2019. A categorical semantics for causal structure. Logical Methods in Computer Science Volume 15, Issue 3 (August 2019). doi: 10.23638/lmcs-15(3:15)2019.
All the concepts of minIR embed themselves very easily within the MLIR-based quantum IRs and the HUGR IR Mark K., 2025. 2025. HUGR: A Quantum-Classical Intermediate Representation. Retrieved (talk recording) from https://www.youtube.com/live/D8esZrt7ogk?feature=shared&t=5217. In this sense, our toy IR serves as the minimum denominator across IRs and compiler technologies so that proposals and contributions we are about to make can be applied regardless of the underlying technical details.
By waiving goodbye to the circuit model, we have been able to integrate much of the theory of traditional compiler design, bringing us in the process much closer to traditional compiler research and the large-scale software infrastructure that is already available. This gives us access to all the classical optimisation and program transformation techniques developed over decades. Using structured control flow, we were also able to model linear resources such as qubits well – by using value semantics and SSA, checking that no-cloning is not violated is as simple as checking that each linear value is used exactly once.
Finally, this new design is also extremely extensible. Not only does it support arbitrary operations, but the type system is also very flexible. There is dedicated support for linear types, but this does not have to be restricted to qubits: lists of qubits could be added or even, depending on the target hardware, higher dimensional qudits, continuous variable quantum data, etc.
-
Note that a region may not be a connected subgraph. Albeit, it is a simple exercice to convince yourself that any non-root region contains either one or two connected components. ↩︎
-
You may know this from prehistoric times as the
gotostatement, in languages such as Fortran, C, and, yes, even Go. ↩︎ -
You might be thinking “oh, but all that is required here are phi nodes!”, if you are familiar with those. No – you’d also need a sort of “phi inverse”. Besides, see this discussion for more arguments on why no phi nodes. ↩︎
-
Note that these comments apply specifically to characteristics of quantum circuits. Other diagrammatic representations of quantum processes in use, such as string diagrams, quantum combs etc may not share the same properties. ↩︎
-
This is python code:
pip install pytket. ↩︎
3.4. Graph transformation in minIR
As discussed in section 3.2, computation graphs with linear values, such as minIR, must adopt strict graph transformation semantics to ensure that linear constraints are satisfied at all times. In this section, we use the minIR graph category presented in the previous section to define transformation semantics that lean on the double pushout (DPO) Ehrig, 1976. 1976. Parallelism of manipulations in multidimensional information structures and sesqui-pushout (SqPO) Corrad., 2006. 2006. Sesqui-Pushout Rewriting. In Graph Transformations, Berlin, Heidelberg. Springer Berlin Heidelberg, 30--45. doi: 10.1007/11841883_4 semantics in adhesive categories Lack, 2005. 2005. Adhesive and quasiadhesive categories. RAIRO - Theoretical Informatics and Applications 39, 3 (July 2005, 511--545). doi: 10.1051/ITA:2005028.
Adhesivity of hypergraph categories #
The natural place to start this section is by studying which of the categories defined in section 3.3 are adhesive. From adhesivity follows that transforming graphs using DPO and SqPO constructions is well-defined and unique, at least in the regimes of interest to us.
A category is said to be adhesive if it has all pullbacks and pushouts along monos, as well as some compatibility conditions between them, the so-called “Van Kampen squares”. We refer to the literature (e.g. Lack, 2005. 2005. Adhesive and quasiadhesive categories. RAIRO - Theoretical Informatics and Applications 39, 3 (July 2005, 511--545). doi: 10.1051/ITA:2005028) for a complete definition. For our purposes, the following two results are sufficient:
- Every presheaf topos is adhesive (Corollary 3.6 in Lack, 2005. 2005. Adhesive and quasiadhesive categories. RAIRO - Theoretical Informatics and Applications 39, 3 (July 2005, 511--545). doi: 10.1051/ITA:2005028);
- Every full subcategory of an adhesive category is adhesive if the pullbacks and pushouts in of objects in are again in (a simple result; if the Van Kampen squares commute in , they must commute in ).
A first result immediately follows from the first result:
It is a presheaf.
This does not immediately generalise to , as unlike , Definition 3.2 imposes that be a coproduct. However, the result still holds:
is a full subcategory of the adhesive category . We must show the existence of pullbacks and pushouts along monos in .
Pullbacks. Consider a pullback of in , with . We must show that is in . Colimits are computed pointwise in presheaves, so we know that is the pullback of in . If we can show that is the coproduct of for , then we are done.
Let . Because and are coproducts in Set, i.e. a disjoint union, there must be such that and . By naturality of and , it follows that and . But by commutativity of the pullback diagram, , and thus and . We conclude by unicity of the pullback that and thus .
Pushouts. The same argument as for pullbacks also applies to pushouts: given a pushout of in with , an element that makes the pushout square commute must have preimages in and for some . Thus the pushout distributes over the coproduct, and we can conclude that is the coproduct of pushouts.
The same argument also applies to 1.
Now to the spicy stuff:
is a presheaf – hence adhesive.
The following pushout square shows that cannot be adhesive: the pushout square is valid in , but the pushout at the bottom right is not in , because the child regions cannot each be assigned a unique parent.
Double pushout semantics #
From Proposition 3.3, it follows that minIR graph transformations can be performed through the double pushout (DPO) construction Ehrig, 1976. 1976. Parallelism of manipulations in multidimensional information structures in the category.
A transformation rule in an adhesive category is a span . For objects , we then write or if there is a matching morphism and a context object along with morphisms and such that the following diagram commutes and both squares are pushouts:
If the DPO transformation exists for some rule and match , then we say is a valid DPO rewrite.
To ensure that a DPO rewrite is valid in minIR, we must impose certain conditions. Let be an IO-free minIR graph, i.e. , there is a morphism in for some type system and .
A DPO rewrite is a valid minIR DPO rewrite if there is a transformation in and
- is left-mono, i.e. the morphism is mono,2
- the pushout complement and pushout also exist in the slice category ,
- satisfies the hierarchy condition of Definition 3.3,
- is IO-free.
We know by construction that . We must show that further satisfies the constraints to be an object in the full subcategory of minIR graphs.
The first condition is standard in DPO and guarantees that and are unique if they exist.
The third condition we impose on corresponds directly to the constraint that defines hierarchical graphs in . The fourth condition ensures that valid minIR DPO rewrites map IO-free graphs to IO-free graphs.
Finally, the second condition is imposed to ensure well-typedness of . The functor that forgets the and morphisms is a left adjoint (it possesses a right Kan extension defined pointwise), and thus preserves colimits. The images of and thus form pushout squares in , and by unicity, must match the pushout squares in . Hence is well-typed.
The restriction to rewrites of IO-free graphs is not a restriction of
generality: if we are interested in rewriting computations with inputs and
outputs, we can always express them as IO-free graphs by adding input and
output ops with the values in as outputs, respectively as inputs. We
assign them dedicated types distinct from all other operations; these operations
will never be matched by transformation rules and can be removed at the end of
rewriting.
Generalising to sesqui-pushouts #
We restricted minIR rewrites to DPO transformations obtained form left-mono rules, to ensure that the construction is unique. This excludes rules that may identify two values in but split them into two different values in . Such rules allow for cloning values, which is a useful transformation in minIR for non-linear values. An example of a transformation rule that we would like to allow in minIR:
For this example we added a 2x operation that multiplies an angle value passed
as input by two. The transformation rule replaces a rotation of angle
by two rotations of angle by cloning the input angle.
Such semantics are possible using the sesqui-pushout construction (SqPO) by Corradini et al. Corrad., 2006. 2006. Sesqui-Pushout Rewriting. In Graph Transformations, Berlin, Heidelberg. Springer Berlin Heidelberg, 30--45. doi: 10.1007/11841883_4. We can reuse the same notation: when DPO is restricted to left-mono rules as we have done, SqPO is a generalisation of DPO (i.e. the construction coincides whenever the DPO exists).
A transformation rule in an adhesive category is a span . For objects , we then write or if there is a matching morphism and a context object along with morphisms and such that is the final pullback complement of and the right square is a pushout:
If the SqPO transformation exists for some rule and match , then we say is a valid (SqPO) rewrite.
The left square is redundant in the diagram above, as it follows from the requirement that be the final pullback complement (FPC). It is kept to highlight the similarities to DPO. As the commuting diagram indicates, the final pullback complement (FPC) construction forms a pullback square. Furthermore, unlike pushout complements, the FPC is defined by a universality property that ensures uniqueness if it exists. We refer to Corrad., 2006. 2006. Sesqui-Pushout Rewriting. In Graph Transformations, Berlin, Heidelberg. Springer Berlin Heidelberg, 30--45. doi: 10.1007/11841883_4 for the exact FPC construction.
With SqPO, we can define the set of valid minIR rewrites as given by the SqPO transformations in satisfying the relaxed set of conditions
- the pushout complement and pushout also exist in the slice category ,
- satisfies the hierarchy condition of Definition 3.3,
- is IO-free.
We conclude this section with a discussion of some of the properties of minIR transformations using SqPO (referring again to Corradini Corrad., 2006. 2006. Sesqui-Pushout Rewriting. In Graph Transformations, Berlin, Heidelberg. Springer Berlin Heidelberg, 30--45. doi: 10.1007/11841883_4 or König, 2018. 2018. A Tutorial on Graph Transformation. In Graph Transformation, Specifications, and Nets - In Memory of Hartmut Ehrig. Springer, 83--104. doi: 10.1007/978-3-319-75396-6_5 for a more detailed explanation of the concepts discussed):
Deletion in unknown context. A key difference between DPO and SqPO transformations is that SqPO transformations on graphs will delete edges attached to a vertex that is deleted by the transformation rule (i.e. but of the rule). The DPO transformation on the other hand is only well-defined when all edges incident to are in the image of and thus explicitly deleted (this is known as the dangling condition).
As minIR rewrites follow SqPO semantics, transformation rules such as the following are allowed:
Here denotes the multiplication of angles and the zero angle. Any operation that would be connected to the starred value on the left would be deleted by this rule. However such an implicit operation deletion only yields valid minIR graphs if all incident values are non-linear and none of the target values of the deleted operation are used.
Non-left-mono rules. As discussed in the introduction to SqPO, the
cloning of values is allowed in minIR rewrites. However, linear values may never
be cloned (the FPC or pushout will not exist in these cases). Thus any minIR
transformation rule will be left-mono on linear values. It must further be
left-linear on all (linear and non-linear) values in that are mapped to
outputs in : if a value is produced by op applied to , then cloning
and 'op will result in two definitions of .
Non-right-mono rules. Non-right-mono rules are allowed in both DPO and SqPO. They result in vertex merges. In minIR, the situation for right-mono is symmetric to left-mono: the map must be mono on linear values (otherwise the same value will have multiple uses or definitions) and it must be mono on all values in that are mapped to inputs in (otherwise a value in the rewritten minIR graph will have more than one value definition).
-
In fact, a much simpler argument applies: the category is isomorphic to the presheaf category , where is obtained from by removing the object . Adhesivity follows. ↩︎
-
This is often called left-linear in the literature. We avoid this term in this thesis to avoid confusion with the linearity property of values in minIR. ↩︎
3.5. MinIR rewriting, operationally
The previous section proposed to view minIR rewrites as the result of a (DPO or SqPO) graph transformation. This yields valid rewriting semantics elegantly (and with little effort!). However, the conditions that must be imposed on the transformation to be valid, along with the fact that pushouts may not exist mean that the existence of a rewrite given a transformation rule and a match is not guaranteed.
In this section, we address this by considering a more restricted notion of minIR rewriting, for which the existence of the right-hand side of the rewrite is guaranteed. In addition, in place of the categorical presentation of the last section, we express the rewriting operation operationally, i.e. as data and a procedure on sets that translates directly into an algorithmic implementation.
We find that this rewrite definition is sufficient in practice. We conclude the section with an example of how more complex rewrites can be achieved by composition of simpler rewrites that can be expressed in this framework.
Graph glueings and rewrites #
Throughout, we consider graph glueings on disjoint vertex and (hyper)edge sets. To underline this, we will use the symbol to denote disjoint set unions.
As we will be working exclusively with vertex and edge sets in this section (as opposed to the objects in the indexing category), we will drop the bold typeface for sets, writing e.g. instead of for the set of vertices of a hypergraph.
Finally, all minIR graphs in this section are IO free.
We define local graph rewrites in terms of graph glueings. Consider first the case of two arbitrary graphs and , along with a relation . Let be the equivalence relation induced by , i.e. the smallest relation on that is reflexive, symmetric and transitive, and satisifes for all and ,
Then, we can define
- is the set of all equivalence classes of , and
- for , is the equivalence class of that belongs to.
The glueing of and according to the glueing relation is given by the vertices and the edges
We write the glueing graph as .
In other words, the glueing is the disjoint union of the two graphs, with identification (and merging) of vertices that are related in .
This allows us to define a rewrite on a graph :
A rewrite on a graph is given by a tuple , with
- is a graph called the replacement graph,
- is the vertex deletion set,
- is the edge deletion set, and
- is the glueing relation, a partial function that maps a subset of the deleted vertices of to vertices in the replacement graph.
The domain of definition is known as the boundary values of .
A graph rewrite per this definition can always be generated by a single pushout (SPO) transformation Löwe, 1991. 1991. Extended algebraic graph transformation. PhD Thesis. Technical University of Berlin.
- define as the graph . Then the injection is the match morphism ;
- the partial map maps a subset of to vertices in the replacement . By injectivity of the match morphism, it also defines a partial map .
We opted for SPO-like semantics in this definition, as they are the simplest to write in set-theoretic terms and coincide with DPO and SqPO in our restricted domain of interest.
The result of the rewrite is computed by gluing the right-hand side to the context subgraph of given by
The partial function is a special case of a glueing relation , and thus defines a glueing of with . The rewritten graph resulting from applying to is
An example of a graph rewrite is given in the next figure. This is equivalent to an SPO transformation with the graph induced by on the left-hand side, the graph on the right-hand side and the partial map given by .
Application of a graph rewrite. On the left, the original graph along with the replacement graph (grey box). On the right, the rewritten graph . Only the vertex has been deleted, as other vertices in are in the boundary (in orange). The (singleton) edge deletion set is red. The blue edge connects a vertex of to a boundary vertex, and is thus also present on the right-hand side. The purple edge, on the other hand, connects a vertex of to a non-boundary vertex of , and is thus deleted.
When there are no edges between and (purple in the example above), this definition corresponds to graph rewrites that can be produced using DPO transformations (see discussion in section 3.4). Otherwise, such edges are deleted.
The notions of graph glueing and graph rewrite can straightforwardly be lifted to hypergraphs and, by extension, to minIR graphs. Notice that in this case, values are glued together, not operations (the former were defined as the graph’s vertices, the latter as its hyperedges).
However, the glueing of two valid minIR graphs – and the result of applying a valid rewrite – may not be a valid minIR graph. Glueing two values of a linear type, for instance, is a sure way to introduce multiple uses (or definitions) of it. Thus, we must be careful to only consider glueings and rewrites of minIR graphs that preserve all the constraints we have imposed in Definition 3.4.
Ensuring rewrite validity: interfaces #
As a sufficient condition for valid minIR rewrites, we introduce minIR interfaces, a concept closely related to the “hypergraph with interfaces” construction of Bonchi, 2017. 2017. Confluence of Graph Rewriting with Interfaces or the supermaps of quantum causality Hefford, 2024. 2024. A Profunctorial Semantics for Quantum Supermaps. In Proceedings of the 39th Annual ACM/IEEE Symposium on Logic in Computer Science, July 2024. ACM, 1--15. doi: 10.1145/3661814.3662123. We eschew the presentation of holes as a slice category in favour of a definition that fits naturally within minIR and is sufficient for our purposes.
Let be a -typed minIR graph with data types and linear types . Consider type strings . We define the index sets
corresponding respectively to the set of all indices into and the subset of indices of linear types. For any , we denote by the type at position i in .
We define a partial order 1 on where and say that can be coerced into if there exists an index map such that
- types are preserved: , and
- is well-defined and bijective on the restriction to indices of linear types
Let be a set of data types. An interface is a pair of type strings .
We say that an interface can be coerced into an interface , written , if and .
We can define the interface associated with an operation in a minIR graph by considering the values used and defined by . Calling the type morphism on and assuming to be an operation in with inputs and outputs, we define the interface of in as the pair of strings in
Similarly, we can assign interfaces to subgraphs of minIR graphs:
Consider a subset of values and operations and . Define the use and define boundary sets
The tuple of is called a minIR subgraph of if there exists a region of such that all boundary values of are in :
We write to indicate that is a minIR subgraph of .
Note that is exactly the set of inputs in the non-IO free minIR graph given by the subgraph of the minIR graph. is a superset of the outputs of : it includes all linear values in that do not have a use in , but also any non-linear value that has a use outside of .
Unlike interfaces, subgraph boundary values are not ordered. An ordering of is a string along with a bijective map
If there are strings and orderings of and
then we can set and in complete analogy to operations. We will write and for the strings and respectively. We say that the subgraph implements the interface
where the type morphism was extended element-wise to strings .
Remark, though, that unlike operations, the same subgraph may implement more than one interface as a result of various choices of orderings and .
As mentioned, the subgraph forms a non-IO free minIR graph. We can always construct an IO-free minIR graph from by adding two operations and in the root region respectively in and inputs-outputs, defined by
We call the resulting graph an interface graph. It implements the
interface if implements . Calling to mind the illustrations of
section 3.3, looks like one of the nested regions
within regiondef operations that we were considering.
MinIR operation rewrite #
Consider
- an operation in a minIR graph with values
- an interface graph with values and its associated subgraph , such that implements an interface
- the index maps and that define the generalisation (per Definition 3.10).
We can define a glueing relation
This is almost enough to define a rewrite that replaces the operation in with the values and operations of – the interface compatibility constraint that we have imposed ensures that the resulting minIR graph is valid. Unfortunately, is not a partial function as required by Definition 3.4.
This is resolved in the following proposition:
Let , and such that , as defined above. Then
i.e. the graph obtained by removing the operation from the glueing of and along , is a valid minIR graph.
There is a graph with values and a partial function such that the graph (13) is the graph , obtained from the rewrite
We call the rewrite of into .
The definition of the rewrite of into a graph behaves as one would expect – the only subtleties relate to handling non-linear (i.e. copyable) values at the boundary of the rewrite. The following example illustrates some of these edge cases.
Rewriting operation in the graph (top left) into the operations and of the graph (bottom left). Coloured dots indicate the index maps and from inputs of to inputs of , respectively from outputs of to outputs of .
When the index maps and are not injective (yellow and green dots), values are merged, resulting in multiple uses of the value (i.e. copies). This is why the index maps must be injective on linear values (dots in shades of blue). Value merging also happens when a value is used multiple times in (yellow and red dots). This will never happen with linear values (as they can never have more than one use in ), nor with any value definitions (the same value can never be defined more than once). Finally, values not in the image of or (purple dot) are discarded. This case is also excluded for linear values by requiring surjectivity.
We start this proof with the explicit construction of and . Define as the smallest equivalence relation such that
Then we define , the graph obtained by glueing together values within the same equivalence class of .
Claim 1: is a valid minIR graph.
Claim 1 follows from the observation that only values of non-linear types are glued together. If , then either or there exist such that If , then is not injective on and , and by the definition of , and . Otherwise, there are such that . The same value is used twice, which is only a valid minIR graph if and are not linear, thus proving Claim 1.
Define as the subgraph obtained from by removing the operations . Let be the set of values of (and of ). Writing for the equivalence class of that belongs to, we can define as:
Claim 2: is a partial function .
In other words, for all , then . Let and be values in . First of all, for all , otherwise is not acyclic. So either , or , but not both.
The simpler case: if , then there exists such that . Furthermore is unique because by minIR definition, has a unique definition in . It follows from (12) that and hence .
Otherwise, there exists and such that and as well as . By definition of , we have , and thus
proving Claim 2.
Claim 3: is given by .
It follows directly from our construction of and that the equivalence classes of (the smallest equivalence relation closure of) is equal to the equivalence classes of (the smallest equivalence relation closure of) . The claim follows by Definition 3.8 and the definition of .
And finally, Claim 4: is a valid minIR graph.
Per Definition 3.4, We must check four properties: (i) every value is defined exactly once, (ii) every linear value is used exactly once, (iii) the graph is acyclic, and (iv) every region has (at most) one parent.
(iii) follows from the fact that and are acyclic and a single operation in is replaced: any cycle across and would also be a cycle in by replacing the subpath in with . (iv) follows from the fact that and are in the root region of , by definition of interface implementation. (i): removing from removes the unique definitions of all values that are targets of . Each such value is glued to a unique value in – the new and unique definition of in (ii) follows from the same argument as in (i), but relying on injectivity of on linear values to establish uniqueness.
Arbitrary minIR rewrites #
We have so far defined rewrites of single operations into graphs . We can generalise these rewrites to rewrite subgraphs , provided the minIR subgraphs satisfy some constraints. We require for this a notion of convexity, as discussed in Bonchi, 2022. 2022. String diagram rewrite theory II: Rewriting with symmetric monoidal structure. Mathematical Structures in Computer Science 32, 4 (April 2022, 511--541). doi: 10.1017/s0960129522000317.
As usual, let us consider a minIR graph with values , linear values , edges , the incidence maps and as well as their inverses and . Consider further a subgraph of that we will now call , to distinguish from .
Let us further define the partial morphism that maps a value to the parent of the region of .
A minIR subgraph is convex if the following conditions hold:
- for all , any path along from to contains only vertices in ,
- parent-child relations are contained within the subgraph, i.e.
Define the sets of boundary values and , as in (10); then fix the boundary orderings and as in (11). The subgraph implements the interface
Consider an interface graph that implements such that . Instead of defining a gluing relation from values of an operation to values of , we replace the interface with . This generalises the definition of from (12) to a glueing defined as
With the set of boundary operations defined as2
we are able to define minIR rewrites in their most general form.
Let and such that and is convex, as defined above. Then,
i.e. the graph obtained by removing the values and operations from the glueing of and along , is a valid minIR graph.
There is a graph with values and a partial function such that the graph (16) is the graph , obtained from the rewrite
We call the rewrite of into .
Consider an operation that implements . We can define the interface graph given by three operations , and . Its associated subgraph only includes . Let be the glueing relation
Consider the rewrite . If we write for the subgraph of given by then according to (9), the graph resulting from applying to can be expressed as the glueing
Our claim is that is a valid minIR graph.
The graph (16) is then obtained by applying the rewrite as given by (14) to . Defining the rewrite as the composition of followed by , the result follows from our claim and Proposition 3.5.
We now prove the claim, by showing the four properties of minIR graphs as per Definition 3.4. Property i) requires showing that every value is defined exactly once. As is obtained by removing values and operations from a valid minIR graph , no value in can be defined more than once. A value that is not defined in must be in the boundary of . By the boundary definitions of (10), cannot be in and thus must be in . It follows by the definition of the glueing that in , will be in the definitions of : . The glueing is bijective between the values of and and thus we can conclude that has a unique definition in .
The same argument applies to property ii). Property iii) follows from the convexity requirement of . Finally, property iv) (every region has at most one parent) follows from two observations. First, by convexity of , no deleted value or operation could be the parent of any value not in , and thus the relation is well-defined on : . Secondly, all new values and operations added to the boundary region of are from the root region of , and thus do not have a parent, ensuring that parent uniqueness is preserved.
This simple and limited graph transformation framework captures a remarkably large set of minIR program transformations. It may seem at first that the restriction to boundary values within a single region of Definition 3.11, as well as the convexity requirements of Definition 3.12 represent significant limitations on the expressivity of the rewrites. In practice, however, the semantics of minIR operations can be used to decompose more complex rewrites into a sequence of simple rewrites to which Proposition 3.6 applies.
Consider minIR graphs with a type system that includes regiondef and call
operations as discussed in examples of the previous section – respectively
defining a code block by a nested region and redirecting control flow to a code
block defined using a regiondef. Then all constraints that we impose on
rewriting can be effectively side-stepped using the region outlining and
value hoisting transformations.
Region outlining moves a valid minIR subgraph into its own separate region,
and replaces the hole left by the subgraph in the computation by a call
operation to the newly outlined region.
Value hoisting moves a value definition within a region to its parent region and passes the value down to the nested region through an additional input. In case of linear values, we can similarly hoist the unique use of the value to the parent region.
Using these transformations, non-convex subgraphs can always be made convex by taking the convex hull and outlining any parts within it that are not part of the subgraph. Outlined regions can then be passed as additional inputs to the subgraph. Step 1 of the figure below illustrates this transformation. Similarly, a subgraph that includes operations without their parent can be extended to cover the entire region and its parent, outlining any parts of the region that are not part of the subgraph.
Finally, whenever a boundary value belongs to a region that is not the top level region of the subgraph3, we can repeatedley hoist to its parent region until it is in the top level region. The value is then recursively passed as argument to descendant regions until the region that it is required in. Subgraphs can thus always be transformed to only have input and output boundary values at the top level region. Step 2 of the figure below illustrates this transformation.
A non-convex minIR graph rewrite, obtained by decomposition into valid convex rewrites, using outlining and hoisting. For simplicity, regiondef operations were made implicit and represented by nested boxes: a region within an operation corresponds to a region definition that is passed as an argument to the operation. Edge colours correspond to value types. Step 1 outlines the ... operations into a dedicated region, which step 2 hoists outside of the region being rewritten. Step 3 and 4 together correspond to a minIR sugraph rewrite. They have been split into two steps following the proof strategy. Step 4 is an instance of a minIR operation rewrite.
-
To be precise, is a partial order on the type strings up to isomorphism. ↩︎
-
The set operations and are again understood to apply to the unordered set of elements contained in the lists and . ↩︎
-
We can always extend a subgraph to contain more ancestor regions, until there is indeed a unique top-level region in the subgraph. ↩︎
Chapter 4
Pattern Matching in large Graph Transformation Systems
To our knowledge, the first practical proposal for a GTS-based quantum compiler was presented in Xu, 2022. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 and then refined in Xu, 2023. 2023. Synthesizing Quantum-Circuit Optimizers. Proceedings of the ACM on Programming Languages 7, PLDI (June 2023, 835--859). doi: 10.1145/3591254. In these, the set of possible graph transformations is obtained by exhaustive enumeration. Using SAT solvers and fingerprinting techniques, the set of all small programs up to a certain size can be generated ahead of time and clustered into disjoint partitions of equivalent programs. This concisely expresses every possible peephole optimisation up to the specified size: for every small enough subset of operations of an input program, its equivalence class can be determined. Any replacement of that set of operations with another program in the same equivalence class is a valid transformation and, thus, a potential peephole optimisation. Transformation systems on minIR graphs based on equivalence classes were formalised in section 3.4.
First results of this approach are promising. Xu, 2022. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 demonstrated that optimisation performance improves markedly with larger sets of transformation rules. Such workloads however rely heavily on pattern matching, the computational task that identifies subgraphs on which transformation rules apply. In Xu, 2022. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 and Xu, 2023. 2023. Synthesizing Quantum-Circuit Optimizers. Proceedings of the ACM on Programming Languages 7, PLDI (June 2023, 835--859). doi: 10.1145/3591254, pattern matching is carried out separately for each pattern. This becomes a significant bottleneck for large rule sets. In Xu, 2022. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433, performance peaks at around 50,000 transformation rules, after which the additional overhead from pattern matching becomes dominant, deteriorating the compilation results.
In this chapter, we solve these scaling difficulties by presenting an algorithm for pattern matching on minIR graphs that uses a pre-computed data structure to return all pattern matches in a single query. The set of transformation rules is directly encoded in this data structure. After a one-time cost for construction, pattern-matching queries can be answered in running time independent of the number of rules in the transformation system.
The asymptotic complexity results presented in this chapter depend on some simplifying assumptions on the properties that the pattern graphs and embeddings must satisfy. This represents a restriction on the generality of minIR graphs, but we do not find that they restrict the usefulness of the result in practice. As discussed in section 4.7, none of these assumptions are required in practice for the implementation. We have not observed any impact on performance when the imposed constraints are lifted, so we conjecture that at least some of these assumptions can be relaxed and our results generalised.
After a discussion of related work in section 4.1, Section 4.2 presents the assumptions that we are making in detail, along with some relevant definitions for the rest of the chapter. Sections 4.3, 4.4, and 4.5 present the core ideas of our approach, respectively introducing: a reduction of minIR graphs to equivalent trees, a canonical construction for the tree reduction and an efficient way to enumerate all possible subtrees of a graph. We also prove bounds on the size and number of the resulting trees.
In section 4.6, we introduce a pre-computation step and show that the pattern-matching problem reduced to tree structures can be solved using a prefix tree-like automaton that is fixed and pre-computed for a given set of patterns. Combining the automaton construction with bounds from section 4.5 leads to the final result. We conclude in section 4.7 with benchmarks on a real-world dataset of 10000 quantum circuits, obtaining a 20x speedup over a leading C++ implementation of pattern matching for quantum circuits.
4.1. Related work
Our proposed solution can be seen as a specialisation of RETE networks Forgy, 1982. 1982. Rete: A fast algorithm for the many pattern/many object pattern match problem. Artificial Intelligence 19, 1 (Septempter 1982, 17--37). doi: 10.1016/0004-3702(82)90020-0 Varró, 2013. 2013. A Rete Network Construction Algorithm for Incremental Pattern Matching and derivatives Ian, 2003. 2003. The execution kernel of RC++: RETE*, a faster RETE with TREAT as a special case. International Journal of Intelligent Games and Simulation 2, 1 (Feb 2003, 36-48) Armstr., 2014. 2014. Memory Efficient Stream Reasoning onResource-Limited Devices. PhD Thesis. University of Dublin, Trinity College Mirank., 1987. 1987. TREAT: a new and efficient match algorithm for AI production systems. PhD Thesis. Columbia University, 2960 Broadway New York, NYUnited States to the case of graph pattern matching. The additional structure obtained from restricting our considerations to graphs results in a simplified network design that allows us to derive worst-case asymptotic runtime and space bounds that are polynomial in the parameters relevant to our use case1 – overcoming a key limitation of RETE.
Another well-studied application of large-scale pattern matching is in the context of stochastic biomolecular simulations Sneddon, 2010. 2010. Efficient modeling, simulation and coarse-graining of biological complexity with NFsim. Nature Methods 8, 2 (December 2010, 177--183). doi: 10.1038/nmeth.1546 Bachman, 2011. 2011. New approaches to modeling complex biochemistry. Nature Methods 8, 2 (January 2011, 130--131). doi: 10.1038/nmeth0211-130, particularly the Kappa project Danos, 2004. 2004. Formal molecular biology. Theoretical Computer Science 325, 1 (Septempter 2004, 69--110). doi: 10.1016/j.tcs.2004.03.065. Stochastic simulations depend on performing many rounds of fast pattern-matching for continuous Monte Carlo simulations Yang, 2008. 2008. Kinetic Monte Carlo method for rule-based modeling of biochemical networks. Physical Review E 78, 3 (Septempter 2008, 031910). doi: 10.1103/physreve.78.031910. However, unlike our use case, the procedure typically does not need to scale well to a large number of patterns. In Danos, 2007. 2007. Scalable Simulation of Cellular Signaling Networks, Danos et al. introduced a pre-computation step to accelerate matching by establishing relations between patterns that activate or inhibit further patterns. This idea was later expanded upon and formalised in categorical language in Boutil., 2017. 2017. Incremental Update for Graph Rewriting. The ideas presented in Boutil., 2017. 2017. Incremental Update for Graph Rewriting are similar to ours; their formalism has the advantage of being more general but does not present any asymptotic complexity bounds and suffers from similar worst-case complexities as RETE.
A similar problem has also been studied in the context of multiple-query optimisation for database queries Sellis, 1988. 1988. Multiple-query optimization. ACM Transactions on Database Systems 13, 1 (March 1988, 23–52). doi: 10.1145/42201.42203 Ren, 2016. 2016. Multi-Query Optimization for Subgraph Isomorphism Search. Proceedings of the VLDB Endowment 10, 3 (November 2016, 121–132). doi: 10.14778/3021924.3021929, but it has limited itself to developing caching strategies and search heuristics for specific use cases. Finally, using a pre-compiled data structure for pattern matching was already proposed in Messmer, 1999. 1999. A decision tree approach to graph and subgraph isomorphism detection. Pattern Recognition 32, 12 (December 1999, 1979--1998). doi: 10.1016/S0031-3203(98)90142-X. However, with a space complexity – is the input size and the pattern size – it does not scale to large input graphs, even for small patterns.
-
RETE networks have been shown to have exponential worst-case space (and thus time) complexity Rakib, 2018. 2018. An Efficient Rule-Based Distributed Reasoning Framework for Resource-bounded Systems. Mobile Networks and Applications 24, 1 (October 2018, 82--99). doi: 10.1007/s11036-018-1140-x, although performance in practical use cases can vary widely Uddin, 2016. 2016. Resource-Bounded Context-Aware Applications: A Survey and Early Experiment. ↩︎
4.2. Preliminaries and simplifying assumptions
For simplicity, we will throughout consider minIR graphs that admit a type system , though most of the results can also be adapted to other graph domains. We will write for the types of (i.e. its values) and for the operation types (i.e. its edges).
Linear paths and operation splitting #
An operation type in the type system is a hyperedge. Its endpoints
are strings of data types that define the input and output signature of the operation . We can refer to the set of all hyperedge endpoints of using the string indices ( denotes the disjoint set union):
Fix a partition of into disjoint pairs
where the last set of the partition may be a singleton if is odd. For every , we then define new split operation types , each with two endpoints: the -th operation type has endpoints and in . For every operation of type , we can then split into operations each of arity 1 or 2 and of types respectively. We will refer to the graph transformation that replaces an operation in a minIR graph with the operations for as operation splitting.
It is important to note that the splitting of an operation is unique and given by the type of , and thus invariant under (typed) morphisms: there is a morphism of a pattern into a graph if and only if there is a morphism from the split pattern into the split graph .
A transformation rule splitting an operation with 3 sources and 2 targets. The choice of endpoint partition made here, obtained by pairing the -th use with the -th define, is arbitrary but convenient for quantum gates as they correspond to the input and output values of a same qubit.
The endpoint partitions also define linear paths. Two values in a minIR graph are on the same linear path if there are values with and such that is connected to through an operation and they correspond to the same pair of endpoints in the endpoints partition (i.e. the indices of correspond to values and in ).
Linearity assumption and rigidity #
Recall that in Definition 3.2, and refer to the subset of values that are within the domain of definition of and respectively. For this chapter, we will assume ; in other words, minIR graphs are IO-free (this is w.l.o.g., see discussion after Proposition 3.4) and all values are linear1 (this is definitely not w.l.o.g.!).
As a result of this assumption, the subcategory of minIR graphs that we consider forms a rigid category, as introduced by Danos et al. Danos, 2014. 2014. Transformation and Refinement of Rigid Structures. The definition, which we reproduce here, is given in terms of morphisms that intersect all components of the codomain. We refer to Danos, 2014. 2014. Transformation and Refinement of Rigid Structures for the precise definition of that notion: in the context of linear-valued minIR graphs, this is equivalent to requiring that the image of the graph homomorphism intersects every connected component of the codomain.
A category is rigid if for all morphisms in that intersects all components of and for all that factorises as , then is unique.
In other words, there is a unique way to extend a morphism to a morphism , if it exists. If we interpret and as graph patterns that we are interested in matching with , then rigidity guarantees that there is (at most) a unique way to extend a match morphism into a match morphism on the larger pattern .
The linearity assumption also has other useful consequences. Every linear value has exactly one use and one definition. As a result, all linear paths are disjoint and form a partition of the values of the graph. They correspond to the paths that form the connected components of the fully split graph, i.e. the graph obtained by splitting every operation. We call the number of linear paths (and hence the number of connected components in the fully split graph) the circuit width, written . We also use the linear path decomposition to define circuit depth, written , as the longest linear path in .
As discussed in section 3.4, minIR rewrites are instantiated from transformation rules by minIR match morphisms . Restricting our considerations to linear-valued minIR graphs has the further implication that all such match morphsisms will be injective. We call an embedding and write it using greek letters and a hooked arrow
Finding such embeddings is the pattern matching problem that we are solving. This problem is equivalent to finding minIR subgraphs of such that is isomorphic to the pattern .
Convexity #
According to Proposition 3.6, a necessary condition for a subgraph to define a valid minIR rewrite is convexity. In this chapter we weaken this requirement and propose a condition based on circuit width:
Let be an embedding of a pattern into a linear-valued minIR graph such that is a convex subgraph of . Then for every subgraph such that , it holds that
Up to isomorphism, we can assume . Suppose there is such that and . Let be partitions of and into sets of values that are on the same linear path of and respectively. It must hold that for all there is such that , because and operation splitting is preserved under embeddings. As the map from to cannot be injective, there must be and , such that and . We conclude that there must be a path in the fully split graph of between a value of and a value of that is not in the fully split graph of . Given that is convex, this path must be in , which contradicts the preservation of operation splitting under embeddings.
In this chapter, whenever we define a subgraph of a graph , we will assume that satisfies the above weakened convexity condition.
The converse of Proposition 4.1, however, is not true. The pattern-matching technique presented below will find a strict superset of convex embeddings. To restrict considerations to convex embeddings, it suffices to filter out the non-convex ones in a post-processing step.
Ignoring minIR Hierarchy #
So far, we have omitted discussing one part of the minIR structure: the nested hierarchy of operations. Syntactically, the hierarchy formed by relations between minIR operations can be viewed as just another value type that operations are incident to: parent operations define an additional output that children operations consume as additional input. Because of the bijectivity requirement of minIR morphisms on parent-child relations of Definition ., these parent-child relations behave, in fact, like linear values – and hence do not violate the linearity assumption we have imposed.
However, by treating them as such, we have further weakened the constraints on pattern embeddings. We do not enforce that boundary values must be in the same regions or that parent-child relations cannot be boundary values. Similarly to convexity, we defer checking these properties to a post-processing step.
Further assumptions (harmless) #
We will further simplify the problem by making presentation choices that do not imply any loss of generality. First of all, we assume that all patterns have the same width and depth , are connected graphs and have at least 2 operations. These conditions can always be fulfilled by adding “dummy” operations if necessary. Embeddings of disconnected patterns can be computed one connected component at a time.
We will further assume that all operations are on at most two linear paths (and thus in particular, have at most 4 endpoints). Operations on linear paths can always be broken up into a composition of operations, each on two linear paths as follows:
Expressing an operation on linear paths as a composition of two operations on 2 linear paths.
This transformation leaves circuit width unchanged but may multiply the graph depth by up to a factor .
We furthermore define the set of all port labels
so that we can associate every operation endpoint in a minIR graph with a port label from the set . We further endow the labels with a total order (for instance, based on the string index values). The total order on then induces a total order on the paths in that start in the same value : the paths are equivalently described by the sequence of port labels of the operations traversed. These form strings in , which we order lexicographically. Given a root value , for every value in there is thus a unique smallest path from to in 2. This path is invariant under isomorphism of the underlying graph (i.e. relabelling of the values and operations but preserving the port labels). With this we conclude the discussions of the specificities of minIR graphs related to typing, linearity and hierarchy, and the related assumptions that we are making.
To summarise, minIR graphs as they are considered in this chapter are hypergraphs (Definition 3.1) that satisfy the following properties
- every vertex (value) is incident to exactly two hyperedges (operations). It is the target of one hyperedge (its definition) and the source of another one (its use),
- every hyperedge is incident to at most four vertices,
- every hyperedge can be split in a unique way (and invariant under isomorphism) into at most two split operations, with each at most two endpoints.
When modelling subgraphs of IO-free minIR graphs (typically patterns for pattern matching), some hyperedge connections at the boundary of the subgraph will be missing. We say a value is open if a use or define operation is missing (i.e. it is a boundary value in a minIR subgraph).
We will simplify refer to hypergraphs that satisfy the above assumptions as graphs. In the unique instance of this chapter where a graph that does not satisfy this construction is referred to, we will specifically call it a simple graph.
We conclude with the following notable bound on circuit width.
Let be a graph with operations of odd arity (i.e. is odd) and open values. Then, the circuit width of is
For any linear path in consider its two ends and , i.e. the two values in with only one neighbouring value in (by definition linear paths cannot be empty). In the fully split graph of , these values are either open or must be connected to two operations. In the latter case, at least one of the operations must have a single endpoint (otherwise by acyclicity, the operation would have two neighbours).
In a fully split graph, operations with a single endpoint result from a split operation with an odd number of endpoints. We conclude that for every linear path, there are either two operations with an odd number of endpoints in , or one such operation and one open value, or two open values. The result follows.
-
This restriction is necessary for our results: copyable values may admit an arbitrary number of adjacent hyperedges. As a result, minIR graph pattern matching with copyable values is a generalisation of the subgraph isomorphism problem, a well-known NP-complete problem Cook, 1971. 1971. The complexity of theorem-proving procedures. In Proceedings of the third annual ACM symposium on Theory of computing - STOC ’71. ACM Press, 151--158. doi: 10.1145/800157.805047. The approach generalises to non-linear types, but the complexity analysis no longer holds (we pay a computational price for every non-linear value matched). ↩︎
-
Remark that the ordering of the operations thus defined is a particular case of a depth-first search (DFS) ordering of the graph: given an operation that has been visited, all its descendants will be visited before proceeding to any other operation. ↩︎
4.3. Tree reduction
We reduce the problem of graph pattern matching to matching on rooted trees – as we will see in section 4.6, a much simpler problem to solve. The map between graphs and (rooted) trees is given by rooted dual trees. Call tree-like if is connected and the underlying undirected graph of is acyclic.
Let be a tree-like graph with operations . Then given a root operation , the rooted dual tree of rooted at , written is the tree given by
- the nodes of the tree are the operations of ,
- the parent and children of are the operations that share a value with in ; the parent is the unique operation on the path from to ,
- the children of an operation are ordered according to the port labels.
Unlike graphs, tree nodes are identified uniquely by their path from the root. Trees isomorphic as graphs with identical root are thus considered equal.
A tree reduction using path splitting #
To reduce a graph to a tree using the rooted dual tree construction, it suffices to reduce to a tree-like graph. The following result shows that this can always be achieved by repeatedly applying operation splitting transformations.
A tree-like graph can be obtained from any connected graph by applying operation splittings. The resulting graph is a path-split graph (PSG) of .
Consider the undirected simple graph , where vertices are linear paths, and there is an edge between two linear paths for every operation that belongs to both paths. We call the interface graph of .
Splitting an operation in a graph corresponds to removing the corresponding edge in . On the other hand, the underlying undirected graph of has a cycle if and only if there is a cycle in . Indeed, a cycle in cannot belong to a single linear path in , by acyclicity of minIR graphs. There is, therefore, a cycle of operations that span multiple linear paths, thus forming a cycle in .
Hence, the operations to be split to turn into a tree-like graph are given by the set of edges in that must be removed to obtain a spanning tree of 1
As we consider typed graph, the splitting of an operation is unique; however the choice of spanning tree of is not unique, and thus multiple PSGs exist for a given graph .
If is a PSG of some graph , then call an operation of an anchor operation if it is on two linear paths and it is not split in . The set of all anchors operations fully determines the path-split graph. We write for the PSG of obtained by anchors .
We have assumed in section 4.2 that every operation in is on at most two linear paths and thus can be connected to at most four values. Each value is linear and hence connected to at most one other operation. It results that every operation in has at most four neighbouring operations – one parent and three children. A tree leaf can be chosen as the root operation to ensure the root does not have four children.
We can make the path splitting transformation reversible by separately storing the set of split operations in that correspond to a single operation in . As every operation of can get split in at most two split operations, we can store the pairs of split operations in that correspond to an operation in in a partial map that defines weights for (a subset of) the operations of :
This maps a split operation to the unique undirected path in from to the other half of the split operation.
This defines a map , the inverse of the path splitting transformation .
Contracted path-split graphs #
We can further simplify the structure of the data of a PSG by contracting all operations of that are on a single linear path. The result is the contracted path-split graph (cPSG) of , written .
We employ a similar trick as above to make this transformation reversible, this time by introducing weights on the values of that store the string of operations that were contracted2
where are the values of and are the optypes of operations in , i.e. the optypes of the minIR graph along with the optypes of the split operations. This defines a second map that is the inverse of the path-split graph contraction transformation . In summary, we have the composition
Contracted PSGs are particularly useful for the study of the asymptotic complexity of the pattern matching algorithm we propose, as they have a very regular structure. This is expressed by the following proposition that further extends the statement of Proposition 4.3:
That the tree is ternary follows from Proposition 4.3. Every node of the tree corresponds to an operation in , which is on exactly two linear paths. As a result of acyclicity of the tree, a tree of nodes spans linear paths – and hence, we conclude .
We conclude the construction presented in this section with the following result, expressing graph pattern matching in terms of tree equality:
Let be a pattern graph and a graph. Let be a PSG of . There is an embedding if and only if there is and a PSG of such that
and the trees have equal weight maps and .
The proof of this follows directly from our construction, the unicity of trees under isomorphism and the bijection between the graphs and their cPSGs.
We have thus successfully reduced the problem of pattern matching to the problem of matching on trees. Given that the ordering of children of a node in a tree is fixed, checking trees for equality is a simple matter of checking node and weight equality, one node (and edge) at a time.
We conclude this section with a figure summarising the constructions we have presented.
A graph , along with the path-split graph , the contracted path-split graph and their rooted dual trees. The anchor operations are (grey) and (red). The root of the rooted dual trees is .
4.4. Canonicalising the tree reduction
The reduction of graph matching to ternary trees from the previous section is a big step towards an algorithm for graph matching. However, Proposition 4.5 is expressed in terms of existence of PSGs – it is as yet unclear how the trees can be constructed. This is the purpose of this section.
We introduce for this purpose a canonical, that is, invariant under isomorphism, choice of PSG of . The result is a unique canonical transformation from to a cPSG that we can use for pattern matching.
We proceed by using the total order that we have defined on port labels and can be extended lexicographically to paths outgoing from a shared root operation (see section 4.2 for more details). Whenever more than one path from to exist in , it suffices to consider the smallest one. For a choice of root operation we thus obtain a total order of all operations in .
We then restrict our attention to operations on two linear paths and consider them in order. We keep track of linear paths that have been visited and proceed as follows to determine whether must be split:
- if is on a linear path that was not seen before, it is left unchanged and the set of visited linear paths is updated;
- otherwise, i.e. is on two linear paths that have already been visited, the operation is split, resulting in two operations on a single linear path.
The pseudocode CanonicalPathSplit implements this algorithm. We use
Operations(G) to retrieve all the operations on the graph G and
LinearPaths(G, op) to retrieve the linear paths of the operation op. The
linear paths are identified using integer indices that can be pre-computed and
stored in linear time in the graph size. SplitOperation(G, op) returns the
graph resulting from splitting op into two operations on a single linear path.
Finally, PathAsPortLabels(G, root, v) returns the string of the port labels
that encode the path from root to v in the graph G. The strings are
ordered lexicographically. The non-capitalized functions set, union,
sort1, len, and issubset have their standard meanings.
1def CanonicalPathSplit(G: Graph, root: Operation) -> Graph:
2 new_G := G
3 all_operations := Operations(G)
4 sorted_operations := sort(
5 all_operations,
6 sort_key= lambda v: PathAsPortLabels(G, root, v)
7 )
8
9 # keep track of the visited linear paths
10 seen_paths := set()
11 for op in sorted_operations:
12 # Get the (pre-computed) indices of the linear paths
13 op_linear_paths := LinearPaths(G, op)
14 if len(op_linear_paths) == 2:
15 if issubset(op_linear_paths, seen_paths):
16 # The two linear paths of `op` are already visited
17 new_G = SplitOperation(new_G, op)
18 else:
19 # Mark the new linear paths as visited
20 seen_paths = union(seen_paths, op_linear_paths)
21 return new_G
The following figure shows an example of splitting a graph into its canonical
PSG using CanonicalPathSplit.
Splitting a graph into its canonical PSG. Ports are ordered counter-clockwise on each edge, and numbered according to the lexicographic order of the paths from root to the port, as returned by PathAsPortLabels. This induces an order on the hyperedges, reflected in the alphabetic order of the edge labels. Linear paths are formed by ports in a horizontal line (as marked by the dotted lines). Vertex root is chosen as the root of the canoncal splitting. Vertices d and g are not split because they are the smallest edges that contain the fourth, respectively first linear path.
CanonicalPathSplitCanonicalPathSplit(G) is a valid PSG of
It is deterministic and invariant under isomorphism of . The runtime of
CanonicalPathSplit is , where is the number of operations in the
graph .Let be the graph returned by CanonicalPathSplit(G). From the
discussion in the proof of Proposition 4.6, we know
it is sufficient to show that the interaction graph of is
acyclic and connected.
is acyclic. If there was a cycle in , then there
would be operations in that pairwise
share a linear path. One of these operations must be
considered last in the for loop of lines 11–20, suppose it is . But
every linear path of is either also a linear path of or a
linear path of : thus does not satisfy the condition on line
15, and thus cannot be in , a contradiction. Hence is
acyclic.
is connected. We proceed inductively to show the following
invariant for the main for loop (lines 11–20): for all linear paths in
seen_paths, there is a path in to a linear path of the root
operation. seen_paths is only modified on line 20. If op is the root
operation, then trivially there is a path from the linear paths
op_linear_paths to a linear path of the root operation. Otherwise, we claim
that there must be one of the paths in op_linear_paths that is already in
seen_paths. From there it follows that there is a path in from
the root path to the unseen linear path, given by the path to the linear path in
seen_path followed by the edge in that corresponds to op.
By connectedness of , there is a path from the root operation to op. The
path is not empty because op is not the root operation, so we can consider the
prefix of the path of all operations excluding op. Call op' the last
operation preceding op and op_linear_paths' its linear paths. Two successive
operations on a path must share a linear path: op_linear_paths
op_linear_paths' cannot be empty. According to line 4, op' must have been
visited before op, thus op_linear_paths' seen_paths. It
follows that at least one element of op_linear_paths must be in seen_paths.
Determinstic and isomorphism invariant. The pseudocode above is deterministic and only depends on paths in encoded as strings of port labels, which are invariant under isomorphism.
Runtime complexity. Lines 2 and 3 run in time. With the exception of
the sort function on lines 4–7, every other line can be run in time:
- lines 13 and 15 run in constant time because the size of
op_linear_pathsis always at most 2; - line 20 (and the
incheck on line 15) can be run in constant time by representing theseen_pathsset as a fixed-size boolean array of size , with the -th bit indicating whether the -th linear path has been seen; - line 17 is a constant time transformation if we allow in-place modification of
new_G.
The for loop will run iterations, for a total of runtime.
Finally, the sorting operation would naively take time .
However, given that the ordering is obtained lexicographically from the paths
starting at the root, we can obtain the sorted list of operations by depth-first
traversal of the graph starting at the root. The result follows.
Using CanonicalPathSplit, we can now sketch what the pattern matching
algorithm should look like. For each pattern, we first compute their canonical
PSG for an arbirary choice of pattern root operation; then, given a graph ,
we can find all embeddings of patterns into by iterating over all possible
PSGs within . Naively, this involves enumerating all posible subgraphs of
, and then for each of them, iterating over all possible root choices.
This can be significantly sped up by realising that many of the PSGs that are
computed when iterating over all possible subgraphs and root choices are
redundant2. We will see in the next section that we can i)
iterate once over all possible root choices in and ii) introduce a new
procedure AllPathSplits that will efficiently enumerate all possible rooted
ual trees of PSGs that are rooted in for subgraphs within . In the
process, we will also see that we can replace the tree equality check of line 12
with a subtree inclusion check, further reducing the number of PSGs that must be
considered.
Naive pattern matching.
1# Precompute all PSGs
2allT = [CanonicalPathSplit(
3 P, root_P
4) for (P, root_P) in patterns]
5
6for S in Subgraphs(G):
7 for root_S in Operations(S):
8 TG = CanonicalPathSplit(
9 S, root_S
10 )
11 for T in allT:
12 if T == TG:
13 yield T
Improved using AllPathSplits (section 4.5).
1# Precompute all PSGs
2allT = [CanonicalPathSplit(
3 P, root_P
4) for (P, root_P) in patterns]
5
6for root_G in Operations(G):
7 for TG in AllPathSplits(
8 G, root_G
9 ):
10 for T in allT
11 # Replace == with subtree
12 if IsSubTree(T, TG)
13 yield T
4.5. Enumerating all path-split graphs
The CanonicalPathSplit procedure in the previous section defines for all
graphs and choice of root operation a canonical PSG , and thus a
canonical set of anchors that we write as
Instead of CanonicalPathSplit, we can equivalently consider a
CanonicalAnchors procedure, which computes directly instead of the
graph .
We formulate this computation below, using recursion instead of a for loop.
This form generalises better to the AllAnchors procedure that we will
introduce next.
The equivalence of the CanonicalAnchors procedure with CanonicalPathSplit
follows from the observation made in
section 4.2 that ordering operations in
lexicographic order of the port labels is equivalent to a depth-first traversal
of the graph.
CanonicalAnchors implements a recursive depth-first traversal (DFS), with the
twist that the recursion is explicit only on the anchor nodes and otherwise
relying on the lexicographic ordering just like in CanonicalPathSplit: lines
5–15 of CanonicalAnchors correspond to the iterations of the for loop (line
11–20) of CanonicalPathSplit until an anchor operation is found (i.e. the
else branch on lines 18–20 is executed). From there, the graph traversal
proceeds recursively.
We introduce the ConnectedComponent, Neighbours and RemoveOperation
procedures; the first returns the connected component of the current operation,
whereas the other two procedures are used to traverse, respectively modify, the
graph . Importantly, Neighbours(G, op) returns the neighbours of op
ordered by port label order.
To ensure that the recursive DFS does not visit the same operation twice, we
modify the graph with RemoveOperation on lines 11 and 15, ensuring that no
visited operation remains in G. As a consequence, CanonicalAnchors may be
called on disconnected graphs, which explains why an additional call to
ConnectedComponent (line 4) is required.
CanonicalPathSplit and CanonicalAnchorsLet be a connected graph and let be a root operation in . Then
CanonicalAnchors maps the graph to the canonical anchor set:
where is the set of all paths in and designates the empty graph.
The proof follows directly from the previous paragraphs.
1def CanonicalAnchors(
2 G: Graph, root: Operation, seen_paths: Set[int]
3) -> (Set[Operation], Set[int], Graph):
4 operations = Operations(ConnectedComponent(G, root))
5 # sort by PathAsPortLabels, as previously
6 sorted_operations := sort(operations)
7 operations_queue := queue(sorted_operations)
8
9 # Skip all operations that are not anchors
10 op := operations_queue.pop() # never emtpy, contains root
11 G = RemoveOperation(G, op)
12 while len(LinearPaths(G, op)) == 1 or
13 issubset(LinearPaths(G, op), seen_paths):
14 op = operations_queue.pop() or return ({}, {}, G)
15 G = RemoveOperation(G, op)
16
17 # op is anchor, update seen_paths and recurse
18 seen_paths = union(seen_paths, LinearPaths(G, op))
19 anchors := [op]
20 # sort by port labels
21 for child in Neighbours(G, op):
22 (new_anchors, seen_paths, G) = CanonicalAnchors(
23 G, child, seen_paths
24 )
25 anchors += new_anchors
26
27 return (anchors, seen_paths, G)
Maximal PSGs #
In addition to “simplifying” the data required to define path splitting, the definition of PSGs using anchor operations has another advantage that is fundamental to the pattern matching algorithm.
Consider the rooted dual tree of a PSG with root operation in . Recall that tree nodes are uniquely identified by their path from the root and thus are considered equal if they are isomorphic as graphs. We can in the same way define a tree inclusion relation on rooted dual trees that corresponds to checking that the trees have the same root and that the left-hand side is isomorphic to a subtree of the right-hand side. We also require that the operation weights given by the map map coincide on the common subtree.
Let be a connected graph, a set of operations in and a root operation. Consider the set
There is a subgraph such that for all subgraphs : . Furthermore, for all graph , there is and such that
We call the maximal PSG with anchors in .
The proof gives an explicit construction for .
Assume , otherwise the statement is trivial.
Construction of . Let be the set of linear paths in that go through at least one operation in . Consider the set of operations in given by the operations whose linear paths are contained in . This defines a subgraph of . Since , there exists . By assumption, is connected, and thus the anchors of are connected in . There is therefore a connected component that contains the set .
Well-definedness of . Consider the PSG of . We must show that is a tree-like graph for the proposition statement to be well-defined. In other words, we must show that the interaction graph of is acyclic and connected. is connected by construction, which implies connectedness of and thus of . It is acyclic because and has exactly operations on more than one linear path. is a thus a tree.
. For any subgraph , its operations must be contained in . Since any is connected and contains , it must further hold that .
We can now prove the equivalence of (2).
: If , then there exists with rooted dual tree
Furthermore, by definition of on rooted trees, a map is defined on , given by the map of on the domain . Recall from section 4.3 that there is a map that maps It merges split operations pairwise, and thus it is immediate that implies . Thus and . By construction, one can also derive that . The statement follows.
: Since , we know from point 1 that . Thus we can define an injective embedding .
Operation splitting leaves the set of values from to , as well as from to unchanged. Similarly, there is a bijection between values in and and thus between edges in and . The pattern embedding hence defines an injective map from tree edges in to tree edges in . We extend this map to a map on the trees by induction over the nodes set of . We start by the root map . Using , we can then uniquely define the image of any child node of in , and so forth inductively.
We show now that the map thus defined is injective. Suppose are nodes in such that . By the inductive construction there are paths from the root to and respectively such that their image under are two paths from to . But is a tree, so both paths must be equal. By bijectivity of , it follows , and thus is injective. Finally, the value and operation weights are invariant under pattern embedding and thus are preserved by definition.
This result means that instead of listing all PSGs for every possible subgraph of , it is sufficient to proceed as follows:
- for every pattern , fix a root operation and construct the rooted tree dual of the canonical PSG
- enumerate every possible root operation in ,
- enumerate every possible sets of anchors in with root ,
- for each set , find the maximal PSG with anchors in , and take its rooted tree dual ,
- find all patterns such that .
In other words, if AllAnchors is a procedure that enumerates all possible sets
of anchors in and MaximalPathSplit computes the maximal PSG as
presented in the proof of Proposition 4.9, then
AllPathSplits(G) can simply be obtained by calling AllAnchors and then
returning their respective maximal PSGs in :
def AllPathSplits(G: Graph, root: Operation) -> Set[Graph]:
all_anchors = AllAnchors(G, root)
return {MaximalPathSplit(G, pi) for pi in all_anchors}
The missing piece: AllAnchors
#
We can now complete the definition of AllPathSplits by defining the
AllAnchors procedure, which enumerates all possible sets of anchors in
given a root operation .
The procedure is similar to CanonicalAnchors, described in detail in the
previous paragraphs. In addition to the arguments of CanonicalAnchors,
AllAnchors requires a width argument. It then returns all sets of
at most operations1 that form the canonical anchors of some
width- subgraph of with root . The main difference between
CanonicalAnchors and AllAnchors is that the successive recursive calls (line
22 in CanonicalAnchors) are replaced by a series of nested loops (lines 42–48
in AllAnchors) that exhaustively iterate over the possible outcomes for
different subgraphs of . The results of every possible combination of
recursive calls are then collected into a list of anchor sets, which is
returned.
The part of the pseudocode that is without comments is unchanged from
CanonicalAnchors. Using Proposition 4.3, we
know that we can assume that every operation has at most 3 children, and thus 3
neighbours in G, given that the operations equivalent to parent nodes were
removed.
1def AllAnchors(
2 G: Graph, root: Operation, w: int,
3 seen_paths: Set[int] = {}
4) -> List[(Set[Operation], Set[int], Graph)]:
5 # Base case: return one empty anchor list
6 if w == 0:
7 return [({}, {}, G)]
8
9 operations = Operations(ConnectedComponent(G, root))
10 sorted_operations := sort(operations)
11 operations_queue := queue(sorted_operations)
12
13 op := operations_queue.pop()
14 G = RemoveOperation(G, op)
15 while len(LinearPaths(G, op)) == 1 or
16 issubset(LinearPaths(G, op), seen_paths):
17 op = operations_queue.pop() or return [({}, {}, G)]
18 G = RemoveOperation(G, op)
19
20 seen0 = union(seen_paths, LinearPaths(G, op))
21 # There are always at most three neighbours: we
22 # unroll the for loop of CanonicalAnchors.
23 [child1, child2, child3] = Neighbours(G, op)
24 # Iterate over all ways to split w-1 anchors over
25 # the three children and solve recursively
26 all_anchors = []
27 for 0 <= w1, w2, w3 < w with w1 + w2 + w3 == w - 1:
28 for (anchors1, seen1, G1) in
29 AllAnchors(G, child1, w1, seen0):
30 for (anchors2, seen2, G2) in
31 AllAnchors(G1, child2, w2, seen1):
32 for (anchors3, seen3, G3) in
33 AllAnchors(G2, child3, w3, seen2):
34 # Concatenate new anchor with anchors from all paths
35 anchors = union([op], anchors1, anchors2, anchors3)
36 all_anchors.push((anchors, seen3, G3))
37 return all_anchors
We can represent the sequence of recursive calls to AllAnchors as a tree. The
call tree for the graph used as example to illustrate CanonicalAnchors earlier
is given on the next page.
We now show correctness of the procedure. Let us write for the set
of sets of anchors returned by AllAnchors(G, r, w, {}).
Let be a graph and be a subgraph of of width . Let be a choice of root operation in . We have
A call tree for an execution of AllAnchors on the example graph of the previous figure with . Starting from the root, each node in the tree corresponds to either picking an operation as anchor or not (thus splitting it). Edges are labelled by the values assigned to for the respective children of the source node. One path from root to leaf leads to no solution (it is impossible to find an unseen linear path from operation . The other paths each lead to a valid set of three anchors.
The proof is by induction over the width of the subgraph . The idea is to
map every recursive call in CanonicalAnchors to one of the calls to
AllAnchors on lines 29, 31 or 33. All recursive results are concatenated on
line 36, and thus, the value returned by CanonicalAnchors will be one of the
anchor sets in the list returned by AllAnchors.
Let be a connected subgraph of of width . We prove
inductively over that if CanonicalAnchors$(H,
r,S)$
then there is a graph such that such
that
AllAnchors
for all valid root operations of and all subsets of the linear paths of
in seen_paths. The statement in the proposition directly follows this
claim.
For the base case , CanonicalAnchors will return the anchors
anchors = [op] as defined on line 19: there is only one linear path, and it is
already in seen_paths, thus for every recursive call to CanonicalAnchors,
the while condition on line 12 will always be satisfied until all operations
have been exhausted and empty sets are returned. In AllAnchors, on the other
hand, The only values of w1, w2 and w3 that satisfy the loop condition on
line 27 for are w1 w2 w3 . As a result, given the w
base case on lines 6–7, the lines 35 and 36 of AllAnchors are only
executed once, and the definition of anchors on line 36 is equivalent to its
definition in CanonicalAnchors.
We now prove the claim for by induction. As documented in AllAnchors,
we can assume that every operation has at most 3 children. This simplifies the
loop on lines 21–25 of CanonicalAnchors to, at most, three calls to
CanonicalAnchors.
Consider a call to CanonicalAnchors for a graph , a root
operation in and a set of linear paths. Let , and
be the length of the values returned by the three recursive calls to
CanonicalAnchors of line 22 for the execution of CanonicalAnchors with
arguments , and . Let and be the three neighbours of
in . If the child does not exist, then one can set and it
can be ignored – the argument below still holds in that case. The definition of
seen0 on line 20 in AllAnchors coincides with the update to the variable
seen_paths on line 18 of CanonicalAnchors; similarly, the updates to G on
lines 14 and 18 of AllAnchors are identical to the lines 11 and 15 of
CanonicalAnchors that update H. Let the updated seen_paths be the set
, the updated G be and the updated be , with
.
As every anchor operation reduces the number of unseen linear paths by exactly
one (using the simplifying assumptions of
section 4.2), it must hold that
. Thus, for a call to AllAnchors with the arguments
, , and , there is an iteration of the for loop on line 27 of
AllAnchors such that w1 , w2 and w3 . It follows
that on line 29 of AllAnchors, the procedure is called recursively with
arguments . From the induction hypothesis, we obtain that
there is an iteration of the for loop on line 29 in which the values of
anchors1 and seen1 coincide with the values of the new_anchors and
seen_paths variables after the first iteration of the for loop on line 21 of
CanonicalAnchors. Call the value of seen1 (and seen_paths) .
Similarly, call the updated value of G in AllAnchors and the updated
value of G in CanonicalAnchors . We have, by the induction hypothesis,
that .
Repeating the argument, we obtain that there are iterations of the for loops
on lines 30 and 32 of AllAnchors that correspond to the second and third
recursive calls to CanonicalAnchors on line 22 of the procedure. Finally, the
concatenation of anchor lists on line 36 of AllAnchors is equivalent to the
repeated concatenations on line 25 of CanonicalAnchors, and so we conclude
that the induction hypothesis holds for .
We will see that the overall runtime complexity of AllAnchors can be easily
derived from a bound on the size of the returned list. For this, we use the
following result:
AllAnchorsAllAnchors is in , where
is a constant.Let be an upper bound for the length of the list returned by a call to
AllAnchors for width . For the base case , . The returned
all_anchors list is obtained by pushing anchor lists one by one on line 36. We
can count the number of times this line is executed by multiplying the length of
the lists returned by the recursive calls on lines 28–32, giving us the
recursion relation
Since is meant to be an upper bound, we replace with equality above to obtain a recurrence relation for . This recurrence relation is a generalisation of the well-known Catalan numbers Stanley, 2015. 2015. Catalan Numbers. Cambridge University Press. doi: 10.1017/CBO9781139871495, equivalent to counting the number of ternary trees with internal nodes: a ternary tree with internal nodes is made of a root along with three subtrees with and internal nodes respectively, with . A closed form solution to this problem can be found in Aval, 2008. 2008. Multivariate Fuss–Catalan numbers. Discrete Mathematics 308, 20 (October 2008, 4660–4669). doi: 10.1016/j.disc.2007.08.100:
satisfying the above recurrence relation with equality, where is a constant obtained from the Stirling approximation:
To obtain a runtime bound for AllAnchors, it is useful to identify how much of
needs to be traversed. If we suppose all patterns have at most depth ,
then it immediately follows that any operation in that is in the image of a
pattern embedding must be at most a distance away from an anchor operation.
We can thus equivalently call AllAnchors on a subgraph of such that no
linear path is longer than . We thus obtain the following runtime.
AllAnchorsFor
patterns with at most width and depth , the total runtime of AllAnchors
is in
We restrict Operations on line 9 to only return the first operations on
the linear path in each direction, starting at the anchor operation: operations
more than distance away from the anchor cannot be part of a pattern of depth
.
We use the bound on the length of the list returned by calls to AllAnchors of
Proposition 4.10 to bound the runtime. We can ignore the
non-constant runtime of the concatenation of the outputs of recursive calls on
line 35, as the total size of the outputs is asymptotically at worst of the same
complexity as the runtime of the recursive calls themselves. Excluding the
recursive calls, the only remaining lines of AllAnchors that are not executed
in constant time are the while loop on lines 15–18 and the Operations and
sort calls on lines 9–11. Using the same argument as in CanonicalAnchors,
we can ignore the latter two calls by replacing the queue of operations by a
lazy iterator of operations. The next operation given op and the graph G can
always be computed in time using a depth-first traversal of G.
Consider the recursion tree of AllAnchors, i.e. the tree in which the nodes
are the recursive calls to AllAnchors and the children are the executions
spawned by the nested for loops on line 28–32. This tree has at most
leaves. A path from the root to a leaf corresponds to a stack of recursive calls
to AllAnchors. Along this recursion path, seen_paths set is always strictly
growing (line 35) and the operations removed from G on lines 14 and 18 are all
distinct. For each linear path, at most operations are traversed. Thus the
total runtime of the while loop (lines 15–18) along a path from root to leaf
in the recursion tree is in . We can thus bound the overall
complexity of executing the entire recursion tree by
.
-
Every anchor operation is on at least one previously unseen linear path, thus there can be at most operations in the set of anchors. ↩︎
4.6. An automaton for multi-pattern matching
We have shown in the previous sections that graph pattern-matching can be reduced to a problem of tree inclusions, with trees of fixed width . To complete the pattern-matching algorithm, we must provide a fast way to evaluate the subtree relation for many trees representing the set of all patterns we wish to match.
More precisely, for patterns with width , fix a root operation in for each and consider the rooted tree duals of the canonical PSGs , with the canonical anchors. Then given a subject graph , we wish to compute the set
for all anchor sets and root operation in . This
corresponds to the IsSubTree predicate introduced in the sketch of the
algorith in section 4.4.
Instead of considering the trees of PSGs, it will prove easier to consider the contracted PSGs (cPSGs)
Such tree inclusions are equivalent to finding embeddings in the subject graph itself, provided that we keep track of the and weight maps (see section 4.3).
It will be useful to remind ourselves the following properties of contracted PSGs. Every operation of a cPSG (and thus every node in its rooted dual tree) is an anchor operation of the PSG. Per Proposition 4.4, the rooted dual tree of a cPSG is a ternary tree and has exactly nodes. Finally, recall the concept of an open value of a graph, i.e. a value that is missing either a use or define operation (see section 4.2).
Reduction of tree inclusion to string prefix matching #
Now consider two contracted spanning tree reductions and with values and . To simplify notation, define
for some choice of root operations and in and , respectively. We lift the relation on rooted dual trees of PSGs introduced in section 4.5 to rooted dual trees of cPSGs in Such a way that there is an inclusion relation between two rooted dual trees of PSGs if and only if the same relation holds on the rooted duals of cPSGs.
We say that if and only if
- the trees share the same root operation,
- is a subtree of ,
- the map coincides on the common subtree, and
- the map satisfies for all :
where designates the embedding of into given by the tree embedding.
The first three conditions are taken as-is from the relation on non-contracted trees, whilst the fourth condition on the map is specific to contracted trees.
Using Proposition 4.2, there are at most 2 open values for each linear path in the graph, and thus at most open values in a rooted dual tree of a cPSG of width . For each such contracted rooted dual, we can thus define a contracted string tuple given by the values of the map evaluated in the (up to) open values1.
If is the restriction of to the domain of definition of non-open values of a cPSG, the fourth condition for the inclusion relation on rooted dual cPSGs, given above becomes an equality condition when restricted to non-open values. A special case of this property of particular interest to us is stated as the following result. The relation on strings refers to prefix inclusion, i.e. if and only if is a prefix of .
Let and be the contracted string tuples of and respectively. Then if and only if the trees share the same root, are isomorphic, have the same and maps and for all : .
The proof of this follows directly from observing that rooted duals of cPSGs have the same set of nodes and that the restriction to non-open values must satisfy equality.
Why restricting ourselves to trees of the same width ? It is sufficient for our purposes! All patterns are of width by assumption and so are the rooted dual trees of the form , given that .
The string prefix matching problem is a simple computational task that can be generalised to check for multiple string patterns at the same time using a prefix tree. An overview of this problem can be found in appendix A. We can thus obtain a solution for the pattern matching problem for patterns:
As above, let
- be a graph, with a set of operations and a choice of root operation,
- be patterns of width and depth , with choices of root operations and canonical anchors
The set of all pattern embeddings mapping the canonical anchor set to and root to for can be computed in time using at most pre-computed prefix tree of size at most , each constructed in time complexity .
For each pattern, we consider its canonical spanning tree reduction and construct a multi-dimensional prefix tree (see Appendix ) for each group of patterns that share the same spanning tree reduction.
Given a graph , we can compute the cPSG of for anchors and map its rooted dual tree to the corresponding prefix tree. This can be done in time by using a search tree. We can restrict to a graph of size by truncating the linear paths to at most length, as in the proof of Proposition 4.12. Thus we can assume .
The rest of the proof and the runtime follow from the multi-dimensional prefix tree construction detailed in Appendix ).
Combining everything #
Finally, putting Proposition 4.15 and Proposition 4.12 together, we obtain our main result.
Let be patterns with width and depth . The pre-computation runs in time and space complexity
For any subject graph , the pre-computed prefix tree can be used to find all pattern embeddings in time
where is a constant.
The pre-computation consists of running the CanonicalAnchors procedure on each
of the patterns and then transforming them into a map of prefix trees
using Proposition 4.15. By
Proposition 4.7, CanonicalAnchors runs in
for each pattern, where we used that
for all patterns. The total runtime of prefix construction is thus
The complexity of pattern matching itself on the other hand is composed of two
parts: the computation of all possible anchor sets , and the
execution of the prefix string matcher for each of the trees resulting from
these sets . As AllAnchors must be run for every choice of
root vertex in , the runtime is thus obtained by multiplying i)
with ii) the runtime of the prefix tree matching
(Proposition 4.15), and with iii) ,
i.e. the number of anchor lists returned by AllAnchors
(Proposition 4.10):
where is the bound for the number of anchor lists returned by
AllAnchors. The result follows.
-
The values can be ordered as usual by using the total lexicographic order on port labels of the tree. ↩︎
4.7. Benchmarks
Proposition 4.13 shows that pattern-independent matching can scale to large datasets of patterns but imposes some restrictions on the patterns and embeddings that can be matched. In this section, we discuss these limitations and give empirical evidence that the pattern-matching approach we have presented can be used on a large scale and outperform existing solutions.
Pattern limitations #
In section 4.2, we imposed conditions on the pattern embeddings to obtain a complexity bound for pattern-independent matching. We argued how these restrictions are natural for applications in quantum computing, and most of the arguments will also hold for a much broader class of computation graphs.
In future work, it would nonetheless be of theoretical interest to explore the importance of these assumptions and their impact on the complexity of the problem. As a first step towards a generalisation, our implementation and all our benchmarks in this section do not make any of these simplifying assumptions. Our results below give empirical evidence that a significant performance advantage can be obtained regardless.
Implementation #
We provide an open-source implementation in Rust of pattern independent matching using the results of this chapter, available on GitHub. The code and datasets used for the benchmarks themselves are available in a dedicated repository.
The implementation works for weighted or unweighted port graphs – of which typed minIR graphs are a special case – and makes none of the simplifying assumptions employed in the theoretical analysis. Pattern matching proceeds in two phases: precomputation and runtime.
Precomputation. In a first step, all graph patterns are processed and compiled into a single state automaton that will be used at runtime for fast pattern independent matching. The automaton in the implementation combines in one data structure two distinct computations of this chapter:
- the recursive branching logic used in the
AllAnchorsprocedure to enumerate all possible choices of anchors. - the automaton described in section 4.6 that matches patterns for a fixed set of anchors, and
The former is implemented with non-deterministic state transitions – each transition corresponding to choosing an additional anchor – , whereas the latter is implemented deterministically.
Concretely, the automaton is constructed by following the construction of section 4.4 to decompose each pattern into its canonical path-split graph. We then order the nodes of the PSG and express each node as a condition that ensures the connectivity and node weight in the graph matches the pattern. We thus obtain a chain of conditions, with a transition between any two consecutive conditions; transitions are deterministic by default and marked as non-deterministic whenever they lead to a condition on an anchor node. The state automaton for all patterns is then obtained by joining all chains of conditions into a tree.
Runtime. Pattern matching is then as simple as simulating the state automaton, evaluating all conditions on the graph passed as input. The states in the automaton corresponding to the last condition of a pattern must be marked as end states, along with a label identifying the pattern that was matched. This can then be used at runtime to report all patterns found.
Our implementation has been tested for correctness, i.e. on the one hand that all matches that are reported are correct, and on the one hand that all pattern matches are found. This was done by comparing the matches of our implementation with the results obtained from matching every pattern separately on millions of randomly generated graphs and edge cases. We also ensured during benchmarking that the number of matches reported by our implementation and by Quartz were always the same.
Benchmarks #
Baseline. To assess practical use, we have benchmarked our implementation against a leading C++ implementation of pattern matching for quantum circuits from the Quartz superoptimiser project Xu, 2022. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433. This implementation is the principal component of an end-to-end quantum circuit optimisation pipeline. The results and speedups we obtain here thus apply and transfer directly to this application.
Dataset. We further ensure that our results apply in practice by using a real-world dataset of patterns. The Quartz optimiser finds opportunities for circuit optimisation by relying on precomputed equivalence classes of circuits (ECC). These are obtained exhaustively by enumerating all possible small quantum circuits, computing their unitaries and clustering them into classes of circuits with identical unitaries.
The generation of ECC sets is parametrised on the number of qubits, the maximum number of gates and the gate set in use. For these benchmarks we chose the minimal set of gates and considered circuits with up to 6 gates and 2, 3 or 4 qubits. The size of these pattern circuits is typical for the application1.
Thus, for our patterns, we have the bound for the maximum depth and
width . In all experiments, the graph subject to pattern matching
was barenco_tof_10 input, i.e. a 19-qubit circuit input with 674 gates
obtained by decomposing a 10-qubit Toffoli gate using the Barenco decomposition
Barenco, 1995. 1995. Elementary gates for quantum computation. Physical Review A 52, 5 (November 1995, 3457--3467). doi: 10.1103/PhysRevA.52.3457.
Results. We study the runtime of our implementation as a function of the number of patterns being matched, up to patterns. We expect the runtime of pattern matching algorithms that match one pattern at a time to scale linearly with . On the other hand, Proposition 4.13 results in a complexity that is independent of .
For each value of , we select a subset of all patterns in the ECC sets at random. For , there are only a total of patterns, explaining why we do not report result beyond that number. For patterns, our proposed algorithm is faster than Quartz. As expected, the advantage of our approach increases as we match more patterns, scaling up to a speedup for . The results are summarised in the following figure:
Runtime of pattern matching for patterns on 2, 3 and 4 qubit quantum circuits from the Quartz ECC dataset, for our implementation (Portmatching) and the Quartz project. All two-qubit circuits were used, whereas for 3 and 4 qubit circuits, random samples were drawn.
Dependency on and . We further study the runtime of our algorithm as a function of its two main parameters, the number of patterns and the pattern width , on an expanded dataset. To this end, we generate random sets of 10,000 pattern circuits with 15 gates and between and qubits, using the same gate set as previously. The resulting pattern matching runtimes are shown in the figure below.
From Proposition 4.13, we expect that the pattern matching runtime is upper bounded by a -independent constant. Our results support this result for and qubit patterns, where runtime seems indeed to saturate, reaching an observable runtime plateau at large .
We suspect on the other hand that the exponential dependency in the complexity bound of Proposition 4.13 makes it difficult to observe a similar plateau for , as we expect this upper bound on the runtime to increase rapidly with qubit counts . A runtime ceiling is not directly observable at this experiment size, but the gradual decrease in the slope of the curve is consistent with the existence of the -independent upper bound predicted in Proposition 4.13.
Runtime of our pattern matching for random quantum circuits with up to 10 qubits.
-
Such small circuit sizes are imposed in part by the fact that ECCs of larger circuits quickly become unfeasible to generate as their number grows exponentially. In practice, large circuit transformations can often be expressed as the composition of smaller atomic transformations, hence the good performance of this approach in practice. ↩︎
Chapter 5
Fully and Confluently Persistent Graph Rewriting
This chapter leverages another construction from graph rewriting theory that finds a direct application in quantum compilation: the unfolding of graph transformation systems Baldan, 1999. 1999. Unfolding and Event Structure Semantics for Graph Grammars. In Foundations of Software Science and Computation Structures, Berlin, Heidelberg. Springer Berlin Heidelberg, 73--89. doi: 10.1007/3-540-49019-1_6 Winskel, 1987. 1987. Event structures. Whereas most applications to date have focused on using unfolding for model verification (see section 5.1 for a review), we will use the same techniques to instead speed up optimisation problems over the space of reachable graphs in a GTS.
In the unfolding, graph rewrites are expressed as persistent modifications of the graph. Mutable data structures are typically ephemeral: modifying the data structure overwrites information and invalidates any references to the old data. In contrast, a persistent data structure applies changes to the data so that both the old and new versions remain accessible – a famous example of this are version control systems such as git.
A data structure is fully persistent if modifications can be applied not only to the latest version but also to previous versions of the data structure. In that case, a version of the data may be used to create several new versions. Instead of a linear edit history of all mutations, the result is an edit history tree, with possibly many “most recent” versions – leaves in the edit history.
Finally, a fully persistent data structure is also confluently persistent if different versions of the data in the edit history can be joined together. As a result, the edit history forms a directed acyclic graph (DAG) of versions of the data, linked by data mutation and joining operations. Adopting terminology from git, we call a join of two or more versions a merge of multiple versions.
In this chapter, we will consider all graphs to be hypergraphs with vertex set and hyperedge set . All results can easily be adapted to accommodate graph attributes, weights, and types as required by applications. This means the data structure and algorithms we present apply directly to minIR graphs and, more broadly, to most instances of graph rewriting.
The central object of study in this chapter is the graph rewrite. We restate a simplified version of Definition 3.9 here. We opt for convenience for a rewrite definition that omits the edge deletion set of Definition 3.9. This is not a restriction of the general case, as can be seen by adding a “dummy” vertex for each edge in a graph: a rewrite that removes an edge can equivalently be expressed by the rewrite that removes the dummy vertex 1.
As in previous chapters, denotes disjoint union, denotes a partial function and denotes the domain of definition of a (partial) function.
A rewrite on a graph is given by a tuple , with
- is a graph called the replacement graph,
- is the vertex deletion set, and
- is the glueing relation, a partial function that maps a subset of the deleted vertices of to vertices in the replacement graph.
Define the context subgraph as the subgraph induced by the vertices
The rewritten graph resulting from applying to is the glueing
obtained from the union of and by merging all vertices within the same class in the equivalence relation that is the closure of . We refer to section 3.5 for more details and an illustration of glueings and rewrites.
In this chapter, we will consider sequences of multiple rewrites. We will use the notation and to designate the vertices, respectively the edges, of a graph . It is further assumed that the vertices and for are always disjoint, a fact that we underline by always writing unions of graphs and vertices with .
We make use of the fact that for every rewrite , the equivalence classes of are of the form
for some . For every set of merged vertices in , there is thus a unique vertex not in :
We choose to always identify the merged vertex in with . Using this convention, the set of vertices of is simply
-
This makes use of the fact that unlike in DPO, our rewrite definition allows the (implicit) deletion of edges with one endvertex in . ↩︎
5.1. Related work
The unfolding of a graph transformation system (GTS) was first proposed in Baldan, 1999. 1999. Unfolding and Event Structure Semantics for Graph Grammars. In Foundations of Software Science and Computation Structures, Berlin, Heidelberg. Springer Berlin Heidelberg, 73--89. doi: 10.1007/3-540-49019-1_6 as a generalisation of a well-known construction on Petri nets Winskel, 1987. 1987. Event structures. Originally defined for DPO rewriting, the unfolding was later generalised to SPO Baldan, 2007. 2007. Unfolding semantics of graph transformation. Information and Computation 205, 5 (May 2007, 733--782). doi: 10.1016/j.ic.2006.11.004 Baldan, 2014. 2014. Processes and unfoldings: concurrent computations in adhesive categories. Mathematical Structures in Computer Science 24, 4 (June 2014). doi: 10.1017/s096012951200031x and SqPO Behr, 2019. 2019. Sesqui-Pushout Rewriting: Concurrency, Associativity and Rule Algebra Framework. Electronic Proceedings in Theoretical Computer Science 309 (December 2019, 23--52). doi: 10.4204/eptcs.309.2 in arbitrary adhesive categories. The unfolding is a powerful GTS technique that has found applications in model verification Baldan, 2008. 2008. Unfolding Graph Transformation Systems: Theory and Applications to Verification Baldan, 2008. 2008. A framework for the verification of infinite-state graph transformation systems. Information and Computation 206, 7 (July 2008, 869--907). doi: 10.1016/j.ic.2008.04.002 Costa, 2012. 2012. Verification of graph grammars using a logical approach. Science of Computer Programming 77, 4 (April 2012, 480--504). doi: 10.1016/j.scico.2010.02.006 and other formal analysis tools such as model-based diagnosis Baldan, 2008. 2008. Unfolding-Based Diagnosis of Systems with an Evolving Topology and model transformation analysis Bisztr., 2009. 2009. Compositional verification of model-level refactorings based on graph transformations. PhD Thesis. University of Leicester.
Unfoldings of finite GSTs are often infinite. A lot of work has therefore concerned itself with finding sufficient conditions for finiteness or the existence of finite complete prefixes of unfoldings Baldan, 2008. 2008. Unfolding Graph Transformation Systems: Theory and Applications to Verification Baldan, 2004. 2004. Verifying Finite-State Graph Grammars: An Unfolding-Based Approach. In CONCUR 2004 - Concurrency Theory, Berlin, Heidelberg. Springer Berlin Heidelberg, 83--98. doi: 10.1007/978-3-540-28644-8_6 Baldan, 2008. 2008. McMillan's Complete Prefix for Contextual Nets Baldan, 2010. 2010. On the Computation of McMillan's Prefix for Contextual Nets and Graph Grammars. In Graph Transformations, Berlin, Heidelberg. Springer Berlin Heidelberg, 91--106. doi: 10.1007/978-3-642-15928-2_7 Schwoon, 2013. 2013. Efficient verification of sequential and concurrent systems. PhD Thesis. École normale supérieure de Cachan-ENS Cachan. On the other hand, unfoldings of GTSs of quantum computation are expected to be intractably large Yang, 2021. 2021. Equality Saturation for Tensor Graph Superoptimization. CoRR abs/2101.01332. doi: 10.48550/ARXIV.2101.01332, with no complete prefixes in general. Rather, our interests lie in finding heuristics that determine the subspace of the unfolding of interest, combined with fast algorithms to expand finite unfolding prefixes into larger ones. This chapter is to our knowledge the first work in this direction.
Persistent data structures on the other hand have a rich history in computer science Drisco., 1989. 1989. Making data structures persistent. Journal of Computer and System Sciences 38, 1 (February 1989, 86--124). doi: 10.1016/0022-0000(89)90034-2 Lagogi., 2005. 2005. A survey of persistent data structures. In Proceedings of the 9th WSEAS International Conference on Computers, Stevens Point, Wisconsin, USA. World Scientific and Engineering Academy and Society (WSEAS), and particularly within functional programming Okasaki, 1996. 1996. Purely functional data structures. Carnegie Mellon University, USA Okasaki, 1998. 1998. Fast Mergeable Integer Maps. In Workshop on ML, Septempter 1998, 77--86 Hinze, 2005. 2005. Finger trees: a simple general-purpose data structure. Journal of Functional Programming 16, 02 (November 2005, 197). doi: 10.1017/s0956796805005769. Confluently persistent data structures were first explored in Drisco., 1994. 1994. Fully persistent lists with catenation. Journal of the ACM 41, 5 (Septempter 1994, 943--959). doi: 10.1145/185675.185791. A general treatment of the approach was subsequently presented in Fiat, 2003. 2003. Making data structures confluently persistent. Journal of Algorithms 48, 1 (August 2003, 16--58). doi: 10.1016/s0196-6774(03)00044-0 and improved in Collet., 2012. 2012. Confluent Persistence Revisited. In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, January 2012. Society for Industrial and Applied Mathematics, 593--601. doi: 10.1137/1.9781611973099.50. Chaler., 2018. 2018. Multi-Finger Binary Search Trees. In International Symposium on Algorithms and Computation 2018. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi: 10.4230/LIPICS.ISAAC.2018.55 proposed a data structure for confluently persistent tries.
Within the field of graph transformations, there is a well-developed theory for persistent (and confluently persistent) transformations in the form of the concurrent graph transformation formalism of Corradini et al. Corrad., 1996. 1996. Graph Processes. Fundamenta Informaticae 26, 3,4 (241--265). doi: 10.3233/fi-1996-263402 and Baldan et al. Baldan, 1999. 1999. Concurrent Semantic of Algebraic Graph Transformations. In Handbook of Graph Grammars and Computing by Graph Transformation, August 1999. World Scientific, 107--187. doi: 10.1142/9789812814951_0003. This categorical formalism has also been extended to include support for overlapping transformations in Echahed, 2017. 2017. Parallel Graph Rewriting with Overlapping Rules. In Proceedings of the 21st International Conference on Logic for Programming, Artificial Intelligence and Reasoning, LPAR, 300--318. doi: 10.29007/576h.
The first practical application of persistent graph rewriting was developed by the graph rewriting engine GRAPE Weber, 2017. 2017. GRAPE – A Graph Rewriting and Persistence Engine. In Graph Transformation. Springer International Publishing, 209--220. doi: 10.1007/978-3-319-61470-0_13 and was originally based on ephemeral data structures with transactional ACID semantics. Later, its successor, GrapeVine Weber, 2022. 2022. Tool Support for Functional Graph Rewriting with Persistent Data Structures - GrapeVine. In Graph Transformation. ICGT 2022. Springer International Publishing, 195--206. doi: 10.1007/978-3-031-09843-7_11, enhanced this with the first fully persistent data structure for graph rewriting. In this work, the vertices and edges that result from graph rewrites are stored individually in a specialised database. Depending on the requested data version, the graph can be retrieved from the database’s individual vertex and edge entities using the database’s graph query language. To our knowledge, no confluently persistent data structure has been proposed for graph rewriting.
As we will see in section 5.5, confluent persistence is a particularly valuable property in the absence of a rewriting strategy, i.e. a procedure to select and prioritise among possible graph transformations Echahed, 2008. 2008. Inductively Sequential Term-Graph Rewrite Systems. In Graph Transformations, ICGT. Springer Berlin Heidelberg, 84--98. doi: 10.1007/978-3-540-87405-8_7. This distinguishes the approach presented in this thesis from most previous work. Rewriting strategies feature prominently in PORGY, a tool for port graph rewriting Andrei, 2011. 2011. PORGY: Strategy-Driven Interactive Transformation of Graphs. Electronic Proceedings in Theoretical Computer Science 48 (February 2011, 54--68). doi: 10.4204/eptcs.48.7 Ferná., 2010. 2010. Strategic programming on graph rewriting systems. Electronic Proceedings in Theoretical Computer Science 44 (December 2010, 1--20). doi: 10.4204/eptcs.44.1; the graph rewriting software GROOVE provides the notion of a control program to govern the transformation order Rensink, 2004. 2004. The GROOVE Simulator: A Tool for State Space Generation. In Applications of Graph Transformations with Industrial Relevance. Springer Berlin Heidelberg, 479--485. doi: 10.1007/978-3-540-25959-6_40; and finally, tools such as GrGen provide advanced control flow primitives to specify rewrite rule execution Geiß, 2006. 2006. GrGen: A Fast SPO-Based Graph Rewriting Tool. In Graph Transformations. ICGT 2006.. Springer Berlin Heidelberg, 383--397. doi: 10.1007/11841883_27.
Specifying rewriting strategies yields efficient graph transformation procedures and is particularly effective for systems with provable properties such as confluence and termination Verma, 1995. 1995. Transformations and confluence for rewrite systems. Theoretical Computer Science 152, 2 (December 1995, 269--283). doi: 10.1016/0304-3975(94)00255-0. As a result, rewriting strategies have also been used successfully within classical compiler optimisations Assmann, 2000. 2000. Graph rewrite systems for program optimization. ACM Transactions on Programming Languages and Systems 22, 4 (July 2000, 583--637). doi: 10.1145/363911.363914 and quantum circuit optimisation Fagan, 2018. 2018. Optimising Clifford Circuits with Quantomatic. In Proceedings 15th International Conference on Quantum Physics and Logic, QPL 2018, Halifax, Canada, 3-7th June 2018, 85--105. doi: 10.4204/EPTCS.287.5 Duncan, 2020. 2020. Graph-theoretic Simplification of Quantum Circuits with the ZX-calculus. Quantum 4 (June 2020, 279). doi: 10.22331/q-2020-06-04-279.
However, such properties of the transition system – or successful heuristic approximations for it – cannot always be derived. In these cases, the space of graphs reachable from an input graph within the transition system must be explored non-deterministically. In the absence of a control program, GROOVE will fall back to an exhaustive exploration of the search space – for an exploration up to depth , the result is a search tree of size , where is the number of possible rewrites at every graph in the search space (assuming is constant for every reachable graph).
Exhaustive exploration is used extensively in model checking, typically to verify properties that must hold for all reachable graphs Rensink, 2004. 2004. Model Checking Graph Transformations: A Comparison of Two Approaches. In Graph Transformations. ICGT 2004. Springer Berlin Heidelberg, 226--241. doi: 10.1007/978-3-540-30203-2_17. It has also proven to be very useful for compiler optimisation, where the constantly evolving rewrite rules, instruction sets and complex, architecture-dependent cost functions render it challenging to fix a deterministic program transformation schedule.
Jia et al. showed in Jia, 2019. 2019. TASO: optimizing deep learning computation with automatic generation of graph substitutions. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, October 2019. ACM, 47--62. doi: 10.1145/3341301.3359630 that computation graph optimisation using graph transformations was achievable without predefined rewriting strategies. They discovered new state-of-the-art implementations for computation graphs of interest to the deep learning community using a simple exhaustive search of the space of possible rewrites with backtracking. This approach was then adapted to quantum circuit optimisation in Xu, 2022. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 and Xu, 2023. 2023. Synthesizing Quantum-Circuit Optimizers. Proceedings of the ACM on Programming Languages 7, PLDI (June 2023, 835--859). doi: 10.1145/3591254.
These recent results fit within a long line of compiler research called superoptimisation Fraser, 1979. 1979. A compact, machine-independent peephole optimizer. In Proceedings of the 6th ACM SIGACT-SIGPLAN symposium on Principles of programming languages - POPL ’79. ACM Press, 1--6. doi: 10.1145/567752.567753 Massal., 1987. 1987. Superoptimizer: a look at the smallest program. In Proceedings of the second international conference on Architectual support for programming languages and operating systems, October 1987. ACM, 122--126. doi: 10.1145/36206.36194 Sands, 2011. 2011. Super-optimizing LLVM IR. (November 2011). Retrieved on 13/01/2025 (LLVM Developer's meeting) from http://llvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdf Bansal, 2006. 2006. Automatic generation of peephole superoptimizers. ACM SIGARCH Computer Architecture News 34, 5 (October 2006, 394--403). doi: 10.1145/1168919.1168906 Sasnau., 2017. 2017. Souper: A Synthesizing Superoptimizer. CoRR abs/1711.04422. On top of excellent optimisation performance, this approach to compiler optimisation using graph transformation systems (GTS) is exceptionally flexible, as rewrite rules can be generated and tailored on demand to the constraints and instruction set of the target hardware. For any supplied cost function, the compiler can explore all valid program transformations to find the rewrites sequence that minimises cost. This keeps the cost function-specific logic separate from the transformation semantics of the program, making it straightforward to replace or update the optimisation objective.
The adaptation of superoptimisation to quantum optimisation of Xu, 2022. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 and Xu, 2023. 2023. Synthesizing Quantum-Circuit Optimizers. Proceedings of the ACM on Programming Languages 7, PLDI (June 2023, 835--859). doi: 10.1145/3591254 is, however, showing scaling difficulties: unlike classical superoptimisation, which is usually designed to optimise small subroutines within programs, e.g. focusing on arithmetic instructions, single instruction multiple data (SIMD) etc., the technique should in principle be able to optimise quantum programs in their entirety and requires tens of thousands of rewrite rules. This leads to immense search spaces that superoptimisation does not scale well to.
For the special case of term rewriting, i.e. rewriting of tree expressions, a technique known as equality saturation was introduced in Tate, 2009. 2009. Equality saturation: a new approach to optimization. ACM SIGPLAN Notices 44, 1 (January 2009, 264--276). doi: 10.1145/1594834.1480915 to compress and reduce the size of the search space significantly. Equality saturation can be viewed as a twist on persistent data structures designed to optimise terms through term rewriting. It is persistent insofar as it preserves all data inserted into it, though, unlike persistent data structures, it does not retain the history of transformations. This introduces a new step in the optimisation process known as the extraction phase, where the best term stored within the data structure must be identified and recovered.
An efficient implementation was presented in Willsey, 2021. 2021. Practical and Flexible Equality Saturation. PhD Thesis. University of Washington, and it has recently been adopted in modern compiler optimisation pipelines Fallin, 2022. 2022. Cranelift: Using E-Graphs for Verified, Cooperating Middle-End Optimizations. (August 2022). Retrieved on 14/01/2025 (RFC) from https://github.com/bytecodealliance/rfcs/blob/main/accepted/cranelift-egraph.md. Though the approach was extended to optimise computation graphs for deep learning in Yang, 2021. 2021. Equality Saturation for Tensor Graph Superoptimization. CoRR abs/2101.01332. doi: 10.48550/ARXIV.2101.01332, equality saturation does not generalise to graph rewriting. Equality saturation and the difficulties of adapting it to graph rewriting are interesting (and subtle!) enough to warrant their own section.
5.2. A closer look at equality saturation
Below, we provide a succinct introduction to equality saturation and discuss its shortcomings in the context of quantum computation and graph rewriting in general. For further details on equality saturation, we recommend the presentation of Willsey, 2021. 2021. Practical and Flexible Equality Saturation. PhD Thesis. University of Washington, its implementation Willsey, 2025. 2025. egg: egraphs good. Retrieved on 14/01/2025 (code repository) from https://github.com/egraphs-good/egg, and this blog discussion Bernst., 2024. 2024. What's in an e-graph?. (Septempter 2024). Retrieved on 14/01/2025 (blog post) from https://bernsteinbear.com/blog/whats-in-an-egraph/.
Unlike a general-purpose compiler utility, equality saturation is specifically a technique for term rewriting. Terms1 are algebraic expressions represented as trees, in which tree nodes correspond to operations, the children of an operation are the subterms passed as arguments to the operation, and leaf nodes are either constants or unbound variables. For instance, the term would be represented as the tree:
This representation is particularly suited for any pure functional (i.e. side-effect-free) classical computation. Every node of a term is identified with its own term: the subterm given by the subtree the node is the root of. Given term transformation rules, term rewriting consists of finding subterms that match known transformation patterns. The matching subtrees can then be replaced with the new equivalent trees.
In equality saturation, all terms obtained through term rewriting are stored within a single persistent data structure. Term optimisation proceeds in two stages. First, an exploration phase adds progressively more terms to the data structure to discover and capture all possible terms that the input can be rewritten to until saturation (see below), or a timeout, is reached. In the second phase, the saturated data structure is passed to an extraction algorithm tasked with finding the term that minimises the cost function of interest among all terms discovered during exploration.
The data structure that enables this is a generalisation of term trees. Just as in terms, nodes correspond to operations and have children subterms corresponding to the operation’s arguments. To record that a new term obtained through a rewrite is equivalent to an existing subterm, we extend the data structure we employ to also store equivalence classes of nodes, typically implemented as Union-Find data structures Galler, 1964. 1964. An improved equivalence algorithm. Communications of the ACM 7, 5 (May 1964, 301--303). doi: 10.1145/364099.364331 Cormen, 2009. 2009. Introduction to algorithms (Third edition ed.). MIT Press, Cambridge, Massachusetts. If we, for instance, applied the rewrite to the term above, we would obtain
Nodes within a grey box indicate equivalent subterms. This diagram encodes that
any occurrence of the x * 2 term can equivalently be expressed by the x + x
term. Henceforth, when matching terms for rewriting, both the * term and the
+ term are valid choices for the first argument of the f operation. Suppose,
for example, the existence of a rewrite , then
this would match the above data structure, resulting in
A consequence of using equivalence relations in the data structure is that the ordering in which the rewrites are considered and applied becomes irrelevant!
As presented, the exploration process would never terminate, and the data structure size would grow indefinitely: as more rewrites are applied, more and more terms are created, resulting in an ever-increasing set of possible rewrites to be considered and processed. Equality saturation resolves this by enforcing a term uniqueness invariant: every term or subterm explored is expressed by exactly one node in the data structure. We can see in the above example that this is currently not the case: the term for instance, is present multiple times – so is . As a result, the nodes no longer form a forest of trees but instead directed acyclic graphs:
This is commonly known as term sharing, and the resulting data structure is known as a term graph Willsey, 2021. 2021. Practical and Flexible Equality Saturation. PhD Thesis. University of Washington. Maintaining this invariance is not hard in practice: whenever a new term is about to be added by applying a rewrite, it must first be checked whether the term exists already – something that can be done cheaply by keeping track of all hashes of existing terms. In the affirmative case, rather than adding a new term to the matched term’s equivalence class, both terms’ classes must be merged.
It might be that equivalence classes must be merged recursively: given the terms and , if the classes of and are merged (and thus and have been proven equivalent), then the classes of their respective parent and must also be merged. Doing so efficiently is non-trivial, so we will not go into details here and refer again to Willsey, 2021. 2021. Practical and Flexible Equality Saturation. PhD Thesis. University of Washington.
In the absence of terms of unbounded size, the uniqueness invariant guarantees that the exploration will eventually saturate: as rewrites are applied, there will come a point where all equivalent terms have been discovered, i.e. every applicable rewrite will correspond to an equivalence within an already known class, thereby not providing any new information. This marks the end of the exploration phase2.
Term optimisation then proceeds to the extraction phase. Reading out an optimised term out of the saturated term data structure is not trivial. For every equivalence class in the data structure, a representative node must be chosen so that the final term, extracted recursively from the root by always selecting the representative node of every term class, minimises the desired cost function3.
The strategy for choosing representative terms depends heavily on the cost function. In simple cases, such as minimising the total size of the extracted term, this can be done greedily in reverse topological order, i.e. proceeding from the leaves towards the root Willsey, 2021. 2021. Practical and Flexible Equality Saturation. PhD Thesis. University of Washington. There are also more complex cases, however: if the cost function allows for the sharing of subexpressions that may be used more than once in the computation, for instance, then finding the optimal solution will require more expensive computations such as solving boolean satisfiability (SAT) or Satisfiability Modulo Theories (SMT) problem instances Biere, 2021. 2021. Handbook of satisfiability (Second edition ed.). IOS Press, Amsterdam.
Equality saturation on graphs? #
Equality saturation is a fast-developing subfield of compilation with a growing list of applications. Unfortunately for us4, adapting these ideas to quantum computation (and graph rewriting more generally) presents several unsolved challenges.
The root of the problem lies in the program representation. The minIR representation we presented in section 3.3 – but also the quantum circuit representation – captures quantum computations, not as a term, but in a directed acyclic graph (DAG) structure.
A generalisation of equality saturation to computation DAGs was studied in
Yang, 2021. 2021. Equality Saturation for Tensor Graph Superoptimization. CoRR abs/2101.01332. doi: 10.48550/ARXIV.2101.01332 in the context of optimisation of computation graphs for deep
learning. Their approach is based on the observation that the computation of a
(classical) computation DAG can always be expressed by a term for each output of
the computation. Consider, for example, the simple computation that takes two
inputs (x, y) representing 2D cartesian coordinates and returns its equivalent
in polar coordinates (r, θ).
By introducing two operations and that compute and subsequently, discard one of the two outputs, the DAG can equivalently be formulated as two terms
corresponding to the two outputs r and θ of the computation. This involves
temporarily duplicating some of the data and computations in the DAG – though
all duplicates will be merged again in the term graph due to the term sharing
invariant.
This duplicating and merging of data is fundamentally at odds with the constraints we must enforce on linear data, such as quantum resources. Each operation (or data) of a DAG that is split into multiple terms introduces a new constraint that must be imposed on the extraction algorithm: a computation DAG will only satisfy the no-discarding theorem (section 2.1) for linear values if, for each split operation it contains, it either contains all or none of its split components.
To illustrate this point, consider the following simple rewrite on quantum circuits that pushes X gates () from the right of a CX gate to the left:
Both the left and right hand sides would be decomposed into two terms, one for each output qubit. The left terms could be written as
whereas the right terms would be
We introduced the term for the single-qubit X gate and two terms and for the terms that produce the first, respectively second, output of the two-qubit CX gate. and denote the input qubits of the computation. This would be interpreted as two different rewrites
Unlike classical computations, however, either of these rewrites on their own would be unphysical: there is no implementation of either split operations or on their own. We would thus have to enforce at extraction time that for every application of this pair of rewrite rules, either both or none of the rewrites are applied.
Conversely, satisfying the no-cloning theorem requires verification that during extraction, terms that share a subterm but correspond to distinct graph rewrites are never selected simultaneously – otherwise, the linear value corresponding to the shared subterm would require cloning to be used twice.
The no-discarding and no-cloning restrictions result in a complex web of AND
respectively XOR relationships between individual terms in the term graph.
These constraints could be ignored during the exploration phase and then be
modelled in the extraction phase by an integer linear programming (ILP) problem.
However, Yang, 2021. 2021. Equality Saturation for Tensor Graph Superoptimization. CoRR abs/2101.01332. doi: 10.48550/ARXIV.2101.01332 observed that this approach causes the term graph to encode a
solution space that grows super-exponentially with rewrite depth (see Fig. 7 in
Yang, 2021. 2021. Equality Saturation for Tensor Graph Superoptimization. CoRR abs/2101.01332. doi: 10.48550/ARXIV.2101.01332), rendering the ILP extraction problem computationally intractable
beyond 3 subsequent rewrites. Recent work has attempted to tackle this issue
using reinforcement learning Bărbu., 2024. 2024. Learned Graph Rewriting with Equality Saturation: A New Paradigm in Relational Query Rewrite and Beyond. arXiv: 2407.12794 [cs.DB].
Linearity-preserving rewrites are an exponentially small subset #
A simple calculation shows that in the case that all values in the computation graph are linear and only graphs up to a maximal size are considered, the number of possible rewrites only grows exponentially in the rewrite depth. In other words, for optimisation of quantum computations, the solution space of valid computations is much smaller5 than the space explored by the equality saturation approach of Yang, 2021. 2021. Equality Saturation for Tensor Graph Superoptimization. CoRR abs/2101.01332. doi: 10.48550/ARXIV.2101.01332.
Indeed suppose there is a maximal graph size and suppose that all rewrite patterns, i.e. the subgraph induced by the vertex deletion set of a rewrite, are connected. This is an assumption that was also made in chapter 4, see section 4.2 for a discussion.
In a computation graph of linear values , every vertex (value in the computation) has a unique incoming and outgoing edge. This means that any pattern embedding is uniquely defined by the image of a single vertex . Thus for a GTS with transformation rules, there can be at most a constant number
of possible rewrites that can be applied to any graph . Let be the set of all graphs that can be reached in the GTS in at most rewrites from some input graph . is the set of all graphs obtained by applying a rewrite to a graph . Thus we have the relation:
The total number of rewrites that can be applied on any graph in is thus
In summary, equality saturation is a specialisation of persistent data structures uniquely suited to the problem of term rewriting. It succinctly encodes the space of all equivalent terms, and using term sharing does away with the need to apply equivalent rewrites on multiple copies of the same term, which inevitably occurs on more naive rewriting approaches.
However, equality saturation cannot model rewrites that require deleting parts of the data. This is not a problem for terms representing classical operations, as data can always be implicitly copied during exploration and discarded during extraction as required. This is not the case for quantum computations – and for graph rewriting in general, where explicit vertex (and edge) deletions are an integral part of graph transformation semantics.
As a result, numerous constraints would have to be imposed to restrict the solution space encoded by term graphs to valid outcomes of graph rewriting procedures. This would make extraction algorithms complex and cumbersome. More importantly, we showed that in the case of computation graphs on linear values, such as quantum computations, the solution space explored by equality saturation is super-exponentially larger than the space of valid computations, rendering the extraction algorithm and meaningful exploration of the relevant rewriting space computationally intractable.
-
Depending on the context, computer scientists also call them abstract syntax trees (AST) – for our purposes, it’s the same thing. ↩︎
-
Of course, it is also practical to include a timeout parameter in implementations to guarantee timely termination even on large or ill-behaved systems. ↩︎
-
Note that we are omitting a subtle point here that arises due to term sharing: depending on the cost function, choosing different representative nodes for the same class could be favourable for the other occurrences of the term in the computation. ↩︎
-
but fortunately for this thesis ↩︎
-
Exponential is super-exponentially smaller than super-exponential! Or put mathematically . ↩︎
5.3. The data structure
We now present a data structure that is closely related to equality saturation but supports arbitrary graph rewriting. It is modelled on the graph unfolding construction as presented in Baldan, 2008. 2008. Unfolding Graph Transformation Systems: Theory and Applications to Verification.
Rather than maintaining equivalence relations between terms, as done in term graphs, we maintain equivalence relations between graph vertices. Our data structure stores the set of all applied rewrites – the main subject of this section is to show how all operations of interest on this data structure can be implemented efficiently.
The persistent graph rewriting data structure is given by a set of events , with
- vertex deletion set and
- glueing relation .
We have extended the notation to by defining it as the union of all vertex sets of replacement graphs in . We will similarly use to denote the set of vertices in the replacement graph of a rewrite .
Events resemble rewrites as defined in Definition 5.4 but differ in that they do not apply to a single graph , i.e. there is no graph such that . Instead,
We will see below how a graph can be constructed such that an event is indeed a valid rewrite on .
Using the disjointness of the union in (2), for all , there is a unique such that that we call the owner of . The parents (or directed causes) of an event are then the owners of the vertices in the deletion set of :
Inversely, we define the children of as the set of event whose parents include
The following figure shows an example of a data structure on undirected graphs.
Events on an undirected graph with their history. Coloured directed edges represent the parent-child relationship. The area that they rewrite in the parent event is represented by dashed regions of the same colour. The map between graphs is given by the vertex IDs.
Merges, confluent persistence and event creation #
A rewrite that applies to a replacement graph of an event , i.e. immediately defines a valid event . In that case, has a unique parent
Creating an event from a rewrite is the simplest type of data mutation that can be recorded in . For to be a confluently persistent data structure, it must also be allowed to merge mulitple data mutations together. Rather than handling merges of versions of the data structure explicitly, an event can define graph mutation operations that apply on collections of events – the resulting mutation is equivalent to explicitly creating a merged version of the versions in , followed by the desired rewrite. In this case, the parents of are precisely the set .
In other words, the parent-child relationships of is precisely the event history of : a directed graph with vertex set and edges if . For to define a valid confluently persistent data structure, we need to
- Ensure that the event history is acyclic, and
- Define conditions that guarantee that events correspond to valid data mutations.
We hit both birds with one stone by restricting how can be constructed and modified in such a way that acyclicity is guaranteed. Specifically, we introduce two procedures:
CreateEmptyconstructs an empty , andAddEvent, adds an event to .
The first is straightforward – and importantly, the only way to construct an
instance . AddEvent, on the other hand, enforces two conditions
that must satisfy to be added to a set :
- , and
- all parents of must be compatible.
We defer the discussion on the second condition, enforced by the AreCompatible
procedure, to its dedicated section below. The restriction
defines a partial order on events by
guaranteeing that an event can only be defined and added to
after all its parents have been added.
We say that is valid if it can be constructed from a single call
to CreateEmpty, followed by a sequence of calls to AddEvent. This is
equivalent to requiring that
- the parent-child relationship is acylic and
- the parents of every event satisfy
AreCompatible.
For the remainder of this chapter, we will always assume that is valid, and thus the event history of is always well-defined and acyclic.
def CreateEmpty() -> Set[Event]:
return set()
def AddEvent(
events: Set[Event],
replacement_graph: Graph
deletion_set: Set[V],
glueing_relation: EquivalenceRelation[V]
) -> Set[Event]:
new_event = (
replacement_graph,
deletion_set,
glueing_relation
)
parents = parents(new_event)
assert(issubset(parents, events))
assert(AreCompatible(parents))
events = union(events, {new_event})
Compatible events #
Assuming the parent-child relationship is acylic, we can define the ancestors (or causes) of an event recursively
Events are compatible (or a configuration) if all vertex deletion sets for all ancestors of are disjoint. That is, writing
we require that all sets are disjoint. In the example above, events and are compatible, wheresa and are not. As pseudocode, this is implemented by the following procedure.
def AreCompatible(events: Set[Event]) -> bool:
all_ancestors = union([ancestors(d) for d in events])
deleted_vertices = set()
for d in all_ancestors:
for v in deletion_set(d):
if v in deleted_vertices:
return False
deleted_vertices.add(v)
return True
Note that this definition of event compatibility is a strictly stronger version of parallel independence as is typically defined in DPO rewriting Corrad., 2018. 2018. On the Essence of Parallel Independence for the Double-Pushout and Sesqui-Pushout Approaches. In Graph Transformation, Specifications, and Nets, Cham. Springer International Publishing, 1--18. doi: 10.1007/978-3-319-75396-6_1. It does not allow for events and such that a vertex is both
- in the read only context of , i.e. , and thus present both before and after the application of ,
- in the deletion set of , i.e. .
This excludes asymmetric conflicts as discussed in e.g. Baldan, 2008. 2008. Unfolding Graph Transformation Systems: Theory and Applications to Verification, which arise in the more generali definition. This restriction simplifies our considerations as makes the event history of any event unique.
The runtime is , where is the sum of the sizes of all vertex deletion sets of events in
The factor can typically be removed if the vertices span a contiguous integer range or by using a hash function. Alternatively, the factor can also be reduced by using separate sets to track deleted vertices of each event.
When talking about compatible sets of events , it simplifies considerations to always choose such that the ancestors of are also in , i.e. We introduce the notation
for the set of all compatible sets of rewrites of the form .
Events are rewrites on the flattened history #
We have so far explored how events can be added to , as well as when they are compatible. However, until we have established that adding events to is in some sense equivalent to applying rewrites on a graph, it is hard to see how the data structure would be useable for graph rewriting. This is precisely our next point.
In a valid non-empty , events form a directed acyclic graph and therefore there must always be (at least) one “root” event with no parents . is thus a valid rewrite that can be applied to any graph.
For the applications of that we consider, it will always be sufficient to have a unique root event . Viewing as a rewrite that applies to the empty graph , we can understand it as injecting the input graph into .
Non-root events in on the other hand typically correspond to valid (semantics preserving) rewrites in the GTS under consideration.
Consider a set of compatible events . Define a topological ordering of the events in , i.e. if then .
There are graphs such that for all , the event defines a valid rewrite on and .
Define the empty graph . The event has no parent and thus must have an empty vertex deletion set and glueing relation. It is thus a valid rewrite on . Define .
We can similarly define inductively for graphs if we show for that the -th event defines a valid rewrite on . The set of vertices in is the union of all vertices in the replacement graph of minus their vertex deletion sets
where is the vertex deletion set of .
Now, by definition of the event ,
On the other hand, because of the compatibility of all events in , we know that for all . It thus follows . Hence is indeed a valid rewrite of , and thus and are well-defined.
This construction is illustrated in the following figure for the compatible set of the previous example.
Applying events as rewrites in topological order. The result is a sequence of valid graph rewrites that start from the graph of .
We now show that the graph is determined uniquely by and provide an explicit procedure to construct it.
The graph obtained by applying the set of compatible rewrites in topological order on the empty graph is independent of the topological ordering chosen.
Given the set of rewrites , the procedure
FlattenHistory returns in time
where and are the total number of vertices and edges across all replacement graphs in .
Let us start with the definition of FlattenHistory:
1def FlattenHistory(events: Set[Event]) -> Graph:
2 all_ancestors = union([ancestors(d) for d in events])
3 graph = Graph()
4 for a in toposort(all_ancestors):
5 add_graph(graph, replacement_graph(a))
6 for (del_v, repl_v) in glueing_relation(a):
7 move_edges(graph, repl_v, del_v)
8 for v in deletion_set(a):
9 remove_vertex(graph, v)
10 return graph
toposort is a function that returns a topological ordering of the rewrites in
according to the parent-child rewrite relation, add_graph inserts the
graph passed as second argument into the graph passed as first argument,
remove_vertex removes the vertex along with all incident edges from the graph
and move_edges moves all edges of the second vertex to the first vertex.
Correctness of FlattenHistory. It is easy to see that if the graph
that is obtained from applying the rewrites in order is independent of the
choice of the toplogical ordering, then FlattenHistory is a correct
implementation of the procedure, as it applies one rewrite at a time, in
topological order.
Rewrite order invariance. Consider two rewrites such that neither is an ancestor of the other. Let
and proceed by induction over : assume the graph obtained by applying the rewrites in is invariant on the choice of the topological ordering of . Clearly this is true for . All that remains to be shown is that obtained by applying first then on is equal to , obtained by applying the same rewrites in the reverse order on .
The vertex sets and of and must be disjoint because and hence are compatible. Furthermore, the replacement graphs (by definition of the rewrites) and the glueing relations of and (by rewrite compatibility) cannot contain vertices in . It follows that the order in which vertices of are removed from does not affect the graph . Furthermore, vertex merging is a commutative operation, and so is disjoint graph addition. It follows and hence the result.
Runtime. In total vertices and edges will be added to graph
by add_graph on line 5. As a result, at most vertices can ever be deleted
by line 9. Finally, while a naive implementation of move_edges of line 7 might
result in the same edge being moved many times, all edge moves can be cached and
only executed once at the end: notice that every time edges are moved away from
a vertex, that vertex is subsequently removed from the graph. Instead of
removing the vertex, keep it “hidden”, with a link to the vertex that the edges
should be moved to. Once all graph operations are completed, traverse all hidden
vertices and follow the links to the vertices that the edges should be moved to.
This can be done in time. Then move all edges to the correct vertex, in
time , and delete the hidden vertices.
Now instead of exploring the space of all graphs reachable by
repeatedly applying rewrites, we can explore the rewrite space by adding events
to . Write for the graph returned by FlattenHistory on
set . If is the set of all graphs returned by FlattenHistory
on compatible events
then Proposition 5.1 and Proposition 5.2 combined guarantee that . To conclude, we show that indeed any graph in is in , and hence .
Let be a set of compatible events and . Any rewrite that can be applied on defines an on defines an event that can be added to .
We recall that a rewrite defines an event that can be added to if
- , and
- all rewrites in are compatible.
By the rewrite definition, . It follows in particular that
and thus , as well as . This proves both conditions.
Starting from the empty graph , we can create a root event with an empty vertex deletion set and glueing relation and add it to
Clearly, . We then apply
Proposition 5.3 repeatedly. If we have a sequence
of valid rewrites that can be applied on , then the
sequence of events that it defines can
also be added to in this order. As we have further seen in
Proposition 5.1 and
Proposition 5.2, the graph that is obtained as a
result of the rewrites is the same graph returned by FlattenHistory called on
.
In other words, we conclude that exploring the rewrite space on is fully equivalent to exploring the space of valid events starting from .
5.4. Exploration and extraction
In the previous section, we proposed a data structure that is confluently persistent and can be used to explore the space of all possible transformations of a graph transformation system (GTS). We are now interested in using to solve optimisation problems over the space of reachable graphs in the GTS. Following the blueprint of equality saturation (see section 5.2), we proceed in two phases:
- Exploration. Given an input graph , populate with events that correspond to rewrites applicable to graphs reachable from ,
- Extraction. Given a cost function , extract the optimal graph in , i.e. the graph that is a flattening of a set of compatible edits and minimises .
Each phase comes with its respective challenges, which we discuss in this section. We will first look at the exploration phase, which requires a way to find and construct new events that can be added to . We will consider the extraction phase in the second part of this section and see that the problem of optimisation over the power set can be reduced to boolean satisfiability formula that admit simple cost functions in the use cases of interest.
There is an additional open question that we do not cover in this section and would merit a study of its own: the choice of heuristics that guide the exploration phase to ensure the “most interesting parts” of the GTS rewrite space are explored. We propose a very simple heuristic to this end in the benchmarks of section 5.5, but further investigations are called for.
Exploring the data structure with pattern matching #
We established in the previous section that rewrites that apply on can
equivalently be added as events to . In other words, a graph
is reachable from using the rewrites of a GTS if and only if there is a set
of compatible events such that is the graph
obtained from FlattenHistory on input .
To expand to a larger set , we
must find all applicable rewrites on all graphs within . A naive
solution would iterate over all subsets of , check
whether they form a compatible set of events, compute FlattenHistory if they
do, and finally run pattern matching on the obtained graph to find the
applicable rewrites. We can do better.
The idea is to traverse the set of events in using the glueing relations that connect vertices between events. Define the function that is the union of all glueing relations in events in :
where we write for the owner of , i.e. the (unique) event
such that . We define the set
of equivalent vertices of that are compatible with by
applying recursively and filtering out vertices whose owner is not
compatible with . It is easiest to formalise this definition using pseudocode
for the EquivalentVertices procedure. The set of vertices in
are vertices of descendant events of .
def EquivalentVertices(
v: Vertex, events: Set[Event]
) -> Set[Vertex]:
all_vertices = set({v})
for w in mu_bar(v):
new_events = union(events, {owner(w)})
if AreCompatible(new_events):
all_vertices = union(all_vertices,
EquivalentVertices(w, new_events)
)
return all_vertices
Whilst it looks as though EquivalentVertices does not depend on ,
it does so through the use of the function calls to mu_bar.
We use EquivalentVertices to repeatedly extend a set of pinned vertices
. A set of pinned vertices must satisfy two
properties:
- the set is a set of compatible events,
- there is no vertex and event such that .
As a result, for the flattened graph , it always holds that
. Furthermore, if is the subgraph of
induced by , then for any superset of pinned vertices
, we have where
. In other words: extending a set of pinned vertices
results in an extension of the flattened graph – a very useful property when
pattern matching. This property follows from the second property above and the
definition of FlattenHistory.
This gives us the following simple procedure for pattern matching:
- Start with a single pinned vertex .
- Construct partial embeddings for patterns .
- Pick a new vertex in but not in (that we would like to extend the domain of definition of our pattern embeddings to).
- For all vertices , build new pinned vertex sets , filter out the sets that are not valid pinned vertex sets.
- Repeat steps 2–4 until all pattern embeddings have been found.
Step 1 is straightforward – notice that pattern matching must be started at a vertex in , so finding all patterns will require iterating over all choices of . The pattern embeddings are constructed over iterations of step 2: each iteration can be seen as one step of the pattern matcher – for instance, as presented in chapter 4 – extending the pattern embeddings that can be extended and discarding those that cannot. If all possible pattern embeddings have been discarded, then matching can be aborted for that set.
How step 3 should be implemented depends on the types of graphs and patterns that are matched on. It is straightforward in the case of computation graphs with only linear values, i.e. hypergraphs with hyperedges that have directed, ordered endpoints and vertices that are incident to exactly one incoming and one outgoing edge. In that case, can always be chosen in such a way as to ensure progress on the next iteration of step 2, i.e. the domain of definition of at least one partial pattern embedding will be extended by one vertex. The text in the blue box below explains this case in more detail.
Step 4 produces all possible extensions of to pinned vertex sets that include a descendant of (or itself). All vertices in are in events compatible with by definition, so to check that is a valid pinned vertex set, we only need to check the second property of pinned vertices. Let be a pattern, let be the set of all sets under consideration. Step 4 increments the sizes of all pinned vertex sets whilst maintaining the following invariant.
Invariant for step 4. If there is a superset of compatible events such that embeds in , then there is a superset of vertices such that embeds in .
Finally, step 5 ensures the process is repeated until, for all partial pattern embeddings, either the domain of definition is complete, or the embedding of is not possible. Given that step 4 increments the size of sets at each iteration, this will terminate as long as the vertex picking strategy of step 3 selects vertices that allow to extend (or refute) the partial pattern embeddings constructed and extended in step 2. This is satisfied, for example, in the case of linear minIR graphs, as explained in the box.
Choosing the next vertex to pin in linear minIR (step 3). Assuming patterns are connected, for any partial pattern embedding there is an edge with no image in but such that at least one of the endvertex of has an image in – say, is the outgoing edge of . Let be an endvertex of in that has no image in – and say, it is the -th outgoing endvertex of in .
Then uniquely identifies an edge in – the unique outgoing edge of – which, in turn, uniquely identifies a vertex – the -th outgoing endvertex of . By choosing in step 3, step 4 will create pinned vertex sets that include all possible vertices equivalent to , which are all vertices that might be connected to through its outgoing edge1. The next iteration of step 2 will then either extend the partial pattern embedding to or conclude that an embedding of is not possible.
Using the approach just sketched, pattern matching can be performed on the persistent data structure . The runtime of steps 2 and 3 depend on the type of graphs and patterns that are matched on – these are, however, typical problems that appear in most instances of pattern matching, independently of the data structure used here. A concrete approach to pattern matching and results for the graph types of interest to quantum compilation was presented in chapter 4.
The runtime of step 4 and the number of overall iterations of steps 2–4
required for pattern matching will depend on the number of events in
(AreCompatible runs in runtime linear in the number of
ancestors), the number of equivalent vertices that successive rounds of step 4
will return and the types of patterns and pattern matching strategies.
Extraction using SAT #
Moving on to the extraction phase, we are now interested in extracting the optimal graph from , according to some cost function of interest. Unlike exploring the “naive” search space of all graphs reachable in the GTS, the optimal solution within the persistent data structure cannot simply be read out.
We showed in section 5.3 that finding an optimal graph that is the result of a sequence of rewrites on an input graph is equivalent to finding an optimal set of compatible events – the optimal graph is then recoved by taking .
There are elements in , which we encode as a boolean assignment problem by introducing a boolean variable for all events . The set of events is then given by
We can constrain the boolean assignments to compatible sets by introducing a boolean formula
for all such that their vertex deletion sets intersect . Any assignment of that satisfies all constraints of this format defines a compatible set of events.
How many such pairs of events ) are there? By definition of parents, two events and can only have overlapping vertex deletion sets if they share a parent. Assuming all events have at most children, ensuring is a set of compatible events requires at most constraints.
To further restrict to , i.e. to sets of compatible events that contain all ancestors, we can add the further constraints: implies . This introduces up to implication constraints
for all such that .
For any set of events , the conjunction of all constraints presented above, i.e. the event compatibility constraints (3) and the parent-child relation constraints (4), defines a boolean satisfiability problem (SAT) with variables . We have shown:
Consider a GTS with a constant upper bound on the number of rewrites that may overlap any previous rewrite.
The set of valid sequences of rewrites that can be extracted from a set of events in the GTS is given by the set of satisfying assignments of a SAT problem Cook, 1971. 1971. The complexity of theorem-proving procedures. In Proceedings of the third annual ACM symposium on Theory of computing - STOC ’71. ACM Press, 151--158. doi: 10.1145/800157.805047 Moskew., 2001. 2001. Chaff: engineering an efficient SAT solver. In Proceedings of the 38th conference on Design automation - DAC ’01. ACM Press, 530--535. doi: 10.1145/378239.379017 with variables of size .
Finding the optimal assignment #
We now have to find the optimal assignment among all satisfiable assignments for the SAT problem given above. In the most general case where the cost function to be minimised is given as a black box oracle on the graph , i.e. on the flattened history of the solution set , this optimisation problem is hard2.
However, if can be expressed as a function of instead of the flattened history , then the ‘hardness’ can be encapsulated within an instance of a SMT problem (satisfiability modulo theories Nieuwe., 2006. 2006. On SAT Modulo Theories and Optimization Problems. In Theory and Applications of Satisfiability Testing - SAT 2006. Springer Berlin Heidelberg, 156--169. doi: 10.1007/11814948_18 Barrett, 2018. 2018. Satisfiability Modulo Theories), a well-studied generalisation of SAT problems for which highly optimised solvers exist Moura, 2008. 2008. Z3: An Efficient SMT Solver. In Tools and Algorithms for the Construction and Analysis of Systems. Springer Berlin Heidelberg, 337--340. doi: 10.1007/978-3-540-78800-3_24 Sebast., 2015. 2015. OptiMathSAT: A Tool for Optimization Modulo Theories. In Computer Aided Verification. Springer International Publishing, 447--454. doi: 10.1007/978-3-319-21690-4_27. A class of cost functions for which the SMT encoding of the optimisation problem becomes particularly simple are local cost functions:
A cost function on graphs is local if for all rewrites there is a cost such that for all graphs that applies to
The cost of a rewrite also immediately defines a cost to the event that defines . We can thus associate a cost with each event , given by the cost of any of the rewrites that defines.
An instance of such a local cost function often used in the context of the optimisation of computation graphs are functions of the type
for some vertex weight function – i.e. cost functions that can be expressed as sums over the costs associated to individual vertices in 3. Indeed, it is easy to see that in this case we can write
where and are the vertex deletion set and replacement graph of respectively.
As discussed in section 2.2, many of the most widely used cost functions in quantum compilation are local, as the cost of a quantum computation is often estimated by the required number of instances of the most expensive gate type (such as \texttt{CX} gates on noisy devices, or \texttt{T} gates for hardware with built-in fault tolerance protocols).
In these cases, the cost function is integer valued and the extraction problem is indeed often sparse:
The local cost function is said to be -sparse on if
In case of -sparse local cost functions, the SAT problem on can be simplified to only include
by repeatedly applying the following constraint simplification rules on any such that :
- for every parent and child , remove the parent-child constraints between and and between and . Insert in their place a parent-child constraint between and .
- for every non-compatible sibling event , remove the compatibility constraint between and . Insert in its place a compatibility constraint between and for all .
This reduces the SAT or SMT problem to a problem with variables and at most constraints.
With the completion of this section, we have described an equivalent computation on for every step of a GTS-based optimisation problem:
- a rewrite that can be applied on a graph can be added as an event to ,
- a graph that results from a sequence of rewrites can be recovered from
using
FlattenHistory, - the set of all graphs reachable from events in can be expressed as a SAT problem; depending on the cost function, the optimisation over that space can then take the form of an SMT problem.
In essence, using the confluently persistent data structure we replace a naive, exhaustive search over the space of all graphs reachable in the GTS with a SAT (or SMT) problem – solvable using highly optimised dedicated solvers that could in principle handle search spaces with up to millions of possible rewrites Zulkos., 2018. 2018. Understanding and Enhancing CDCL-based SAT Solvers. PhD Thesis. University of Waterloo.
-
To realise this, notice that all vertices equivalent to are vertices that will be merged with . Hence, they will all be attached to the outgoing edge of at its -th outgoing endvertex. ↩︎
-
Hardness can be seen by considering the special case of the extraction problem in which all events are compatible and no two events have a parent-child relation: then there are no constraints on the solution space and the optimisation problem requires finding the minimum of an arbitrary oracle over inputs. ↩︎
-
A similar argument also applies to cost functions that sum over graph edges, as would be the case in minIR, where operations are modelled as hyperedges. ↩︎
5.5. Bounding the search space size
We show in this section that under some assumptions on the GTSs that hold in the use cases of interest to quantum compilation, there is a provable gap between the size of the search space of all reachable graphs in the GTS and the size of the corresponding confluently persistent data structure .
Let us introduce first the notion of overwriting rewrites.
For two rewrites and , we say that overwrites , written , if the deletion set of includes a vertex of the vertex set of the replacement graph of
The definition can identically be applied to events. In this case, the overwriting events are precisely given by the parent-child relation: the set of all overwriting events of is by definition the set of parents of .
Our argument relies on the comparison of asymptotic bounds for the sizes of two sets and , which we now define. Consider a GTS and an input graph . A graph is reachable from within depth if there is a sequence of rewrites in the GTS from to such that all subsequences formed of overwriting rewrites have length at most .
The set is the set of all graphs reachable within depth . We derive:
- a lower bound on the size of , the space of all graphs reachable in at most rewrites from some input , and
- an upper bound on the size of the equivalent confluently persistent data structure , i.e. such that
In order to obtain bounds, we will introduce hypotheses that the GTSs must satisfy. Throughout this section, we will illustrate and motivate the restrictions that they impose in the following two use cases.
Use case 1: -complete GTS #
The first GTS we consider can be defined on any graph domain that has a notion of graph size (e.g. based on number of nodes, number of edges, etc.1) and for any graph size . The GTS is such that for any subgraph of size , there is at least one transformation rule in the GTS that matches . We will call this case CompleteGTS.
This is the use case of quantum superoptimisation discussed in section 3.1 and used for benchmarking in section 4.7. In those cases, the transformation rules are obtained by enumerating all small circuits up to a certain size , thus guaranteeing that any subcircuit of size will be matched by the GTS.
Note that there is also an (obvious) upper bound on the number of transformation rules that can match on any given subgraph: the total number of transformation rules in the GTS.
Use case 2: single-rule GTS in a uniform domain #
At the other extreme of the GTS spectrum, we can consider a GTS made of a single (arbitrary) transformation rule. In this case, we require that graphs are drawn from a domain uniformly at random, so that for any subgraph , all patterns of size are equally likely. We will call this case SingleRuleGTS.
In this case, we will not show that our hypotheses hold for all inputs, but rather that they hold with a high probability. We will phrase our statements as a function of and will require that they hold with probability for randomly drawn .
This regime is interesting as it is the simplest instance of problem domains for which few assumptions can be made about the GTS themselves, but all inputs are expected to be equally likely.
Lower bound on the naive search tree #
The event history of the set of graphs defines a tree , where are the nodes and is the parent of if there is a rewrite rewriting to in the GTS. Paths in are sequences of rewrites. We call the naive search tree of the GTS. We wish to derive a lower bound for .
Graph partitioning #
For fixed search depth , let be the largest integer such that for all graphs , there exists disjoint subgraphs of that satisfies the following property, for all
there exists a rewrite in the GTS that can be applied to .
For a fixed GTS, a fixed depth and a family of input graphs , we have the scaling .
We conjecture that this scaling holds for many GTSs of interest:
CompleteGTS and for SingleRuleGTS probabilistically.In CompleteGTS, it suffices to partition any input into disjoint subgraphs of size at least . Each subgraph will match a rule of the GTS by definition.
For SingleRuleGTS, let . Let be the size of the left hand side of the GTS rule. By assumption, for any subgraph of size of an input , there is a constant probability that the rule matches . For a subgraph of size , the probability of the rule not matching in is . Picking
for some ensures that whenever (i.e. for large enough),
where was chosen. It follows that a partition of into disjoint subgraphs of size at least satisfies the hypothesis with probability .
Lower bound on #
Fix the tree depth and the GTS. Any rewrite from the GTS removes at most a constant number of vertices from the graph it applies on. Thus, Any graph in is at least of size . Let be the smallest value of for a graph . Whenever hypothesis 1 applies, we have .
For each and , pick a rewrite that applies to and let be the set of all such rewrites
We can consider the subtree that only contains graphs obtained by applying rewrites in .
For , the search tree will contain graphs: for each subgraph of the input graph , we can choose to either apply or not2.
By repeating times the search tree of size for , we obtain a lower bound
We frame this result as the following proposition.
As a result, for any GTS satisfying Hypothesis 1 the size of grows at least exponentially with input graph size and search depth .
Upper bound on the factorised search space #
Consider two rewrites and . If neither overwrites the other, i.e. and , then the order in which they are applied is irrelevant (see also Proposition 5.2). The persistent data structure uses this symmetry explicitly when exploring the set of reachable graphs.
This drastically reduces the size of the event history of . The event history defines a directed acyclic graph that is the equivalent for to the naive search tree of . The vertices of are the flattened histories of events in :
with an edge from to if there is a parent-child relation . We call the factorised search space of .
By construction, any graph in the naive search tree maps injectively to a subgraph of the factorised search space, given by the subgraph of induced by the rewrites on the path from to in .
Whereas our earlier discussion was focused on proving a lower bound for the size of the search tree, we now show an upper bound on the number of graphs in the factorised search space.
Graph covering #
Instead of considering partitions of graphs in as we did above, we now consider coverings of graphs in , i.e. a set of subgraphs such that their union is but that might not be disjoint.
Let , and be parameters and fix a covering for each graph such that:
- for all and all , there are at most applicable rewrites to the -th covering set of . Furthermore, all rewrites within are mutually exclusive, i.e. they modify a shared subgraph so that it is never possible to apply more than one rewrite among the (up to ) applicable ones3;
- for all and for all rewrites that apply to , there is such that applies to (i.e. the matching subgraph of is fully contained within one of the coverings). Furthermore, the matching subgraph of overlaps with at most distinct ;
- for all rewrites that apply to the covering set of a graph , the image must be a subgraph of the -th covering subgraph of .
The first condition is satisfied whenever the size of the coverings can be bounded: in that case can be chosen based on the number of distinct subgraphs that can be contained in a covering set and the number of rules that can apply to each. The second condition is related to the connectivity between the covering sets: can thus often be derived by considering how many neighbours a covering set has, and how many of those neighbours can a match of a GTS rule span.
The third condition above can be understood as “rewrites must preserve the coverings”. In other words, the coverings are chosen such that a graph mutation produced by the application of a rewrite on is always contained within a single covering subgraph of .
For a fixed GTS, a fixed depth and a family of input graphs , we have the scaling and and .
These conditions along with Hypothesis 2 are somewhat restrictive and future
work should explore how to relax them. For our use cases CompleteGTS and
SingleRuleGTS, we restrict our considerations to the special case where the
graph domain is quantum circuits. We make the further simplifying assumptions
(similar results can be obtained with variations on these assumptions)
- transformation rules have two-qubit circuits as left and right hand sides, and
- the number of qubits on the inputs is fixed (i.e. the number of gates on each qubit scales with circuit size).
Define to be the largest number of gates on any one qubit in the left hand sides of the GTS transformation rules. Consider a partition of the gates on each qubit into sequences of gates. We can obtain a covering of a quantum circuit by considering covering sets defined for all such that for all matches of a left hand side of a transformation rule of the GTS, if and only if there is a gate in such that is in the -th sequence of gate on some qubit in . Imposing the condition that rewrites must preserve coverings, the covering of the input fixes the covering of all reachable graphs in the GTS.
CompleteGTS and for SingleRuleGTS.Let be the input circuit with qubits. Consider the covering of as constructed above. By construction , where is the maximum number of gates on a qubit of .
The covering set contains the set of gates composed of the -th sequence of gates for each qubit in . Furthermore, for all , if is a match of a two-qubit rule that contains , then may contain at most other gates. Hence by construction, . This is a constant and thus there is a constant such that for all there are most matches of a two-qubit rule that intersect .
Finally, any match spans two qubits; the gates on each qubit (at most
) may belong to at most two distinct sequences of gates of that
qubit. Thus, any match spans at most distinct covering sets.
These arguments made no assumption on properties of the rule set and thus apply
equally to CompleteGTS and SingleRuleGTS.
Upper bound on #
The preservation of coverings under rewrites allows us to consider a covering of : for each , let be the set of graphs in that are the result of a rewrite of its -th covering subgraph. Every graph in is the result of a rewrite on some covering subgraph , or is the input graph . So, from a bound for all , we can obtain a bound
The bound can be obtained recursively: upper bounds by definition the number of rewrites in any covering subgraph of the root graph , and thus the number of graphs in . We then proceed by induction for .
A rewrite overlaps with at most other covering subgraphs. It can overwrite at most one previous rewrite for each subgraph, and thus will have at most parent graphs in sets . Each of the sets is of size at most . Furthermore, there are at most rewrites in any covering subgraph. We thus obtain the recursion:
Unrolling the recursion, we can write this as
Recalling that by construction , we obtain:
The factorised search space size is in .
Discussion and empirical exploratory analysis #
We have derived bounds on the size of the search spaces and shown that under some assumptions on the properties of the GTS, the factorised search space grows linearly in the size of the input graph . This stands in stark contrast to the lower bound of the naive search tree, which scales exponentially with the size of the input graph.
However, when considering the overall optimisation problem of finding the optimal solution over the set of reachable graphs in a GTS, the exponential overhead does not disappear: it is rather shifted to the extraction phase that relies on a SAT solver. It is therefore an open question whether the factorised search space can be used to improve optimisation problems on GTSs.
To this end, we devise a simple numerical experiment that assesses the potential of using the unfolding construction as presented in this chapter in the context of quantum computation optimisation.
The toy problem. We consider a very simple circuit optimisation problem that is desiged to require a deep search space (i.e. a large number of rewrites) to be solved. This will exacerbate the scaling difference between an optimiser that must traverse the naive search space and another that relies on the factorised representation instead.
The inputs are quantum circuits composed of two-qubit and single-qubit rotation gates. The angles of the rotations are not relevant and set randomly. They are of the following form:
i.e. each pair of subsequent qubits have 2 gates at either end and 10 rotation in-between, on the control qubits of the gates. These circuits admit a very simple optimisation that can be expressed by the following two transformation rules:
Given the objective of minimising the number of gates, the optimiser must commute the leftmost gates through all of the rotation gates, until the two on each qubit are adjacent and cancel out. We study the performance of the optimisers as we increase the number of qubits in the circuit.
Optimisers. We define two optimisers. Badger is a backtracking search through the naive search space of reachable graphs in the GTS: starting from the input, the search space is expanded by computing all possible rewrites at a given state. States with the lowest cost function are processed first. This is similar to an A* search Hart, 1968. 1968. A Formal Basis for the Heuristic Determination of Minimum Cost Paths. IEEE Transactions on Systems Science and Cybernetics 4, 2 (100--107). doi: 10.1109/tssc.1968.300136.
Seadog on the other hand performs the backtracking search on the factorised search space instead: when expanding a state of the search space, only rewrites that overlap with the last rewrite are considered and added to the search space, as discussed in section 5.4. In a second phase, the search space is encoded as a SAT problem that is solved using Z3 Moura, 2008. 2008. Z3: An Efficient SMT Solver. In Tools and Algorithms for the Construction and Analysis of Systems. Springer Berlin Heidelberg, 337--340. doi: 10.1007/978-3-540-78800-3_24.
The Badger optimiser is released and publicly available as part of the open-source TKET package4. The Seadog optimiser on the other hand is still in early development; more benchmarks and a release will follow.
Results. We ran the experiment on an Apple M3 Max CPU (4.05GHz) for inputs between ( gates) and qubits ( gates). Both optimisers ran on a single core. For each instance, we set a timeout of seconds and report the relative gate reduction, i.e.
The results are shown in the figures on the right.
Discussion. On the left, we observe that both optimisers are able to find the optimum for circuits with up to 30 CX gates. Beyond that point, the time limit starts impacting Badger performance, which drops continuously and reaches 0% for inputs of 50 CX gates and above. Seadog on the other hand does not time out and is able to explore the entire (factorised) search space exhaustively up until 70 CX gates.
Observe that the Badger optimiser reaches the time limit for as few as 10 CX. Indeed, the complete naive search space size can be calculated to have states (each pair of qubits can be in one of 12 states). For we get states, but this already reaches over states for .

CX gate count reduction (left) and runtime (right) for the Badger and Seadog optimisers. 100% gate count reduction is optimal. A timeout was set to 2 seconds.
Size of factorised search space for Seadog.
On the other hand, the factorised search space will only contain states for each qubit pair. This results in a linear scaling of the search space size, as can clearly be seen in the second figure.
Where the runtime exceeds the 2 second timeout, this is due to pre- and post-optimisation steps such as memory allocation/deallocation, I/O, file parsing etc that are included in the measurements. The quadratic runtime scaling that we observe in Seadog is due to a hash function that is run on every state of the search space to detect and discard duplicates: as the number of states in the search space grows linearly with input size and each state requires a hash in linear time, the overall runtime grows quadratically. Future work may be able to address this issue by designing updateable hash functions that do not require the full graph to be rehashed when applying a local rewrite.
Future work should also investigate how to scale Seadog to larger input sizes on a broader class of problems. We have observed that the SAT-based extraction phase of Seadog corresponds to less than 1% of the runtime budget (under 15ms for all input sizes). Whilst being asymptotically exponential in the worst case, it is thus not currently a bottleneck. On the other hand, the number of states visited per second in the exploration phase is currently up to slower for Seadog compared to Badger. Further investigations into the causes of this are still required, but we expect that large performance improvements can be realised on the current implementation and as a result could scale to larger inputs.
-
The only constraint on the notion of graph size is that it must be compatible with the subgraph relation: if , then ↩︎
-
Note that this counting is already an act of clemency: we are not counting all permutations of the rewrites, which would be considered separately by a naive exploration that applies one graph rewrite at a time. In this case, the search tree for would contain graphs. ↩︎
-
This can always be made to hold by replacing any set of mutually disjoint rewrites with their cartesian product, in effect viewing the application of multiple disjoint rewrites as one large rewrite. Thi comes at the cost of a larger value for ↩︎
-
As a Python package on PyPI and a rust crate on crates.io). ↩︎
Chapter 6
Future Work and Conclusions
The time has now come to conclude this thesis. In summary, our claim is that given
- the modularity and expressiveness that quantum compilers will require to simultaneously express higher level abstractions, hardware primitives and interleaved quantum classical computation (cf. sections 3.3, 2.3, and 2.4),
- the challenge of scaling up quantum programs sizes to make the most of the computational capabilities of upcoming hardware (cf. sections 1.1 and 2.2),
- the linearity restrictions that quantum data imposes on the compiler’s intermediate representation (IR) of the computation (cf. sections 2.1, 3.3, and 3.4),
graph transformation systems (GTS) are uniquely positioned to serve as the backbone of a quantum compilation framework.
To this aim, chapter 3 presented minIR, a graph-based compiler IR with explicit support for linear types. To go along with it, we proposed the first formalisation of graph transformation semantics that preserve linearity.
Chapters 4 and 5 built on this foundation and solved two critical scaling problems for the adoption of GTS techniques in quantum compilers.
Pattern matching. Successful implementations of GTSs for quantum circuit optimisation rely on thousands to hundreds of thousands of transformation rules Xu, 2022. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 Xu, 2023. 2023. Synthesizing Quantum-Circuit Optimizers. Proceedings of the ACM on Programming Languages 7, PLDI (June 2023, 835--859). doi: 10.1145/3591254, for which techniques matching one pattern at a time become a significant bottleneck. Chapter 4 presented an approach based on state automata with an asymptotic runtime that is independent on the number of patterns. This resulted in a 20x speedup for a real-world pattern matching task of direct utility to quantum compilers.
Efficient rewrite space exploration. Applications of GTS to quantum compilation distinguish themselves – unfortunately – by a lack of successful rewriting strategies or other rule control mechanisms. Consequently, the optimisation of quantum computations is framed as a search problem over the space of all reachable graphs in the GTS. Chapter 5 introduced a novel confluently persistent data structure that uses the structure of the rewrite search space to speed up its exploration. In typical applications, the factorised search space thus obtained is conjectured to grow linearly with the size of the input – an exponential improvement over the naive search strategy, without which GTS-based compiler optimisations on real-world computations with thousands of gates will be infeasible.
In both cases, the guarantees that linear values provide and that minIR enforces translate into asymptotic runtime guarantees that cannot be derived otherwise. In the absence of linearity, the pattern matching of chapter 4 becomes an NP-hard problem; meanwhile, the graph rewriting space considered in chapter 5 would grow super-exponentially and require pruning heuristics for the extraction problem, as studied in Yang, 2021. 2021. Equality Saturation for Tensor Graph Superoptimization. CoRR abs/2101.01332. doi: 10.48550/ARXIV.2101.01332 and Bărbu., 2024. 2024. Learned Graph Rewriting with Equality Saturation: A New Paradigm in Relational Query Rewrite and Beyond. arXiv: 2407.12794 [cs.DB].
Combined, these contributions lay the groundwork for a quantum compiler platform that is modular in the hardware primitives, high-level programming abstractions and transformation rules that it can model, and scalable in the size of the computation and number of rules that it can match and optimise over. Work on such a platform is well underway within the TKET2 open-source compiler, available on GitHub.
Further work could take many directions. The graph transformation semantics of chapter 3 that are presented operationally could for example be categorified and generalised. This would open many promising bridges and parallels to work in related domains, such as string diagrams, DPO-based GTSs and even the family of ZX calculi.
There are also immediate opportunities in extending the work of chapters 4 and 5, in particular around weakening the assumptions that had to be made on the structure of the graph, respectively on the properties of the GTS and graph domain. In both cases, a more in-depth study of how the runtime of actual implementations depend on properties of the inputs would be very informative. We suspect from anecdotal observations that many assumptions we have imposed can be relaxed with little impact on performance – conversely, there may be large variations in runtimes for different regimes within the asymptotic guarantees of our results.
Another crucially important question that this thesis has not addressed is the choice of transformation rules. Beyond the results of Xu, 2022. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 and Xu, 2023. 2023. Synthesizing Quantum-Circuit Optimizers. Proceedings of the ACM on Programming Languages 7, PLDI (June 2023, 835--859). doi: 10.1145/3591254 that we have referred to repeatedly throughout this corpus, very recent work by Amy and Lunderville Amy, 2025. 2025. Linear and Non-linear Relational Analyses for Quantum Program Optimization. Proceedings of the ACM on Programming Languages 9, POPL (January 2025, 1072--1103). doi: 10.1145/3704873 has present what amounts to the first inroads into hybrid classical-quantum optimisations. Developing comprehensive transformation rules for hybrid computations would present significant a significant advance for the field.
Among the myriad of options, we opt to conclude this thesis with the discussion of two particularly promising avenues for future work. The first (section 6.1) relates to increasing the expressivity of the pattern matching language; such an extended framework would also enable fast pattern matching directly on the persistent data structure of chapter 5, rather than having to match patterns in each graph of separately.
The second (section 6.2) is a proposal to use the persistent data structure of chapter 5 for large scale distributed graph rewriting. With this, the optimisation of quantum computations could be distributed across multiple machines, potentially scaling up to high-performance computing (HPC) clusters and opening the door to optimisation capabilities that could significantly advance the state of the art of quantum circuit optimisation.
6.1. More expressive pattern matching
Pattern matching as defined in chapter 4 is the problem of finding pattern embeddings for patterns from a fixed set of patterns . We are interested in lifting two limitations of this definition.
Firstly, it would be desirable to be able to define patterns that are not a concrete graph instance, but instead a (potentially infinite) family of graphs. Examples of such pattern families that could be useful in quantum computing are
- “a sequence of gates that commute with each other”, or
- “a subgraph that only contains Clifford gates”, or
- “all operations within the body of a loop”.
To express these patterns as concrete graph instances would require an infinite number of graphs. The study of pattern languages that allow the expression of such higher-level graphs is a mature field of graph transformations, with tools such as GrGen.NET Geiß, 2006. 2006. GrGen: A Fast SPO-Based Graph Rewriting Tool. In Graph Transformations. ICGT 2006.. Springer Berlin Heidelberg, 383--397. doi: 10.1007/11841883_27 offering many advanced capabilities. It would be of great interest to establish what classes of pattern languages could be supported by generalisations of the state automaton approach presented in chapter 4.
Secondly, our approach currently only supports linear values, and thus in its current form is unsuitable for hybrid quantum-classical computations. Coincidentally, supporting non-linear values is very similar to finding embeddings of patterns into the confluently persistent data structure of chapter 5. The case of a non-linear value that is used multiple times in a computation is syntactically very similar to having to consider a value in that may be connected to operations in different ways, depending on the variant of the “multiverse” of equivalent graphs that are stored simultaneously in .
Pattern matching generalisation #
The following generalisation of pattern matching might be able to achieve these two goals whilst still being compatible with the state automaton approach that we presented. We suggest defining patterns and how they match using three concepts:
Constraints. A pattern is given by a set of constraints . They encode the conditions under which a pattern matches. They would for instance assert that two vertices are connected by an edge, or that a vertex is of a certain type. A pattern that is a concrete graph would then at a minimum have a constraint for each edge in .
Constraints correspond to edges (transitions) in the state automaton. Pattern matching proceeds by evaluating all outgoing constraints from the current state, and proceeds to the states for which the respective constraint is satisfied.
Indexing schemes. An indexing scheme assigns each object (e.g. vertex) in the patterns a unique key in , and each object (e.g. vertex) in the input domain a unique value . Embeddings of patterns into are then given by key-value maps , mapping keyed objects from patterns to objects in the input domain. Each constraint has a set of keys associated with it; can then be evaluated by passing it all the values in bound to its keys.
Indexing schemes are designed to give overlapping patterns the same key on their overlap, so that the overlap must only be matched once. This models how in chapter 4, patterns are clustered into patterns that share the same contracted tree and are differentiated by their contracted string tuples only.
Key-value map expansion. Indexing schemes abstract away the pattern and input data in such a way that the pattern matcher only needs to keep track of key-value maps . These maps can be created recursively using an expansion function
This provides all the ways in which the domain of definition of an index map can be extended. The returned set of new index maps should coincide with on but expand their domain of definition to include new keys . By making it possible to extend in more than one way, we can model the existence of non-linear values (i.e. the index map could be extended to any of the operations that uses a certain value ), as well as the fact that a persistent data structure such as may be keeping track of multiple versions of the graph, and thus expand a key in multiple ways.
Execution of the pattern matcher #
Starting from an empty key-value map at the root state of the state automaton, the pattern matcher keeps track of a set of key-value maps, along with for each map the state it is in. It then proceeds by repeatedly performing the following two actions:
-
Expand the domain of definition of a key-value map by calling ;
-
Evaluate the constraints for a key-value map ; if the constraint is satisfied, move to the next state, otherwise try another constraint. If no constraint is satisfied, delete .
The performance of the pattern matcher will be highly dependent on choosing a smart ordering of these two actions, as well as prioritising the right key-value maps to be expanded and evaluated.
With this proposal, it would appear possible to combine the fast state automaton-based approach of chapter 4 and its scaling to a very large number of patterns, with a more expressive pattern language and support for non-linear types as well as persistent graph rewriting. An implementation of this is currently being worked on in the open-source portmatching project, available on GitHub.
6.2. Massively parallel graph rewriting
Persistent data structures – and particularly fully and confluently persistent ones – are well-suited for distributed applications. In persistent data structures, data can always be added but never deleted, and is thus immutable. This removes the need for locks and synchronisation primitives across processes. Furthermore, using confluence, edits can be made concurrently in different processes and then eventually merged asynchronously, as follows:
The contributions presented in chapter 5 thus translate directly into a proposal for a massively parallel graph rewriting system. In summary, we have shown that graph rewrites can be tracked in a persistent data structure in the form of edits . New edits added to can refer to previous edits, and thus create an acyclic edit history. Sets of edits and can also be merged (confluence) and as a result, new edits that build on top of edits from both and can be defined.
We describe in slightly more detail what a massively parallel graph rewriting architecture might look like.
Inter-process communication #
During the rewriting process, the set of processes that are involved must regularly broadcast the edits they have added to (their copy of) the data . Such broadcasted edits must then be merged by the other processes into their respective local copies. This is required so that progress that is made by one process can be shared and expanded on top of by other processes.
Technologies such as message-passing interface (MPI) Dongar., 1993. 1993. MPI: a message passing interface. In Proceedings of the 1993 ACM/IEEE conference on Supercomputing. ACM Press, 878--883. doi: 10.1145/169627.169855 would be well-suited to such inter-process communications. To reduce the number of messages that senders and receivers must process, edits should not be broadcasted one-by-one, but rather grouped together. For this, we propose the notion of a salient edit, reflecting that an edit is deemed of importance.
Non-salient edits are not broadcasted as they are added to . When, on the other hand, an edit is deemed salient, it is broadcasted along with all its ancestors (i.e. all edits that depends on). As the edit history deepens, it might become inefficient to broadcast all the ancestry of an edit, in which case more advanced communication protocols would have to be devised.
Finally, a procedure must be put in place to identify identical edits that may be added and/or broadcasted by different processes to avoid deduplication. Hashing techniques and hash tables are well-suited for this kind of problem.
Process types #
At a minimum, the distributed graph rewriting system should distinguish between two types of processes.
The vast majority of processes would be rewrite factories. Their purpose is to create new edits, add them to and broadcast them whenever they are deemed salient. These processes will be responsible for driving forward the search space exploration and, in the end, the optimisation. A good candidate for a rewrite factory is the pattern matching automaton of chapter 4 and its generalisation just described in section 6.1. Different processes may specialise in different transformation rule sets; others still could implement dedicated optimisations such as ZX-based optimisations or optimal Clifford synthesis (see discussion in section 2.2).
The other type of process would be a result extractor; a read-only process that runs the SAT-based optimisation and graph extraction algorithm of section 5.4. Such a process would run the computation at regular intervals to track the optimisation progress.
As the distributed architecture grows in complexity, more tasks and more process types may be required. It might for instance be desirable to have a process that identifies under-explored parts of the search space to direct rewrite factories in that direction.
Using such an architecture, it might be possible for the first time to scale quantum compilation workloads to large clusters of machines. This could significantly advance compilation performance of quantum programs, a particularly valuable contribution at a time where quantum computers are on the edge of utility. Nevertheless, such distributed systems often prove difficult to design and run successfully. Open questions include how to coordinate the search across processes in such a way that the most promising parts of the search space are explored whilst avoid work duplication; will communication become the bottleneck in the computation; what are the most effective transformation rules and cost functions to use; and what are the limits of modern SAT solvers on our problem of interest.
Appendix
A. Prefix trees
Our main result is achieved by reducing a tree inclusion problem to the following problem.
String prefix matching. Consider the following computational problem over strings. Let be a finite alphabet and consider the set of -tuples of strings over . For a string tuple and a set of string tuples , the -dimensional string prefix matching consists in finding the set
This string problem can be solved using a -dimensional prefix tree. We give a short introduction to prefix trees for the string case but refer to standard literature for more details Knuth, 1999. 1999. The Art of Computer Programming: Sorting and Searching, Volume 3. Addison-Wesley, Reading MA.
One-dimensional prefix tree. Let be strings on some alphabet . Given an input string , we wish to find the set of patterns , i.e. is a prefix of .
The prefix tree of is a tree with a tree node for each prefix of a pattern. The children of an internal node are the strings that extend the prefix by one character. The root of the tree is the empty string. Each tree node also stores a list of matching patterns, with each pattern stored in the unique corresponding node. Every prefix tree has an empty string node, which is the root of the tree. For every inserted pattern of length at most nodes are inserted, one for every non-empty prefix of the pattern. Thus a one-dimensional prefix tree has at most nodes and can be constructed in time .
Given an input , we can find the set of matching patterns by traversing the prefix tree of starting from the root. We report the list of matching patterns at the current node and move to the child node that is still a prefix of , if it exists. This procedure continues until no more such child exists. In total the traversal takes time , as every character of is visited at most once.
Note that in theory the number of reported pattern matches can dominate the runtime of the algorithm. We can avoid this by returning the list of matches as an iterator, stored as a list of pointers to the tree nodes matching lists.
Multi-dimensional prefix tree. A -dimensional prefix tree for is defined recursively as a one-dimensional prefix tree that at each node stores a -dimensional prefix tree. Given an input -tuple , the traversal of the -dimensional prefix tree is done by traversing the one-dimensional prefix tree on the input until no child is a prefix of the input, and then recursively traversing the -dimensional prefix tree on . Similarly to the one-dimensional case, the list of matching patterns is stored at prefix tree nodes and reported during traversal. The traversal thus takes time , as every character of is visited at most once.
For tuples of size of words of maximum length , we can bound the number of nodes of the -dimensional prefix tree by . The runtime and space complexity of the construction of the -dimensional prefix tree is thus in , summarised in the result:
B. Lower bound on the number of patterns
Let be the number of port graphs of width , depth and maximum degree . We can lower bound
assuming .
In the regime of interest, is small, so the assumption is not a restriction.
Let and be integers. We wish to lower bound the number of port graphs of depth , width and maximum degree . It is sufficient to consider a restricted subset of such port graphs, whose size can be easily lower bounded. We will count a subset of CX quantum circuits, i.e. circuits with only gates, a two-qubit non-symmetric gate. Because we are using a single gate type, this is equivalent to counting a subset of port graphs with vertices of degree 4. Assume w.l.o.g that is a power of two. We consider CX circuits constructed from two circuits with qubits composed in sequence:
- Fixed tree circuit: A -depth circuit that connects qubits pairwise in such a way that the resulting port graph is connected. We fix such a tree-like circuit and use the same circuit for all CX circuits. We can use this common structure to fix an ordering of the qubits, that refer to as qubits .
- Bipartite circuit: A CX circuit of depth with exactly CX gates, each gate acting on a qubit and a qubit .
The following circuit illustrates the construction:

All that remains is to count the number of such bipartite circuits. Every slice of depth 1 must have CX gates acting on distinct qubits. Every qubit to must interact with one of the qubits to , so there are such depth 1 slices. Repeating this depth 1 construction times and using Sterling’s approximation, we obtain a lower bound for the number of port graphs of depth , width and maximum degree at least 4:
where we used to obtain in the last step.