Unrolled thesis

Scaling
Quantum Compilation
with
Graph Rewriting #

Luca Mondada

Lady Margaret Hall

University of Oxford

A thesis submitted for the degree of

Doctor of Philosophy

Hilary 2025


Abstract #

As the capabilities of quantum hardware advance and their architectures become more complex, quantum compilers must evolve to optimise increasingly large programs and support a growing range of computational primitives. In particular, the emergence of hybrid quantum-classical computations, required for instance by implementations of quantum error correcting protocols, poses significant challenges to established quantum compilation techniques.

This thesis argues that graph transformation systems (GTS) offer a principled foundation for compiler platforms that can express arbitrary hardware primitives and support computations over both quantum and classical data. To this end, it introduces a graph-based intermediate representation (IR), with support for linear types to model quantum data. This unifies graphical formalisms used for reasoning over quantum computations (such as the ZX calculus) with IR-based program transformation techniques from classical compiler design.

Building upon this foundation, the thesis tackles two critical scaling problems hindering the adoption of GTS in quantum compilers. First, it presents an efficient pattern matching algorithm based on a precomputed data structure that achieves query times independent of the number of transformation rules in the system. This removes a key bottleneck in quantum superoptimisers, which optimise quantum programs using tens to hundreds of thousands of rules.

Second, the thesis introduces a confluently persistent data structure that enables efficient exploration of the GTS state space of possible graph transformations. This factorised search space offers an exponential complexity advantage in search space size and traversal time compared to naively exploring all reachable graphs. The thesis also discusses the problem of extracting optimal programs from this data structure, relating it to Boolean satisfiability (SAT) problems.

Together, these contributions lay the groundwork for scalable, GTS-based quantum compilers and advance the integration of quantum and classical compilation techniques. The persistent data structure for graph rewriting also opens the door for concurrent graph rewriting in distributed systems, a technique that may have applications within the graph transformations field more broadly and merits further research.


To my family who have supported me throughout these years. I couldn’t have done it without you.

To my friends and colleagues that made this adventure an unforgettable journey. Thank you to Aleks and the entire group in Oxford.

A special mention goes to Ross and the amazing crew at Quantinuum: Dan, Pablo and Thomas – thank you for proofreading – as well as Alan, Agustin, Alec, Callum, Craig, Doug, Ian, Lukas, Mark, Richie, Silas, Seyon, Will, Yao, and so many more! I look forward to continuing this incredible undertaking with you all.

This document is also available as HTML here. If you are reading this document on a screen, you might find the experience much more pleasant in the browser (you can hover on citations, resize the window to your liking, search the document and more).


Chapter 1

Introduction

Quantum computing is the computational model that arises from the quantum mechanical manipulation of finite dimensional physical systems. Realising this new computing paradigm requires an entirely new technology stack: most obviously, new dedicated hardware, but also an extensive collection of software tools that transform the intents of a human user into a symphony of electric pulses that operate all components of the hardware installation (lasers, magnetic fields, currents, photodetectors, etc.).

Turning human-readable code into machine instructions is the realm of compilers, a problem as old as classical1 computer science itself. By analogy, the same problem in the quantum world was named quantum compilation.

Interestingly, whereas the term quantum compilation has been in use for the longest part of the existence of quantum computing as a field, it is only recently that the quantum compilation community has started to adopt tools, ideas and results from our classical counterparts.

Meanwhile, quantum computing has a long history of adopting diagrammatic and graph-based representations to model and reason about computations and their quantum mechanical properties. The most famous example of this is undoubtedly the quantum circuit, a quantum analogue to boolean circuits that visualises how data flows from one operation to the next (cf. section 2.1).

Going beyond circuits, the field of categorical quantum mechanics has embraced and extended diagrammatic formalisms to model a variety of quantum processes and computations. Particularly noteworthy in this line of work are the numerous advances in quantum circuit optimisation (e.g. Duncan, 2020Ross Duncan, Aleks Kissinger, Simon Perdrix and John van de Wetering. 2020. Graph-theoretic Simplification of Quantum Circuits with the ZX-calculus. Quantum 4 (June 2020, 279). doi: 10.22331/q-2020-06-04-279 Gogioso, 2022Stefano Gogioso and Richie Yeung. 2022. Annealing Optimisation of Mixed ZX Phase Circuits. In Proceedings 19th International Conference on Quantum Physics and Logic, QPL 2022, Wolfson College, Oxford, UK, 27 June - 1 July 2022, 415--431. doi: 10.4204/EPTCS.394.20), quantum simulations (e.g. Kissin., 2022Aleks Kissinger and John van de Wetering. 2022. Simulating quantum circuits with ZX-calculus reduced stabiliser decompositions. Quantum Science and Technology 7, 4 (July 2022, 044001). doi: 10.1088/2058-9565/ac5d20 Sutcli., 2025Matthew Sutcliffe and Aleks Kissinger. 2025. Fast classical simulation of quantum circuits via parametric rewriting in the ZX-calculus. arXiv: 2403.06777 [quant-ph]), error correction (e.g. Beaudr., 2020Niel de Beaudrap and Dominic Horsman. 2020. The ZX calculus is a language for surface code lattice surgery. Quantum 4 (January 2020, 218). doi: 10.22331/q-2020-01-09-218 Cowtan, 2024Alexander Cowtan and Simon Burton. 2024. CSS code surgery as a universal construction. Quantum 8 (May 2024, 1344). doi: 10.22331/q-2024-05-14-1344) and many more related subjects (e.g. Simmons, 2021Will Simmons. 2021. Relating Measurement Patterns to Circuits via Pauli Flow. Electronic Proceedings in Theoretical Computer Science 343 (Septempter 2021, 50--101). doi: 10.4204/eptcs.343.4 Felice, 2023Giovanni de Felice and Bob Coecke. 2023. Quantum Linear Optics via String Diagrams. In Proceedings 19th International Conference on Quantum Physics and Logic, Wolfson College, Oxford, UK, 27 June - 1 July 2022. Open Publishing Association, 83-100. doi: 10.4204/EPTCS.394.6) that the family of ZX-like calculi have enabled in the last five years alone.

A challenge in quantum compilation has been to combine the principled and abstract graph-based transformation semantics of diagrammatic reasoning with the feature set and performance requirements of practical compilation tools. General purpose tools graph rewriting tools such as Quantomatic Fagan, 2018Andrew Fagan and Ross Duncan. 2018. Optimising Clifford Circuits with Quantomatic. In Proceedings 15th International Conference on Quantum Physics and Logic, QPL 2018, Halifax, Canada, 3-7th June 2018, 85--105. doi: 10.4204/EPTCS.287.5 proved too slow for quantum circuit optimisation and other tools from the graph transformation community such as GROOVE Rensink, 2004Arend Rensink. 2004. The GROOVE Simulator: A Tool for State Space Generation. In Applications of Graph Transformations with Industrial Relevance. Springer Berlin Heidelberg, 479--485. doi: 10.1007/978-3-540-25959-6_40 and GrGen.NET Geiß, 2006Rubino Geiß, Gernot Veit Batz, Daniel Grund, Sebastian Hack and Adam Szalkowski. 2006. GrGen: A Fast SPO-Based Graph Rewriting Tool. In Graph Transformations. ICGT 2006.. Springer Berlin Heidelberg, 383--397. doi: 10.1007/11841883_27 have not been adopted.

Instead, successful graph-based tools such as PyZX Kissin., 2020Aleks Kissinger and John van de Wetering. 2020. PyZX: Large Scale Automated Diagrammatic Reasoning. In Proceedings 16th International Conference on Quantum Physics and Logic, Chapman University, Orange, CA, USA., 10-14 June 2019. Open Publishing Association, 229-241. doi: 10.4204/EPTCS.318.14 and its faster re-implementation QuiZX Kissin., 2022Aleks Kissinger, John van de Wetering and Renaud Vilmart. 2022. Classical Simulation of Quantum Circuits with Partial and Graphical Stabiliser Decompositions. In . Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi: 10.4230/LIPICS.TQC.2022.5 focused on performant rewriting for a restricted subdomain (in this case, the ZX calculus). This specialisation makes it difficult to expand these approaches to new primitives and constraints that are emerging from hardware advances within quantum computing. It also limits the interaction and sharing across field boundaries and impedes the development of tools applicable to a broader range of graph transformation domains.

The ambitious aim of this thesis is to advocate for graph transformation as a robust basis for a scalable and modular compiler platform for quantum computations – and hope that in the process, our contributions will strengthen the bridge between research in classical compilation, quantum computing and graph transformations. The key desired properties of our compilation framework can be summarised as follows:

Scalable. The compiler should handle quantum computations of the kind we realistically expect to execute within the coming decade: thousands of logical qubits, relying on possibly millions of physical qubits. Just as importantly, the compiler architecture should scale to take advantage of large classical computational resources, in order to maximise the optimisation potential when available.

Modular. The computational primitives available on present quantum hardware are wide-ranging and evolving rapidly, the programming models for end users are adapting, and hardware constraints and characteristics change from device to device. A future-proof compiler platform must therefore imperatively be extensible in its supported instruction set, its optimising cost function and the program transformation strategies.

Why is now the time for such a compiler and why are these qualities so important? We develop some arguments in section 1.1. Our concrete contributions to this goal are then summarised in section 1.2, along with an outline of the thesis.

This thesis hopes to strenghten the bridge between the fields of classical compilation, quantum computing and graph transformations. Three-legged bridges also exist in the real world – here the Butterfly Bridge in Copenhagen. Image credits: Christian Lindgren, archdaily.com.

This thesis hopes to strenghten the bridge between the fields of classical compilation, quantum computing and graph transformations. Three-legged bridges also exist in the real world – here the Butterfly Bridge in Copenhagen. Image credits: Christian Lindgren, archdaily.com.


  1. To distinguish traditional computing from quantum computing, the field refers to the former as classical computing. We will adopt this term throughout, for lack of a better word. ↩︎

1.1. A new compilation regime

We have introduced quantum compilation by drawing an analogy with the well-developed field of classical compilers. The novel directions in which quantum compilation is taking the field make for exciting new challenges. Three new quantum-specific properties of compilation form the core motivation for this work.

Large variations in architecture #

The vast differences between proposed hardware architectures are a first distinguishing characteristic of current quantum computing developments. Unlike classical computing, where silicon-based transistors have become the definitive physical foundation for all electronic chips, the search for the most scalable and reliable technology for quantum computing is ongoing – and doubtless one of the most burning questions for the nascent industry. This introduces an incredible variety of compilation problems.

Quantum hardware designs differ both in the types of quantum particles used to implement qubits and in the control systems employed to manipulate these particles. Suggestions for the former include charged ions Kielpi., 2002D. Kielpinski, C. Monroe and D. J. Wineland. 2002. Architecture for a large-scale ion-trap quantum computer. Nature 417, 6890 (June 2002, 709--711). doi: 10.1038/nature00784 T. Pel., 1995J. I. Cirac, T. Pellizzari and P. Zoller. 1995. Decoherence, Continuous Observation, and Quantum Computing: A Cavity QED Model. Physical Review Letters 75, 21 (November 1995, 3788--3791). doi: 10.1103/PhysRevLett.75.3788, neutral atoms Jaksch, 2000D. Jaksch, J. I. Cirac, P. Zoller, S. L. Rolston, R. Côté and M. D. Lukin. 2000. Fast Quantum Gates for Neutral Atoms. Physical Review Letters 85, 10 (Septempter 2000, 2208--2211). doi: 10.1103/physrevlett.85.2208 Deutsch, 2000Ivan H. Deutsch, Gavin K. Brennen and Poul S. Jessen. 2000. Quantum Computing with Neutral Atoms in an Optical Lattice. Fortschritte der Physik 48, 9–11 (Septempter 2000, 925--943). doi: 10.1002/1521-3978(200009)48:9/11<925::aid-prop925>3.0.co;2-a, photons Knill, 2001E. Knill, R. Laflamme and G. J. Milburn. 2001. A scheme for efficient quantum computation with linear optics. Nature 409, 6816 (January 2001, 46--52). doi: 10.1038/35051009, transmons Blais, 2007Alexandre Blais, Jay Gambetta, A. Wallraff, D. I. Schuster, S. M. Girvin, M. H. Devoret and R. J. Schoelkopf. 2007. Quantum-information processing with circuit quantum electrodynamics. Physical Review A 75, 3 (March 2007, 032329). doi: 10.1103/physreva.75.032329 and even Majorana Sau, 2010Jay D. Sau, Roman M. Lutchyn, Sumanta Tewari and S. Das Sarma. 2010. Generic New Platform for Topological Quantum Computation Using Semiconductor Heterostructures. Physical Review Letters 104, 4 (January 2010, 040502). doi: 10.1103/physrevlett.104.040502 particles. The equipment that drives the desired operations on these particles is then drawn from a jolly mixture of lasers, magnetic fields, microwaves, dilution fridges, etc. Each combination results in different trade-offs: some will render a specific computation particularly easy; others promise to scale well to large systems but are very error-prone and unreliable; others still achieve high fidelities at the expense of slow operations.

From the perspective of a compiler engineer, this means we must equip quantum compilers to handle a wide variety of hardware primitives, multiple optimisation goals, and hardware-specific program constraints. Traditional compilation is ill-equipped to handle this considerable challenge.

A comparison of machine code for different architectures illustrates the difference between the quantum and classical worlds. Classical CPUs are dominated by two architectures, x86, used mainly by Intel and AMD, and ARM, used by a wide range of desktop and mobile chip manufacturers1.

x86 CPU (e.g. Intel and AMD)

mov eax, 5        ; Load 5 into EAX
add eax, 3        ; Add 3 to EAX
mov [result], eax ; Store the result in memory

ARM CPU (e.g. mobile, Apple M-series)

ldr r0, =5        ; Load 5 into R0
add r0, r0, #3    ; Add 3 to R0
ldr r1, =result   ; Load address of result
str r0, [r1]      ; Store the result in memory

There are noticeable differences between the two architectures, mostly around variable naming conventions, as well as explicit memory loads ldr and stores str instructions in the case of ARM, which in x86 are handled implicitly by mov. This simplistic example naturally ignores some of the more fine-grained considerations that can make translations hard in certain edge cases. A discussion of these can be found in Ford, 2021Blake W. Ford, Apan Qasem, Jelena Tesic and Ziliang Zong. 2021. Migrating Software from x86 to ARM Architecture: An Instruction Prediction Approach. In 2021 IEEE International Conference on Networking, Architecture and Storage (NAS), October 2021. IEEE, 1--6. doi: 10.1109/NAS51552.2021.9605443. However, overall, the instructions and capabilities of the two platforms are broadly equivalent, as is confirmed by the existence of emulation tools such as Apple Rosetta.

Let us contrast this with the difference between two quantum architectures. Consider, on the one hand, an architecture that can natively perform CX and H gates on qubits (e.g. superconducting qubits, ion traps, etc.) and, on the other hand, a platform based on photons and optical components.

Quantum circuit (qubits)

h q[0];
cz q[0],q[1];

Linear circuit (photons)

bs.h(5*pi/2, pi, pi, 2*pi) m[0], m[1];
bs.h m[2], m[3];
perm([2, 1, 3, 0]) m[1], m[2], m[3], m[4];
barrier m[0], m[1], m[2], m[3], m[4], m[5];
bs.h(1.910633) m[0], m[1];
bs.h(1.910633) m[2], m[3];
bs.h(1.910633) m[4], m[5];
perm([1, 0]) m[2], m[3];
bs.h m[3], m[4];
perm([3, 0, 1, 2]) m[1], m[2], m[3], m[4];

On the left is a quantum circuit expressed in the OpenQASM2 standard Cross, 2017Andrew W. Cross, Lev S. Bishop, John A. Smolin and Jay M. Gambetta. 2017. Open Quantum Assembly Language. arXiv: 1707.03429 [quant-ph]. The right-hand side is the equivalent linear optics circuit computed by Perceval Heurtel, 2023Nicolas Heurtel, Andreas Fyrillas, Grégoire Gliniasty, Raphaël Le Bihan, Sébastien Malherbe, Marceau Pailhas, Eric Bertasi, Boris Bourdoncle, Pierre-Emmanuel Emeriau, Rawad Mezher, Luka Music, Nadia Belabas, Benoît Valiron, Pascale Senellart, Shane Mansfield and Jean Senellart. 2023. Perceval: A Software Platform for Discrete Variable Photonic Quantum Computing. Quantum 7 (February 2023, 931). doi: 10.22331/q-2023-02-21-931, expressed in a custom, OpenQASM2-like, format. The conversion is by no means straightforward! Some of the challenges include encoding qubits into multiple photon modes and mapping quantum operations to an optically realisable procedure made of optical components and measurements Felice, 2023Giovanni de Felice and Bob Coecke. 2023. Quantum Linear Optics via String Diagrams. In Proceedings 19th International Conference on Quantum Physics and Logic, Wolfson College, Oxford, UK, 27 June - 1 July 2022. Open Publishing Association, 83-100. doi: 10.4204/EPTCS.394.6.

Other architectures, such as neutral atoms, may broadly support qubit-based operations but might not offer control over individual qubits and, instead require any operations to be applied in parallel to large groups of qubits Bluvst., 2022Dolev Bluvstein, Harry Levine, Giulia Semeghini, Tout T. Wang, Sepehr Ebadi, Marcin Kalinowski, Alexander Keesling, Nishad Maskara, Hannes Pichler, Markus Greiner, Vladan Vuletić and Mikhail D. Lukin. 2022. A quantum processor based on coherent transport of entangled atom arrays. Nature 604, 7906 (April 2022, 451--456). doi: 10.1038/s41586-022-04592-6. Finally, it is to be expected that error-correcting codes that individual platforms will introduce to reduce error rates at the hardware level will introduce further constraints and new instruction sets yet again.

It is noteworthy that current trends in the classical world are also pushing compilers towards more heterogenous architectures that may include GPUs, FPGAs and other accelerators. This has led to significant changes in the design of current compilers, which we will touch upon later. Nonetheless, this shift has, so far, mostly “limited” itself to new forms of parallelism and the introduction of more specialised instruction sets rather than a fundamental redesign of existing tools and computing paradigms. The breadth of technologies and trade-offs that quantum compilers must face have no equivalent in the classical world – at least for the time being.

Asymmetric computational resources #

A second exciting paradigm shift in compilation that quantum is driving forward is cross-compilation. A common assumption in compilation is that the program is executed on the same machine (or at least the same architecture) on which it was compiled. By contrast, in cross-compilation, the compiler and the compiled binary program run on different machines, possibly with different architectures. An instance of this would be using a recent ARM system-on-chip machine to create a binary program for a traditional Windows PC with an Intel CPU. This is a supported feature of most modern compilers (and made easier by the relative similarities between processor architectures, as seen above), but such tasks are by no means trivial and can be laborious to get to work well in practice2.

The situation is very different for quantum computing. Quantum computational resources are so limited that native compilation, in which the program is compiled and run on the same machine, is unfeasible – and will remain so for the foreseeable future3. When we put the possibility of pure-quantum compilation aside, we are left with a cross-compilation problem that is entirely the realm of classical computer science; the output of which happens to be destined to run on a quantum computer. This is simliar to how in classical computing, GPU programs are typically compiled on CPUs before being uploaded and executed on the GPU.

Cross-compilation presents significant challenges. As quantum programs grow in size and complexity, debugging and verifying their correctness without access to the target hardware becomes increasingly difficult Rovara, 2024Damian Rovara, Lukas Burgholzer and Robert Wille. 2024. A Framework for Debugging Quantum Programs. arXiv: 2412.12269 [quant-ph], as we hit the limits of what can be simulated classically. Quantum simulation is a vibrant research area that is the subject of theses (e.g. Flanni., 2020Stuart Flannigan. 2020. The application of quantum simulation to topological and open many-body systems. PhD Thesis. University of Strathclyde Azad, 2024Fariha Azad. 2024. Tensor networks for classical andquantum simulation of open and closedquantum systems. PhD Thesis. University College London.) in its own right.

On the flip side, using classical hardware for quantum program compilation comes with a giant opportunity for compilers: the classical computational resources available to the compiler, measured in the size of the memory and the number of operations that can be handled, are many orders of magnitude larger (and cheaper!) than what the quantum hardware that will execute the program is capable of. We can today execute tens to hundreds of billions of operations per second (GFLOPS) on desktop computers, up to the “exascale”, i.e. 101510^{15} FLOPS, for the largest supercomputers Dongar., 2024Strohmaier, E., Simon, H., Meuer, H. Dongarra. 2024. TOP500 List. (November 2024). Retrieved on 30/12/2024 from https://top500.org/lists/top500/list/2024/11/. Quantum hardware, on the other hand, will not be executing programs with sizes beyond 1000 error-corrected gates, or 10,000 physical gates, for another three years – that is believing the most optimistic roadmaps in the industry IBM, 2024 IBM. 2024. Expanding the IBM Quantum roadmap to anticipate the future of quantum-centric supercomputing. Retrieved on 30/12/2024 from https://www.ibm.com/quantum/blog/ibm-quantum-roadmap-2025 Quanti., 2024 Quantinuum. 2024. Quantinuum Unveils Accelerated Roadmap to Achieve Universal, Fully Fault-Tolerant Quantum Computing by 2030. Retrieved on 30/12/2024 from https://www.quantinuum.com/press-releases/quantinuum-unveils-accelerated-roadmap-to-achieve-universal-fault-tolerant-quantum-computing-by-2030.

It is expected that even a few thousand quantum gates will suffice to solve problems that our largest supercomputers struggle with. Meanwhile, every gate that must be performed comes at a high cost: it may fail, introduce errors, or take a long time to complete. It therefore behoves us to use all the classical resources at our disposal to reduce quantum operations to a minimum.

Given the strict hardware limitations, all near-term architectures are expected to face, quantum compilation must evolve into cross-compilers that are able to utilise the full power of classical hardware available to them; doing so will push the boundaries of what is possible with quantum computing just a bit further – in a field where every marginal gain may unlock new applications.

The confluence of classical and quantum compilation #

Finally, quantum compilation also stands in front of some momentous engineering challenges. As we will see in section 2.2, significant research efforts have focused on the compilation and optimisation of quantum programs expressed as quantum circuits (cf. section 2.1). This formalism has its roots in quantum information theory, the field that gave birth to quantum computing and makes for an ideal framework to develop the theory and optimisation techniques. However, it does not include any of the fundaments of compiler and programming language design that make classical software as composable and scalable as it is today.

For example, there is no concept of subroutine or function calls; neither can a program execution be branching or looping based on runtime values. This makes code reuse impossible, resulting in huge program sizes and unsurmountable challenges for scaling up compilation to problems of real-world interest Ittah, 2022David Ittah, Thomas Häner, Vadym Kliuchnikov and Torsten Hoefler. 2022. QIRO: A Static Single Assignment-based Quantum Program Representation for Optimization. ACM Transactions on Quantum Computing 3, 3 (June 2022, 1--32). doi: 10.1145/3491247. The absence of code abstractions is being felt even more acutely with the emergence of hybrid quantum-classical computations, as we discuss in section 2.3.

With applications of quantum computing that cannot be expressed as quantum circuits proliferating, a move away from circuit-based representations is becoming unavoidable Hossei., 2023Lev Bishop, Yudong Cao, Andrew Cross, Niel Hossein Ajallooiean. 2023. OpenQASM 3.0 Specification. Retrieved on 15/03/2025 from https://openqasm.com/versions/3.0/intro.html QIR Al., 2021 QIR Alliance. 2021. QIR Specification v0.1. Retrieved on 31/12/24 from https://www.qir-alliance.org/. This is also an opportunity to incorporate learnings from the decades of experience that have been gathered in classical computer science. Many of the tools and software that were originally developed for classical computations are thus being adopted and adapted to the specificities of quantum. This convergence of quantum computing and classical compiler technologies is heralding new opportunities – but also pose important questions around how to represent quantum programs and optimise them.


  1. There are other architectures, such as RISC-V Waterm., 2016Andrew Shell Waterman. 2016. Design of the RISC-V Instruction Set Architecture. PhD Thesis. University of Berkeley and MIPS Hennes., 1982John Hennessy, Norman Jouppi, Steven Przybylski, Christopher Rowen, Thomas Gross, Forest Baskett and John Gill. 1982. MIPS: A microprocessor architecture. ACM SIGMICRO Newsletter 13, 4 (December 1982, 17--22). doi: 10.1145/1014194.800930, but as of 2025 the quasi totality of consumer and professional CPUs run on x86 or ARM from mobile phones to laptops, desktops, and data centres. See Valve ., 2024 Valve Corporation. 2024. Steam Hardware & Software Survey: December 2024. (December 2024). Retrieved on 30/01/2025 from https://store.steampowered.com/hwsurvey/processormfg/ for a detailed hardware market share analysis, albeit focused on gaming. Details on mobile market share can be found in this survey – all of the listed manufacturers use the ARM architecture. ↩︎

  2. There are new tools promising to make cross-compilation easier, such as Zig. This only proves our point, though: classical cross-compilation has long been a neglected edge case. ↩︎

  3. First valiant efforts at defining optimisation problems relevant to quantum compilation that could be run on quantum hardware have been recently presented in Rattac., 2024Davide Rattacaso, Daniel Jaschke, Marco Ballarin, Ilaria Siloi and Simone Montangero. 2024. Quantum circuit compilation with quantum computers. arXiv: 2408.00077 [quant-ph]. However, this concerns only specific optimisation subroutines of the overall compilation problem. It is hard to imagine today that deploying an entire compilation stack such as LLVM on quantum hardware would ever be sensible. Why tooling so close to the classical compiler frameworks will be required for quantum compilation is a topic we will return to in section 2.4↩︎

1.2. Contributions and thesis outline

Preliminaries #

The thesis starts in chapter 2 with a review of the main concepts on which the rest of the thesis is built. Aside from a short introduction to quantum computations (section 2.1) and a survey of the major quantum circuit optimisation techniques (section 2.2), this chapter makes two observations that impart a research direction to the rest of the thesis:

  1. The emergence of hybrid quantum-classical computations is rendering the quantum circuit obsolete as the main representation of quantum computations within compilers (section 2.3).
  2. The best optimisation outcomes will combine classical and quantum compiler optimisations. This can be achieved by adopting abstractions that are interoperable with classical compiler infrastructure (section 2.4).

A graph transformation formalism for quantum computations #

Chapters 3, 4, and 5 form the core of this thesis and present our main contributions. The results in chapter 3 are crucial stepping stones for the rest of the thesis. Chapters 4 and 5 meanwhile present our most significant contributions to the state of the art.

In chapter 3, we propose minIR, a new graph-based intermediate representation (IR) for quantum computations. MinIR is a minimal subset of the Hierarchical Unified Graph Representation (HUGR), recently presented in joint work Mark K., 2025Seyon Sivarajah, Alan Lawrence, Alec Edgington, Douglas Wilson, Craig Roy, Luca Mondada, Lukas Heidemann, Ross Duncan Mark Koch. 2025. HUGR: A Quantum-Classical Intermediate Representation. Retrieved (talk recording) from https://www.youtube.com/live/D8esZrt7ogk?feature=shared&t=5217 and the subject of ongoing development. It is to our knowledge the first compiler IR with support for linear types – required to model the restrictions that quantum mechanics imposes on quantum computations.

Unlike quantum circuits, minIR (and HUGR) programs can model computations that act on arbitrary combinations of classical (bits) and quantum data (qubits) within a single, unified representation. It represents the best of two worlds: it combines the safety guarantees of quantum-specific representations such as quantum circuits (i.e. it is impossible to declare physically unrealisable computations), whilst at the same time being interoperable with classical compiler IRs.

Graph-based representations of computations, known as computation graphs in deep learning and dataflow graphs within the compiler community, are common in these fields. Our original contribution is in the formalisation of the IR transformation semantics: whereas classical compilers typically define IR transformations in terms of the values that they depend on and the values that they overwrite, this approach implicitly relies on value copying and discarding and thus does not generalise to linear values. Instead, we define graph rewriting semantics on minIR and show sufficient conditions for which minIR transformations preserve the validity of the program, and in particular the linearity conditions.


The encoding of quantum computations as graphs sets the stage for quantum compilation and optimisation using graph transformation systems (GTS), in which the set of transformations that the compiler is allowed to perform is expressed by a set of graph transformation rules. This is in effect a generalisation of an approach first proposed in Xu, 2022Mingkuan Xu, Zikun Li, Oded Padon, Sina Lin, Jessica Pointing, Auguste Hirth, Henry Ma, Jens Palsberg, Alex Aiken, Umut A. Acar and Zhihao Jia. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 in the context of quantum circuits. We improve on this work with two major contributions that resolve critical issues that concerning the scaling of the technique to large numbers of transformation rules and large inputs respectively.

Pattern matching #

Our first major contribution is a pattern matching algorithm, presented in chapter 4. The main result is a runtime complexity bound independent of the number of patterns being matched, achieved using a one-off pre-computation. This is to our knowledge the first pattern matching algorithm for quantum circuits that does not depend on the number of patterns. Whilst similar multi-pattern matching techniques have been explored in other domains such as RETE networks Forgy, 1982Charles L. Forgy. 1982. Rete: A fast algorithm for the many pattern/many object pattern match problem. Artificial Intelligence 19, 1 (Septempter 1982, 17--37). doi: 10.1016/0004-3702(82)90020-0 Varró, 2013Gergely Varró and Frederik Deckwerth. 2013. A Rete Network Construction Algorithm for Incremental Pattern Matching Ian, 2003Wright Ian and James A. R. Marshall. 2003. The execution kernel of RC++: RETE*, a faster RETE with TREAT as a special case. International Journal of Intelligent Games and Simulation 2, 1 (Feb 2003, 36-48) and computational biology Danos, 2007Vincent Danos, Jérôme Feret, Walter Fontana and Jean Krivine. 2007. Scalable Simulation of Cellular Signaling Networks Boutil., 2017Pierre Boutillier, Thomas Ehrhard and Jean Krivine. 2017. Incremental Update for Graph Rewriting, no algorithm is known with provable sub-exponential worst-case complexity. These results were published in Mondada, 2025Luca Mondada and Pablo Andrés-Martínez. 2025. Scalable Pattern Matching in Computation Graphs. Electronic Proceedings in Theoretical Computer Science 417 (March 2025, 71--95). doi: 10.4204/eptcs.417.5.

The proved complexity bound applies to computations with only linear values1, of which quantum circuits are a special case. The result is expressed in terms of maximal pattern width ww and depth dd, two measures of pattern size defined in section 4.2. The main result, presented in Proposition 4.13, is reproduced here:

Proposition 1.1Pattern matching

Let P1,,PP_1, \dots, P_\ell be patterns with width ww and depth dd. The pre-computation runs in time and space complexity

O((d)w+wd).O \left( (d\cdot \ell)^w \cdot \ell + \ell \cdot w \cdot d \right).

For any subject graph GG, the pre-computed prefix tree can be used to find all pattern embeddings PiGP_i \to G in time

O(Gcww1/2d)O \left( |G| \cdot \frac{c^w}{w^{1/2}} \cdot d \right)

where c=6.75c = 6.75 is a constant.

The runtime complexity is dominated by an exponential scaling in maximal pattern width ww. Meanwhile, the advantage of our approach over matching one pattern at a time grows with the number of patterns \ell. It is thus of particular interest for matching numerous small width patterns.

We illustrate this point by comparing our approach to a standard algorithm that matches one pattern at a time Jiang, 1998Xiaoyi Jiang and Horst Bunke. 1998. Marked subgraph isomorphism of ordered graphs. In Advances in Pattern Recognition, Berlin, Heidelberg. Springer Berlin Heidelberg, 122--131. doi: 10.1007/bfb0033230, with runtime complexity O(PG)O(\ell \cdot |P| \cdot |G|). Using Pwd|P| \leq w\cdot d (cf. section 4.2), and comparing to eq. (2), we thus have a speedup in the regime Θ(cw/w3/2)<\Theta(c^w / w^{3/2}) < \ell. On the other hand, \ell is upper bounded by the maximum number Nw,dN_{w, d} of patterns of bounded width and depth. Using a crude lower-bound for Nw,dN_{w,d} derived in Appendix , we obtain a computational advantage for our approach when

Θ(cww32)<<(w2e)Θ(wd)Nw,d.\Theta\left(\frac{c^w}{w^{\frac32}}\right) < \ell < \left(\frac{w}{2e}\right)^{\Theta(w d)} \leq N_{w, d}.

In the case of quantum circuits, the width of the patterns is given by the number of qubits. The low-qubit regime where our approach shines coincides exactly with the typical applications of GTSs in quantum compilation: in Xu, 2022Mingkuan Xu, Zikun Li, Oded Padon, Sina Lin, Jessica Pointing, Auguste Hirth, Henry Ma, Jens Palsberg, Alex Aiken, Umut A. Acar and Zhihao Jia. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 and Xu, 2023Amanda Xu, Abtin Molavi, Lauren Pick, Swamit Tannu and Aws Albarghouthi. 2023. Synthesizing Quantum-Circuit Optimizers. Proceedings of the ACM on Programming Languages 7, PLDI (June 2023, 835--859). doi: 10.1145/3591254, all rules used have at most 4 qubits.

We present benchmarks on a real world dataset of 10,000 quantum circuits in section 4.7, showing a 20x speedup over a leading C++ implementation of pattern matching for quantum circuits.

Confluently persistent graph rewriting #

Our second major contribution, in chapter 5, uses a well-known construction in GTSs, the unfolding Baldan, 1999Paolo Baldan, Andrea Corradini and Ugo Montanari. 1999. Unfolding and Event Structure Semantics for Graph Grammars. In Foundations of Software Science and Computation Structures, Berlin, Heidelberg. Springer Berlin Heidelberg, 73--89. doi: 10.1007/3-540-49019-1_6, to derive a novel data structure D\mathcal{D} that compresses the representation of the space G\mathcal{G} of all graphs reachable from an input within a GTS. We call D\mathcal{D} the factorised search space of G\mathcal{G}. Optimisation problems over the space of reachable graphs in a GTS can then equivalently be expressed as optimisation problems over D\mathcal{D}.

We show in section 5.5 that under some assumptions on the GTS and input, there is an exponential complexity separation in the input size between the size of the factorised search space D\mathcal{D} – which admits an asymptotically linear upper bound – and the size of the rewrite space G\mathcal{G} that it encodes – which grows at least exponentially.

D\mathcal{D} is furthermore the first confluently persistent data structure Drisco., 1994James R. Driscoll, Daniel D. K. Sleator and Robert E. Tarjan. 1994. Fully persistent lists with catenation. Journal of the ACM 41, 5 (Septempter 1994, 943--959). doi: 10.1145/185675.185791 Fiat, 2003Amos Fiat and Haim Kaplan. 2003. Making data structures confluently persistent. Journal of Algorithms 48, 1 (August 2003, 16--58). doi: 10.1016/s0196-6774(03)00044-0 [?] it performs non-destructive rewrites on immutable graph objects by maintaining an explicit history of all graph rewrites and their dependencies. This allows concurrent application of multiple rewrites and can merge rewritten graphs that were obtained independently. This represents an exciting development in its own right that opens the door to functional programming and massively parallelised approaches to graph rewriting (see section 6.2).

The intuition behind the exponential reduction in search space size is as follows: if rewrites r1,,rnr_1, \dots, r_n apply to disjoint subgraphs of a common graph GG, then D\mathcal{D} will be of size nn, storing the set possible rewrites, rather than the up to 2n2^n distinct graphs in G\mathcal{G} obtained by applying a subset of the rewrites. To generalise to arbitrary rewrites, the data structure D\mathcal{D} must keep track of the dependencies and overlaps between rewrites and update these as more rewrites are added to D\mathcal{D}.

A lot of parallels can be drawn between this approach and equality saturation, a technique for term rewriting with applications in classical compilers. We explore these connections in section 5.2.

Unlike the results of chapter 4, the construction and bounds proven in chapter 5 can be applied to a wide range of graph rewriting domains. It has particularly significant implications for applications of GTSs that are unable to derive rewriting strategies from first principles, and hence have to resort to an exhaustive (or heuristic) exploration of the rewrite space G\mathcal{G}. They can proceed as follows:

  1. Exploration phase. Construct the factorised search space D\mathcal{D} by finding and applying rewrites, in time proportional to D|\mathcal{D}|. With our results, this results in an exponential speedup over the naive exploration of G\mathcal{G} (section 5.3).

  2. Extraction phase. Unlike the case of G\mathcal{G} where the optimal solution is an element GoptGG_{opt} \in \mathcal{G}, constructing the optimal solution DGopt\mathcal{D} \rightarrow G_{opt} in D\mathcal{D} is a non-trivial extraction problem. We show in section 5.4 that the extraction can be expressed as a boolean satisfiability (SAT) problem; depending on the cost function, the optimisation can then be encoded as a side condition on SAT or by a generalisation of the problem to Satisfiability Modulo Theories (SMT).

In the worst case, SAT and SMT problems will require exponential time to solve Cook, 1971Stephen A. Cook. 1971. The complexity of theorem-proving procedures. In Proceedings of the third annual ACM symposium on Theory of computing - STOC ’71. ACM Press, 151--158. doi: 10.1145/800157.805047 Moskew., 2001Matthew W. Moskewicz, Conor F. Madigan, Ying Zhao, Lintao Zhang and Sharad Malik. 2001. Chaff: engineering an efficient SAT solver. In Proceedings of the 38th conference on Design automation - DAC ’01. ACM Press, 530--535. doi: 10.1145/378239.379017 Biere, 2021Armin Biere, Marijn J. H. Heule, Hans Maaren and Toby Walsh. 2021. Handbook of satisfiability (Second edition ed.). IOS Press, Amsterdam, thus cancelling the exponential compression of the search space GD\mathcal{G} \rightarrow \mathcal{D}. However, SAT and SMT are standardised problems for which heavily optimised solvers and optimisers have been developed Moura, 2008Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Tools and Algorithms for the Construction and Analysis of Systems. Springer Berlin Heidelberg, 337--340. doi: 10.1007/978-3-540-78800-3_24 Sebast., 2015Roberto Sebastiani and Patrick Trentin. 2015. OptiMathSAT: A Tool for Optimization Modulo Theories. In Computer Aided Verification. Springer International Publishing, 447--454. doi: 10.1007/978-3-319-21690-4_27. We expect that the instances of SAT and SMT that encode the extraction problem will scale well in practice:

  • Clauses in the problem encode local properties that SAT solvers are well-suited to solve Zulkos., 2018Edward Zulkoski. 2018. Understanding and Enhancing CDCL-based SAT Solvers. PhD Thesis. University of Waterloo​: the boolean variables represent rewrites, which only impose restrictions on other rewrites that apply in the same neighbourhood of the graph.
  • Furthermore, in quantum compilation applications, D\mathcal{D} can be sparsified: most rewrties in D\mathcal{D} do not change the cost function (think of IR transformations that reorder operations but do not reduce the runtime) and thus do not need to be encoded in the SAT problem.

In a first exploratory analysis, we present some empirical results that support our claims: by searching over the factorised search space instead of the naive search space, the optimiser is able to find the global optimum for circuits that are twice as large. Our results also exhibit a linear scaling in the size of the factorised search space, confirming that the approach should scale well to larger problems.

Conclusion #

The thesis concludes in chapter 6 with a discussion on how our contributions serve our overall goal of a scalable and modular quantum compiler platform. We discuss in particular two extensions of our work that we see as particularly promising: the generalisation of fast multi-pattern matching to non-linear values and to the persistent data structure D\mathcal{D} of chapter 5 (section 6.1) and the deployment of confluently persistent graph rewriting to a massively parallel distributed compute architecture (section 6.2).


  1. In the absence of linearity, pattern matching is an instance of the subgraph isomorphism problem, an NP-complete problem. The assumption is therefore necessary and expected. ↩︎


Chapter 2

Quantum Computing: a Computer Scientist's Perspective

Many (too many?) introductions to quantum computing have been written, so we will refrain from adding another entry to the collection. Instead, beyond the absolute basics, our focus is on the expressive power and syntax of quantum programs. This demystifies quantum compilation into program transformation problems, amounting to traditional compiler methods that will be very familiar to computer scientists.

In this chapter, we lay the groundwork for this thesis by introducing what programs meant to run on quantum computers look like today, what we expect they will look like in the (near) future, and how quantum compilers have been built to optimise them. We start in section 2.1 with a review of the basic computation primitives of quantum computers and how they are composed to form quantum circuits, the simplest form of quantum programs. This is followed by a review of the leading quantum circuit optimisation techniques in section 2.2. Finally, sections 2.3 and 2.4 introduce and discuss the impact of hybrid quantum computations, and how they challenge existing quantum compiler designs and optimisations.

2.1. Foundations of quantum computing

The most widespread computational model in quantum computing – and arguably its simplest – is built on the qubit abstraction. As its name suggests, it is the quantum analogue of the classical bit, i.e., a value that can take the values 0 or 1.

We will stick to our promise of not delving into the details of the physical realisations of qubits in real-world architectures. Nonetheless, it is important to note one fundamental difference with classical systems. Classical bit values (the famous 0s and 1s of our computers) are typically encoded using two voltages; another way of saying this is that bit values, and hence data, correspond to electrical currents in the wires1 of a chip. Gates, i.e. the lowest level of operations that can be applied to bits, then correspond to barriers that let the electrical current flow through to outgoing wires, or block it.

We can thus picture a classical gate as a black box with n input wires going into the box and m output wires leaving it. For any combination of on and off voltages on the input wires, the box will turn on some of the output wires. The vital point to take away from this classical state of affairs is that we can think of the carriers of input and output data (i.e. the input and output wires) as physically distinct objects that can exist and can be read simultaneously.

Quantum physics rules out such an implementations of qubits. In the case of matter-based qubits, such as ions in traps or Josephson junctions on superconductors, quantum gates are operations that modify – or “mutate”, to borrow a term from programming languages – the physical qubits themselves. An input qubit to a gate is thus submitted to physical interactions that change its internal state. After the gate execution is completed, the qubits that held the input states now contain the operation’s output.

Similarly, photonic systems encode qubits using modes of the electromagnetic field. A gate in this setting acts by transforming these modes – mixing them, shifting their phases, or entangling them with ancillary modes. It is never possible to modify qubit data coherently whilst keeping access to the original input data.

This has several profound implications for quantum computing. First and foremost, every quantum gate must have the same number of inputs as outputs. Most iconic classical gates (AND, OR, XOR, etc.) are thus impossible to implement on a quantum computer without some adjustments2. This also means that the number of qubits must remain unchanged throughout the computation. A computation that starts with n qubits must also end in n qubits – and have n qubits at every point throughout the computation.

At this point, taking the preservation of qubits just described seriously, we should be asking how a quantum computation can even come to be at all, given that no qubit can be created out of thin air. In our attempt to remain blissfully ignorant of physical realities, we suggest adopting the following abstracted mental model of qubits: qubits can neither be created nor deleted3, they simply i) exist at all times, and ii) can be reset to the 0 state.

For our convenience, we can ignore qubits that are unimportant to us. If all we need are n qubits, then we will limit our considerations to these and pretend none other exists. Pushing further our myopic focus on qubits with a direct utility, we can also adjust the window of qubits of interest as we progress through the computation. If, for instance, a new qubit becomes useful halfway through our program execution, we can enlarge the set of qubits we are keeping track of and refer to this as “creating” a qubit. Conversely, qubits often become irrelevant, in which case we move them outside of our field of consideration and say that the qubits were discarded.

A final consequence of mutating qubits that we will highlight is that once a gate has been applied, the input states to the gate no longer exist! In other words, any state that we reach throughout our execution can only be used at most once. Here, your classical intuition might kick in:

Let us just maintain a copy of the original state before modifying it!

This would allow us to do more than one computation from a temporary value. However, copying is a big NO in quantum computing. It is a profound restriction (or property, depending on your point of view) with deep roots in the physics of quantum mechanics. This principle, the no-cloning theorem, is one of three fundamental properties of quantum physics that quantum computing builds upon.

The physical constraints of quantum computation #

No-cloning theorem #

The no-cloning principle Wootte., 1982W. K. Wootters and W. H. Zurek. 1982. A single quantum cannot be cloned. Nature 299, 5886 (October 1982, 802--803). doi: 10.1038/299802a0 provides a formal foundation for the vague statement “qubits live forever” we made earlier. It is a fundamental tenet of quantum information, deserving a more rigorous treatment than we are giving it here. We recommend that the curious reader refers themselves to more respectful references such as Nielsen, 2016Michael A. Nielsen and Isaac L. Chuang. 2016. Quantum Computation and Quantum Information (10th Anniversary edition). Cambridge University Press.

No-cloning theorem: it is impossible to copy an arbitrary unknown state onto another (possibly known) qubit, or to copy a (possibly known) qubit to a qubit with unknown arbitrary state.

If we use ψ\ket{\psi} to denote an arbitrary state and 0\ket{0} to denote a known state, the principle can be restated as: there are no quantum computations mapping ψ0ψψ\ket{\psi}\ket{0} \mapsto \ket{\psi}\ket{\psi}, nor ψ000\ket{\psi}\ket{0} \mapsto \ket{0}\ket{0}. The consequences of this are profound.

A consequence of the first half is what we alluded to in the previous section: any qubit states can only be used once in a computation. This statement also justifies why every quantum gate implementation, no matter the hardware specifics, will mutate its input qubits to produce the output states.

The second half of the statement is often referred to as the “no delete” theorem. Indeed, if we view ψ\ket{\psi} as a state encoding some data, we can interpret it as some amount of information. The state 0\ket{0}, on the other hand, is a fixed state and thus cannot store any information. From the perspective of information theory, the map ψ000\ket{\psi}\ket{0} \mapsto \ket{0}\ket{0} thus destroys information: it turns an information storing left-hand side into a product of 0\ket{0} states, devoid of any information.

We can also revisit the first map ψ0ψψ\ket{\psi}\ket{0} \mapsto \ket{\psi}\ket{\psi} and understand it from an information theoretic perspective as an attempt to create information out of thin air! Using this interpretation, the no-cloning theorem is thus the statement that quantum information is a preserved quantity in quantum computations: its amount will never increase or decrease.

Reversibility #

The fact that the amount of quantum information can never increase by transforming quantum states matches our intuition: if no new information is added from outside the system, then the total information encoded should not be increasing. Why, however, is it impossible to erase some information and thus reduce its total? The answer is reversibility of closed quantum systems: if we exclude the option of discarding parts of the physical system, every quantum of operation is undoable. In other words, a computation must have an inverse operation that recovers the input when applied to the output.

If a quantum operation were thus to erase any information, then an inverse operation would exist that creates information from nothing! The two halves of the no-cloning theorem, as we presented it, thus state the same principle once we consider that every operation must be reversible.

Universality #

Finally, a third distinguishing property of quantum computation is how arbitrarily large computations can be generated from single-qubit gates and pairwise entangling interactions between qubits (two-qubit gates) Barenco, 1995Adriano Barenco, Charles H. Bennett, Richard Cleve, David P. DiVincenzo, Norman Margolus, Peter Shor, Tycho Sleator, John A. Smolin and Harald Weinfurter. 1995. Elementary gates for quantum computation. Physical Review A 52, 5 (November 1995, 3457--3467). doi: 10.1103/PhysRevA.52.3457. It is furthermore the case that the choice of a fixed two-qubit gate, along with single-qubit gates, is sufficient to generate any arbitrary quantum computation. We call a set of gates that can be used to construct any arbitrary quantum computation a universal gate set.

This is a boon for hardware design, as manipulating single-qubit systems is often much more manageable than controlling physical interactions between multiple entities. This decomposition into single-qubit and (a fixed) two-qubit gates means that the architecture i) does not need to support interactions between n>2n > 2 qubits, and ii) can be specialised and hand-tuned to execute the two-qubit interaction of choice as faithfully as possible. Having a two-qubit gate as the entangling operation is not the only choice. Some architectures, such as neutral atoms, choose instead to replace it with a global entangling operation that applies to many qubits simultaneously Evered, 2023Simon J. Evered, Dolev Bluvstein, Marcin Kalinowski, Sepehr Ebadi, Tom Manovitz, Hengyun Zhou, Sophie H. Li, Alexandra A. Geim, Tout T. Wang, Nishad Maskara, Harry Levine, Giulia Semeghini, Markus Greiner, Vladan Vuletić and Mikhail D. Lukin. 2023. High-fidelity parallel entangling gates on a neutral-atom quantum computer. Nature 622, 7982 (October 2023, 268--272). doi: 10.1038/s41586-023-06481-y, resulting in a universal gate set that is more convenient to implement experimentally in their system.

Gate set universality can be generalised further to approximate universality, which is at the centre of the development of error-correcting codes. Indeed, any quantum computations can be approximated to arbitrary precision using only discrete finite sets of one and two-qubit gates Kitaev, 2002Alexei Y. Kitaev, A. H. Shen and Mikhail N. Vyalyi. 2002. Classical and Quantum Computation. American Mathematical Society Dawson, 2006Christopher M. Dawson and Michael A. Nielsen. 2006. The Solovay-Kitaev algorithm. Quantum Information and Computation 6, 1 (January 2006, 81--95). doi: 10.26421/QIC6.1-6. This represents a significant simplification for error correction, as it removes the need for continuously parametrised gates and discretises the problem space.

Leveraging quantum properties for compilation #

We have introduced the universality, reversibility and no-cloning properties of quantum computations for a reason: these laws of physics that govern quantum computations and are absent from classical computer science are an excellent foundation for developing quantum-specific computation optimisations and compilation techniques in general.

As we have just discussed, the wide variety of universal gate sets are degrees of freedom that the compiler can use. Using universality to translate computations between universal gate sets, enabling programmers to seamlessly target different hardware, is one of quantum compiler’s first and most fundamental functions Sivara., 2020Seyon Sivarajah, Silas Dilkes, Alexander Cowtan, Will Simmons, Alec Edgington and Ross Duncan. 2020. t|ket⟩: a retargetable compiler for NISQ devices. Quantum Science and Technology 6, 1 (November 2020, 014003). doi: 10.1088/2058-9565/ab8e92.

Reversibility is also a source of flexibility when expressing quantum programs. Suppose the user wants to execute an operation AA but it is more convenient, or the hardware is only capable of executing a different gate BB. Then, using the inverse B1B^{-1} of BB, it is always possible to rewrite the program as

where these diagrams should be read as operations to be executed from left to right. This is nothing but the mathematical trick of multiplying the left-hand side with the identity operation expressed as BB1B \circ B^{-1}4.

Now, of course, this rewrite is only sensible if the operation B1AB^{-1} \circ A is reasonably cheap to perform. There are plenty of instances where this is indeed the case. Morally, the quantum compiler always has the freedom to execute any quantum operation – at the risk of producing very inefficient code – given that reversibility always guarantees that the operation can be reversed and the competition undone whenever necessary.

Finally, no-cloning is a very useful guarantee that the compiler can use to simplify reasoning about computations5. In chapter 4 we will see that it dramatically simplifies pattern matching, which helps identify all possible optimisations quickly. More generally, no-cloning restricts the set of programs that the compiler must consider, resulting in elegant graph transformation semantics – a topic we explore in chapter 3.

The quantum circuit representation #

We could not conclude our overview of the basics of quantum computing without mentioning the quantum circuit, a representation of quantum computation ubiquitous in the field. With the understanding that we have gained in this section, the two building blocks of the circuit model and the conventions around their graphical representation should be of no surprise to the reader:

  1. Qubits are represented by straight, horizontal lines. Their evolution through time can be followed along the line from left to right: At the leftmost point on the line, the qubit is in its input state; when the qubit reaches its rightmost point, operations have mutated it into the output state of the circuit.
  2. Gates on qubits are boxes placed vertically across one or multiple qubit lines. The qubits it is on represents the set that the gate may act on (and mutate), whereas the left-to-right ordering of the gates reflects their ordering in time.

A simple circuit composed of two qubits and three gates AA, BB and CC could for instance look like this

The previous diagram was in fact also a circuit, in which each arrow pointing to the right was a segment of a qubit line. In this case, AA would be executed before BB and CC; AA would act on both qubits, whereas BB and CC would only modify the first and second qubits, respectively. Note that there is no ordering specified between BB and CC: because they act on disjoint sets of qubits, their relative ordering makes no difference. It is thus common to display them as acting at the same time. We could have equivalently chosen to draw them as:

All these circuits represent the same computation.

Certain quantum gates are particularly useful and appear very regularly in practice. These have standard names that are widely used in the field. The most common single qubit gates are arguably the Hadamard, represented in circuits by a HH box, and the XX, YY and ZZ-axis rotations, drawn as Rx(θ)R_x(\theta), Ry(θ)R_y(\theta) and Rz(θ)R_z(\theta) boxes respectively. Note that rotation gates are parametrised by an angle 0θ<4π0 \leqslant \theta < 4\pi that must be specified to execute the rotation.

There are also commonly used multi-qubit gates. For these, it becomes slightly awkward to draw them as boxes, as they may act on qubits that are not drawn next to each other in the circuit6 or might be applied to qubits in a specified order. As a solution, common gates were given representations that do not spell out their name but mark which qubit they are acting on and in what order. Here are the representations of three of the most famous ones, in order: the CX\mathit{CX} (also known as CNOT) gate, the CZ\mathit{CZ} and the CCX\mathit{CCX} (also known as the three-qubit Toffoli):

You will probably notice that there seems to be a system to this graphical notation. There is, but unfortunately, explaining it would require us to discuss Pauli matrices and commutation relations and quickly lead us astray. The references in section 2.5 are a good starting point for further reading.


  1. In the case of integrated circuits and printed circuits boards, the wires we refer to here would be called “interconnects” or “traces”. ↩︎

  2. The NOT gate is the notable exception to this. It is often found in quantum programs and called X. ↩︎

  3. This is true physically: the carriers of quantum information, typically atoms or photons, live forever in the absence of interactions with their environment. However, we would be seriously deluding ourselves if we believed that the control systems we use to manipulate and keep these particles trapped could do so for any significant amount of time. Instead, experimentalists must constantly devise creative ways to stop the qubits from escaping or interacting with their surroundings (and destroying themselves in the process). ↩︎

  4. The \circ denotes the composition of functions, so unlike the left-to-right diagram, it must be read from right to left. ↩︎

  5. In particular, no-cloning resolves the problem of aliasing once and for all! ↩︎

  6. This becomes immediately apparent if you attempt to draw a gate that should act on the first and third qubit line of a circuit, but leave the second one untouched. ↩︎

2.2. Quantum circuit optimisation: a review

Much of the foundations of classical computer science rely on boolean logic and discrete mathematics Lehman, 2017Eric Lehman, F. Thomson Leighton and Albert R. Meyer. 2017. Mathematics for Computer Science. Samurai Media Limited. In some regards, this is a poor man’s maths, as much of the structure that comes with continuous infinite mathematical objects is lost along the way when discretised.

In contrast, quantum computation, on the other hand, encompasses the whole breadth of (finite dimensional) quantum physical system evolution. Underlying this is a rich mathematical theory steeped in the theory of Hilbert spaces and Lie groups1. A direct consequence of the mathematics of quantum computations is the flourishing of an entire field of research dedicated to quantum circuit optimisations Karupp., 2025Krishnageetha Karuppasamy, Varun Puram, Stevens Johnson and Johnson P. Thomas. 2025. A Comprehensive Review of Quantum Circuit Optimization: Current Trends and Future Directions. Quantum Reports 7, 1 (January 2025, 2). doi: 10.3390/quantum7010002. They leverage the unique structure and symmetries of quantum physics to reduce the noise and resource requirements of quantum computations significantly.

In this section, we will review the main optimisation techniques that established themselves within quantum compilers, focusing on the representation of quantum computations they use and their assumptions about the computations they are optimising.

Cost function #

A key point to settle first when discussing circuit optimisations is the objective of the optimisation – the cost function to be minimised. Unlike much of classical compiler research, which can rely on an established set of hardware targets and benchmarking datasets to profile the empirical, “real world” performance of compiled programs, the quantum world must often contend with simplified noise and architecture models to design proxy metrics, given the limited scale and availability of current quantum devices.

The quantum compilers research community has mostly coalesced around cost functions based on gate count statistics Karupp., 2025Krishnageetha Karuppasamy, Varun Puram, Stevens Johnson and Johnson P. Thomas. 2025. A Comprehensive Review of Quantum Circuit Optimization: Current Trends and Future Directions. Quantum Reports 7, 1 (January 2025, 2). doi: 10.3390/quantum7010002. Counting a type of gate is a simple and popular choice. Making some additional assumptions on the gate parallelism of future hardware, one may also consider cost functions based on gate depth, i.e. the length of the longest chain of gates that cannot be run simultaneously Seling., 2013Peter Selinger. 2013. Quantum circuits of T-depth one. Physical Review A 87, 4 (April 2013, 042302). doi: 10.1103/physreva.87.042302 Basile., 2024Daniel Basilewitsch, Clemens Dlaska and Wolfgang Lechner. 2024. Comparing planar quantum computing platforms at the quantum speed limit. Physical Review Research 6, 2 (April 2024, 023026). doi: 10.1103/physrevresearch.6.023026. In spite (or precisely because) of their simplicity, gate counts serve well as cost functions in many quantum compilation use cases. Most circuit optimisations target one of two hardware regimes.

On most current hardware architectures, the major challenge is achieving high accuracy on entangling operations, i.e. quantum gates that make two or more qubits interact Acharya, 2024Rajeev Acharya, Dmitry A. Abanin, Laleh Aghababaie-Beni, Igor Aleiner, Trond I. Andersen, Markus Ansmann, Frank Arute, Kunal Arya, Abraham Asfaw, Nikita Astrakhantsev, Juan Atalaya, Ryan Babbush, Dave Bacon, Brian Ballard, Joseph C. Bardin, Johannes Bausch, Andreas Bengtsson, Alexander Bilmes, Sam Blackwell, Sergio Boixo, Gina Bortoli, Alexandre Bourassa, Jenna Bovaird, Leon Brill and et al. 2024. Quantum error correction below the surface code threshold. Nature (December 2024). doi: 10.1038/s41586-024-08449-y Pino, 2021J. M. Pino, J. M. Dreiling, C. Figgatt, J. P. Gaebler, S. A. Moses, M. S. Allman, C. H. Baldwin, M. Foss-Feig, D. Hayes, K. Mayer, C. Ryan-Anderson and B. Neyenhuis. 2021. Demonstration of the trapped-ion quantum CCD computer architecture. Nature 592, 7853 (April 2021, 209--213). doi: 10.1038/s41586-021-03318-4 Koch, 2007Jens Koch, Terri M. Yu, Jay Gambetta, A. A. Houck, D. I. Schuster, J. Majer, Alexandre Blais, M. H. Devoret, S. M. Girvin and R. J. Schoelkopf. 2007. Charge-insensitive qubit design derived from the Cooper pair box. Physical Review A 76, 4 (October 2007, 042319). doi: 10.1103/PhysRevA.76.042319 Blais, 2007Alexandre Blais, Jay Gambetta, A. Wallraff, D. I. Schuster, S. M. Girvin, M. H. Devoret and R. J. Schoelkopf. 2007. Quantum-information processing with circuit quantum electrodynamics. Physical Review A 75, 3 (March 2007, 032329). doi: 10.1103/physreva.75.032329. In superconducting qubit and ion trap architectures2, for example, the gate set is typically composed of one and two-qubit gate types, with error rates dominated by an order of magnitude by the latter Steiger, 2018Damian S. Steiger, Thomas Häner and Matthias Troyer. 2018. ProjectQ: an open source software framework for quantum computing. Quantum 2 (January 2018, 49). doi: 10.22331/q-2018-01-31-49 Sivara., 2020Seyon Sivarajah, Silas Dilkes, Alexander Cowtan, Will Simmons, Alec Edgington and Ross Duncan. 2020. t|ket⟩: a retargetable compiler for NISQ devices. Quantum Science and Technology 6, 1 (November 2020, 014003). doi: 10.1088/2058-9565/ab8e92. Circuit optimisations for computations on such noisy hardware thus often define cost functions based on the number of two-qubit gates – typically the CX\mathit{CX} gate, though many other two-qubit gates could be used equivalently.

On the other hand, future generations of hardware for larger scale computations are expected to be more resilient to noise, with the help of error detection and correction techniques. In this regime, the computational power of the hardware is no longer limited by hardware noise but rather by the affordances of the error-correcting code. Depending on how the quantum data is redundantly encoded in the code space, the fault-tolerant execution of specific operations may be anywhere between very straightforward and nigh-impossible. The bottleneck is widely expected to be the execution of single-qubit (non-Clifford) gates, such as the T\mathit{T} gate3. These cases can thus just as well be modelled by cost functions based on gate counts.

Unitary synthesis: the perfect optimisation #

The ne plus ultra of quantum circuit optimisation is unitary synthesis. It leverages the representation of a quantum computation as a square, complex-valued, unitary matrix, which is then re-synthesised as a new, equivalent (and ideally optimised!) quantum circuit. This approach thus breaks down quantum optimisation into two separate sub-problems:

  1. Reduce a nn-qubit quantum circuit into a 2n×2n2^n \times 2^n matrix. This matrix is a unique representation of the computation, meaning that any two equivalent computations will be mapped to the same matrix.
  2. Find the optimal matrix decomposition into primitive quantum gates, thus obtaining a new quantum circuit, equivalent to the original.

The uniqueness of the unitary matrix representation makes it invaluable as a resource for computation optimisation. Not only does it reduce any potentially large collection of equivalent inputs to a single form; it also – crucially – provides a sound distance metric on the space of all circuits, in the form of the Haar measure. This can be used in search-based approaches to measure the distance between synthesised circuits and thus direct a search heuristic towards the optimal solution.

Early work explored general unitary decomposition schemes obtained analytically from linear algebra. These express arbitrary unitaries as a product of unitaries that typically correspond to one and two-qubit gates in the quantum circuit model Iten, 2016Raban Iten, Roger Colbeck, Ivan Kukuljan, Jonathan Home and Matthias Christandl. 2016. Quantum circuits for isometries. Physical Review A 93, 3 (March 2016, 032318). doi: 10.1103/PhysRevA.93.032318 Iten, 2019Raban Iten, Oliver Reardon-Smith, Emanuel Malvetti, Luca Mondada, Gabrielle Pauvert, Ethan Redmond, Ravjot Singh Kohli and Roger Colbeck. 2019. Introduction to UniversalQCompiler. arXiv: 1904.01072 [quant-ph]. Approaches have been proposed using the Cosine-Sine decomposition Mött., 2004Mikko Möttönen, Juha J. Vartiainen, Ville Bergholm and Martti M. Salomaa. 2004. Quantum Circuits for General Multiqubit Gates. Physical Review Letters 93, 13 (Septempter 2004, 130502). doi: 10.1103/PhysRevLett.93.130502, the Quantum Shanon decomposition Krol, 2022Anna M. Krol, Aritra Sarkar, Imran Ashraf, Zaid Al-Ars and Koen Bertels. 2022. Efficient Decomposition of Unitary Matrices in Quantum Circuit Compilers. Applied Sciences 12, 2 (January 2022, 759). doi: 10.3390/app12020759, and the QR decomposition Sedlák, 2008Michal Sedlák and Martin Plesch. 2008. Towards optimization of quantum circuits. Open Physics 6, 1 (March 2008, 128--134). doi: 10.2478/s11534-008-0039-8. While some schemes have been shown to be asymptotically efficient for almost all unitaries Iten, 2016Raban Iten, Roger Colbeck, Ivan Kukuljan, Jonathan Home and Matthias Christandl. 2016. Quantum circuits for isometries. Physical Review A 93, 3 (March 2016, 032318). doi: 10.1103/PhysRevA.93.032318, such strategies typically generate fixed-sized circuits and fail to synthesise short circuits when such circuits exist. The size of synthesised circuits grows exponentially with the number of qubits, making most such schemes impractical beyond three qubits.

Unitary matrix decomposition can also be combined with tools from classical circuit design: in Loke, 2014T. Loke, J. B. Wang and Y. H. Chen. 2014. OptQC : An optimized parallel quantum compiler. Computer Physics Communications 185, 12 (December 2014, 3307--3316). doi: 10.1016/j.cpc.2014.07.022, Loke et al. proposed an approach merging reversible circuit synthesis (see below), a classical compilation problem, with unitary matrix synthesis. They show that searching for decompositions U=PUQU = PU'Q, where PP and QQ are classical reversible circuits can yield shorter circuits when using the Cosine-Sine decomposition for the unitaries UU and UU'.

Search-based approaches have been developed to address the shortcomings of analytical decompositions. Unlike the algebraic approaches, the circuit decomposition problem is viewed as an optimisation problem in search-based circuit synthesis. The space of all possible quantum circuits is explored to find the one that implements the desired unitary whilst minimising the cost function. The major challenge of such methods is the gigantic (typically super-exponential) size of the search space of all possible programs. Without mitigation, most work in this space struggles to scale beyond a handful of qubits.

Up to 3 qubits, T-depth optimal circuits can be found using exhaustive brute force search first proposed in Amy, 2013M. Amy, D. Maslov, M. Mosca and M. Roetteler. 2013. A Meet-in-the-Middle Algorithm for Fast Synthesis of Depth-Optimal Quantum Circuits. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32, 6 (June 2013, 818--830). doi: 10.1109/TCAD.2013.2244643 and improved in Gheorg., 2022Vlad Gheorghiu, Michele Mosca and Priyanka Mukhopadhyay. 2022. A (quasi-)polynomial time heuristic algorithm for synthesizing T-depth optimal circuits. npj Quantum Information 8, 1 (Septempter 2022). doi: 10.1038/s41534-022-00624-1. Asymptotic bounds on the number of T gates required for general unitary synthesis were recently given in Gosset, 2024David Gosset, Robin Kothari and Kewen Wu. 2024. Quantum state preparation with optimal T-count. arXiv: 2411.04790 [quant-ph].

Scaling to 4 qubits and handling gate sets with continuous parameters, required for non-fault tolerant circuits, an A* search with smart pruning heuristics was proposed in Davis, 2020Marc G. Davis, Ethan Smith, Ana Tudor, Koushik Sen, Irfan Siddiqi and Costin Iancu. 2020. Towards Optimal Topology Aware Quantum Circuit Synthesis. In 2020 IEEE International Conference on Quantum Computing and Engineering (QCE), October 2020. IEEE, 223--234. doi: 10.1109/QCE49297.2020.00036. This approach’s outputs are no longer provably optimal, but the results match optimal decompositions in all known instances. This line of work has subsequently been further refined with heuristics based on pre-defined circuit templates Smith, 2023Ethan Smith, Marc Grau Davis, Jeffrey Larson, Ed Younis, Lindsay Bassman Oftelie, Wim Lavrijsen and Costin Iancu. 2023. LEAP: Scaling Numerical Optimization Based Synthesis Using an Incremental Approach. ACM Transactions on Quantum Computing 4, 1 (February 2023, 1--23). doi: 10.1145/3548693 Madden, 2022Liam Madden and Andrea Simonetto. 2022. Best Approximate Quantum Compiling Problems. ACM Transactions on Quantum Computing 3, 2 (March 2022, 1--29). doi: 10.1145/3505181, parameter instantiation Younis, 2022Ed Younis and Costin Iancu. 2022. Quantum Circuit Optimization and Transpilation via Parameterized Circuit Instantiation. arXiv: 2206.07885 [quant-ph] Younis, 2021Ed Younis, Koushik Sen, Katherine Yelick and Costin Iancu. 2021. QFAST: Conflating Search and Numerical Optimization for Scalable Quantum Circuit Synthesis. In 2021 IEEE International Conference on Quantum Computing and Engineering (QCE), October 2021. IEEE, 232--243. doi: 10.1109/QCE52317.2021.00041 Rakyta, 2022Péter Rakyta and Zoltán Zimborás. 2022. Approaching the theoretical limit in quantum gate decomposition. Quantum 6 (May 2022, 710). doi: 10.22331/q-2022-05-11-710, machine learning Weiden, 2023Mathias Weiden, Ed Younis, Justin Kalloor, John Kubiatowicz and Costin Iancu. 2023. Improving Quantum Circuit Synthesis with Machine Learning. In 2023 IEEE International Conference on Quantum Computing and Engineering (QCE). IEEE. doi: 10.1109/QCE57702.2023.00093 and tensor networks Kuklia., 2023Alon Kukliansky, Ed Younis, Lukasz Cincio and Costin Iancu. 2023. QFactor: A Domain-Specific Optimizer for Quantum Circuit Instantiation. In 2023 IEEE International Conference on Quantum Computing and Engineering (QCE), Septempter 2023. IEEE, 814--824. doi: 10.1109/QCE57702.2023.00096.

Some of these heuristics also make it possible to synthesise circuits with device constraints in mind and can trade off decomposition accuracy for shallower circuit depth and lower noise. In Wu, 2020Xin-Chuan Wu, Marc Grau Davis, Frederic T. Chong and Costin Iancu. 2020. QGo: Scalable Quantum Circuit Optimization Using Automated Synthesis. arXiv: 2012.09835 [quant-ph] Kuklia., 2023Alon Kukliansky, Ed Younis, Lukasz Cincio and Costin Iancu. 2023. QFactor: A Domain-Specific Optimizer for Quantum Circuit Instantiation. In 2023 IEEE International Conference on Quantum Computing and Engineering (QCE), Septempter 2023. IEEE, 814--824. doi: 10.1109/QCE57702.2023.00096, the authors have also explored partitioning the circuit into smaller parts optimised independently to scale these techniques to large circuit sizes. Despite the reduced optimisation performance that the boundaries of the partitioned circuits introduce, the combination of circuit partitioning with the techniques listed above yields some of the best-performing circuit optimisation techniques developed to date Costin., 2025Bert Costin Iancu. 2025. Berkeley Quantum Synthesis Toolkit. Retrieved on 09/01/2025 from https://bqskit.lbl.gov/. Circuit synthesis schemes have also been extended to generate circuits on a more expressive gate set, including elementary classical operations Alam, 2024Faisal Alam and Bryan K. Clark. 2024. Learning dynamic quantum circuits for efficient state preparation. arXiv: 2410.09030 [quant-ph] Niu, 2024Siyuan Niu, Efekan Kokcu, Anupam Mitra, Aaron Szasz, Akel Hashim, Justin Kalloor, Wibe Albert de Jong, Costin Iancu and Ed Younis. 2024. AC/DC: Automated Compilation for Dynamic Circuits. arXiv: 2412.07969 [quant-ph].

However, a fundamental flaw of all unitary synthesis schemes is the 4n4^n-scaling in the number of qubits nn of the unitary representation itself. This means that no matrix-based synthesis method, however efficient, will ever be able to handle computations with much more than a dozen qubits. Circuit partitioning schemes such as Wu, 2020Xin-Chuan Wu, Marc Grau Davis, Frederic T. Chong and Costin Iancu. 2020. QGo: Scalable Quantum Circuit Optimization Using Automated Synthesis. arXiv: 2012.09835 [quant-ph] Kuklia., 2023Alon Kukliansky, Ed Younis, Lukasz Cincio and Costin Iancu. 2023. QFactor: A Domain-Specific Optimizer for Quantum Circuit Instantiation. In 2023 IEEE International Conference on Quantum Computing and Engineering (QCE), Septempter 2023. IEEE, 814--824. doi: 10.1109/QCE57702.2023.00096 effectively circumvent the problem, but they are heavily dependent on the partitioning quality.

The search for scalable representations #

Our study of unitary synthesis introduced us to a convenient two-step approach to quantum computation optimisation. First, the input circuit is transformed into a “global” representation that captures the computation as a whole, abstracting away the precise sequences of gates that compose the original circuit. This representation is then the input for the second half of the problem, which produces a circuit of the desired shape, equivalent to the original input but with reduced cost.

In addition to simplifying the original problem, such global intermediate representations are well-positioned to leverage the quantum-specific structure and symmetries in the computation. They can thus enable more advanced optimisations and are robust to varying circuit representation and local optimisation landscape.

The unitary matrix is the most common representation of quantum computations, but as we have seen, it suffers from severe scaling problems in the number of qubits. The problem is not so much that quantum computations require exponential space to be described in the worst case – after all, the space of all nn-qubit unitaries SU(2n)SU(2^n) is exponentially large. However, the set of unitaries implementable in practice can only be a tiny subset of SU(2n)SU(2^n)4 – the set of unitaries that admit a polynomial-sized circuit representation.

Another fruitful avenue of work for quantum optimisation has thus been the development of alternative representations for quantum computations that can encode polynomially sized quantum programs efficiently whilst enabling novel optimisations.

Phase Polynomials and Pauli Gadgets #

A particularly convenient global representation of many quantum circuits is as products of Pauli exponentials, also known as Pauli gadgets Cowtan, 2019Alexander Cowtan, Silas Dilkes, Ross Duncan, Will Simmons and Seyon Sivarajah. 2019. Phase Gadget Synthesis for Shallow Circuits. In Proceedings 16th International Conference on Quantum Physics and Logic, QPL 2019, Chapman University, Orange, CA, USA, June 10-14, 2019, 213--228. doi: 10.4204/EPTCS.318.13. These unitaries are of the form U=sPexp(iαss)U = \prod_{s \in P} exp(i \alpha_s \cdot s)

where αs[0,2π)\alpha_s \in \mathbb [0, 2\pi) are real coefficients and sP={I,X,Y,Z}ns \in P = \{I, X, Y, Z\}^n are strings of length nn of the four Pauli matrices – so-called Pauli strings. In this formulation, nn fixes the number of qubits of the computation.

These exponentials are always valid nn-qubit unitaries and can express entangling operations across any number of qubits: the qubits on which an operation exp(iαs)exp(i \alpha \cdot s) acts non-trivially are given by the indices of the characters in ss that are not the identity II. For instance, the exponential

exp(iπ2XIZ)exp(i \frac\pi2 XIZ)

is a valid quantum computation on 3 qubits, entangling the first and third qubits. Beyond useful abstractions for optimisation, such entangling operations appear naturally when simulating quantum systems, for example in quantum chemistry McClean, 2016Jarrod R McClean, Jonathan Romero, Ryan Babbush and Alán Aspuru-Guzik. 2016. The theory of variational hybrid quantum-classical algorithms. New Journal of Physics 18, 2 (February 2016, 023023). doi: 10.1088/1367-2630/18/2/023023.

The use of these primitives for quantum compilation was first explored in Cowtan, 2019Alexander Cowtan, Silas Dilkes, Ross Duncan, Will Simmons and Seyon Sivarajah. 2019. Phase Gadget Synthesis for Shallow Circuits. In Proceedings 16th International Conference on Quantum Physics and Logic, QPL 2019, Chapman University, Orange, CA, USA, June 10-14, 2019, 213--228. doi: 10.4204/EPTCS.318.13, and further generalised in Cowtan, 2020Alexander Cowtan, Will Simmons and Ross Duncan. 2020. A Generic Compilation Strategy for the Unitary Coupled Cluster Ansatz. arXiv: 2007.10515 [quant-ph]. Starting from an (unordered) sequence of Pauli gadgets, the gadgets are clustered into sets of mutually commuting gadgets. These can then be jointly synthesised into a circuit, markedly reducing the number of entangling operations as compared to naively implementing one exponential at a time.

Further improvements to this work have since been presented in Huang, 2024Qunsheng Huang, David Winderl, Arianne Meijer-van de Griend and Richie Yeung. 2024. Redefining Lexicographical Ordering: Optimizing Pauli String Decompositions for Quantum Compiling. CoRR abs/2408.00354. doi: 10.48550/ARXIV.2408.00354 and Schmitz, 2024Albert T. Schmitz, Nicolas P. D. Sawaya, Sonika Johri and A. Y. Matsuura. 2024. Graph Optimization Perspective for Low-Depth Trotter-Suzuki Decomposition. Physical Review A 109, 4 (April 2024, 042418). doi: 10.1103/PhysRevA.109.042418, where new heuristics are introduced to choose the Pauli gadget ordering. In Huang, 2024Qunsheng Huang, David Winderl, Arianne Meijer-van de Griend and Richie Yeung. 2024. Redefining Lexicographical Ordering: Optimizing Pauli String Decompositions for Quantum Compiling. CoRR abs/2408.00354. doi: 10.48550/ARXIV.2408.00354, the hardware-specific connectivity constraints between qubits are also taken into account to produce programs that can be executed on the targeted architecture without overhead.

A close relative of Pauli gadgets – a strictly smaller subset of it, to be precise – are the so-called phase polynomials Amy, 2018Matthew Amy, Parsiad Azimzadeh and Michele Mosca. 2018. On the controlled-NOT complexity of controlled-NOT–phase circuits. Quantum Science and Technology 4, 1 (Septempter 2018, 015002). doi: 10.1088/2058-9565/aad8ca, obtained when restricting the Pauli strings to combinations of Z Pauli matrices and identities: s{I,Z}ns \in \{I, Z\}^n . These are particularly amenable to optimisation as in this case, the ordering of the gadgets becomes irrelevant – all exponential terms commute. This gives the compiler a lot of freedom during circuit synthesis.

The action of phase polynomials on quantum states is quite easy to understand. Instead of the exponentials of II and ZZ-based Pauli string, the computation can equivalently be given by its action on the basis states. A quantum basis state – just like a classical state – is given by a bistring b1bnb_1 \cdots b_n of bits bi{0,1}b_i \in \{0, 1\}. Writing eb1bne_{b_1 \cdots b_n} for the basis state corresponding to the bitstring b1bnb_1 \cdots b_n, the action of a phase polynomial UU on eb1bne_{b_1 \cdots b_n} is given by

eb1bnexp(is{I,Z}nas(s~1b1s~nbn))Reb1bne_{b_1 \cdots b_n} \mapsto \underbrace{\exp(i \cdot \sum_{s \in \{I, Z\}^n} a_s \cdot (\tilde{s}_1 b_1 \oplus \cdots \oplus \tilde{s}_n b_n))}_{\in\,\mathbb{R}} e_{b_1 \cdots b_n}

where s~1s~n\tilde{s}_1 \cdots \tilde{s}_n is now also a bitstring of booleans s~i{0,1}\tilde{s}_i \in \{0, 1\}, and \oplus denotes the boolean XOR operation. The boolean s~i\tilde{s}_i has value 11 if and only if the ii-th character in the Pauli string ss is ZZ,

The exponential expression in (2) is just a real number – indeed each term in the sum simply evaluates to either asa_s or 00. A phase polynomial is thus a diagonal unitary matrix: it maps every basis state b1bn\ket {b_1 \cdots b_n} to itself, multiplied by some phase eiθe^{i \theta}.

Polynomially-sized circuits that implement diagonal matrices correspond to phase polynomials with k=O(np)2nk = \mathcal{O}(n^p) \ll 2^n non-zero terms as0a_s \neq 0, i.e. they can represent quantum computations efficiently and scale well with the number of qubits – thus allowing efficient algorithms that scale polynomially in the number of qubits nn.

The Graysynth algorithm, as presented in Amy, 2018Matthew Amy, Parsiad Azimzadeh and Michele Mosca. 2018. On the controlled-NOT complexity of controlled-NOT–phase circuits. Quantum Science and Technology 4, 1 (Septempter 2018, 015002). doi: 10.1088/2058-9565/aad8ca, has become the reference synthesis method for phase polynomials. The key observation made by its authors is that all terms of the sum within the exponential can be cycled through and obtained following the binary Gray codes Gray, 1953F. Gray. 1953. Pulse code communication. Retrieved from http://www.google.com/patents/US2632058. The Hamming distance of one that separates successive bitstrings in the code translates into a single two-qubit CX\mathit{CX} gate when synthesised to a quantum circuit by Graysynth.

This approach was adapted to work with hardware connectivity constraints in Griend, 2022Arianne Meijer-van de Griend and Ross Duncan. 2022. Architecture-Aware Synthesis of Phase Polynomials for NISQ Devices. In Proceedings 19th International Conference on Quantum Physics and Logic, QPL 2022, Wolfson College, Oxford, UK, 27 June - 1 July 2022, 116--140. doi: 10.4204/EPTCS.394.8, Gogioso, 2022Stefano Gogioso and Richie Yeung. 2022. Annealing Optimisation of Mixed ZX Phase Circuits. In Proceedings 19th International Conference on Quantum Physics and Logic, QPL 2022, Wolfson College, Oxford, UK, 27 June - 1 July 2022, 415--431. doi: 10.4204/EPTCS.394.20 and Vandae., 2022Vivien Vandaele, Simon Martiel and Timothée Goubault de Brugière. 2022. Phase polynomials synthesis algorithms for NISQ architectures and beyond. Quantum Science and Technology 7, 4 (Septempter 2022, 045027). doi: 10.1088/2058-9565/ac5a0e. An up-to-date study of the performance of phase polynomial-based compiler optimisations and comparisons with standard approaches is performed in Meijer., 2025Arianne Meijer - van de Griend. 2025. A comparison of quantum compilers using a DAG-based or phase polynomial-based intermediate representation. Journal of Systems and Software 221 (March 2025, 112224). doi: 10.1016/j.jss.2024.112224.

The study of phase polynomials can also be generalised to arbitrary diagonal operators. Tight asymptotic bounds on the resource requirements for arbitrary diagonal operator synthesis and their implementation were recently given in Sun, 2023Xiaoming Sun, Guojing Tian, Shuai Yang, Pei Yuan and Shengyu Zhang. 2023. Asymptotically Optimal Circuit Depth for Quantum State Preparation and General Unitary Synthesis. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 42, 10 (October 2023, 3301--3314). doi: 10.1109/TCAD.2023.3244885. The authors propose using a smart meshing of different Gray codes in parallel and, where available, additional qubits as ancilla registers to parallelise computations further and minimise circuit depth. The resulting general-purpose decomposition of arbitrary diagonal operators yields circuits of depth O(2nn)\mathcal O(\frac{2^n}{n}) and size O(n2logn)+2n+3\mathcal O(\frac{n^2}{\log n}) + 2^{n+3}, as well as improved bounds in the presence of m2nm \geq 2n ancilla qubits.

Clifford synthesis #

The group of all nn-qubit unitaries SU(2n)SU(2^n) contains a subgroup that has become an object of study across many domains of quantum computing science: the Clifford group. We have already mentioned that it is at the centre of quantum error correction theory Bravyi, 2005Sergey Bravyi and Alexei Kitaev. 2005. Universal quantum computation with ideal Clifford gates and noisy ancillas. Physical Review A 71, 2 (February 2005, 022316). doi: 10.1103/PhysRevA.71.022316; it is also a cornerstone of measurement-based quantum computing Rausse., 2001Robert Raussendorf and Hans J. Briegel. 2001. A One-Way Quantum Computer. Physical Review Letters 86, 22 (May 2001, 5188--5191). doi: 10.1103/PhysRevLett.86.5188 and graph states Hein, 2004M. Hein, J. Eisert and H. J. Briegel. 2004. Multiparty entanglement in graph states. Physical Review A 69, 6 (June 2004, 062311). doi: 10.1103/physreva.69.062311, as well as one of the most promising approaches for fast quantum simulations Gottes., 1999Daniel Gottesman. 1999. The Heisenberg representation of quantum computers. In Group 22: Proceedings of the 12th International Colloquium onGroup Theoretical Methods in Physics,. International Press, 32--43 Bravyi, 2019Sergey Bravyi, Dan Browne, Padraic Calpin, Earl Campbell, David Gosset and Mark Howard. 2019. Simulation of quantum circuits by low-rank stabilizer decompositions. Quantum 3 (Septempter 2019, 181). doi: 10.22331/q-2019-09-02-181 Kissin., 2022Aleks Kissinger and John van de Wetering. 2022. Simulating quantum circuits with ZX-calculus reduced stabiliser decompositions. Quantum Science and Technology 7, 4 (July 2022, 044001). doi: 10.1088/2058-9565/ac5d20.

The Clifford subgroup of quantum circuits admits an efficient Θ(2n2)\Theta(2n^2)-sized program representation known as Clifford tableau Aarons., 2004Scott Aaronson and Daniel Gottesman. 2004. Improved simulation of stabilizer circuits. Physical Review A 70, 5 (November 2004, 052328). doi: 10.1103/PhysRevA.70.052328. This has been used profusely for compiler optimisation. In Aarons., 2004Scott Aaronson and Daniel Gottesman. 2004. Improved simulation of stabilizer circuits. Physical Review A 70, 5 (November 2004, 052328). doi: 10.1103/PhysRevA.70.052328 the first Clifford circuit synthesis procedure is given, using an analytical decomposition of Clifford tableaus into O(n2/logn)O(n^2 /\log n) one and two-qubit gates. An improved, Bruhat-based decomposition that is optimal in the number of Hadamard gates was subsequently proposed in Maslov, 2018Dmitri Maslov and Martin Roetteler. 2018. Shorter Stabilizer Circuits via Bruhat Decomposition and Quantum Circuit Transformations. IEEE Transactions on Information Theory 64, 7 (July 2018, 4729--4738). doi: 10.1109/tit.2018.2825602. In the case of a Clifford fragment directly followed by measurements, the procedure can be further refined to replace gates with classical computation on the measurement outcomes Bravyi, 2021Sergey Bravyi and Dmitri Maslov. 2021. Hadamard-Free Circuits Expose the Structure of the Clifford Group. IEEE Transactions on Information Theory 67, 7 (July 2021, 4546--4563). doi: 10.1109/TIT.2021.3081415. Finally, an alternative normal form that is well-suited to hardware with limited nearest neighbours connectivity was also derived using a diagrammatic approach Maslov, 2023Dmitri Maslov and Willers Yang. 2023. CNOT circuits need little help to implement arbitrary Hadamard-free Clifford transformations they generate. npj Quantum Information 9, 1 (Septempter 2023). doi: 10.1038/s41534-023-00760-2.

Just as in unitary synthesis, circuit decompositions of Clifford operations more efficient than the general analytical expressions can be obtained case-by-case using search and optimisation. The pendant to the provably optimal decompositions of unitaries obtained through brute force search Amy, 2013M. Amy, D. Maslov, M. Mosca and M. Roetteler. 2013. A Meet-in-the-Middle Algorithm for Fast Synthesis of Depth-Optimal Quantum Circuits. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32, 6 (June 2013, 818--830). doi: 10.1109/TCAD.2013.2244643 also exists for Clifford circuits Kliuch., 2013Vadym Kliuchnikov and Dmitri Maslov. 2013. Optimization of Clifford circuits. Physical Review A 88, 5 (November 2013, 052307). doi: 10.1103/physreva.88.052307. Due to the more efficient representation and smaller search space, all optimal Clifford circuits could be found up to 6 qubits. Using modern SAT solvers, optimal Clifford synthesis has recently been pushed much further, with known optimal circuits beyond 20 qubits Peham, 2023Tom Peham, Nina Brandl, Richard Kueng, Robert Wille and Lukas Burgholzer. 2023. Depth-Optimal Synthesis of Clifford Circuits with SAT Solvers. In IEEE International Conference on Quantum Computing and Engineering, QCE 2023, Bellevue, WA, USA, September 17-22, 2023. IEEE, 802--813. doi: 10.1109/QCE57702.2023.00095 Schnei., 2023Sarah Schneider, Lukas Burgholzer and Robert Wille. 2023. A SAT Encoding for Optimal Clifford Circuit Synthesis. In Proceedings of the 28th Asia and South Pacific Design Automation Conference, January 2023. IEEE. doi: 10.1145/3566097.3567929.

Heuristic optimisation approaches have also been shown to be effective on Clifford optimisation Bravyi, 2021Sergey Bravyi and Dmitri Maslov. 2021. Hadamard-Free Circuits Expose the Structure of the Clifford Group. IEEE Transactions on Information Theory 67, 7 (July 2021, 4546--4563). doi: 10.1109/TIT.2021.3081415 Fagan, 2018Andrew Fagan and Ross Duncan. 2018. Optimising Clifford Circuits with Quantomatic. In Proceedings 15th International Conference on Quantum Physics and Logic, QPL 2018, Halifax, Canada, 3-7th June 2018, 85--105. doi: 10.4204/EPTCS.287.5 and scale to larger systems. For Clifford computations on devices with restricted connectivity, an architecture-aware synthesis method was proposed in Winderl, 2024David Winderl, Qunsheng Huang, Arianne Meijer-van de Griend and Richie Yeung. 2024. Architecture-Aware Synthesis of Stabilizer Circuits from Clifford Tableaus. arXiv: 2309.08972 [quant-ph].

Diagrammatic representations #

Quantum computer science and quantum mechanics have a rich history in diagrammatic representations Feynman, 1949R. P. Feynman. 1949. Space-Time Approach to Quantum Electrodynamics. Physical Review 76, 6 (Septempter 1949, 769--789). doi: 10.1103/physrev.76.769 Coecke, 2017Bob Coecke and Aleks Kissinger. 2017. Picturing Quantum Processes: A First Course in Quantum Theory and Diagrammatic Reasoning. Cambridge University Press. doi: 10.1017/9781316219317 Backens, 2019Miriam Backens and Aleks Kissinger. 2019. ZH: A Complete Graphical Calculus for Quantum Computations Involving Classical Non-linearity. Electronic Proceedings in Theoretical Computer Science 287 (January 2019, 23--42). doi: 10.4204/EPTCS.287.2. These have allowed one to picture complex physical processes as intuitive operations in a graphical language and have – as a nice side effect – led to a plethora of state-of-the-art quantum circuit optimisation techniques!

A diagrammatic representation of quantum computation is obtained by lifting the gates that form a quantum circuit into the nodes of a more abstract graph-based graphical calculus. The most commonly used flavour of calculus for circuit optimisation is the ZX calculus Coecke, 2008Bob Coecke and Ross Duncan. 2008. Interacting Quantum Observables Coecke, 2012Bob Coecke, Ross Duncan, Aleks Kissinger and Quanlong Wang. 2012. Strong Complementarity and Non-locality in Categorical Quantum Mechanics. In 2012 27th Annual IEEE Symposium on Logic in Computer Science, June 2012. IEEE, 245--254. doi: 10.1109/lics.2012.35 Weteri., 2020John van de Wetering. 2020. ZX-calculus for the working quantum computer scientist. arXiv: 2012.13966 [quant-ph] Yeung, 2024Richie Yeung, Konstantinos Meichanetzidis, Alexandre Krajenbrink and François Charton. 2024. Teaching small transformers to rewrite ZX diagrams. MathAI submission.

By breaking up multi-qubit gates into several non-unitary tensors, the ZX calculus and related variants Roy, 2011Shibdas Roy, Dipankar Home, Guruprasad Kar and Archan S. Majumda. 2011. Towards Normal Forms for GHZ∕W Calculus. In AIP Conference Proceedings. AIP, 112--119. doi: 10.1063/1.3635852 Backens, 2019Miriam Backens and Aleks Kissinger. 2019. ZH: A Complete Graphical Calculus for Quantum Computations Involving Classical Non-linearity. Electronic Proceedings in Theoretical Computer Science 287 (January 2019, 23--42). doi: 10.4204/EPTCS.287.2 Felice, 2023Giovanni de Felice and Bob Coecke. 2023. Quantum Linear Optics via String Diagrams. In Proceedings 19th International Conference on Quantum Physics and Logic, Wolfson College, Oxford, UK, 27 June - 1 July 2022. Open Publishing Association, 83-100. doi: 10.4204/EPTCS.394.6 expose some of the symmetry and structure of quantum physics in the form of simple and intuitive graphical rules. This has enabled the discovery of many quantum optimisation techniques (e.g. Duncan, 2019Ross Duncan, Aleks Kissinger, Simon Perdrix and John van de Wetering. 2019. Graph-theoretic Simplification of Quantum Circuits with the ZX-calculus. arXiv: 1902.03178 [quant-ph] Weteri., 2024John van de Wetering, Richie Yeung, Tuomas Laakkonen and Aleks Kissinger. 2024. Optimal compilation of parametrised quantum circuits. arXiv: 2401.12877 [quant-ph]), some of which we have already reviewed Huang, 2024Qunsheng Huang, David Winderl, Arianne Meijer-van de Griend and Richie Yeung. 2024. Redefining Lexicographical Ordering: Optimizing Pauli String Decompositions for Quantum Compiling. CoRR abs/2408.00354. doi: 10.48550/ARXIV.2408.00354 Gogioso, 2022Stefano Gogioso and Richie Yeung. 2022. Annealing Optimisation of Mixed ZX Phase Circuits. In Proceedings 19th International Conference on Quantum Physics and Logic, QPL 2022, Wolfson College, Oxford, UK, 27 June - 1 July 2022, 415--431. doi: 10.4204/EPTCS.394.20 Griend, 2022Arianne Meijer-van de Griend and Ross Duncan. 2022. Architecture-Aware Synthesis of Phase Polynomials for NISQ Devices. In Proceedings 19th International Conference on Quantum Physics and Logic, QPL 2022, Wolfson College, Oxford, UK, 27 June - 1 July 2022, 116--140. doi: 10.4204/EPTCS.394.8 Cowtan, 2019Alexander Cowtan, Silas Dilkes, Ross Duncan, Will Simmons and Seyon Sivarajah. 2019. Phase Gadget Synthesis for Shallow Circuits. In Proceedings 16th International Conference on Quantum Physics and Logic, QPL 2019, Chapman University, Orange, CA, USA, June 10-14, 2019, 213--228. doi: 10.4204/EPTCS.318.13 Cowtan, 2020Alexander Cowtan, Will Simmons and Ross Duncan. 2020. A Generic Compilation Strategy for the Unitary Coupled Cluster Ansatz. arXiv: 2007.10515 [quant-ph]. This selection of papers is not quite exhaustive5 – there are currently over 300 hundred papers on the topic, as indexed by zxcalculus.com.

Aside from being an invaluable tool for research and compiler pass design, a significant contribution of these diagrammatic representations is the introduction of graph transformation systems (GTS) Ehrig, 1973Hartmut Ehrig, Michael Pfender and Hans Jürgen Schneider. 1973. Graph-Grammars: An Algebraic Approach. In 14th Annual Symposium on Switching and Automata Theory, Iowa City, Iowa, USA, October 15-17, 1973. IEEE Computer Society, 167--180. doi: 10.1109/SWAT.1973.11 Rozenb., 1997Grzegorz Rozenberg. 1997. Handbook of Graph Grammars and Computing by Graph Transformations, Volume 1: Foundations. World Scientific König, 2018Barbara König, Dennis Nolte, Julia Padberg and Arend Rensink. 2018. A Tutorial on Graph Transformation. In Graph Transformation, Specifications, and Nets - In Memory of Hartmut Ehrig. Springer, 83--104. doi: 10.1007/978-3-319-75396-6_5 to quantum computing. More on this in chapter 3 (and much of the rest of this thesis)!

Reversible classical circuits #

Many more representations have either been taken over from classical compiler optimisations or were developed for specific purposes. The last we will mention is reversible circuit synthesis, an entirely classical circuit design problem which can draw from the results of decades of research. From a quantum perspective, reversible classical circuits correspond to unitaries (and more generally, isometries) that send basis states to basis states –  and thus do not introduce any complex phase Shende, 2002V.V. Shende, A.K. Prasad, I.L. Markov and J.P. Hayes. 2002. Reversible logic circuit synthesis. In IEEE/ACM International Conference on Computer Aided Design, 2002, November 2002. IEEE, 353--360. doi: 10.1109/iccad.2002.1167558. We highlight a selection of the more recent work in the field and refer the reader to the much more complete, albeit ageing, survey of Saeedi, 2013Mehdi Saeedi and Igor L. Markov. 2013. Synthesis and optimization of reversible circuits—a survey. ACM Computing Surveys 45, 2 (February 2013, 1--34). doi: 10.1145/2431211.2431220.

Up to 4 (qu)bits, all reversible circuits and their optimal synthesis can be generated by brute force Li, 2014Zhiqiang Li, Hanwu Chen, Xiaoyu Song and Marek Perkowski. 2014. A Synthesis Algorithm for 4-Bit Reversible Logic Circuits with Minimum Quantum Cost. ACM Journal on Emerging Technologies in Computing Systems 11, 3 (December 2014, 1--19). doi: 10.1145/2629542. Viewing reversible circuits as a permutation of all 2n2^n bitstrings, Susam et al. pre-compute optimal circuits only for swaps of two bitstrings (transpositions). These can then be used as part of a standard selection sort to synthesise arbitrary permutations. The number of such permutations scales much more favourably compared to arbitrary permutation, allowing fast circuit synthesis of up to 20+ (qu)bits in a fraction of a second, with good performance.

Truth table or matrix representations of reversible circuits suffer from the same exponential scaling as unitaries. To address these, other representations that have been used include exclusive sums of product terms (ESOP) Fazel, 2007K. Fazel, M. A. Thornton and J. E. Rice. 2007. ESOP-based Toffoli Gate Cascade Generation. In 2007 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, August 2007. IEEE. doi: 10.1109/pacrim.2007.4313212 Bandyo., 2014Chandan Bandyopadhyay, Hafizur Rahaman and Rolf Drechsler. 2014. A Cube Pairing Approach for Synthesis of ESOP-Based Reversible Circuit. In 2014 IEEE 44th International Symposium on Multiple-Valued Logic, May 2014. IEEE, 109--114. doi: 10.1109/ismvl.2014.27, positive polarity Reed-Müller codes (PPRM) Jegier, 2017Jerzy Jegier and Paweł Kerntopf. 2017. PPRM-based approach to synthesis of reversible functions. In Photonics Applications in Astronomy, Communications, Industry, and High Energy Physics Experiments 2017, August 2017. SPIE, 1044523. doi: 10.1117/12.2280943 and decision diagrams Stojko., 2019Suzana Stojković, Radomir Stanković, Claudio Moraga and Milena Stanković. 2019. Reversible Circuits Synthesis from Functional Decision Diagrams by using Node Dependency Matrices. Journal of Circuits, Systems and Computers 29, 05 (August 2019, 2050079). doi: 10.1142/s0218126620500796 Wille, 2010Robert Wille and Rolf Drechsler. 2010. Effect of BDD Optimization on Synthesis of Reversible and Quantum Logic. Electronic Notes in Theoretical Computer Science 253, 6 (March 2010, 57--70). doi: 10.1016/j.entcs.2010.02.006 Pang, 2011Yu Pang, Shaoquan Wang, Zhilong He, Jinzhao Lin, Sayeeda Sultana and Katarzyna Radecka. 2011. Positive Davio-based synthesis algorithm for reversible logic. In 2011 IEEE 29th International Conference on Computer Design (ICCD), October 2011. IEEE, 212--218. doi: 10.1109/iccd.2011.6081399.

The quantum framework is strictly more general than the classical regime in which the problem was studied initially. This affords additional freedom for decomposition schemes, such as decompositions of CCX\textit{CCX} gates on 3 qubits into single and two-qubit gates Shende, 2008Vivek V. Shende and Igor L. Markov. 2008. On the CNOT-cost of TOFFOLI gates. arXiv: 0803.2316 [quant-ph]. Various optimised decompositions for sequences of Toffoli gates have also been similarly developed Scott, 2008Nathan O. Scott and Gerhard W. Dueck. 2008. Pairwise decomposition of toffoli gates in a quantum circuit. In Proceedings of the 18th ACM Great Lakes symposium on VLSI, May 2008. ACM, 231--236. doi: 10.1145/1366110.1366168 Arabza., 2010Mona Arabzadeh, Mehdi Saeedi and Morteza Saheb Zamani. 2010. Rule-based optimization of reversible circuits. In 2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC), January 2010. IEEE, 849--854. doi: 10.1109/aspdac.2010.5419684 Datta, 2013Kamalika Datta, Gaurav Rathi, Robert Wille, Indranil Sengupta, Hafizur Rahaman and Rolf Drechsler. 2013. Exploiting Negative Control Lines in the Optimization of Reversible Circuits Rahman, 2014Md Zamilur Rahman and Jacqueline E. Rice. 2014. Templates for Positive and Negative Control Toffoli Networks Datta, 2015Kamalika Datta, Indranil Sengupta and Hafizur Rahaman. 2015. A Post-Synthesis Optimization Technique for Reversible Circuits Exploiting Negative Control Lines. IEEE Transactions on Computers 64, 4 (April 2015, 1208--1214). doi: 10.1109/tc.2014.2315641 Arpita, 2015Pyreddy Mary Arpita, Kamalika Datta, Rohith Vemula and Indranil Sengupta. 2015. Optimization of reversible circuits using triple-gate templates at quantum gate level. In 2015 International Conference on Electronic Design, Computer Networks & Automated Verification (EDCAV), January 2015. IEEE, 120--124. doi: 10.1109/edcav.2015.7060551 Abdess., 2016Nabila Abdessaied, Matthew Amy, Mathias Soeken and Rolf Drechsler. 2016. Technology Mapping of Reversible Circuits to Clifford+T Quantum Circuits. In 2016 IEEE 46th International Symposium on Multiple-Valued Logic (ISMVL), May 2016. IEEE, 150--155. doi: 10.1109/ismvl.2016.33 Gado, 2021Mariam Gado and Ahmed Younes. 2021. Optimization of Reversible Circuits Using Toffoli Decompositions with Negative Controls. Symmetry 13, 6 (June 2021, 1025). doi: 10.3390/sym13061025. Mohammadi and Eshghi introduced 4-valued truth tables to extend classical circuit synthesis to include X\sqrt{X} (also known as V\mathit{V}) gates Mohamm., 2008Majid Mohammadi and Mohammad Eshghi. 2008. Behavioral description of quantum V and V+ gates to design quantum logic circuits. In 2008 5th International Multi-Conference on Systems, Signals and Devices, July 2008. IEEE, 1--5. doi: 10.1109/ssd.2008.4632850. References Soeken, 2012M. Soeken, Z. Sasanian, R. Wille, D. M. Miller and R. Drechsler. 2012. Optimizing the Mapping of Reversible Circuits to Four-Valued Quantum Gate Circuits. In 2012 IEEE 42nd International Symposium on Multiple-Valued Logic, May 2012. IEEE, 173--178. doi: 10.1109/ismvl.2012.64 as well as Rahman, 2012Md. Mazder Rahman, Gerhard W. Dueck and Anindita Banerjee. 2012. Optimization of Reversible Circuits Using Reconfigured Templates incorporated controlled-V\mathit{V} gates into template matching strategies and showed significant improvements in synthesised gate count . Finally, Maslov, 2016Dmitri Maslov. 2016. Advantages of using relative-phase Toffoli gates with an application to multiple control Toffoli optimization. Physical Review A 93, 2 (February 2016, 022311). doi: 10.1103/physreva.93.022311 proposed decomposing Toffolis only up to relative phase, introducing a lot of freedom in the quantum decompositions that are required compared to the traditional classical decompositions.


In summary, a variety of scalable representations – such as phase polynomials, Pauli gadgets, Clifford tableaus, diagrammatic calculi, and reversible circuits – have been developed to abstract computations and enable highly tailored optimisation methods. These approaches leverage the unique structure and symmetries of quantum computations, achieving significant reductions in circuit size, depth, and hardware-specific overheads. Techniques such as phase polynomial synthesis and Clifford tableau representations are widely applicable and are a cornerstone of modern quantum compilers Amy, 2019Matthew Amy. 2019. Formal methods in Quantum Circuit Design. PhD Thesis. University of Waterloo Meijer., 2025Arianne Meijer - van de Griend. 2025. A comparison of quantum compilers using a DAG-based or phase polynomial-based intermediate representation. Journal of Systems and Software 221 (March 2025, 112224). doi: 10.1016/j.jss.2024.112224. Meanwhile, diagrammatic calculi, such as the ZX calculus, provide a flexible and theoretically robust framework for optimisations, often revealing simplifications invisible in the traditional gate-based model.


  1. If intrigued, look at this nice introduction Kottma., 2024Korbinian Kottmann. 2024. Introducing (dynamical) Lie algebras for quantum practitioners. (February 2024). Retrieved on 08/01/2025 from https://pennylane.ai/qml/demos/tutorial_liealgebra and the references therein. It’s not as scary as it sounds. ↩︎

  2. Experimental realisations of many-qubit interactions have also been demonstrated Erhard, 2019Alexander Erhard, Joel J. Wallman, Lukas Postler, Michael Meth, Roman Stricker, Esteban A. Martinez, Philipp Schindler, Thomas Monz, Joseph Emerson and Rainer Blatt. 2019. Characterizing large-scale quantum computers via cycle benchmarking. Nature Communications 10, 1 (November 2019). doi: 10.1038/s41467-019-13068-7 Bluvst., 2022Dolev Bluvstein, Harry Levine, Giulia Semeghini, Tout T. Wang, Sepehr Ebadi, Marcin Kalinowski, Alexander Keesling, Nishad Maskara, Hannes Pichler, Markus Greiner, Vladan Vuletić and Mikhail D. Lukin. 2022. A quantum processor based on coherent transport of entangled atom arrays. Nature 604, 7906 (April 2022, 451--456). doi: 10.1038/s41586-022-04592-6 Arrazo., 2021J. M. Arrazola, V. Bergholm, K. Brádler, T. R. Bromley, M. J. Collins, I. Dhand, A. Fumagalli, T. Gerrits, A. Goussev, L. G. Helt, J. Hundal, T. Isacsson, R. B. Israel, J. Izaac, S. Jahangiri, R. Janik, N. Killoran, S. P. Kumar, J. Lavoie, A. E. Lita, D. H. Mahler, M. Menotti, B. Morrison, S. W. Nam, L. Neuhaus, H. Y. Qi, N. Quesada, A. Repingon and et al. 2021. Quantum circuits with many photons on a programmable nanophotonic chip. Nature 591, 7848 (March 2021, 54--60). doi: 10.1038/s41586-021-03202-1 Evered, 2023Simon J. Evered, Dolev Bluvstein, Marcin Kalinowski, Sepehr Ebadi, Tom Manovitz, Hengyun Zhou, Sophie H. Li, Alexandra A. Geim, Tout T. Wang, Nishad Maskara, Harry Levine, Giulia Semeghini, Markus Greiner, Vladan Vuletić and Mikhail D. Lukin. 2023. High-fidelity parallel entangling gates on a neutral-atom quantum computer. Nature 622, 7982 (October 2023, 268--272). doi: 10.1038/s41586-023-06481-y and are at the core of other proposed architectures Bartol., 2023Sara Bartolucci, Patrick Birchall, Hector Bombín, Hugo Cable, Chris Dawson, Mercedes Gimeno-Segovia, Eric Johnston, Konrad Kieling, Naomi Nickerson, Mihir Pant, Fernando Pastawski, Terry Rudolph and Chris Sparrow. 2023. Fusion-based quantum computation. Nature Communications 14, 1 (February 2023). doi: 10.1038/s41467-023-36493-1 Bouras., 2021J. Eli Bourassa, Rafael N. Alexander, Michael Vasmer, Ashlesha Patil, Ilan Tzitrin, Takaya Matsuura, Daiqin Su, Ben Q. Baragiola, Saikat Guha, Guillaume Dauphinais, Krishna K. Sabapathy, Nicolas C. Menicucci and Ish Dhand. 2021. Blueprint for a Scalable Photonic Fault-Tolerant Quantum Computer. Quantum 5 (February 2021, 392). doi: 10.22331/q-2021-02-04-392↩︎

  3. Much of quantum error correction theory is built on the Clifford group, a subset of quantum operations that preserve “Pauli errors” and can thus be corrected easily. The flip side of this is that correcting any non-Clifford operation is very hard, something that is resolved by constructing “error-free” magic states ahead of time. For more details, refer to a quantum error correction textbook such as Gottes., 2024Daniel Gottesman. 2024. Surviving as a Quantum Computer in a Classical World. (February 2024). Retrieved on 08/01/2025 (lecture notes) from https://www.cs.umd.edu/class/spring2024/cmsc858G/QECCbook-2024-ch1-15.pdf↩︎

  4. Polynomial-sized quantum circuits constitute a polynomial-dimensional submanifold of the exponential-dimensional SU(2n)SU(2^n) Lie group. They are, hence, a measure zero subset of SU(2n)SU(2^n) with respect to the Haar measure. ↩︎

  5. and totally arbitrary! ↩︎

2.3. Rise of hybrid quantum-classical computation

Quantum measurements #

We have, until now, skipped over a crucial part of the quantum computation process: the role of quantum measurements. Quantum data, in isolation, is inherently inaccessible to us and the broader macroscopic world. A result from a quantum computation is only of value if we can probe it and get some readout value that we can display to the user or return to whoever launched the quantum computation.

Quantum physics measurements fundamentally differ from our classical understanding of just “reading out” data that is already there. This is the famous Schrödinger’s cat thought experiment of quantum mechanics: what data is within the qubits remains undefined until a measurement is performed. The act of observation will transform the quantum data: looking inside the box will, at random, either kill the cat or spare it1.

We thus need to add the measurement operation as a special case to our computer scientist’s model of quantum computing. Unlike purely quantum operations, measurements inherently involve interaction with the environment to produce a readout. Consequently, the no-delete and reversibility principles discussed earlier do not apply. Indeed, measurement is a lossy (and therefore irreversible) operation that projects the quantum state into one of a small subset of classical states. Which state the quantum state is projected into is non-deterministic. If one has access to an infinite supply of the same quantum state, then the whole state can be reconstructed by repeating measurements and analysing the distribution of outcomes2. Given no-cloning, however, this is unlikely to be the case, and so the full quantum result is hardly ever known. Instead, we must rely on well-designed measurement schemes to extract useful information from our partial access to the quantum states.

We model measurement as an operation that takes one qubit and outputs one purely classical bit3. In the circuit formalism, measurements are often implicitly added at the end of every qubit. Suppose we wish to make them explicit or add them elsewhere in the computation. In that case, we must introduce a graphical representation for the classical bit of data the measurement produces. The field has adopted the double-wire for this, even though a “half” wire would arguably have been more appropriate to reflect the reduced information content relative to quantum wires. I present to you the measurement box:

Measurements as first-class citizens #

It is very tempting to our feeble classical brains – and admittedly, we just did it ourselves in the previous paragraphs – to view measurements as merely a readout operation, an auxiliary operation that we are forced to perform at the end of a computation for operative reasons. This could not be further from the truth! In many ways, measurements are just as powerful tools as any other quantum operation – if not more so!

One eye-opening perspective on this is the field of measurement-based quantum computing (MBQC). Raussendorf and Briegel showed indeed Rausse., 2001Robert Raussendorf and Hans J. Briegel. 2001. A One-Way Quantum Computer. Physical Review Letters 86, 22 (May 2001, 5188--5191). doi: 10.1103/PhysRevLett.86.5188 that arbitrary quantum computations can be reproduced in the MBQC framework using only some resource quantum states that can be prepared ahead of time and measurements! In other words, given entangled qubits, measurements are all you need to perform quantum operations.

We will not explore MBQC further in this chapter (nor in this thesis, for that matter). Instead, we will use this as a motivation to explore what we can achieve with measurements. We have so far spared you from any mathematical alphabet soup. As we start discussing more concrete constructions of quantum computations, some introductory linear algebra and conventions around notation will become unavoidable.

Dirac formalism. Quantum states are nearly unanimously written using kets: instead of referring to a quantum state as ψ\psi, we write it wrapped in special brackets as ψ\ket{\psi}. This notation is also used when referring to the 00 and 11 states of qubits, written 0\ket{0} and 1\ket{1}.

Several states can be joined and considered together as one overall state. This is expressed using the tensor \otimes symbol: ψ1ψ2\ket{\psi_1} \otimes \ket{\psi_2} is the joint system of ψ1\ket{\psi_1} and ψ2\ket{\psi_2}. When the states in question are all explicitly qubit states, we use the shorthand binary notation 010=010\ket{0} \otimes \ket{1} \otimes \ket{0} = \ket{010}.

We will introduce more notation along the way.

With this out of the way, let us look in more details at the first smart use of measurements: the block-encoding technique. Consider the following scenario: you would like to perform an operation AA on an arbitrary quantum state ψ\ket{\psi}. Now, there are, unfortunately, many cases where implementing AA as a quantum circuit made of primitive gates that can be executed on hardware is very expensive4.

However, what we can always do is express AA as a matrix of dimensions 2n×2n2^n \times 2^n, where nn is the number of qubits in the state ψ\ket{\psi}. Then, there is a neat trick that we can sometimes apply: instead of trying to execute AA, we enlarge the matrix to a bigger A~\tilde{A}: A~=(AG1G2G3)\tilde{A} = \begin{pmatrix}A & G_1\\ G_2 & G_3 \end{pmatrix}

where G1,G2G_1, G_2 and G3G_3 are “garbage” matrices that we do not care about, but should combine into a matrix A~\tilde{A} that we know how to execute on a quantum computer. Quantum computations must be matrices with a row and column number that is a power of two; so at a minimum, A~\tilde{A} must be of size 2n+1×2n+12^{n+1} \times 2^{n+1}, i.e. be a computation on n+1n + 1 qubits.

We restrict our considerations in the following to the case on m=n+1m = n + 1 qubits – other cases are similar. We thus need to add a qubit to our ψ\ket\psi state to be able to pass it to our new operation. Such qubits that are added temporarily to facilitate a computation are a recurring feature in quantum computing and have thus earned themselves a name – ancilla qubits.

Let us take a look at the quantum states that result from executing A~\tilde{A}. If we add to ψ\ket \psi a 0\ket 0 ancilla state, our quantum operation acts as5

A~(0ψ)=0Aψ+1G1ψ.\tilde{A} (\ket 0 \otimes \ket \psi) = \ket 0 \otimes A \ket \psi + \ket 1 \otimes G_1 \ket \psi.

The expression AψA \ket \psi means the operation AA applied to ψ\ket \psi – exactly the output state we are seeking. If we input the ancilla qubit in state 1\ket 1, we get garbage:

A~(1ψ)=0G2ψ+1G3ψ.\tilde{A} (\ket 1 \otimes \ket \psi) = \ket 0 \otimes G_2 \ket \psi + \ket 1 \otimes G_3 \ket \psi.

So 0ψ\ket 0 \otimes \ket \psi is definitely the input state we are more interested in.

How can we recover AA from (3)? This is precisely what measurements do! When quantum states are expressed as sum of states, the terms of the sum form the possible measurement outcomes6. If we only measure a subset of the qubits, then the term corresponding to that measurement is isolated and all other terms disappear. Hence, if we measure the first qubit (that we introduced ourselves) in the zero state, then the remaining nn qubits will be precisely in the desired state Aψ.A \ket \psi. Success!

Using this “term isolating” property of measurements, known as state collapse, we can thus effect computations that would have been otherwise difficult or impossible to perform. There is however one important wrinkle that we cannot forget about: measurements are non-deterministic! We cannot assume that all measurements of the ancilla qubit will return the zero state. When 1\ket 1 is measured on the ancilla, the remaining qubits are left in the G1ψG_1 \ket\psi state. The computation has thus failed, and the execution must be aborted and restarted. How often the block-encoding protocol that we have presented fails depends on the details of AA and the choices of G1,G2G_1, G_2 and G3G_3 and is the main disadvantage of an otherwise very powerful quantum technique.

We will now explore two strategies to deal with “fails” in measurements. At the core of them is the idea of hybrid quantum-classical programs.

Who said quantum computers could not fix their mistakes #

Failed computations are an expensive mistake in quantum computing as the no-cloning theorem prevents us from keeping a “backup” of the initial state. The fact that failures are in fact unlucky measurement outputs makes matters worse, given that measurements are the only irreversible quantum operation. It is therefore impossible in general to recover from a “wrong” measurement.

There are, however, prominent cases in which the computation can be corrected based on the measurement outcome, thus yielding deterministic results. Recall equation (3) of the previous section: there is a computation AA on nn qubits, that can be probabilistically computed using m=n+1m = n + 1 qubits using A~\tilde{A}:

A~(0ψ)=0(Aψ)+1(Gψ),\tilde{A} (\ket 0 \otimes \ket \psi) = \ket 0 \otimes (A\ket\psi) + \ket 1 \otimes (G\ket\psi),

for some “garbage” GG. What if GG is a reversible operation, i.e. there is an operation G1G^{-1} to undo GG? Well then, we can still, at least in theory, recover AψA\ket \psi by applying AG1A \circ G^{-1}:

Gψ(AG1)Gψ=Aψ,G \ket \psi \mapsto (A \circ G^{-1}) \circ G \ket \psi = A \ket \psi,

but only if 1 was measured on the ancilla qubit7!

This is the beginning of quantum-classical hybrid computing: we start by performing quantum operations followed by measurements, the outcomes of which dictate what further quantum operations must be applied. We define for this purpose a classically controlled gate: a quantum operation that is only executed if a certain classical bit (the condition) is set. This bit will typically be a value derived from a previous measurement: it could be as simple as the outcome that a previous measurement yield, or a function of multiple past outcomes that must be evaluated on classical hardware (e.g. a CPU).

Mixing classical and quantum operations is a sure way to bring the quantum circuit representation to its knees. We adopt the following representation, in which a quantum gate that has an additional classical bit wire attached to it represents a classically controlled operation that is only executed if the bit value is 1.

Quantum Teleportation #

Quantum teleportation is a simple example of performing classically controlled quantum operations to do circuit corrections based on measurement outcomes. It is also coincidentally one of the most fundamental protocols of quantum theory. Its name is slightly misleading. Think of it as data transfer for quantum data, with a mind-bending twist: at the time of the transfer, only classical data must be communicated between the sending and receiving parties. As a result of this protocol, quantum information can be transferred using plain old-school copper wires (or any other classical communication channels)!

This is predicated on one crucial action being performed before the start of the communication. For every qubit that should be transmitted, the parties must beforehand create and share among themselves a pair of qubits that will serve as the quantum resource during the protocol execution. This resource state is widespread enough that it got its name: the Bell pair state. It is written in Dirac notation as 00+11\ket{00} + \ket{11}. As the notation indicates, it is a state with perfectly correlated measurements: when measured, the two qubits will always yield the same outcome, either both 0 or both 1.

There turns out to be a straightforward circuit that maps the two-qubit 00\ket {00}, which every two-qubit computation starts in, into the Bell pair state:

It is enough for us to think of it as a black box – or a grey box in this case.

We are interested in “teleporting” an arbitrary, single-qubit quantum state. Such a state can always be expressed as ψ=α0+β1\ket \psi = \alpha \ket 0 + \beta \ket 1, i.e. in the most general case, a one-qubit state will be in some superposition of the states 0\ket 0 and 1\ket 1. The paramenters α\alpha and β\beta are complex coefficients that encode the probabilities of measuring 0 or 1 – we can view them as the weights of a weighted sum.

We are now interested in combining a Bell resource state in a joint system with the arbitrary state ψ\ket \psi. The resulting three-qubit state is obtained with the \otimes operation, which distributes over sums just like usual multiplication:

(00+11)first two qubits(α0+β1)third qubit= α000+α110+β001+β111.\begin{aligned} &\underbrace{(\ket {00} + \ket {11})}_{\text{first two qubits}} \otimes \underbrace{(\alpha \ket 0 + \beta \ket 1)}_{\text{third qubit}}\\=\ &\alpha \ket {000} + \alpha \ket {110} + \beta \ket {001} + \beta \ket {111}.\end{aligned}

We chose to place the Bell pair on the first two qubits and the arbitrary state on the third. The goal is to move the data that sits on that last qubit to the first qubit. Looking at the first qubit in the above expression, notice that the desired state ψ=α0+β1\ket \psi = \alpha \ket 0 + \beta \ket 1 appears in the first qubit if we can discard the second and third terms:

α000+(α110+β001)+β111\alpha \ket{\underline{\mathbf{0}}00} + {\color{gray}(\alpha \ket {110} + \beta \ket {001})} + \beta \ket{\underline{\mathbf{1}}11}

This sounds very much like the measurement operations we have used before to isolate terms – but we need to isolate two terms simultaneously. We can resolve this issue by reorganising the expression8

α000+α110+β001+β111= (α0+β1)=ψ(00+11)Bell pair+(β0+α1)(01+10)+(α0β1)(0110)+(β0α1)(0011)\begin{aligned}&\alpha \ket{000} + \alpha \ket{110} + \beta \ket {001} + \beta \ket {111}\\ =\ &\underbrace{(\alpha \ket 0 + \beta \ket 1)}_{= \ket \psi}\otimes \underbrace{( \ket {00} + \ket {11})}_{\text{Bell pair}}\\&+(\beta \ket 0 + \alpha \ket 1) \otimes (\ket {01} + \ket {10})\\&+(\alpha\ket 0 - \beta \ket 1) \otimes (\ket{01} - \ket {10})\\&+(\beta \ket 0 - \alpha \ket 1) \otimes (\ket {00} - \ket {11})\end{aligned}

Obtaining the ψ\ket \psi state on the first qubit is thus as simple as isolating the first of these four terms. We do not know a priori how to measure 00+11\ket {00} + \ket{11} but we do know how to map that state to 00\ket {00}: that’s the inverse of the Bell pair state preparation circuit! This results in the following circuit:

This brings us to the same situation as we had for the block encoding application above: conditioned on the measurement outcome of the second and third qubits being 0, the computation performs a state “teleportation”, moving ψ\ket \psi from the third to the first qubit. We can compute the effect of Bell1\textit{Bell}^{-1} on the overall expression of (4) to find all possible output states: α000+α110+β001+β111Bell1(α0+β1)00+(β0+α1)01+(α0β1)10+(β0α1)11\begin{aligned}\alpha \ket {000} + \alpha \ket {110} + \beta \ket {001} + \beta \ket {111} \overset{\textit{Bell}^{-1}}{\mapsto}&(\alpha \ket 0 + \beta \ket 1) \otimes \ket {00} \\&+ (\beta \ket 0 + \alpha \ket 1) \otimes \ket {01} \\& + (\alpha \ket 0 - \beta \ket 1) \otimes \ket {10} \\&+ (\beta \ket 0 - \alpha \ket 1) \otimes \ket {11}\end{aligned}

As expected, we do get ψ\ket \psi on the first qubit for the measurement 00 (corresponding to the state 00\ket {00}), but as it stands, this only has a 14\frac{1}{4} probability of success.

You might notice, however, that the other states in which the first qubit can end up look remarkably similar, up to some sign flips and swaps 01\ket 0 \leftrightarrow \ket 1. In particular, all states still have the amplitudes α\alpha and β\beta somewhere, so it does not seem unfathomable that these “wrong” states can be mapped back to ψ\psi.

We can use the measurement outcomes of the second and third qubit to infer which of the “mistakes” occurred, and hence what state the first qubit has ended in. The 01 measurement outcome, for instance, results in the β0+α1\beta \ket 0 + \alpha \ket 1 state – this is just a bit flip away from ψ\ket \psi! This gate is known as XX. Its colleague the ZZ gate on the other hand leaves 0\ket 0 states untouched but flips the sign of 1.\ket 1. This would fix the 10 outcome. Finally, 11 requires both a Z and a X correction.

Putting these observations together, we can leverage classically controlled operations to obtain a fully deterministic protocol! The correct circuit implementing quantum teleportation is given by

In the scenario where a first party (Alice) wants to send a one-qubit quantum state to Bob, they can achieve that by creating a Bell pair state, the first qubit of which is given to Bob and the second to Alice. When Alice then gets in possession of another qubit ψ\ket \psi whose data she wants to transmit to Bob, she can achieve that by executing Bell1\textit{Bell}^{-1}, measuring her two qubits and communicating the (classical) measurement outcomes to Bob. Bob can perform the necessary corrections and will then have state ψ\ket \psi.

It is beautiful and often overlooked how one of the most fundamental protocols of quantum information theory is, in fact, a hybrid classical-quantum computation. Quantum teleportation without classical communication is physically impossible: it would let Alice communicate with Bob instantly, even though he could be light years away – in other words, it would fundamentally break relativity.

Repeat until success: If you fail, retry! #

Classical computer science has a straightforward solution whenever probabilistic computations that can fail are used: probability amplification or boosting Scheid., 2018Christian Scheideler. 2018. Probability amplification. Retrieved lecture notes, online, visited 30/12/2024 from https://cs.uni-paderborn.de/fileadmin-eim/informatik/fg/ti/Lehre/SS_2018/AA/lecture_5.pdf. The idea is so simple that it barely deserves a name: execute several independent runs of the computation and choose the most common outcome. If the probability of failure is below a certain threshold (e.g. 50% for a binary output), then with basic statistics, one can extrapolate the number of runs required to obtain any desired accuracy9.

We have been ignoring this approach so far since no-cloning prohibits us from repeating a procedure more than once on an input state ψ\ket\psi. However, in the scenario that the computation should only be executed on a specific, known input state and the computation that prepares that state is known, we can recover from computation failures by just preparing a new state identically.

Suppose we know how to execute the quantum computation PP mapping 00P00=ψ.\ket{0\cdots 0} \mapsto P \ket {0\cdots0} = \ket \psi.

As before, we would like to compute AA given an implementation of the computation A~\tilde{A} that acts on a nn-qubit state ψ\ket\psi and an ancilla qubit in the 0\ket 0 state. If the measurement of A~(0ψ)\tilde{A}(\ket 0 \otimes \ket \psi) returns 1, then the computation failed. We can then discard all qubits and restart from the 0\ket 0 state, applying PP followed by A~\tilde{A} and an ancilla measurement, repeating until we measure 0. As a pseudo-quantum circuit, we could express this as:

psi_qs = create_qubits(n)
while True:
   ancilla_q = create_qubit()
   obtain measurement m from:



    if m == 0:
        break  # success! we can exit loop and proceed
    else:
        reset_qubits(psi_qs)

At each iteration, we can either exit the loop if the state collapse was successful (m == 0), or reset the qubits to zero and try again. But pseudo circuits do not run on hardware! The only way to express this computation as an actual circuit is to unroll the loop, i.e. repeat the block within the loop as many times as we expect might be necessary10. The first two iterations would look as follows:

It should be obvious why we haven’t unrolled the loop any further – it quickly becomes unweildy. The resulting program is not only hard to display and read, but it also suffers from fundamental issues in practice. For one, the program size becomes hugely bloated, and beyond slowing down the compiler, it will also cause a host of issues on the control hardware in real-time, such as long load times, inefficient execution, and low cache efficiency.

Even more worryingly, when picking the maximum number of iterations, we face an impossible tradeoff: if the number of iterations is small, then the probability of failure will remain non-negligible. As we scale this value up, however, we are introducing more and more gates into the program to cover the odd case of multiple successive repeated failures. We do not intend to execute these gates on most computation runs. They come at a significant cost to the runtime. For each gate listed in the circuit, the condition for the gate’s execution must be checked, whether or not the gate ends up being executed. Furthermore, hardware schedulers might be forced to be pessimistic and schedule a time window for all conditional operations ahead of time. This will significantly delay any operation to be performed after the loop.

We, therefore, argue that the quantum circuit model is ill-suited as the representation for quantum programs that combine classical and quantum data. Such programs, however, are a fundamental building block towards developing meaningful large-scale quantum computations and are bound to become the norm. Beyond the examples discussed above—–including block-encodings, repeat-until-success schemes, distributed quantum computing and measurement-based quantum computing – one application of hybrid quantum-classical operations stands out as critically important for the large-scale deployment of quantum computing: quantum error correction (QEC) schemes. We discuss this use case in the next section.


  1. It is ironic that Schrödinger’s thought experiment Schrö., 1935Erwin Schrödinger. 1935. Die gegenwärtige Situation in der Quantenmechanik. Naturwissenschaftern, intended to highlight the absurdity of quantum mechanics, has become the field’s most famous PR campaign. Sorry to disappoint – you won’t find felines occupying multiple states of existence (though qubits do!) ↩︎

  2. This is known as state tomography Allahv., 2004A. E. Allahverdyan, R. Balian and Th. M. Nieuwenhuizen. 2004. Determining a Quantum State by Means of a Single Apparatus. Physical Review Letters 92, 12 (March 2004, 120402). doi: 10.1103/PhysRevLett.92.120402. One must perform measurements in multiple bases, i.e., different choices of classical states to project to. ↩︎

  3. Where did the qubit go? All the information in a qubit post-measurement is also contained in the classical bit of output data – it is, therefore, redundant and renders the qubit useless. In our model, we, therefore, bundle measurement and qubit discard into one operation. ↩︎

  4. or outright impossible, in cases where AA is not a unitary linear operation, for example. ↩︎

  5. This is obtained by a simple matrix multiplication. The vector representation of the quantum state 0ψ\ket 0 \otimes \psi is obtained using the Kronecker product. You can also just trust me that this works out this way. ↩︎

  6. This is simplifying slightly. There is a necessary condition for this to be a valid measurement: the states in the sum must form a measurement basis, i.e. they must be orthogonal. This is satisfied here. ↩︎

  7. Notice that, informally, we would hope to get a computation GG such that GAG \approx A in the sense that it should somehow be closely related to AA. This way, the resulting correction AG1IdA \circ G^{-1} \approx Id would be close to the identity, and would be cheap to compute. ↩︎

  8. Apologies, it seems at this point that we are conjuring up a complex expression out of nowhere. It is in fact just a change of basis – plain old linear algebra. The formula can be obtained easily by writing out the basis change matrix. ↩︎

  9. This is fiendishly effective: the Hoeffding bounds guarantee that the probability of success will converge to 1 exponentially with the number of runs. ↩︎

  10. In other words, we must pick a constant MM for the maximum number of times we expect the loop to be executed. If a single loop iteration has a failure probability of pp, the failure probability of the program with MM unrolled iteration is then pMp^M↩︎

2.4. Quantum compilers cannot do it alone

We have (hopefully!) by now convinced our readership that quantum programs must interface with our established classical infrastructure and should rather be understood as an interleaved execution of both classical and quantum operations. The obvious question that thus poses itself is

How do we equip quantum compilers to deal with classical operations?

The simplest solution is to adopt the extended quantum circuit formalism with support for classically controlled operations, as we have introduced in the previous section. Using this representation, the basic types available for computation are the qubit and the classical bit. We can also, at that point, introduce purely classical operations on bits, for instance, to compute boolean logic on measurement outcomes, such as “if both the first AND the second measurement outcomes are 1, then …”.

However, the circuit model is inherently designed with the no-cloning principle in mind: specifically, with the assumption that at any one time, there are exactly nn (for some fixed value of nn) resources available for computation. This for example means that in the following program

in which two measurements write to the same classical bit, it would be impossible to append a gate controlled on the first measurement outcome after the ZZ gate, as that value was overwritten on the classical wire by the second measurement. The solution could be to introduce1 a new, fresh classical wire for each measurement and avoid overwriting outcomes. However, there are also many other ways to break this wires-based representation: suppose you have an operation with one input and two outputs, such as a copy operation x(x,x)x \mapsto (x,x). We would need two wires for the output, but the input would only provide us with one… We now have to start creating additional wires ahead of time for this purpose and solve memory allocation problems to decide which wire should be given to which operation.

These are run-of-the-mill classical compiler problems! One might at first hope that the set of overlapping problems between classical and quantum compilers is manageably small. After all, in all the use cases we have covered so far, the amount of classical computation was very minimal, limiting itself to conditionals and loops based on simple boolean expressions. Surely the full-blown powers of a classical compiler are not required!

Unfortunately (and as usual), scientists have shown no lack of imagination in this field – and so have found very compelling use cases for complex classical computations within quantum programs. To drive this point home, let us consider the concrete example of quantum error correction.

The quantum error correction use case #

Error-correcting protocols do as their name suggests: they detect whenever data is subjected to errors and thus modified in an unexpected way. They then attempt to recover the intended valid state. In the classical world, such schemes are employed whenever the hardware is not reliable enough: this is hardly the case for computations themselves but is widespread in communications (e.g. within the TCP/IP protocol for the internet Eddy, 2022Wesley Eddy. 2022. Transmission Control Protocol (TCP). (August 2022). Retrieved as RFC 9293 from https://www.rfc-editor.org/info/rfc9293) or for memory and storage in critical applications.

No one expects to be able to manipulate matter-based qubits without introducing errors for a very long time. Photons, on the other hand, are prone to data losses throuh absorption and can only be entangled using complex and noisy schemes such as the Knill–Laflamme–Milburn protocol Knill, 2001E. Knill, R. Laflamme and G. J. Milburn. 2001. A scheme for efficient quantum computation with linear optics. Nature 409, 6816 (January 2001, 46--52). doi: 10.1038/350510092. Simply put, it is safe to assume that error correction will be found everywhere – as soon as our quantum computers manage to implement such protocols.

A sketch of quantum error correction goes roughly as follows: the data that would be stored on kk qubits is instead encoded in a redundant way on a larger number n>kn > k of qubits. Thus, when errors occur on a subset of the nn qubits, the data can be restored using the qubits that have not been corrupted. Before errors can be corrected, they must be detected. To this end, we first add fresh ancilla qubits to the program. Through smartly designed interactions with the data qubits, the ancilla qubits pick up the errors from the data. When we subsequently perform measurements on the ancilla qubits, these errors result in modified outcomes, called the error syndrome.

The challenging bit comes next: from a run of syndrome measurements, one must infer the most likely errors – a step known as syndrome decoding. This is a purely classical maximum likelihood problem that requires a non-trivial amount of computations to resolve. For small problem instances, all possible input syndromes can be tabled, and the outputs precomputed – in which case the problem at runtime is reduced to fast table lookups. However, the higher the fault tolerance we require, the more qubits must be used in the encodings, and so invariably, the problem quickly becomes very demanding computationally.

Meanwhile, these “cycles” of error detection and correction are under strict latency constraints: idling qubits waiting for corrections to be applied will accumulate new errors that must themselves be corrected – for error correction to be workable, we must be capable of detecting and correcting for errors faster than they are being introduced. The entire error correction cycle just described can be summarised by the following diagram:

q a u n b c i i t l s l a c i r c u i t p r o e p r a r g o a r t i o n m s e y a n s d u r r o e m m e e n t s d y e n t d e r c o t m i e o c n o r r e c t i o n

The decoding time is a crucial factor in determining the overall cycle time and, thus, the clock rate of fault-tolerant quantum hardware. Consider, for example, a 32-qubit Toric code Kitaev, 2003A.Yu. Kitaev. 2003. Fault-tolerant quantum computation by anyons. Annals of Physics 303, 1 (January 2003, 2--30). doi: 10.1016/S0003-4916(02)00018-0, one of the most well-studied quantum error-correcting codes. Without going into the details of the code itself, we can use the C++ implementation made available by the MQT toolkit Burgho., 2021Lukas Burgholzer and Robert Wille. 2021. Advanced Equivalence Checking for Quantum Circuits. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 40, 9 (Septempter 2021, 1810--1824). doi: 10.1109/tcad.2020.3032630 to study the decoder performance for this code.

Consider first a “naive” compilation of the decoder – the kind of program that we could hope to get from a quantum compiler that “understands” classical operations but only implements optimisations directly relevant to quantum computations. Such a compiler does not currently exist, but the decoder being a C++ program, we can approximate what the compiled binary would look like by turning off all optimisations from an established classical compiler3.

The runtime averaged over 1000 runs of the decoder is 0.73±0.06ms0.73\pm0.06\,ms. This is within the latency requirements of certain trapped ion architectures Ryan-A., 2021C. Ryan-Anderson, J. G. Bohnet, K. Lee, D. Gresh, A. Hankin, J. P. Gaebler, D. Francois, A. Chernoguzov, D. Lucchetti, N. C. Brown, T. M. Gatterman, S. K. Halit, K. Gilmore, J. A. Gerber, B. Neyenhuis, D. Hayes and R. P. Stutz. 2021. Realization of Real-Time Fault-Tolerant Quantum Error Correction. Physical Review X 11, 4 (December 2021, 041058). doi: 10.1103/physrevx.11.041058, but far beyond the sub-microsecond regime that will be required to make error correction a reality on superconduction-based quantum computers Carrer., 2024Almudena Carrera Vazquez, Caroline Tornow, Diego Ristè, Stefan Woerner, Maika Takita and Daniel J. Egger. 2024. Combining quantum processors with real-time classical communication. Nature 636, 8041 (November 2024, 75--79). doi: 10.1038/s41586-024-08178-2. this can be contrasted with the program output by the same compiler, but with all compiler optimisations enabled: the average runtime is reduced by a factor close to 10x to 0.078±0.004ms0.078\pm0.004\,ms – still a factor 100x away from the required performance on superconductors, but huge gains nonetheless! The details of the experiment with all build flags, the hardware used and how to reproduce the results are available here.

There is no hope of obtaining these types of speedups without an in-depth understanding of classical hardware and battle-tested implementations for every optimisation pass under the sun – in short, the full thrust of a modern state-of-the-art compiler such as clang or gcc.

To make matters worse, such classical computations are bound to move to dedicated accelerators that require specialised compilation, such as GPUs and FPGAs, for the most time-critical subroutines: quantum error decoding using GPUs is already well-developed Bausch, 2024Johannes Bausch, Andrew W. Senior, Francisco J. H. Heras, Thomas Edlich, Alex Davies, Michael Newman, Cody Jones, Kevin Satzinger, Murphy Yuezhen Niu, Sam Blackwell, George Holland, Dvir Kafri, Juan Atalaya, Craig Gidney, Demis Hassabis, Sergio Boixo, Hartmut Neven and Pushmeet Kohli. 2024. Learning high-accuracy error decoding for quantum processors. Nature 635, 8040 (November 2024, 834--840). doi: 10.1038/s41586-024-08148-8 Cao, 2023Hanyan Cao, Feng Pan, Yijia Wang and Pan Zhang. 2023. qecGPT: decoding Quantum Error-correcting Codes with Generative Pre-trained Transformers. arXiv: 2307.09025 [quant-ph] and more esoteric platforms FPGAs Overwa., 2022Ramon W. J. Overwater, Masoud Babaie and Fabio Sebastiano. 2022. Neural-Network Decoders for Quantum Error Correction Using Surface Codes: A Space Exploration of the Hardware Cost-Performance Tradeoffs. IEEE Transactions on Quantum Engineering 3 (1--19). doi: 10.1109/tqe.2022.3174017 Meinerz, 2022Kai Meinerz, Chae-Yeun Park and Simon Trebst. 2022. Scalable Neural Decoder for Topological Surface Codes. Physical Review Letters 128, 8 (February 2022, 080505). doi: 10.1103/physrevlett.128.080505, superconducting circuits Ueno, 2021Yosuke Ueno, Masaaki Kondo, Masamitsu Tanaka, Yasunari Suzuki and Yutaka Tabuchi. 2021. QECOOL: On-Line Quantum Error Correction with a Superconducting Decoder for Surface Code. In 2021 58th ACM/IEEE Design Automation Conference (DAC), December 2021. IEEE, 451--456. doi: 10.1109/dac18074.2021.9586326 and compute-in-memory architectures Wang, 2024Hao Wang, Erjia Xiao, Songhuan He, Zhongyi Ni, Lingfeng Zhang, Xiaokun Zhan, Yifei Cui, Jinguo Liu, Cheng Wang, Zhongrui Wang and Renjing Xu. 2024. CIM-Based Parallel Fully FFNN Surface Code High-Level Decoder for Quantum Error Correction. arXiv: 2411.18090 [cs.AR] are being actively studied.

These observations should leave the reader convinced that in order to compile and realise the kind of hybrid quantum-classical programs that we expect will become the norm in the field, quantum compilers will need to embrace and encompass the full breadth and depth of classical compilers. This leaves us with no choice but to fully transform and integrate the existing quantum tooling and quantum optimisation research into the established compiler ecosystem. What this means exactly is the subject of the rest of this chapter.

A new quantum programming paradigm? #

We have seen it – quantum circuits are very limited in their expressiveness. They are well suited to presenting sequences of purely quantum operations and how the computation is parallelised across qubits, but they quickly become limiting once both quantum and classical data types are mixed and any type of control flow (conditionals, loops, function calls, etc.) is introduced.

How users express programs in the front end has deep implications for the kind of computations that the compiler must be capable of reasoning about and, hence, for the compiler’s architecture. The great merging of classical and quantum compilers is the perfect opportunity to reconcile program representations and integrate the learnings from decades of classical programming language research into quantum computing.

There have been several trailblazing initiatives to formalise quantum programming and create dedicated languages, such as QCL Ömer, 2000Bernhard Ömer. 2000. Quantum Programming in QCL. (January 2000). Retrieved from http://tph.tuwien.ac.at/ oemer/doc/quprog.pdf, Quipper Green, 2013Alexander S. Green, Peter LeFanu Lumsdaine, Neil J. Ross, Peter Selinger and Benoît Valiron. 2013. Quipper: a scalable quantum programming language. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2013, New York, NY, USA. Association for Computing Machinery, 333--342. doi: 10.1145/2491956.2462177 Rios, 2018Francisco Rios and Peter Selinger. 2018. A Categorical Model for a Quantum Circuit Description Language (Extended Abstract). Electronic Proceedings in Theoretical Computer Science 266 (February 2018, 164--178). doi: 10.4204/eptcs.266.11 Fu, 2023Peng Fu, Kohei Kishida, Neil J. Ross and Peter Selinger. 2023. Proto-Quipper with Dynamic Lifting. Proceedings of the ACM on Programming Languages 7, POPL (January 2023, 309--334). doi: 10.1145/3571204, Q# Micros., 2024 Microsoft. 2024. Introduction to the quantum programming language Q#. Retrieved on 31/12/2024 from https://learn.microsoft.com/en-us/azure/quantum/qsharp-overview and Silq Bichsel, 2020Benjamin Bichsel, Maximilian Baader, Timon Gehr and Martin Vechev. 2020. Silq: a high-level quantum language with safe uncomputation and intuitive semantics. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2020. ACM, 286--300. doi: 10.1145/3385412.3386007. Their adoption in the quantum ecosystem have so far remained limited, overshadowed by the popularity of python-based APIs for quantum circuit-based representations, as offered by Qiskit Javadi., 2024Ali Javadi-Abhari, Matthew Treinish, Kevin Krsulich, Christopher J. Wood, Jake Lishman, Julien Gacon, Simon Martiel, Paul D. Nation, Lev S. Bishop, Andrew W. Cross, Blake R. Johnson and Jay M. Gambetta. 2024. Quantum computing with Qiskit. arXiv: 2405.08810 [quant-ph], Pennylane Bergho., 2022Ville Bergholm, Josh Izaac, Maria Schuld, Christian Gogolin, Shahnawaz Ahmed, Vishnu Ajith, M. Sohaib Alam, Guillermo Alonso-Linaje, B. AkashNarayanan, Ali Asadi, Juan Miguel Arrazola, Utkarsh Azad, Sam Banning, Carsten Blank, Thomas R Bromley, Benjamin A. Cordier, Jack Ceroni, Alain Delgado, Olivia Di Matteo, Amintor Dusko, Tanya Garg, Diego Guala, Anthony Hayes, Ryan Hill, Aroosa Ijaz, Theodor Isacsson, David Ittah, Soran Jahangiri, Prateek Jain, Edward Jiang, Ankit Khandelwal, Korbinian Kottmann, Robert A. Lang, Christina Lee, Thomas Loke, Angus Lowe, Keri McKiernan, Johannes Jakob Meyer, J. A. Montañez-Barrera, Romain Moyard, Zeyue Niu, Lee James O'Riordan, Steven Oud, Ashish Panigrahi, Chae-Yeun Park, Daniel Polatajko, Nicolás Quesada, Chase Roberts, Nahum , Isidor Schoch, Borun Shi, Shuli Shu, Sukin Sim, Arshpreet Singh, Ingrid Strandberg, Jay Soni, Antal Száva, Slimane Thabet, Rodrigo A. Vargas-Hernández, Trevor Vincent, Nicola Vitucci, Maurice Weber, David Wierichs, Roeland Wiersema, Moritz Willmann, Vincent Wong, Shaoming Zhang and Nathan Killoran. 2022. PennyLane: Automatic differentiation of hybrid quantum-classical computations. arXiv: 1811.04968 [quant-ph] and Cirq Cirq D., 2024 Cirq Developers. 2024. Cirq. There is, as a result, a justified dose of scepticism in the quantum community on how well the ideas from classical programming really translate to quantum.

It is thus all the more notable that we are seeing a new generation of quantum programming tooling being developed Koch, 2024Mark Koch, Alan Lawrence, Kartik Singhal, Seyon Sivarajah and Ross Duncan. 2024. GUPPY: Pythonic Quantum-Classical Programming. (January 2024). Retrieved (talk recording) from https://www.youtube.com/live/D8esZrt7ogk?feature=shared&t=31448 Ittah, 2024David Ittah, Ali Asadi, Erick Ochoa Lopez, Sergei Mironov, Samuel Banning, Romain Moyard, Mai Jacob Peng and Josh Izaac. 2024. Catalyst: a Python JIT compiler for auto-differentiable hybrid quantum programs. Journal of Open Source Software 9, 99 (July 2024, 6720). doi: 10.21105/joss.06720 CUDA-Q., 2024 CUDA-Q Developers. 2024. CUDA-Q Documentation. Retrieved on 31/12/24 from https://nvidia.github.io/cuda-quantum/latest/index.html, driven by the need to write more expressive programs for the improving hardware (as we have been discussing), as well as for performance reasons, to scale quantum compilation to large scale Ittah, 2022David Ittah, Thomas Häner, Vadym Kliuchnikov and Torsten Hoefler. 2022. QIRO: A Static Single Assignment-based Quantum Program Representation for Optimization. ACM Transactions on Quantum Computing 3, 3 (June 2022, 1--32). doi: 10.1145/3491247, accelerate quantum simulations Ittah, 2024David Ittah, Ali Asadi, Erick Ochoa Lopez, Sergei Mironov, Samuel Banning, Romain Moyard, Mai Jacob Peng and Josh Izaac. 2024. Catalyst: a Python JIT compiler for auto-differentiable hybrid quantum programs. Journal of Open Source Software 9, 99 (July 2024, 6720). doi: 10.21105/joss.06720 and integrate with classical high-performance computing (HPC) NVIDIA, 2024 NVIDIA. 2024. NVIDIA Accelerates Quantum Computing Centers Worldwide With CUDA-Q Platform. Retrieved on 31/12/2024 from https://investor.nvidia.com/news/press-release-details/2024/NVIDIA-Accelerates-Quantum-Computing-Centers-Worldwide-With-CUDA-Q-Platform/default.aspx.

The history of programming is, first and foremost, a masterclass in constructing abstractions. Many of the higher-level primitives, that have proven invaluable classically, solve problems that we expect to encounter very soon in our hybrid programs – when we have not already. Examples include

  • structured control flow to simplify reasoning about branching in quantum-classical hybrid programs,
  • type systems to encode program logic and catch errors at compile time – this is particularly important for quantum programs as there is no graceful way of handling runtime errors on quantum hardware: by the time the error has been propagated to the caller, all quantum data stored on qubits is probably corrupted and lost,
  • memory management such as reference counting and data ownership models. Current hardware follows a static memory model, in which the number of available qubits is fixed, and every operation acts on a set of qubits assigned at compile time. This becomes impossible to keep track of in instances such as qubit allocations within loops with an unknown number of iterations at compile time. It thus becomes necessary to manage qubits dynamically, just like classical memory.

To facilitate such a large swath of abstractions, the first step quantum compilers must take is to make a distinction between the language frontend and the intermediate representation (IR) that the compiler uses to reason about the program and perform optimisations. This will be the topic of chapter 3. The graph-based IR that we introduce in that chapter will then form the foundation for the new quantum compilation techniques that will be developed throughout the remainder of the thesis.


  1. Or, as we would say in programming parlance, to allocate↩︎

  2. We should at this point – at the risk of stoking controversy – acknowledge the commendable efforts of scientists chasing the Majorana particle Sau, 2010Jay D. Sau, Roman M. Lutchyn, Sumanta Tewari and S. Das Sarma. 2010. Generic New Platform for Topological Quantum Computation Using Semiconductor Heterostructures. Physical Review Letters 104, 4 (January 2010, 040502). doi: 10.1103/physrevlett.104.040502 Haaf, 2024Sebastiaan L. D. ten Haaf, Qingzhen Wang, A. Mert Bozkurt, Chun-Xiao Liu, Ivan Kulesh, Philip Kim, Di Xiao, Candice Thomas, Michael J. Manfra, Tom Dvir, Michael Wimmer and Srijit Goswami. 2024. A two-site Kitaev chain in a two-dimensional electron gas. Nature 630, 8016 (June 2024, 329--334). doi: 10.1038/s41586-024-07434-9 Mourik, 2012V. Mourik, K. Zuo, S. M. Frolov, S. R. Plissard, E. P. A. M. Bakkers and L. P. Kouwenhoven. 2012. Signatures of Majorana Fermions in Hybrid Superconductor-Semiconductor Nanowire Devices. Science 336, 6084 (May 2012, 1003--1007). doi: 10.1126/science.1222360. The topological quantum computers these would enable are, to our knowledge, the only quantum architecture proposed that could do away with error correction. ↩︎

  3. Here we are using Apple clang v15.0.0, running macOS 14.7 on an Apple M3 Max chip. ↩︎

2.5. Summary and further reading

This introductory chapter covered some of the basic principles of quantum computation and, in doing so, hopefully, made a convincing argument as to why we should expect the programs running on quantum hardware to become more complex in the future, with the intertwining of classical and quantum computations – processes we refer to as hybrid quantum-classical programs. Prior to that, we also presented quantum compilation, an emerging discipline that is introducing many new problems and ideas to the established corpus of work on compiler research.

If this quantum taster has intrigued you or you would like to learn the basics from people who actually know what they are talking about, nothing beats the reference book for quantum information and quantum computing by Nielsen and Chuang Nielsen, 2016Michael A. Nielsen and Isaac L. Chuang. 2016. Quantum Computation and Quantum Information (10th Anniversary edition). Cambridge University Press. A fascinating alternative perspective on quantum theory has also been developed within the programme of categorical quantum mechanics, for which the illustrious “Dodo book” Coecke, 2017Bob Coecke and Aleks Kissinger. 2017. Picturing Quantum Processes: A First Course in Quantum Theory and Diagrammatic Reasoning. Cambridge University Press. doi: 10.1017/9781316219317 would be the go-to introductory material1.

At the risk of turning this thesis into absolutely shameless Oxford self-promotion, guess what else was a product of this university’s world-class research? The quantum circuit itself! These diagrams came from theoretical physicists (no surprise here) interested in capturing thought experiments in quantum information theory Deutsch, 1989David Deutsch. 1989. Quantum Computational Networks. In Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences. Royal Society, 73--90.

The idea caught on, and soon, software tools were created to facilitate building such diagrams. The Quantum Computation Language (QCL) was one of the first Ömer, 2000Bernhard Ömer. 2000. Quantum Programming in QCL. (January 2000). Retrieved from http://tph.tuwien.ac.at/ oemer/doc/quprog.pdf. Quantum software2 has since proliferated, especially as the possibility of actually performing these thought experiments on quantum hardware became more tangible. The result were software packages for quantum computing, designed for the automatic transformation and optimisation of quantum computations for execution on real hardware Javadi., 2024Ali Javadi-Abhari, Matthew Treinish, Kevin Krsulich, Christopher J. Wood, Jake Lishman, Julien Gacon, Simon Martiel, Paul D. Nation, Lev S. Bishop, Andrew W. Cross, Blake R. Johnson and Jay M. Gambetta. 2024. Quantum computing with Qiskit. arXiv: 2405.08810 [quant-ph] Cirq D., 2024 Cirq Developers. 2024. Cirq Steiger, 2018Damian S. Steiger, Thomas Häner and Matthias Troyer. 2018. ProjectQ: an open source software framework for quantum computing. Quantum 2 (January 2018, 49). doi: 10.22331/q-2018-01-31-49 Sivara., 2020Seyon Sivarajah, Silas Dilkes, Alexander Cowtan, Will Simmons, Alec Edgington and Ross Duncan. 2020. t|ket⟩: a retargetable compiler for NISQ devices. Quantum Science and Technology 6, 1 (November 2020, 014003). doi: 10.1088/2058-9565/ab8e92​ – we called them quantum compilers.

A recent development for quantum compilers focuses on scalability and first-class support for hybrid quantum-classical computations. Quantum circuits that include some form of classical control have been variously called “dynamic circuits” (e.g. Córco., 2021A. D. Córcoles, Maika Takita, Ken Inoue, Scott Lekuch, Zlatko K. Minev, Jerry M. Chow and Jay M. Gambetta. 2021. Exploiting Dynamic Quantum Circuits in a Quantum Algorithm with Superconducting Qubits. Physical Review Letters 127, 10 (August 2021, 100501). doi: 10.1103/physrevlett.127.100501), “adaptive circuits” (e.g. Smith, 2024Kevin C. Smith, Abid Khan, Bryan K. Clark, S.M. Girvin and Tzu-Chieh Wei. 2024. Constant-Depth Preparation of Matrix Product States with Adaptive Quantum Circuits. PRX Quantum 5, 3 (Septempter 2024, 030344). doi: 10.1103/prxquantum.5.030344), “circuits with measurements and feedforward” (e.g. Graham, 2023T. M. Graham, L. Phuttitarn, R. Chinnarasu, Y. Song, C. Poole, K. Jooya, J. Scott, A. Scott, P. Eichler and M. Saffman. 2023. Midcircuit Measurements on a Single-Species Neutral Alkali Atom Quantum Processor. Physical Review X 13, 4 (December 2023, 041051). doi: 10.1103/physrevx.13.041051), and “circuits assisted by local operations and classical communication” (e.g. Piroli, 2021Lorenzo Piroli, Georgios Styliaris and J. Ignacio Cirac. 2021. Quantum Circuits Assisted by Local Operations and Classical Communication: Transformations and Phases of Matter. Physical Review Letters 127, 22 (November 2021, 220503). doi: 10.1103/physrevlett.127.220503).

Besides supporting advances in quantum hardware Córco., 2021A. D. Córcoles, Maika Takita, Ken Inoue, Scott Lekuch, Zlatko K. Minev, Jerry M. Chow and Jay M. Gambetta. 2021. Exploiting Dynamic Quantum Circuits in a Quantum Algorithm with Superconducting Qubits. Physical Review Letters 127, 10 (August 2021, 100501). doi: 10.1103/physrevlett.127.100501 Graham, 2023T. M. Graham, L. Phuttitarn, R. Chinnarasu, Y. Song, C. Poole, K. Jooya, J. Scott, A. Scott, P. Eichler and M. Saffman. 2023. Midcircuit Measurements on a Single-Species Neutral Alkali Atom Quantum Processor. Physical Review X 13, 4 (December 2023, 041051). doi: 10.1103/physrevx.13.041051 Pino, 2021J. M. Pino, J. M. Dreiling, C. Figgatt, J. P. Gaebler, S. A. Moses, M. S. Allman, C. H. Baldwin, M. Foss-Feig, D. Hayes, K. Mayer, C. Ryan-Anderson and B. Neyenhuis. 2021. Demonstration of the trapped-ion quantum CCD computer architecture. Nature 592, 7853 (April 2021, 209--213). doi: 10.1038/s41586-021-03318-4, hybrid classical-quantum computations are central to many quantum computing applications. As put recently by Alam and Clark Alam, 2024Faisal Alam and Bryan K. Clark. 2024. Learning dynamic quantum circuits for efficient state preparation. arXiv: 2410.09030 [quant-ph]​:

“[…] dynamic quantum circuits are a crucial milestone on the roadmap to fault-tolerant quantum computers.”

We have covered a small subset of applications of hybrid quantum-classical computations. Quantum teleportation is undoubtedly one of the oldest Bennett, 1993Charles H. Bennett, Gilles Brassard, Claude Crépeau, Richard Jozsa, Asher Peres and William K. Wootters. 1993. Teleporting an unknown quantum state via dual classical and Einstein-Podolsky-Rosen channels. Physical Review Letters 70, 13 (March 1993, 1895--1899). doi: 10.1103/physrevlett.70.1895. The block-encoding technique that we discussed in section 2.3 is the foundation of several algorithms, including the Quantum Singular Value Decomposition (QSVT) Gilyén, 2019András Gilyén, Yuan Su, Guang Hao Low and Nathan Wiebe. 2019. Quantum singular value transformation and beyond: exponential improvements for quantum matrix arithmetics. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, June 2019. ACM, 193--204. doi: 10.1145/3313276.3316366 and the Linear Combination of Unitaries (LCU) Chakra., 2024Shantanav Chakraborty. 2024. Implementing any Linear Combination of Unitaries on Intermediate-term Quantum Computers. Quantum 8 (October 2024, 1496). doi: 10.22331/q-2024-10-10-1496 Sze, 2025Michelle Wynne Sze, Yao Tang, Silas Dilkes, David Muñoz Ramo, Ross Duncan and Nathan Fitzpatrick. 2025. Hamiltonian dynamics simulation using linear combination of unitaries on an ion trap quantum computer. arXiv: 2501.18515 [quant-ph]. Measurement-based quantum computing (MBQC) was introduced in Rausse., 2001Robert Raussendorf and Hans J. Briegel. 2001. A One-Way Quantum Computer. Physical Review Letters 86, 22 (May 2001, 5188--5191). doi: 10.1103/PhysRevLett.86.5188 and is forming the base for some photonic quantum computing architectures Bartol., 2023Sara Bartolucci, Patrick Birchall, Hector Bombín, Hugo Cable, Chris Dawson, Mercedes Gimeno-Segovia, Eric Johnston, Konrad Kieling, Naomi Nickerson, Mihir Pant, Fernando Pastawski, Terry Rudolph and Chris Sparrow. 2023. Fusion-based quantum computation. Nature Communications 14, 1 (February 2023). doi: 10.1038/s41467-023-36493-1 Bouras., 2021J. Eli Bourassa, Rafael N. Alexander, Michael Vasmer, Ashlesha Patil, Ilan Tzitrin, Takaya Matsuura, Daiqin Su, Ben Q. Baragiola, Saikat Guha, Guillaume Dauphinais, Krishna K. Sabapathy, Nicolas C. Menicucci and Ish Dhand. 2021. Blueprint for a Scalable Photonic Fault-Tolerant Quantum Computer. Quantum 5 (February 2021, 392). doi: 10.22331/q-2021-02-04-392. Hybrid programs have also been shown to be useful for implementing the Quantum Fourier Transform (QFT) Bäumer, 2024Elisa Bäumer, Vinay Tripathi, Alireza Seif, Daniel Lidar and Derek S. Wang. 2024. Quantum Fourier Transform Using Dynamic Circuits. Physical Review Letters 133, 15 (October 2024, 150602). doi: 10.1103/physrevlett.133.150602 and the Quantum Phase Estimation (QPE) algorithms Córco., 2021A. D. Córcoles, Maika Takita, Ken Inoue, Scott Lekuch, Zlatko K. Minev, Jerry M. Chow and Jay M. Gambetta. 2021. Exploiting Dynamic Quantum Circuits in a Quantum Algorithm with Superconducting Qubits. Physical Review Letters 127, 10 (August 2021, 100501). doi: 10.1103/physrevlett.127.100501, two of the most fundamental computation primitives for quantum algorithms.

On the other hand, repeat until success schemes Paetzn., 2014Adam Paetznick and Krysta M. Svore. 2014. Repeat-until-success: non-deterministic decomposition of single-qubit unitaries. Quantum Information & Computation 14, 15–16 (November 2014, 1277–1301) are widespread in state preparation routines and will play a key role in fault-tolerant (FT) quantum computing. Arguably, the most well-known scheme for FT is magic state distillation Bravyi, 2005Sergey Bravyi and Alexei Kitaev. 2005. Universal quantum computation with ideal Clifford gates and noisy ancillas. Physical Review A 71, 2 (February 2005, 022316). doi: 10.1103/PhysRevA.71.022316, a procedure expected to be a core building block of many FT architectures. State preparation is generally a ubiquitous problem for FT, as the error-correcting codes that are employed initiate computations starting from a logical zero state, which may be expensive to prepare on the qubits of the hardware Fowler, 2012Austin G. Fowler, Matteo Mariantoni, John M. Martinis and Andrew N. Cleland. 2012. Surface codes: Towards practical large-scale quantum computation. Physical Review A 86, 3 (Septempter 2012, 032324). doi: 10.1103/physreva.86.032324.

Finally, quantum error-correcting (QEC) codes themselves must be implemented using hybrid programs. The quantum error correction (QEC) literature is vast and can get very technical very quickly, but diving into it promises bountiful rewards. The field is one of quantum information’s fastest-evolving areas of research. These work-in-progress lecture notes Gottes., 2024Daniel Gottesman. 2024. Surviving as a Quantum Computer in a Classical World. (February 2024). Retrieved on 08/01/2025 (lecture notes) from https://www.cs.umd.edu/class/spring2024/cmsc858G/QECCbook-2024-ch1-15.pdf by a coryphaeus of the field make for excellent introductory material.


  1. And while we’re on the topic of my supervisor’s brilliant work, there is also a very recent textbook, a sort of spiritual successor to Coecke, 2017Bob Coecke and Aleks Kissinger. 2017. Picturing Quantum Processes: A First Course in Quantum Theory and Diagrammatic Reasoning. Cambridge University Press. doi: 10.1017/9781316219317, particularly focused on quantum compilation Kissin., 2024Aleks Kissinger and John van de Wetering. 2024. Picturing Quantum Software: An Introduction to the ZX-Calculus and Quantum Compilation. Preprint. It is just as worth a read and might appeal more to the computer science-y reader. ↩︎

  2. That is classical software written to control and optimise quantum computations. ↩︎


Chapter 3

Quantum Compilation as a Graph Transformation Problem

The specialised optimisation techniques that we reviewed in section 2.2 are effective for the scenarios they were designed for, but they are challenging to adapt to new hardware primitives, constraints, or cost functions.

This thesis proposes interpreting quantum compilation as a graph transformation system (GTS). GTSs endow quantum compilation with well-defined semantics and strong theoretical foundations Lack, 2005Stephen Lack and Pawel Sobocinski. 2005. Adhesive and quasiadhesive categories. RAIRO - Theoretical Informatics and Applications 39, 3 (July 2005, 511--545). doi: 10.1051/ITA:2005028. They establish a practical, purely declarative framework in which compiler transformations can be defined and studied.

This allows us to decouple the semantics of quantum programs and the architecture specifics from the compiler infrastructure itself. We can thus focus on building and designing scalable and efficient graph transformation algorithms that can then be applied on a wide range of compilation problems and hardware targets.

In this chapter, we formalise quantum computation and optimisation based on graphs and graph transformations, providing the foundation for all considerations in later chapters. Albeit slightly simplified, the intermediate representation IR we propose here is based on joint work Mark K., 2025Seyon Sivarajah, Alan Lawrence, Alec Edgington, Douglas Wilson, Craig Roy, Luca Mondada, Lukas Heidemann, Ross Duncan Mark Koch. 2025. HUGR: A Quantum-Classical Intermediate Representation. Retrieved (talk recording) from https://www.youtube.com/live/D8esZrt7ogk?feature=shared&t=5217, as well as ongoing development.

The words graph rewrite and graph transformation are often used interchangeably in the literature. In the context of this thesis, we will take these words to distinguish two slightly different problems:

The study of equivalences and other relations between graphs under well-defined semantics is the subject of graph transformations. For instance:

  • a graph transformation rule LRL \to R (Definition .) expresses that an instance of LL can always be transformed into an instance of RR, reflecting the semantics of the system that the graph is modelling.
  • a minIR equivalence class (Definition .) is an instance of a graph transformation system (GTS), which uses known semantic relations, expressed, for example, as graph transformation rules, to define how graphs can be transformed.

Graph rewriting, on the other hand, encapsulates the algorithmic procedures and data structures that mutate graphs. A rewrite (Definition 3.9) is the tuple of data required to turn a graph GG into a new graph GG'.

Given matches of patterns LL on a graph GG, a graph transformation system can consider the set of graph transformation rules that define the semantics of GG to produce a set of rewrites that can be applied to GG and mutate GG.

Our contributions in the subsequent chapters are mostly preoccupied with problems of graph rewriting, i.e. the definition and application of the data required to mutate graphs, as opposed to graph transformations. This chapter nonetheless considers both, using the mature graph transformation framework as a foundation to define IR rewriting semantics.

Section 3.1 starts with a review of previous related work at the intersection of graph transformation software and quantum program optimisation. We then discuss in section 3.2 a fundamental difference between classical computation graphs and the requirements of quantum computation. This motivates a new graph-based IR tailored to quantum computation that we present in section 3.3, along with formal graph rewriting semantics based on sesqui-pushout (SqPO) transformations (section 3.4). Whilst the SqPO transformation definition is constructive, its existence is not guaranteed. We conclude the chapter in section 3.5 by discussing a more restricted “operational” notion of graph rewriting that will be useful for the rest of the thesis.

3.1. Related work

Graph rewriting on computation graphs.  Optimisation of computation graphs is a long-standing problem in computer science that is seeing renewed interest in the compiler Lattner, 2021Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache and Oleksandr Zinenko. 2021. MLIR: Scaling Compiler Infrastructure for Domain Specific Computation. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), February 2021. IEEE, 2--14. doi: 10.1109/CGO51591.2021.9370308, machine learning (ML) Jia, 2019Zhihao Jia, Oded Padon, James Thomas, Todd Warszawski, Matei Zaharia and Alex Aiken. 2019. TASO: optimizing deep learning computation with automatic generation of graph substitutions. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, October 2019. ACM, 47--62. doi: 10.1145/3341301.3359630 Fang, 2020Jingzhi Fang, Yanyan Shen, Yue Wang and Lei Chen. 2020. Optimizing DNN computation graph using graph substitutions. Proceedings of the VLDB Endowment 13, 12 (August 2020, 2734--2746). doi: 10.14778/3407790.3407857 and quantum computing communities Xu, 2022Mingkuan Xu, Zikun Li, Oded Padon, Sina Lin, Jessica Pointing, Auguste Hirth, Henry Ma, Jens Palsberg, Alex Aiken, Umut A. Acar and Zhihao Jia. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 Xu, 2023Amanda Xu, Abtin Molavi, Lauren Pick, Swamit Tannu and Aws Albarghouthi. 2023. Synthesizing Quantum-Circuit Optimizers. Proceedings of the ACM on Programming Languages 7, PLDI (June 2023, 835--859). doi: 10.1145/3591254. In all these domains, graphs encode computations that are either expensive to execute or evaluated repeatedly over many iterations, making the optimisation of the execution cost of the computation a primary concern.

Domain-specific heuristics are the most common approach in compiler optimisations Paszke, 2019Adam Paszke, Sam Gross, Francisco Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kö pf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai and S. Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Neural Information Processing Systems. doi: 10.5555/3454287.3455008 Sivara., 2020Seyon Sivarajah, Silas Dilkes, Alexander Cowtan, Will Simmons, Alec Edgington and Ross Duncan. 2020. t|ket⟩: a retargetable compiler for NISQ devices. Quantum Science and Technology 6, 1 (November 2020, 014003). doi: 10.1088/2058-9565/ab8e92​ – a more flexible alternative are optimisation engines based on declarative sets of graph transformations Bonchi, 2022Filippo Bonchi, Fabio Gadducci, Aleks Kissinger, Pawel Sobocinski and Fabio Zanasi. 2022. String Diagram Rewrite Theory I: Rewriting with Frobenius Structure. Journal of the ACM 69, 2 (March 2022, 1 - 58). doi: 10.1145/3502719 Bonchi, 2022Filippo Bonchi, Fabio Gadducci, Aleks Kissinger, Pawel Sobocinski and Fabio Zanasi. 2022. String diagram rewrite theory II: Rewriting with symmetric monoidal structure. Mathematical Structures in Computer Science 32, 4 (April 2022, 511--541). doi: 10.1017/s0960129522000317. In such systems, a graph transformation system (GTS) is used to find a sequence of allowed transformations that rewrite a computation graph given as input into a computation graph with minimal cost.

Transformation systems were first studied on strings Dersho., 1990Nachum Dershowitz and Jean-Pierre Jouannaud. 1990. Rewrite Systems, then generalised to trees and terms Bezem, 2003Marc Bezem, Jan Willem Klop and Roel Vrijer. 2003. Term Rewriting Systems (1. publ. ed.). Cambridge University Press, Cambridge, before being applied to graph domains Ehrig, 1973Hartmut Ehrig, Michael Pfender and Hans Jürgen Schneider. 1973. Graph-Grammars: An Algebraic Approach. In 14th Annual Symposium on Switching and Automata Theory, Iowa City, Iowa, USA, October 15-17, 1973. IEEE Computer Society, 167--180. doi: 10.1109/SWAT.1973.11 Rozenb., 1997Grzegorz Rozenberg. 1997. Handbook of Graph Grammars and Computing by Graph Transformations, Volume 1: Foundations. World Scientific König, 2018Barbara König, Dennis Nolte, Julia Padberg and Arend Rensink. 2018. A Tutorial on Graph Transformation. In Graph Transformation, Specifications, and Nets - In Memory of Hartmut Ehrig. Springer, 83--104. doi: 10.1007/978-3-319-75396-6_5. Their use in quantum computing is part of a long tradition of diagrammatic reasoning in physics Penrose, 1964Roger Penrose. 1964. Conformal treatment of infinity. General Relativity and Gravitation 43, 3 (901--922 (reprint)). doi: 10.1007/s10714-010-1110-5 Feynman, 1949R. P. Feynman. 1949. Space-Time Approach to Quantum Electrodynamics. Physical Review 76, 6 (Septempter 1949, 769--789). doi: 10.1103/physrev.76.769, and particularly in quantum mechanics with the advent of categorical quantum mechanics Abrams., 2008Samson Abramsky and Bob Coecke. 2008. Categorical quantum mechanics. arXiv: 0808.1023 [quant-ph] Coecke, 2012Bob Coecke, Ross Duncan, Aleks Kissinger and Quanlong Wang. 2012. Strong Complementarity and Non-locality in Categorical Quantum Mechanics. In 2012 27th Annual IEEE Symposium on Logic in Computer Science, June 2012. IEEE, 245--254. doi: 10.1109/lics.2012.35 Coecke, 2017Bob Coecke and Aleks Kissinger. 2017. Picturing Quantum Processes: A First Course in Quantum Theory and Diagrammatic Reasoning. Cambridge University Press. doi: 10.1017/9781316219317.

GTS in quantum computing.  In quantum computing, the ZX calculus Coecke, 2008Bob Coecke and Ross Duncan. 2008. Interacting Quantum Observables and other diagrammatic theories that derive from it are particularly important. Properties of GTSs such as completeness, confluence and termination Verma, 1995Rakesh M. Verma. 1995. Transformations and confluence for rewrite systems. Theoretical Computer Science 152, 2 (December 1995, 269--283). doi: 10.1016/0304-3975(94)00255-0 are well-studied within this field Backens, 2014Miriam Backens. 2014. The ZX-calculus is complete for stabilizer quantum mechanics. New Journal of Physics 16, 9 (Septempter 2014, 093021). doi: 10.1088/1367-2630/16/9/093021 Backens, 2019Miriam Backens and Aleks Kissinger. 2019. ZH: A Complete Graphical Calculus for Quantum Computations Involving Classical Non-linearity. Electronic Proceedings in Theoretical Computer Science 287 (January 2019, 23--42). doi: 10.4204/EPTCS.287.2 Biamon., 2023J Biamonte and A Nasrallah. 2023. The ZX-Calculus is Canonical in the Heisenberg Picture for Stabilizer Quantum Mechanics. arXiv: 2301.05717 [quant-ph]. These results have formed the basis for software implementations of circuit optimisations with soundness and performance guarantees Duncan, 2020Ross Duncan, Aleks Kissinger, Simon Perdrix and John van de Wetering. 2020. Graph-theoretic Simplification of Quantum Circuits with the ZX-calculus. Quantum 4 (June 2020, 279). doi: 10.22331/q-2020-06-04-279 Kissin., 2020Aleks Kissinger and John van de Wetering. 2020. PyZX: Large Scale Automated Diagrammatic Reasoning. In Proceedings 16th International Conference on Quantum Physics and Logic, Chapman University, Orange, CA, USA., 10-14 June 2019. Open Publishing Association, 229-241. doi: 10.4204/EPTCS.318.14 Sivara., 2020Seyon Sivarajah, Silas Dilkes, Alexander Cowtan, Will Simmons, Alec Edgington and Ross Duncan. 2020. t|ket⟩: a retargetable compiler for NISQ devices. Quantum Science and Technology 6, 1 (November 2020, 014003). doi: 10.1088/2058-9565/ab8e92 Borgna, 2023Agustín Borgna. 2023. Towards a compiler toolchain for quantum programs. PhD Thesis. Loria, Université de Lorraine.

Great strides are also being made in our theoretical understanding of transformation systems for quantum circuits. Recently, Clément et al. established completeness for the first time Cléme., 2023Alexandre Clément, Nicolas Heurtel, Shane Mansfield, Simon Perdrix and Benoît Valiron. 2023. A Complete Equational Theory for Quantum Circuits. In 2023 38th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), June 2023. IEEE, 1--13. doi: 10.1109/lics56636.2023.10175801 as well as minimality Cléme., 2024Alexandre Clément, Noé Delorme and Simon Perdrix. 2024. Minimal Equational Theories for Quantum Circuits. In Proceedings of the 39th Annual ACM/IEEE Symposium on Logic in Computer Science, July 2024. ACM, 1--14. doi: 10.1145/3661814.3662088 of a GTS for quantum circuits. A set of circuit transformation rules were presented such that no rule is redundant, and for any two equivalent quantum circuits, there exists a sequence of local transformations rewriting one into the other. Such systems are however not confluent, and this is unlikely to change: most circuit optimisation problems are known to be computationally hard Weteri., 2024John van de Wetering and Matt Amy. 2024. Optimising quantum circuits is generally hard. arXiv: 2310.05958 [quant-ph].

There is also another inherent tension in integrating diagrammatic calculi into compilers. Diagrammatic theories arise from abstract primitives that admit a simple rewriting logic Heurtel, 2024Nicolas Heurtel. 2024. A Complete Graphical Language for Linear Optical Circuits with Finite-Photon-Number Sources and Detectors. arXiv: 2402.17693 [quant-ph] Booth, 2024Robert I. Booth, Titouan Carette and Cole Comfort. 2024. Graphical Symplectic Algebra. arXiv: 2401.07914 [cs.LO] Felice, 2023Giovanni de Felice, Razin A. Shaikh, Boldizsár Poór, Lia Yeh, Quanlong Wang and Bob Coecke. 2023. Light-Matter Interaction in the ZXW Calculus. Electronic Proceedings in Theoretical Computer Science 384 (August 2023, 20--46). doi: 10.4204/EPTCS.384.2 Carette, 2023Titouan Carette, Timothée Hoffreumon, Émile Larroque and Renaud Vilmart. 2023. Complete Graphical Language for Hermiticity-Preserving Superoperators. In 2023 38th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), June 2023. IEEE, 1--22. doi: 10.1109/LICS56636.2023.10175712; compilers meanwhile must capture all the expressivity, constraints and messiness of real-world hardware targets, with all the edge cases and exceptions that this entails.

An example of this is the ZX circuit extraction problem Quanz, 2024Marcel Quanz, Korbinian Staudacher and Karl Fürlinger. 2024. Parallel Quantum Circuit Extraction from MBQC-Patterns. In 2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 1078-1087. doi: 10.1109/IPDPSW63119.2024.00179 Backens, 2021Miriam Backens, Hector Miller-Bakewell, Giovanni de Felice, Leo Lobski and John van de Wetering. 2021. There and back again: A circuit extraction tale. Quantum 5 (March 2021, 421). doi: 10.22331/q-2021-03-25-421​: it is in general hard to recover an executable quantum circuit from a ZX diagram as the latter is strictly more general and primitives cannot be mapped one-to-one. Similarly, while simple quantum-classical hybrid computations can be expressed using extensions of ZX Borgna, 2021Agustín Borgna, Simon Perdrix and Benoît Valiron. 2021. Hybrid Quantum-Classical Circuit Simplification with the ZX-Calculus. In Programming Languages and Systems, Cham. Springer International Publishing, 121--139. doi: 10.1007/978-3-030-89051-3_8 Carette, 2021Titouan Carette, Emmanuel Jeandel, Simon Perdrix and Renaud Vilmart. 2021. Completeness of Graphical Languages for Mixed State Quantum Mechanics. ACM Transactions on Quantum Computing 2, 4 (December 2021, 1--28). doi: 10.1145/3464693 Koziel., 2024Alexander Koziell-Pipe and Aleks Kissinger. 2024. Hybrid Quantum-Classical Machine Learning with String Diagrams. arXiv: 2407.03673 [quant-ph], it will never be possible to capture the full breadth and generality of classical CPU instruction sets in a practical and extensible (and algebraically satisfying) way.

Peephole optimisations.  As an alternative to the very principled approach of elegant calculi, graph transformations can also be used in the absence of theoretical guarantees in a more ad hoc fashion. Indeed, many existing (classical and quantum) compiler optimisations can already be understood as graph transformations. For as long as compilation has existed, compilers have relied on local transformations of the IR, typically referred to as peephole optimisations McKeem., 1965W. M. McKeeman. 1965. Peephole optimization. Communications of the ACM 8, 7 (July 1965, 443--444). doi: 10.1145/364995.365000 Tanenb., 1982Andrew S. Tanenbaum, Hans van Staveren and Johan W. Stevenson. 1982. Using Peephole Optimization on Intermediate Code. ACM Transactions on Programming Languages and Systems 4, 1 (January 1982, 21--36). doi: 10.1145/357153.357155. Such optimisation strategies are based on the heuristic that local optimisations to the program will produce a well-optimised result overall. Mature compiler ecosystems have developed tools for declarative definitions, as well as automatic generation and correctness proving of peephole optimisations Menend., 2017David Menendez and Santosh Nagarakatte. 2017. Alive-Infer: data-driven precondition inference for peephole optimizations in LLVM. ACM SIGPLAN Notices 52, 6 (June 2017, 49--63). doi: 10.1145/3140587.3062372 Lopes, 2015Nuno P. Lopes, David Menendez, Santosh Nagarakatte and John Regehr. 2015. Provably correct peephole optimizations with alive. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2015. ACM, 22--32. doi: 10.1145/2737924.2737965 Riddle, 2021River Riddle. 2021. PDLL: a new declarative rewrite frontend for MLIR. (November 2021). Retrieved on 13/01/2025 (RFC on Discourse) from https://discourse.llvm.org/t/rfc-pdll-a-new-declarative-rewrite-frontend-for-mlir/4798. We refer to the classical compiler literature, e.g. Muchni., 2007Steven S. Muchnick. 2007. Advanced compiler design and implementation ([Nachdr.] ed.). Morgan Kaufmann, San Francisco, Calif. [u.a.], for more details on the various types of common peephole optimisations.

Quantum compilers adopted peephole-style optimisations from the beginning Cheung, 2007Donny Cheung, Dmitri Maslov and Simone Severini. 2007. Translation techniques between quantum circuit architectures. In Workshop on quantum information processing, 1--3 Steiger, 2018Damian S. Steiger, Thomas Häner and Matthias Troyer. 2018. ProjectQ: an open source software framework for quantum computing. Quantum 2 (January 2018, 49). doi: 10.22331/q-2018-01-31-49 Sivara., 2020Seyon Sivarajah, Silas Dilkes, Alexander Cowtan, Will Simmons, Alec Edgington and Ross Duncan. 2020. t|ket⟩: a retargetable compiler for NISQ devices. Quantum Science and Technology 6, 1 (November 2020, 014003). doi: 10.1088/2058-9565/ab8e92. They encompass some of the most common optimisations in quantum computing, including the Euler Angle reduction Chatzi., 2009K. Ch. Chatzisavvas, G. Chadzitaskos, C. Daskaloyannis and S. G. Schirmer. 2009. Improving quantum gate fidelities using optimized Euler angles. Physical Review A 80, 5 (November 2009, 052329). doi: 10.1103/physreva.80.052329, the two-qubit KAK decomposition Tucci, 2005Robert R. Tucci. 2005. An Introduction to Cartan's KAK Decomposition for QC Programmers. arXiv: quant-ph/0507171 [quant-ph] Cross, 2019Andrew W. Cross, Lev S. Bishop, Sarah Sheldon, Paul D. Nation and Jay M. Gambetta. 2019. Validating quantum computers using randomized model circuits. Physical Review A 100, 3 (Septempter 2019, 032328). doi: 10.1103/physreva.100.032328 and all gate set rebases contri., 2025TKET contributors. 2025. Documentation: pytket.passes.AutoRebase. Retrieved on 13/01/2025 (TKET docs) from https://docs.quantinuum.com/tket/api-docs/passes.html#pytket.passes.AutoRebase. A quantum-specific flavour of peephole optimisation with close links to GTSs, template matching Maslov, 2008D. Maslov, G.W. Dueck, D.M. Miller and C. Negrevergne. 2008. Quantum Circuit Simplification and Level Compaction. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 27, 3 (March 2008, 436--444). doi: 10.1109/tcad.2007.911334 Iten, 2022Raban Iten, Romain Moyard, Tony Metger, David Sutter and Stefan Woerner. 2022. Exact and Practical Pattern Matching for Quantum Circuit Optimization. ACM Transactions on Quantum Computing 3, 1 (January 2022, 1--41). doi: 10.1145/3498325, achieved state-of-the-art results for Clifford circuit optimisation Bravyi, 2021Sergey Bravyi, Ruslan Shaydulin, Shaohan Hu and Dmitri Maslov. 2021. Clifford Circuit Optimization with Templates and Symbolic Pauli Gates. Quantum 5 (November 2021, 580). doi: 10.22331/q-2021-11-16-580. Recently, quantum peephole optimisations were also proposed that leverage provable state information to perform contextual optimisations Liu, 2021Ji Liu, Luciano Bello and Huiyang Zhou. 2021. Relaxed Peephole Optimization: A Novel Compiler Optimization for Quantum Circuits. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), February 2021. IEEE, 301--314. doi: 10.1109/cgo51591.2021.9370310, similar to strength reduction and optimisation with preconditions in classical compilation Lopes, 2015Nuno P. Lopes, David Menendez, Santosh Nagarakatte and John Regehr. 2015. Provably correct peephole optimizations with alive. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2015. ACM, 22--32. doi: 10.1145/2737924.2737965.

Internal representations.  The graph formalisation of quantum computations we will define in this chapter also draws a lot from the internal representations (IR) of programs in classical compilers. The classical compilation community has found significant advantages in sharing a common standardised IR format. Indeed, while the exact syntax constructs and abstractions vary across programming languages, and, at the other end of the compiler stack, the specific assembly instructions emitted differ between hardware targets, much of the compiler middleware can be broadly shared across use cases. This gave rise to the LLVM Lattner, 2004Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In International Symposium on Code Generation and Optimization, 2004. CGO 2004.. IEEE, 75--86. doi: 10.1109/CGO.2004.1281665 and, more recently, the MLIR Lattner, 2021Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache and Oleksandr Zinenko. 2021. MLIR: Scaling Compiler Infrastructure for Domain Specific Computation. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), February 2021. IEEE, 2--14. doi: 10.1109/CGO51591.2021.9370308 projects, which provide common compiler IRs, along with all the infrastructure compilers typically require: IR transformation tooling, translation into hardware-specific assembly, efficient serialisations, in-memory formats etc.

The idea of adopting LLVM for quantum was championed by QIR QIR Al., 2021 QIR Alliance. 2021. QIR Specification v0.1. Retrieved on 31/12/24 from https://www.qir-alliance.org/, a standard introducing quantum primitives into the LLVM IR. This was subsequently adopted by many quantum hardware providers for its superior expressive power compared to circuit-based formats QIR Al., 2023 QIR Alliance. 2023. NVIDIA Joins the QIR Alliance as the Effort Enters Year Two. Retrieved on 02/01/2025 from https://www.qir-alliance.org/posts/year_one_in_review/. Building on top of QIR, an IR specifically for quantum-classical programs was proposed in Mark K., 2025Seyon Sivarajah, Alan Lawrence, Alec Edgington, Douglas Wilson, Craig Roy, Luca Mondada, Lukas Heidemann, Ross Duncan Mark Koch. 2025. HUGR: A Quantum-Classical Intermediate Representation. Retrieved (talk recording) from https://www.youtube.com/live/D8esZrt7ogk?feature=shared&t=5217, with additional soundness guarantees based, among others, on the no-cloning principle of quantum information. In parallel, projects with similar aims have also emerged McCask., 2021Alexander McCaskey and Thien Nguyen. 2021. A MLIR Dialect for Quantum Assembly Languages. In 2021 IEEE International Conference on Quantum Computing and Engineering (QCE), October 2021. IEEE. doi: 10.1109/QCE52317.2021.00043 Ittah, 2022David Ittah, Thomas Häner, Vadym Kliuchnikov and Torsten Hoefler. 2022. QIRO: A Static Single Assignment-based Quantum Program Representation for Optimization. ACM Transactions on Quantum Computing 3, 3 (June 2022, 1--32). doi: 10.1145/3491247 that use the full MLIR and LLVM toolchain.

Challenges of GTS for compilation.  Peephole optimisations of compiler IRs have proven to be a fast, general and scalable approach to compilation and code optimisation in practice. However, the optimisation results depend heavily on well-designed transformation orderings and the performance may vary widely across (equivalent) input programs. This is commonly known in compiler research as the phase ordering problem Click, 1995Cliff Click and Keith D. Cooper. 1995. Combining analyses, combining optimizations. ACM Transactions on Programming Languages and Systems 17, 2 (March 1995, 181--196). doi: 10.1145/201059.201061​. When a compiler can modify code in multiple ways, it must determine which transformations to apply and in what sequence to achieve optimal results Whitfi., 1997Deborah L. Whitfield and Mary Lou Soffa. 1997. An approach for exploring code improving transformations. ACM Transactions on Programming Languages and Systems 19, 6 (November 1997, 1053--1084). doi: 10.1145/267959.267960 Liang, 2023Youwei Liang, Kevin Stone, Ali Shameli, Chris Cummins, Mostafa Elhoushi, Jiadong Guo, Benoit Steiner, Xiaomeng Yang, Pengtao Xie, Hugh Leather and Yuandong Tian. 2023. Learning Compiler Pass Orders using Coreset and Normalized Value Prediction. In Proceedings of the 40th International Conference on Machine Learning. JMLR.org. doi: 10.48550/ARXIV.2301.05104. This is a common design challenge in GTSs, often addressed through mechanisms such as rule controls Heckel, 2020Reiko Heckel and Gabriele Taentzer. 2020. Graph Transformation for Software Engineers: With Applications to Model-Based Development and Domain-Specific Language Engineering. Springer International Publishing. doi: 10.1007/978-3-030-43916-3.

This issue is also a key challenge within quantum compilation, as can be verified by comparing the performance of peephole-based compilers with provably optimal circuit synthesis strategies. On problem sizes where exhaustive search is feasible, unitary synthesis tools can sometimes outperform current, mostly peephole-based compilers Sivara., 2020Seyon Sivarajah, Silas Dilkes, Alexander Cowtan, Will Simmons, Alec Edgington and Ross Duncan. 2020. t|ket⟩: a retargetable compiler for NISQ devices. Quantum Science and Technology 6, 1 (November 2020, 014003). doi: 10.1088/2058-9565/ab8e92 by up to 50%1 Wu, 2020Xin-Chuan Wu, Marc Grau Davis, Frederic T. Chong and Costin Iancu. 2020. QGo: Scalable Quantum Circuit Optimization Using Automated Synthesis. arXiv: 2012.09835 [quant-ph].


  1. at the cost of many hours of compute, of course. ↩︎

3.2. Computation graphs and linearity

Computation graphs represent the flow of data between operations in a program, with nodes as operations and edges as data dependencies. Widely used in machine learning frameworks and GPU optimisations Bergst., 2011James Bergstra, Frédéric Bastien, Olivier Breuleux, Pascal Lamblin, Razvan Pascanu, Olivier Delalleau, Guillaume Desjardins, David Warde-Farley, Ian Goodfellow, Arnaud Bergeron and others. 2011. Theano: Deep learning on gpus with python. In NIPS 2011, BigLearning Workshop, Granada, Spain Zhao, 2023Yuxuan Zhao, Qi Sun, Zhuolun He, Yang Bai and Bei Yu. 2023. AutoGraph: Optimizing DNN Computation Graph for Parallel GPU Kernel Execution. Proceedings of the AAAI Conference on Artificial Intelligence 37, 9 (June 2023, 11354--11362). doi: 10.1609/aaai.v37i9.26343, they are conceptually equivalent to dataflow graphs used in compiler design, which were pioneered by Feo, 1990John T. Feo, David C. Cann and Rodney R. Oldehoeft. 1990. A report on the sisal language project. Journal of Parallel and Distributed Computing 10, 4 (December 1990, 349--366). doi: 10.1016/0743-7315(90)90035-n and Kahn, 1976Gilles Kahn and David MacQueen. 1976. Coroutines and networks of parallel processes. PhD Thesis. IRIA and are now central to most compiler IRs.

In classical computations, these graph representations of computations are essentially term graphs Barend., 1987H. P. Barendregt, M. C. J. D. Eekelen, J. R. W. Glauert, J. R. Kennaway, M. J. Plasmeijer and M. R. Sleep. 1987. Term Graph Rewriting​ – sets of algebraic expressions that are stored as trees, combined with an important optimisation known as term sharing. When identical subexpressions appear multiple times, they can be represented as one computation and referenced from multiple locations, creating a directed acyclic graph rather than a term tree Plump, 1999Detlef Plump. 1999. Term Graph Rewriting. (October 1999). This sharing enables a more efficient representation. It can also be used as a compiler optimisation to identify subexpressions that can be cached and shared across expression evaluations for a more efficient execution – a technique known as common subexpression elimination (CSE) Cocke, 1970John Cocke. 1970. Global common subexpression elimination. In Proceedings of a symposium on Compiler optimization -. ACM Press, 20--24. doi: 10.1145/800028.808480.

Each edge of a computation graphs corresponds to a unique value: the output of a previous computation that is being passed on to new operations. These values flow along edges in the graph – hence dataflow graph. Values are immutable: they are defined once and then passed as input to further operations, where they can only be consumed, never modified. In compiler speak, programs expressed using such immutable values are often called single static assignment (SSA) programs Cytron, 1991Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman and F. Kenneth Zadeck. 1991. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems 13, 4 (October 1991, 451--490). doi: 10.1145/115372.115320 Rosen, 1988B. K. Rosen, M. N. Wegman and F. K. Zadeck. 1988. Global Value Numbers and Redundant Computations. In Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages - POPL ’88. ACM Press, 12--27. doi: 10.1145/73560.73562. In SSA:

  1. Every value is defined exactly once,
  2. Every value may be used any number of times (including zero).

Quantum computing throws this second pillar of SSA into the bin. Values in quantum computations are the result of computations on quantum data, and as such must obey the no-cloning and no-deleting theorems (section 2.1). We call values subject to these restrictions linear1. They introduce the following constraint on valid computation graphs:

Every linear value must be used exactly once.

Linear values change fundamentally how transformations of the computation graph must be specified. Where compilers on classical data can:

  • freely share common subexpressions (term sharing),
  • undo term sharing, i.e. duplicate shared terms into independent subterms, and
  • delegate the identification and deletion of obsolete code to specialised passes (e.g., dead code elimination Cytron, 1991Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman and F. Kenneth Zadeck. 1991. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems 13, 4 (October 1991, 451--490). doi: 10.1145/115372.115320 Briggs, 1994Preston Briggs and Keith D. Cooper. 1994. Effective partial redundancy elimination. ACM SIGPLAN Notices 29, 6 (June 1994, 159--170). doi: 10.1145/773473.178257),

quantum compilers must enforce much stricter invariants on IR transformations – or risk producing invalid programs.

In classical compilers, IR modification APIs (such as MLIR’s PatternRewriter) decouple program transformation from code deletion. Program transformations are specified by copying existing values and introducing new values and operations as needed, while the actual deletion of unused code is deferred to specialized dead code elimination passes. This approach is no longer feasible in the presence of linear values. Computation graphs for quantum computations must adopt proper graph rewriting semantics, in which the explicit deletion of obsolete values and operations is just as much a part of the rewriting data as the new code generation.


  1. The terminology comes from “linear” logic Girard, 1987Jean-Yves Girard. 1987. Linear logic. Theoretical Computer Science 50, 1 (1--101). doi: 10.1016/0304-3975(87)90045-4. I apologise for slamming additional semantics on what I recognise is an already very overloaded term. ↩︎

3.3. A graph representation for quantum programs

For the purposes of this thesis, we introduce a simplified graph-based IR for quantum computations that we will call minIR. It captures all the expressiveness that we require for hybrid programs whilst remaining sufficiently abstract to be applicable to a variety of IRs and, by extension, quantum programming languages and compiler frameworks.

MinIR can be thought of as being built from statements of the form

x, y, ... := op(a, b, c, ...)

to be understood as an operation op applied on the SSA values a, b, c, ... and producing the output values x, y, ....

Computation and dataflow graphs are commonly defined with operations as vertices and values as edges. To faithfully capture the function signature of op, this requires storing and preserving an ordering of the incoming and outgoing edges (also known as port graphs Ferná., 2018Maribel Fernández, Hélène Kirchner and Bruno Pinaud. 2018. Labelled Port Graph – A Formal Structure for Models and Computations. Electronic Notes in Theoretical Computer Science 338 (October 2018, 3--21). doi: 10.1016/j.entcs.2018.10.002).

Instead, we adopt the hypergraph formalisation of computation graphs, which is more common within category theory (see string diagrams Seling., 2010P. Selinger. 2010. A Survey of Graphical Languages for Monoidal Categories and their formalisation in the hypergraph category Bonchi, 2022Filippo Bonchi, Fabio Gadducci, Aleks Kissinger, Pawel Sobocinski and Fabio Zanasi. 2022. String Diagram Rewrite Theory I: Rewriting with Frobenius Structure. Journal of the ACM 69, 2 (March 2022, 1 - 58). doi: 10.1145/3502719 Wilson, 2022Paul Wilson and Fabio Zanasi. 2022. The Cost of Compositionality: A High-Performance Implementation of String Diagram Composition. Electronic Proceedings in Theoretical Computer Science 372 (November 2022, 262--275). doi: 10.4204/eptcs.372.19). This definition is particularly well-suited for our purposes because it frames the graph transformations of interest to us in the well-studied language of rewriting within adhesive categories Lack, 2004Stephen Lack and Paweł Sobociński. 2004. Adhesive Categories. In Foundations of Software Science and Computation Structures, Berlin, Heidelberg. Springer Berlin Heidelberg, 273--288. doi: 10.1007/978-3-540-24727-2_20.

Hypergraphs and minIR #

At a minimum, a directed hypergraph – for simplicity sometimes in the following referred to simply as graph – is defined by a set of vertices V\mathbf V and a set of (hyper) edges E\mathbf E. We will always consider hypergraphs where the edges eEe \in \mathbf E are directed and the vertices attached to ee are given by ordered lists. We formalise this incidence relation between vertices in V\mathbf V and edges in E\mathbf E by writing E\mathbf E as the partition over the disjoint sets Est\mathbf E_{st}

E=s,tNEst\mathbf E = \bigsqcup_{s, t \in \mathbb{N}} \mathbf E_{st}

and introducing ss source and tt target maps for each Est\mathbf E_{st}. Why we write sets in boldface will become clear in a moment.

Definition 3.1Directed hypergraph

A directed hypergraph is given by sets V\mathbf V and Est\mathbf E_{st} for s,tNs, t \in\mathbb{N}, along with maps

srcst,i:EstVfor 1istgtst,j:EstVfor 1jt \begin{aligned} \textit{src}_{st,i}&: \mathbf E_{st} \to \mathbf V\quad&\textrm{for } 1 \leqslant i \leqslant s\\ \textit{tgt}_{st,j}&: \mathbf E_{st} \to \mathbf V&\textrm{for } 1 \leqslant j \leqslant t\\ \end{aligned}

for all s,tNs, t \in \mathbb{N}.

Note that in this thesis, as in most common uses of hypergraphs, the sets V\mathbf V and E=Est\mathbf E = \bigsqcup \mathbf E_{st} will always be finite, and thus Est\mathbf E_{st} \neq \varnothing for a finite number of s,tNs, t \in \mathbb{N} only.

For simplicity, we can further omit the stst subscript of the source srcst,i\textit{src}_{st,i} and target tgtst,j\textit{tgt}_{st,j} maps whenever it can be inferred from the domain of definition of the map. For eEste \in \mathbf{E}_{st}, we call src1(e),,srcs(e)V\textit{src}_{1}(e), \dots, \textit{src}_{s}(e) \in \mathbf{V} the ss source vertices of ee and tgt1(e),,tgtt(e)V\textit{tgt}_{1}(e), \dots, \textit{tgt}_{t}(e) \in \mathbf{V} the tt target vertices of ee.

We introduce the notation uvu \leadsto v to signify that there is an edge from uu to vv, i.e. there is eEste \in \mathbf E_{st} for some s,tNs, t \in \mathbb N and 1is,1jt1 \leqslant i \leqslant s, 1 \leqslant j \leqslant t such that u=srci(e)u = src_i(e) and v=tgtj(e)v = tgt_j(e). We define the equivalence relation V2\sim \subseteq \mathbf V^2 of connected vertices, given by the transitive, symmetric and reflexive closure of \leadsto. The equivalence classes of \sim are the connected components of the graph. We will write [v][v], resp. [e][e] for the connected component that contains the vertex vv, resp. the edge ee.

To proceed, it is useful to frame the hypergraph definition in a categorical setting. We write [C,Set][\mathbb{C}, \mathrm{Set}] for the presheaf topos of the category C\mathbb{C}, i.e. the category with functors CSet\mathbb{C} \to \mathrm{Set} as objects and natural transformations as morphisms. Definition 3.1 can be equivalently restated as:

hypergraphs are objects in the presheaf topos H=[C,Set]\mathbb H = [\mathbb{C}, \mathrm{Set}],

where the category C\mathbb{C} has objects VV and EstE_{st} for s,tNs, t \in \mathbb{N} and arrows given by (1), now interpreted as morphisms in C\mathbb{C} rather than as functions in Set\mathrm{Set}. In this framing, a graph is a functor that defines a set for each object of C\mathbb{C} and specifies functions between these sets – one for each arrow in C\mathbb{C}.

This is where the distinction between bold and non-bold typeface comes from: we use bold letters to refer to images in Set\mathrm{Set} of a hypergraph functor, whereas the non-bold typeface is refers to objects in the indexing category C\mathbb C. The distinction between C\mathbb C and Set\mathrm{Set} is less important for morphisms – it will typically be clear from the context. We thus use the same symbols for both.

Linearity constraints #

The introduction of H\mathbb H not only gives us a notion of hypergraph homomorphisms – maps between hypergraphs that preserve the structure of the graph. It also provides us with a way to express the linearity constraints that arise from our discussion in section 3.2, and which we must enforce on our computation graphs.

The definition that follows adds the coproduct EE explicitly as an object of the category (which we did not need to do in Definition 3.1), as we need it as the codomain of the new morphisms use\textit{use} and def\textit{def}. The adhesitivity of the category does no longer comes for free – we will get back to this in section 3.4.

Definition 3.2Hypergraph with linearity constraints

The category lin-C\textrm{lin-}\mathbb{C} is the category given by objects {V,Vsrc,Vtgt}{E}{Ests,tN}.\{V, V_\textit{src}, V_\textit{tgt}\} \cup \{ E \} \cup \{ E_{st}\, |\, s, t \in \mathbb{N}\}. Its arrows are the incidence morphisms given in (1), along with

use:VsrcEdef:VtgtEλsrc:VsrcVλtgt:VtgtV \begin{aligned} \mathit{use}&: V_\textit{src} \to E\quad& \mathit{def}&: V_\textit{tgt} \to E\\ \lambda_\mathit{src}&: V_\textit{src} \rightarrowtail V& \lambda_\mathit{tgt}&: V_\textit{tgt} \rightarrowtail V\\ \end{aligned}

and ιst:EstE\iota_{st}: E_{st} \rightarrowtail E for all s,tNs, t \in \mathbb{N}. The morphisms λsrc,λtgt,ιst\lambda_\mathit{src}, \lambda_\mathit{tgt}, \iota_{st} are split monomorphisms and the following diagrams commute for all s,tNs, t \in \mathbb{N} and 1is,1jt1 \leqslant i \leqslant s, 1 \leqslant j \leqslant t:

Directed hypergraphs with linearity conditions are objects in the full subcategory lin-H\textrm{lin-}\mathbb H given by objects Hlin[lin-C,Set]H_\mathrm{lin} \in [\textrm{lin-}\mathbb{C}, \mathrm{Set}] such that E=Est\mathbf E = \bigsqcup \mathbf E_{st} is the coproduct in Set\mathrm{Set} and Hlin(ιst):Hlin(Est)Hlin(E)H_\mathrm{lin}(\iota_{st}): H_\mathrm{lin}(E_{st}) \to H_\mathrm{lin}(E) are the injections into Hlin(E)H_\mathrm{lin}(E).

We probably owe an explanation for this definition – at least for the sake of the few computer scientists that are still following us.

First of all, notice that every hypergraph with linearity constraint corresponds to a hypergraph in the sense of Definition 3.1: there is an obvious functor L:Clin-C\mathcal{L}: \mathbb{C} \to \textrm{lin-}\mathbb{C} that maps each object and morphism in C\mathbb{C} to the object or morphism with the same name in lin-C\textrm{lin-}\mathbb{C}. By contravariance, we can thus (functorially) map every hypergraph with linearity constraints Hlinlin-HH_\mathrm{lin} \in \textrm{lin-}\mathbb{H} to the hypergraph H=HlinLHH = H_\mathrm{lin} \circ \mathcal{L} \in \mathbb{H}.

Another way of looking at this is to realise that by requiring that λsrc,λtgt\lambda_\mathit{src}, \lambda_\mathit{tgt} be split monomorphisms, we obtain that the resulting functions in Set\mathrm{Set} are injective. Up to isomorphism, we can consider that Vsrc,VtgtV\mathbf{V}_\textit{src}, \mathbf{V}_\textit{tgt} \subseteq \mathbf{V} are subsets of vertices in HlinH_\mathrm{lin}. A hypergraph with linearity constraints is thus a directed hypergraph with two selected subsets of vertices Vsrc\mathbf{V}_\textit{src} and Vtgt\mathbf{V}_\textit{tgt}.

Vertices within these subsets are special. For every vVsrcv \in \mathbf{V}_\textit{src}, there exist unique indices s,tNs, t \in \mathbb{N}, 1is1 \leq i \leq s and edge use(v)=eEstuse(v) = e \in \mathbf{E}_{st} such that srci(e)=v\textit{src}_i(e) = v. In words, for every vVsrcv\in \mathbf{V}_\textit{src} there is a unique edge in the hypergraph that has vv as one of its sources. We then say that ee is the unique use of vertex vv. Similarly, vertices in Vtgt\mathbf{V}_\textit{tgt} have a unique edge ee in the hypergraph with vv as one of its targets – it is the unique definition of vv.

Typed graphs #

MinIR graphs are strongly typed. We introduce typed graphs for this purpose, a concept for which graph transformation was first formalised in Ehrig, 2004Hartmut Ehrig, Ulrike Prange and Gabriele Taentzer. 2004. Fundamental Theory for Typed Attributed Graph Transformation. In Graph Transformations, Berlin, Heidelberg. Springer Berlin Heidelberg, 161--177. doi: 10.1007/978-3-540-30203-2_13. A type system for directed hypergraphs is just another object ΣH\Sigma \in \mathbb H. A typed graph is then an object of the slice category HΣ\mathbb H \searrow \Sigma, that is to say, typed graphs are morphisms HΣH \to \Sigma of H\mathbb{H} and morphisms between typed graphs are given by the subset of morphisms H1H2H_1 \rightarrow H_2 of H\mathbb H that make the triangle diagram formed by H1H_1, H2H_2 and Σ\Sigma commute.

To type hypergraphs with linearity constraints, we do not pick the type system Σ\Sigma in lin-H\textrm{lin-}\mathbb H, as the existence of useuse and defdef morphisms impose restrictions that are too strict. We consider instead the category lin-Ctype\textrm{lin-}\mathbb C_\textrm{type} given by the same objects as lin-C\textrm{lin-}\mathbb C, as well as the same morphisms, with the omission of useuse and defdef. There is an obvious functor lin-Ctypelin-C\textrm{lin-}\mathbb C_\textrm{type} \to \textrm{lin-}\mathbb C and thus, by contravariance, every hypergraph with linearity constraints Hlinlin-HH_\textrm{lin} \in \textrm{lin-}\mathbb H can be mapped to a hypergraph in lin-Htype\textrm{lin-}\mathbb H_\textrm{type}. We say that a hypergraph with linearity constraints Hlinlin-HH_\textrm{lin} \in \textrm{lin-}\mathbb H is Σ\Sigma-typed for a type system Σlin-Htype\Sigma \in \textrm{lin-}\mathbb H_\textrm{type} if there is a morphism ΣHlin\Sigma \to H_\textrm{lin}, when interpreting HlinH_\textrm{lin} is as an object of lin-Htype\textrm{lin-}\mathbb H_\textrm{type}.

Two example hypergraphs. Vertices (labelled with capital letters) are circles and hyperedges (labelled with small letters) span between them. Vertices that are attached to an edge in the black half of the circle correspond to source vertices of the edge; those in the white half correspond to target vertices. The functions srcisrc_isrci​ and tgtjtgt_jtgtj​ map edges to the incident vertices, defining the directed hypergraph. On the left, there are further functions use\textit{use}use and def\textit{def}def that map vertices to the unique edge that uses or defines them. This defines a hypergraph with linearity constraints, with Vtgt={A1,A2}V_\textit{tgt} = \{A_1, A_2\}Vtgt​={A1​,A2​} and Vsrc={A3}V_\textit{src} = \{A_3\}Vsrc​={A3​}. These cannot be defined on the graph on the right. On the other hand, the edges b and c are in F11F_{11}F11​, and are in the domain of functions in11in_{11}in11​ and out11out_{11}out11​, thus defining a child region of a hierarchical graph. Note that it would be invalid to have any edge connecting i1i_1i1​ or o1o_1o1​ to i2i_2i2​ or o2o_2o2​. The iii and ooo vertices also have incidence morphisms, not displayed here.

Two example hypergraphs. Vertices (labelled with capital letters) are circles and hyperedges (labelled with small letters) span between them. Vertices that are attached to an edge in the black half of the circle correspond to source vertices of the edge; those in the white half correspond to target vertices. The functions srcisrc_i and tgtjtgt_j map edges to the incident vertices, defining the directed hypergraph. On the left, there are further functions use\textit{use} and def\textit{def} that map vertices to the unique edge that uses or defines them. This defines a hypergraph with linearity constraints, with Vtgt={A1,A2}V_\textit{tgt} = \{A_1, A_2\} and Vsrc={A3}V_\textit{src} = \{A_3\}. These cannot be defined on the graph on the right. On the other hand, the edges b and c are in F11F_{11}, and are in the domain of functions in11in_{11} and out11out_{11}, thus defining a child region of a hierarchical graph. Note that it would be invalid to have any edge connecting i1i_1 or o1o_1 to i2i_2 or o2o_2. The ii and oo vertices also have incidence morphisms, not displayed here.

Hierarchical hypergraphs #

A final bit of structure that minIR graphs require is a notion of hierarchy between regions of the graph. This will be useful to define functions, control flow blocks such as if-else, or any subroutine that can itself be viewed as an operation in the computation.

Hierarchical hypergraphs were first proposed in Drewes, 2002Frank Drewes, Berthold Hoffmann and Detlef Plump. 2002. Hierarchical Graph Transformation. Journal of Computer and System Sciences 64, 2 (March 2002, 249--283). doi: 10.1006/jcss.2001.1790 and further generalised in Busatto, 2005Giorgio Busatto, Hans-Jörg Kreowski and Sabine Kuske. 2005. Abstract hierarchical graph transformation. Mathematical Structures in Computer Science 15, 4 (July 2005, 773--819). doi: 10.1017/s0960129505004846 Palacz, 2004Wojciech Palacz. 2004. Algebraic hierarchical graph transformation. Journal of Computer and System Sciences 68, 3 (May 2004, 497--520). doi: 10.1016/s0022-0000(03)00064-3. However, we opt to use a more restrictive definition, closer to the notion of flattened hypergraphs of Drewes, 2002Frank Drewes, Berthold Hoffmann and Detlef Plump. 2002. Hierarchical Graph Transformation. Journal of Computer and System Sciences 64, 2 (March 2002, 249--283). doi: 10.1006/jcss.2001.1790. The reason for this is twofold. Firstly, hierarchical (hyper)graphs are typically defined recursively. It is not obvious under which conditions (and if) such definitions form adhesive categories, although progess in this direction was made in Padberg, 2017Julia Padberg. 2017. Hierarchical Graph Transformation Revisited: Transformations of Coalgebraic Graphs. In Graph Transformation, Cham. Springer International Publishing, 20--35. doi: 10.1007/978-3-319-61470-0_2 with the introduction of coalgebraic graphs. As a result, to the extent that graph transformation results can be applied to such structures, it must be done so carefully.

The second, more practical, reason is that the notion of typed graph introduced above cannot be directly lifted to the hierarchical graph setting: while some subset of the hierarchical relation in minIR should be enforced by the type system, the type graph of a nested graph should be identical to the parent’s (as opposed to being itself nested within the type graph of the parent).

It is therefore more convenient for us to encode hierarchy in directed hypergraphs as follows. Note that it is not clear that our definition is adhesive either1 – but at least it is framed as a subcategory of a base category that is.

Definition 3.3Hierarchical hypergraph

The category hier-C\textrm{hier-}\mathbb C is the category with objects and arrows of C\mathbb C along with additional objects FstF_{st} for s,tNs, t \in \mathbb{N} and arrows:

  • FstEstF_{st} \rightarrowtail E_{st} that are split monomorphisms,
  • input arrows FstinstE0sF_{st} \xrightarrow{in_{st}} E_{0s}, and output arrows FstoutstEt0F_{st} \xrightarrow{out_{st}} E_{t0}.

Hierarchical hypergraph are the objects in the full subcategory hier-H\textrm{hier-}\mathbb H given by objects Hhier[hier-C,Set]H_\textrm{hier} \in [\textrm{hier-}\mathbb{C}, \mathrm{Set}] such that

  • for any edge eEe \in \mathbf{E} of HhierH_\textrm{hier}, the set P([e])=in1([e])out1([e])P([e]) = \overline{in}^{-1}([e]) \cup \overline{out}^{-1}([e]) has at most one element.
  • the transitive and reflexive closure of [e][P([e])][e] \preccurlyeq [P([e])] for eEe \in \mathbf E is a partial order on the connected components of HhierH_\textrm{hier}.

Here in\overline{in} and out\overline{out} are the functions with domain E\mathbf E defined piecewise as instin_{st} and outstout_{st} for all s,ts, t on their respective (disjoint) domains of definition.

The same definition can also be applied to lin-H\textrm{lin-}\mathbb H to obtain the category of hierarchical directed hypergraphs with linearity constraints hier-lin-H\textrm{hier-lin-}\mathbb H. Similarly, we define the associated category for type systems; however, we do not impose any of the two conditions related to P()P(\cdot) for the type category, i.e. hier-lin-Htype=[hier-lin-Ctype,Set]\textrm{hier-lin-}\mathbb H_\textrm{type} = [\textrm{hier-lin-}\mathbb C_\textrm{type}, \mathrm{Set}] is the full presheaf category, rather than a subcategory of it.

As with the incidence morphisms src\textit{src} and tgt\textit{tgt}, we will drop the stst subscript for the IO arrows in\textit{in} and out\textit{out} when it can be inferred from the domain of definition.

Just as in the discussion of Definition 3.2, we interpret the split monomorphisms as equivalent to requiring FstEst\mathbf{F}_{st} \subseteq \mathbf{E}_{st}. Taking over terminology from Drewes, 2002Frank Drewes, Berthold Hoffmann and Detlef Plump. 2002. Hierarchical Graph Transformation. Journal of Computer and System Sciences 64, 2 (March 2002, 249--283). doi: 10.1006/jcss.2001.1790, we call elements fFstf \in \mathbf{F}_{st} the frames of Hhierhier-HH_\textrm{hier} \in \textrm{hier-}\mathbb H. For each frame ff, there is thus a unique input edge in(f)in(f) and a unique output edge out(f)out(f) in HhierH_\textrm{hier} that have respectively ss targets and zero sources, and tt sources and zero targets.

By the first condition we imposed on P()P(\cdot), the partial function parentparent mapping connected components to their parent edge:

parent([e])={p if there exists pP(e) otherwise. parent([e]) = \begin{cases} p\quad&\textrm{ if there exists } p \in P(e)\\ \bot&\textrm{ otherwise.} \end{cases}

is well-defined. We call the subgraphs of HhierH_\textrm{hier} that share a same parent a region of HhierH_\textrm{hier}2. The subgraph of vertices and edges without parent is the root region of HhierH_\textrm{hier}.

The minIR computation graph #

In minIR, the vertices are the values of the computation, while the hyperedges define the operations. This imposes some constraints for a hypergraph to be a valid minIR graph:

  • all values in minIR must have a unique operation that defines them;
  • values that are linear must also have a unique operation that uses them;
  • the graph must be acyclic, meaning that no value can be defined in terms of itself.

This can be expressed as a hypergraph with linearity constraints by choosing Vtgt=V\mathbf V_\textit{tgt} = \mathbf V and Vsrc=VL\mathbf V_\textit{src} = \mathbf V_L, where VLV\mathbf V_L \subseteq \mathbf V is the subset of linear values.

The following definition then comes as no surprise:

Definition 3.4MinIR graph

Let Σhier-lin-Htype\Sigma \in \textrm{hier-lin-}\mathbb H_\textrm{type} be a type system. A minIR graph HH typed in Σ\Sigma is an object of hier-lin-H\textrm{hier-lin-}\mathbb H that is Σ\Sigma-typed and such that the adjacency relation \leadsto is acyclic. We call Vsrc=VL\mathbf V_\textit{src} = \mathbf V_L, the linear values of HH.

In the context of minIR, \leadsto relations encode the data flow of values across the computation. The lack of explicit operation ordering differentiates minIR (and HUGR) from most classical IRs, which, unless specified otherwise, typically assume that instructions may have side effects and thus cannot be reordered. All quantum operations (and the classical operations we are interested in) are side-effect free, which significantly simplifies our IR.

Input and output values #

Notice that in Definition 3.4, it is not enforced that every value has a definition, i.e. there might be vVVtgtv \in \mathbf V \setminus \mathbf V_\textrm{tgt}; nor that every value with a linear type is in VL\mathbf V_L, i.e. if τ:HΣ\tau: H \to \Sigma is the typing morphism and TL\mathbf T_L are the linear types in the type system, there might be vτ1(TL)VLv \in \tau^{-1}(\mathbf T_L) \setminus \mathbf V_L.

This would be easy to fix: we could on the one hand enforce the equality Vtgt=V\mathbf V_\textrm{tgt} = \mathbf V, thus guaranteeing that every value has a unique definition in the graph. On the other hand, we could define V\mathbf V as the coproduct V=VLVNL\mathbf V = \mathbf V_L \sqcup \mathbf V_\textit{NL}, where VNLV_\textit{NL} is a new object introduced to explicitly capture the set of non-linear values. Morphisms in this category would guarantee that the linearity of values is always preserved, and thus in particular the type morphism would map a value to a linear type if and only if it is linear.

Instead, we opt to allow undefined values and unused linear values to be able to express rewrite rules that match sugraphs of minIR graphs within the same category.

Definition 3.5Inputs and outputs of minIR graphs

For a minIR graph HH with typing morphism τ:HΣ\tau: H \to \Sigma, we call the set I=VVtgtI = \mathbf V \setminus \mathbf V_\textit{tgt} the input values of HH and O=τ1(TL)VLO = \tau^{-1}(\mathbf T_L) \setminus \mathbf V_L its output values, where TL\mathbf T_L are the linear values in Σ\Sigma.

If I=O=I = O = \varnothing, we say that HH is IO-free.

Note that by this definition, an output value in OO always has a linear type! This is because non-linear values do not need to be treated specially when they are outputs: unlike linear values that must always be used in a non-output position, non-linear values may have no outgoing edge, in which case they are simply discarded in the computation.

Structured control flow #

The operations and values of a minIR graph define the data flow of a program. However, a program must also be able to control and change the data flow at run time in order to express loops, conditionals, function calls etc. This is the program control flow, which minIR expresses using regions and so-called structured control flow.

Using regions, any non-trivial control flow (function calls, conditionals, loops etc.) is captured by a frame, a “black box” operation within the data flow of the program. Its implementation is then defined in the nested region of the frame. This can be used for function calls, but also for branches of control flow. A simple function call, that unconditionally redirects the control flow to the operations within a nested region, could for example be represented as follows:

In this figure and below, circles are SSA values (the vertices of the hypergraph), while the edges spanning between them are the operations. Edges attached to the white half of circles are value definitions, while hyperedges attached to the black half of circles are value uses. The call and + operation can be read left to right: for instance, the two values x and y on the left of call are inputs (the operation uses those values, and is thus attached to the black half of the circles), whereas the x + y value on the right is the output of the call operation (the operation defines this value, which is thus attached to the white half of the circle).

Dashed arrows indicate hierarchical dependencies that map the frame edge to the two input and output edges in the child region; dashed rectangles mark the non-root regions of the graph.

Importantly, the frame edge representing the call operation must intuitively “forward” all its input values to the in operation of the child region, and similarly passes on the value at the out operation to the output value of the call output in the parent graph. Passing function arguments and retrieving returned values in this fashion will be very familiar to any computer scientist. Unlike most programming languages, this is also how in minIR values are passed to and from any control flow constructs we would wish to model.

In terms of graph structure, this relation between values in parent and child regions means that the arity and types of the inputs and outpus of call fix the signatures of the child in and out operations. Definition 3.3 already ensures that the input and output arities of in and out are correct. The correct typing of these input and output values will be ensured by the type system, which we discuss in a separate section below.

To handle constructs that require more than one child region, such as an if-else statement, we can use frames that have zero input and one output:

The output of the ifblock is intuitively a higher-order type representing an operation that takes two inputs and ouputs the sum.

An if-else statement might then look as follows:

The if and else blocks must expect the same input and output values. This is key to respecting any linearity constraints that values passed to ifelse might be subject to. By definition, all operations that use or define a value vv will be in the same region – in other words, values are only available within their defining region. This in effect implements “variable scoping”. With some imagination, this construction can easily be adapted to model loops, complex control flow graphs, or any other control flow structures.

Why not plain branch statements? #

There is a simpler – and at least as popular – way of expressing control flow in IRs without requiring regions and operation hierarchies, using branch statements3. For instance, LLVM IR provides a conditional branch statement

br i1 %cond, label %iftrue, label %iffalse

that will jump to the operations under the iftrue label if %cond is true and to the iffalse label otherwise.

This is a simple and versatile approach to control flow that can be used to express any higher-level language constructs. Unfortunately, conditional branching does not mix well with linear values.

Linearity, as defined in Definition 3.4, is a simple constraint to impose on minIR graphs. In the presence of conditional branching, however, the constraint would have to be relaxed to allow for single use in each mutually exclusive branch of control flow. For instance, the following two uses of b should be allowed (in pseudo-IR):

b := h(a)
<if cond> c := x(b)
<else>    d := h(b)

This is a much harder invariant to check on the IR: linearity would no longer be enforceable as a syntactic constraint on the minIR graph as in Definition 3.4, but would instead depend on the semantics of the operations to establish mutual exclusivity of control flow4. Forbidding arbitrary branching in minIR and resorting instead to structured control flow as described above to express control flow is just as expressive and gives the linearity constraint a much simpler form.

Type graph #

We have seen how minIR graphs impose some structure on the types of computations that can be expressed: a linear value cannot be used by two (or zero) operations, frames will always have a unique in and out operation in their child region with correct arities, etc.

However, without a “good” type system and associated semantics, it is still possible to express nonsensical programs: we have mentioned for instance earlier that it is up to the type system to enforce that the types of the in and out operations match the types of the frame. Similarly, it is possible to construct programs that break linearity: take the ifelse operation discussed in the previous section, but now replace its semantics to be do-in-parallel, i.e. it will execute both the if-block and else-block in parallel on the inputs that it is given. This would violate the linearity of its inputs, but would nonetheless be a syntactically valid minIR graph!

To resolve this we present here some typed operations, along with their semantics, that can be used to construct well-behaved type systems: programs typed in this system model the kind of quantum programs that we are interested in expressing and are guaranteed to be valid computations. Categorising all valid constructions or an exhaustive enumeration of conditions that type systems must satisfy to guarantee the validity of programs is beyond the scope of this thesis. It is in practice often straightforward to combine and extend the elements presented here to support further custom syntactic constructs and types.

Basic types and operations #

The most elementary types in our computations are Bits and Qubits. The former is typically known as a Boolean and represents the purely classical values 0 and 1. The latter is the canonical quantum example of a linear type. Indeed, just like values in minIR graphs, the type system in hier-lin-Htype\textrm{hier-lin-}\mathbb H_\textrm{type} distinguishes between linear and non-linear types.

Other typical classical types such as integers, floats, strings, custom algebraic data types (ADT) etc could also be introduced as required. In the figure below, we for instance introduce the Angle type to represent rotation angles that parametrise quantum gates. Further examples of linear types, on the other hand, include higher-dimensional qudits, but also any ADT that contains a linear type within it.

As we saw in section 2.1, the number of input qubits in pure quantum operations will match the number of output qubits: the single-qubit h (hadamard gate) and two-qubit cx (controlled NOT) operations thus have one or two Qubits as both inputs and outputs rz (Z rotation) is also a single-qubit operation, but it takes an additional input of type Angle to specify the rotation angle.

On the pure classical side, we are free to add any side-effect free operations on our types; in our example we model addition + on Angles and negation not on Bits. In the type system Σhier-lin-Htype\Sigma \in \textrm{hier-lin-}\mathbb H_\textrm{type}, each type is represented by a single vertex.

In our example, we thus have three vertices:

We introduce a different colour for each type. Operations such as cx are represented by a hyperedge with two sources on Qubit and two targets on Qubit. As in the previous diagrams, we can distinguish operation inputs from outputs by whether they are attached to the dark or light half of the type vertex: the rz operation thus has one Qubit input, one Angle input and one Qubit output.

As you can tell from the diagram, whilst Qubit is a linear type in the type system, it is not a linear value in the sense of a minIR graph: the Qubit type has multiple uses and defines in the cx operation alone. This is the key difference between lin-Htype\textrm{lin-}\mathbb H_\textrm{type} and lin-H\textrm{lin-}\mathbb H.

Qubit allocation and measurement #

We also introduce non-pure quantum operations qalloc and measure which respectively “create” a qubit (so no input, one Qubit output) and “destroy” it (one Qubit input, one Bit output – depending on whether the qubit was projected onto the 0\ket{0} or 1\ket{1} state). Remember that the reason these operations seem to “break” the laws of pure quantum physics is because they result from interactions with the classical environment.

measure is fundamental, as it connects quantum values with classical ones!

Region definition and structured control flow #

Our type system is so far missing a crucial aspect of minIR: the hierarchical structures. For this we need frame types, i.e. frames in the type graph. We must introduce a distinct type for each possible type signature of a frame. To keep this as simple as possible, we will introduce exactly one type for each signature.

If we write TT for the set of types in our type system (i.e. Bit, Qubit and Angle in our example), then a type signature of an edge is given by a pair (I,O)(I, O) of ordered lists of types I,OTI, O \in T^\ast. For each such (I,O)(I, O) pair, we introduce

  • the frame type regiondef<I, O>,
  • the in and out types in<I,O> and out<I,O>,
  • along with a new non-linear type Region<I, O>, the higher-order type representing a region with inputs II and outputs OO.

The regiondef<I, O> op takes zero inputs and returns one output Region<I, O>, whereas in<I,O> takes zero inputs and returns values of type II and out<I,O> takes inputs of type OO and returns nothing. For instance, for I=(I = (Qubit, Qubit)) and O=(O = (Qubit, Bit)), we have the following type graph.

Note that there is an important distinction in hier-lin-Htype\textrm{hier-lin-}\mathbb H_\textrm{type} in comparison to hier-lin-H\textrm{hier-lin-}\mathbb H: there is no notion of regions in the type system: the Qubit and Bit types in the above diagram would be in the child region of regiondef<I, O> if it were a graph in hier-H\textrm{hier-}\mathbb H, but in the type system, they might also be used by other operations in other regions (such as cx, rz, h etc. defined earlier).

Using the Region<I,O> types, it is then easy to define typed operations for any structured control flow of interest, such as the if-else example above. The following figure gives an overview of the entire type system of our example. For display purposes, we have included multiple copies of each type vertex; we remind the reader that in the actual type graph, all circles of the same type (colour) are one and the same.

A complete minIR type graph, following the example in this section. Value vertices with the same label (and same colour) form a single vertex in the type graph. They have been split into multiple vertices in this representation for better readability. The data types and op types with the <I,O> suffix are parametrised on the signature type (I,O)(I,O)(I,O) for I,O∈T∗I,O \in T^\astI,O∈T∗.

A complete minIR type graph, following the example in this section. Value vertices with the same label (and same colour) form a single vertex in the type graph. They have been split into multiple vertices in this representation for better readability. The data types and op types with the <I,O> suffix are parametrised on the signature type (I,O)(I,O) for I,OTI,O \in T^\ast.

An example minIR program #

Taking a step back, let us make the introduced ideas more concrete through an example. We demonstrate how a simple program written in textual form can be translated and expressed as a minIR graph. All statements are of the form

x, y, ... := op(a, b, c, ...)

where a, b, c etc are the SSA values passed to op (or used by op), and x, y etc are the SSA values returned by op (or defined by op). We use curly bracket to define the child region of a regiondef operation. A valid minIR program might then look as follows:

 1main := regiondef<(Qubit, Qubit), (Qubit, Bit)> {
 2    q0, q1 := in()
 3
 4    q0_1 := h(q0)
 5    q0_2, q1_1 := cx(q0_1, q1)
 6
 7    m0 := measure(q0_2)
 8
 9    ifregion := regiondef<(Qubit,), (Qubit,)> {
10        q1 := in()
11        out(q1)
12    }
13    elseregion := regiondef<(Qubit,), (Qubit,)> {
14        q1 = in()
15        q1_1 := h(q1)
16        out(q1_1)
17    }
18    q1_2 := ifelse(m0, q1_1, ifregion, elseregion)
19
20    out(q1_2, m0)
21}

Note that the in() and out(..) operations are only allowed within nested regions (as required by the type system). We have omitted the type parameters on these operations, as it mirrors exactly the paremeter of the regiondef.

It corresponds to the two minIR graphs on the following page. We use “wiggly hyperedges” that stretch between values, as in the first figure. They may look unusual if you are used to computation graphs. One can opt to draw the same graph with boxes for hyperedges and wires for values, yielding the second figure. The two representations are equivalent, but the rewriting semantics are most explicit when viewing values as vertices.

An example of an IO-free minIR graph. The vertex colours indicate their types in the type system presented in the previous figure. The main, ifregion and elseregion ops are all of op type regiondef (with type parameters omitted), labelled here with custom names for clarity. The type parameters of the ifelse, in and out op type have similarly been omitted. All other operation types are given as labels on the edges.

An example of an IO-free minIR graph. The vertex colours indicate their types in the type system presented in the previous figure. The main, ifregion and elseregion ops are all of op type regiondef (with type parameters omitted), labelled here with custom names for clarity. The type parameters of the ifelse, in and out op type have similarly been omitted. All other operation types are given as labels on the edges.

An equivalent representation of the computation above, now representing operations as boxes and values as wires. The arrow direction indicates the flow from value definition to value use(s). Dashed arrows have been changed to point to regions instead of individual operations.

An equivalent representation of the computation above, now representing operations as boxes and values as wires. The arrow direction indicates the flow from value definition to value use(s). Dashed arrows have been changed to point to regions instead of individual operations.

Differences to the quantum circuit model #

We conclude this presentation of minIR by highlighting the differences between this IR-based representation and the quantum circuit model that most quantum computing and quantum information scientists are familiar with5.

When restricted to purely quantum operations and no nested regions, the string diagram representation of a minIR graph (i.e. operations as boxes and values as wires) looks very similar to a quantum circuit. There is, however, a fundamental shift under the hood from reference to value semantics – to borrow terminology from C++.

In the reference semantics of quantum circuits, operations are typically thought of as “placed” on a qubit (the “lines” in the circuit representation), for instance, by referring to a qubit index. This qubit reference exists for the entire computation duration, and the quantum data it refers to will change over time as operations are applied to that qubit.

In the value semantics of computation graphs and SSA, on the other hand, qubits only exist in the form of the data they encode. When applying an operation, the (quantum) data is consumed by the operation and new data is returned. Given that the input data no longer exists, linearity conditions are required to ensure that no other operation can be fed the same value.

To make the difference clear, compare the program representations of the following computation:

Quantum circuit (pytket)6

import pytket as tk
circ = tk.Circuit(2)
circ.H(0)
circ.CX(0, 1)
circ.X(1)

SSA (minIR)

q0_0, q1_0 := in()
q0_1       := h(q0_0)
q0_2, q1_1 := cx(q0_1, q1_0)
q1_2       := x(q1_1)
out(q0_2, q1_2)

In value semantics, it becomes much harder to track physical qubits across their lifespan. This has very practical implications: without the convenient naming scheme, it would, for example, be non-trivial to count how many qubits are required in the SSA representation of the computation above. However, it is a drastically simpler picture from the point of view of the compiler and the optimiser – hence its popularity in classical compilers. When operations are defined based on qubit references, the compiler must carefully track the ordering of these operations: operations on the same qubit must always be applied in order. Through multi-qubit gates, this also imposes a partial ordering on operations across different qubits that must be respected.

SSA values remove this dependency tracking altogether: the notion of physical qubit disappears, and the ordering of statements becomes irrelevant. All that matters is connecting each use of a value (i.e. an input to an operation) with its unique definition, the output of a previous operation. In other words, the global ordering imposed by reference semantics is replaced by a causal order on the diagram Kissin., 2019Aleks Kissinger and Sander Uijlen. 2019. A categorical semantics for causal structure. Logical Methods in Computer Science Volume 15, Issue 3 (August 2019). doi: 10.23638/lmcs-15(3:15)2019.


All the concepts of minIR embed themselves very easily within the MLIR-based quantum IRs and the HUGR IR Mark K., 2025Seyon Sivarajah, Alan Lawrence, Alec Edgington, Douglas Wilson, Craig Roy, Luca Mondada, Lukas Heidemann, Ross Duncan Mark Koch. 2025. HUGR: A Quantum-Classical Intermediate Representation. Retrieved (talk recording) from https://www.youtube.com/live/D8esZrt7ogk?feature=shared&t=5217. In this sense, our toy IR serves as the minimum denominator across IRs and compiler technologies so that proposals and contributions we are about to make can be applied regardless of the underlying technical details.

By waiving goodbye to the circuit model, we have been able to integrate much of the theory of traditional compiler design, bringing us in the process much closer to traditional compiler research and the large-scale software infrastructure that is already available. This gives us access to all the classical optimisation and program transformation techniques developed over decades. Using structured control flow, we were also able to model linear resources such as qubits well – by using value semantics and SSA, checking that no-cloning is not violated is as simple as checking that each linear value is used exactly once.

Finally, this new design is also extremely extensible. Not only does it support arbitrary operations, but the type system is also very flexible. There is dedicated support for linear types, but this does not have to be restricted to qubits: lists of qubits could be added or even, depending on the target hardware, higher dimensional qudits, continuous variable quantum data, etc.


  1. And in fact, we will see in section 3.4 that it is not. ↩︎

  2. Note that a region may not be a connected subgraph. Albeit, it is a simple exercice to convince yourself that any non-root region contains either one or two connected components. ↩︎

  3. You may know this from prehistoric times as the goto statement, in languages such as Fortran, C, and, yes, even Go↩︎

  4. You might be thinking “oh, but all that is required here are phi nodes!”, if you are familiar with those. No – you’d also need a sort of “phi inverse”. Besides, see this discussion for more arguments on why no phi nodes. ↩︎

  5. Note that these comments apply specifically to characteristics of quantum circuits. Other diagrammatic representations of quantum processes in use, such as string diagrams, quantum combs etc may not share the same properties. ↩︎

  6. This is python code: pip install pytket↩︎

3.4. Graph transformation in minIR

As discussed in section 3.2, computation graphs with linear values, such as minIR, must adopt strict graph transformation semantics to ensure that linear constraints are satisfied at all times. In this section, we use the minIR graph category presented in the previous section to define transformation semantics that lean on the double pushout (DPO) Ehrig, 1976Hartmut Ehrig and Hans-Jörg Kreowski. 1976. Parallelism of manipulations in multidimensional information structures and sesqui-pushout (SqPO) Corrad., 2006Andrea Corradini, Tobias Heindel, Frank Hermann and Barbara König. 2006. Sesqui-Pushout Rewriting. In Graph Transformations, Berlin, Heidelberg. Springer Berlin Heidelberg, 30--45. doi: 10.1007/11841883_4 semantics in adhesive categories Lack, 2005Stephen Lack and Pawel Sobocinski. 2005. Adhesive and quasiadhesive categories. RAIRO - Theoretical Informatics and Applications 39, 3 (July 2005, 511--545). doi: 10.1051/ITA:2005028.

Adhesivity of hypergraph categories #

The natural place to start this section is by studying which of the categories defined in section 3.3 are adhesive. From adhesivity follows that transforming graphs using DPO and SqPO constructions is well-defined and unique, at least in the regimes of interest to us.

A category is said to be adhesive if it has all pullbacks and pushouts along monos, as well as some compatibility conditions between them, the so-called “Van Kampen squares”. We refer to the literature (e.g. Lack, 2005Stephen Lack and Pawel Sobocinski. 2005. Adhesive and quasiadhesive categories. RAIRO - Theoretical Informatics and Applications 39, 3 (July 2005, 511--545). doi: 10.1051/ITA:2005028) for a complete definition. For our purposes, the following two results are sufficient:

  • Every presheaf topos [C,Set][\mathbb C, \mathrm{Set}] is adhesive (Corollary 3.6 in Lack, 2005Stephen Lack and Pawel Sobocinski. 2005. Adhesive and quasiadhesive categories. RAIRO - Theoretical Informatics and Applications 39, 3 (July 2005, 511--545). doi: 10.1051/ITA:2005028);
  • Every full subcategory DC\mathbb D \subseteq \mathbb C of an adhesive category is adhesive if the pullbacks and pushouts in C\mathbb C of objects in D\mathbb D are again in D\mathbb D (a simple result; if the Van Kampen squares commute in C\mathbb C, they must commute in D\mathbb D).

A first result immediately follows from the first result:

Proposition 3.1Adhesivity of directed hypergraphs
The category H\mathbb H of directed hypergraphs is adhesive.

It is a presheaf.

This does not immediately generalise to lin-H\textrm{lin-}\mathbb H, as unlike H\mathbb H, Definition 3.2 imposes that EE be a coproduct. However, the result still holds:

Proposition 3.2Adhesivity of hypergraphs with linearity constraints
The categories lin-H\textrm{lin-}\mathbb H and lin-Htype\textrm{lin-}\mathbb H_\textrm{type} are adhesive.

lin-H\textrm{lin-}\mathbb H is a full subcategory of the adhesive category [lin-C,Set][\textrm{lin-}\mathbb C, \mathrm{Set}]. We must show the existence of pullbacks and pushouts along monos in lin-H\textrm{lin-}\mathbb H.

Pullbacks. Consider a pullback ApaPpbA \xleftarrow{p_a} P \xrightarrow{p_b} of AaCbBA \xrightarrow{a} C \xleftarrow{b} B in [lin-C,Set][\textrm{lin-}\mathbb C, \mathrm{Set}], with A,B,Clin-HA, B, C \in \textrm{lin-}\mathbb H. We must show that PP is in lin-H\textrm{lin-}\mathbb H. Colimits are computed pointwise in presheaves, so we know that P(E)P(E) is the pullback of A(E)C(E)B(E)A(E) \to C(E) \leftarrow B(E) in Set\textrm{Set}. If we can show that P(E)P(E) is the coproduct of P(Est)P(E_{st}) for s,tNs, t \in \mathbb{N}, then we are done.

Let vP(E)v \in P(E). Because A(E)A(E) and B(E)B(E) are coproducts in Set, i.e. a disjoint union, there must be s,t,s,tNs, t, s', t' \in \mathbb{N} such that pa(v)A(Est)p_a(v) \in A(E_{st}) and pb(v)B(Est)p_b(v) \in B(E_{st'}). By naturality of aa and bb, it follows that a(pa(v))C(Est)a(p_a(v)) \in C(E_{st}) and b(pb(v))B(Est)b(p_b(v)) \in B(E_{s't'}). But by commutativity of the pullback diagram, a(pa(v))=b(pb(v))a(p_a(v)) = b(p_b(v)), and thus s=ss = s' and t=tt = t'. We conclude by unicity of the pullback that vP(Est)v \in P(E_{st}) and thus P(E)=stP(Est)P(E) = \bigsqcup_{st} P(E_{st}).

Pushouts. The same argument as for pullbacks also applies to pushouts: given a pushout PP of AaCbBA \xrightarrow{a} C \xleftarrow{b} B in [lin-C,Set][\textrm{lin-}\mathbb C, \mathrm{Set}] with A,B,Clin-HA, B, C \in \textrm{lin-}\mathbb H, an element vP(E)v \in P(E) that makes the pushout square commute must have preimages in A(Est),B(Est)A(E_{st}), B(E_{st}) and C(Est)C(E_{st}) for some s,tNs, t \in \mathbb{N}. Thus the pushout distributes over the coproduct, and we can conclude that P(E)P(E) is the coproduct of pushouts.

The same argument also applies to lin-Htype\textrm{lin-}\mathbb H_\textrm{type}1.

Now to the spicy stuff:

Proposition 3.3Non-adhesivity of hierarchical hypergraphs
Whilst hier-lin-Htype\textrm{hier-lin-}\mathbb H_\textrm{type} is adhesive, the category hier-lin-H\textrm{hier-lin-}\mathbb H is NOT adhesive.

hier-lin-Htype\textrm{hier-lin-}\mathbb H_\textrm{type} is a presheaf – hence adhesive.

The following pushout square shows that hier-lin-H\textrm{hier-lin-}\mathbb H cannot be adhesive: the pushout square is valid in [hier-lin-C,Set][\textrm{hier-lin-}\mathbb C, \mathrm{Set}], but the pushout at the bottom right is not in hier-lin-H\textrm{hier-lin-}\mathbb H, because the child regions cannot each be assigned a unique parent.

Double pushout semantics #

From Proposition 3.3, it follows that minIR graph transformations can be performed through the double pushout (DPO) construction Ehrig, 1976Hartmut Ehrig and Hans-Jörg Kreowski. 1976. Parallelism of manipulations in multidimensional information structures in the [hier-lin-C,Set][\textrm{hier-lin-}\mathbb C, \mathrm{Set}] category.

Definition 3.6Double pushout (DPO) transformation

A transformation rule pp in an adhesive category A\mathbb A is a span LIRL \leftarrow I \rightarrow R. For objects G,HAG, H \in \mathbb A, we then write G(p,m)HG \xRightarrow{(p,m)} H or GpHG \xRightarrow{p} H if there is a matching morphism m:LGm: L \to G and a context object CC along with morphisms GCHG \leftarrow C \to H and ICI \to C such that the following diagram commutes and both squares are pushouts:

If the DPO transformation G(p,m)HG \xRightarrow{(p,m)} H exists for some rule pp and match mm, then we say GHG \Rightarrow H is a valid DPO rewrite.

To ensure that a DPO rewrite is valid in minIR, we must impose certain conditions. Let GG be an IO-free minIR graph, i.e. Ghier-lin-HG \in \textrm{hier-lin-}\mathbb H, there is a morphism GΣG \to \Sigma in hier-lin-Htype\textrm{hier-lin-}\mathbb H_\textrm{type} for some type system Σ\Sigma and I=O=I = O = \varnothing.

A DPO rewrite GHG \Rightarrow H is a valid minIR DPO rewrite if there is a transformation GpHG \xRightarrow{p} H in [hier-lin-C,Set][\textrm{hier-lin-}\mathbb C, \mathrm{Set}] and

  1. pp is left-mono, i.e. the morphism ILI \to L is mono,2
  2. the pushout complement CC and pushout HH also exist in the slice category hier-lin-HtypeΣ\textrm{hier-lin-}\mathbb H_\textrm{type} \searrow \Sigma,
  3. HH satisfies the hierarchy condition of Definition 3.3,
  4. HH is IO-free.
Proposition 3.4
If GG is a minIR graph and GHG \Rightarrow H is a valid minIR DPO rewrite, then HH is a valid minIR graph.

We know by construction that H[hier-lin-C,Set]H \in [\textrm{hier-lin-}\mathbb C, \mathrm{Set}]. We must show that HH further satisfies the constraints to be an object in the full subcategory of minIR graphs.

The first condition is standard in DPO and guarantees that CC and DD are unique if they exist.

The third condition we impose on HH corresponds directly to the constraint that defines hierarchical graphs in hier-lin-H\textrm{hier-lin-}\mathbb H. The fourth condition ensures that valid minIR DPO rewrites map IO-free graphs to IO-free graphs.

Finally, the second condition is imposed to ensure well-typedness of HH. The functor hier-lin-Hhier-lin-Htype\textrm{hier-lin-}\mathbb H \to \textrm{hier-lin-}\mathbb H_\textrm{type} that forgets the def\textit{def} and use\textit{use} morphisms is a left adjoint (it possesses a right Kan extension defined pointwise), and thus preserves colimits. The images of CC and HH thus form pushout squares in hier-lin-Htype\textrm{hier-lin-}\mathbb H_\textrm{type}, and by unicity, must match the pushout squares in hier-lin-HtypeΣ\textrm{hier-lin-}\mathbb H_\textrm{type} \searrow \Sigma. Hence HH is well-typed.

The restriction to rewrites of IO-free graphs is not a restriction of generality: if we are interested in rewriting computations with inputs and outputs, we can always express them as IO-free graphs by adding input and output ops with the values in II as outputs, respectively OO as inputs. We assign them dedicated types distinct from all other operations; these operations will never be matched by transformation rules and can be removed at the end of rewriting.

Generalising to sesqui-pushouts #

We restricted minIR rewrites to DPO transformations obtained form left-mono rules, to ensure that the construction is unique. This excludes rules that may identify two values in GG but split them into two different values in HH. Such rules allow for cloning values, which is a useful transformation in minIR for non-linear values. An example of a transformation rule that we would like to allow in minIR:

For this example we added a 2x operation that multiplies an angle value passed as input by two. The transformation rule replaces a rotation of angle 2α2\alpha by two rotations of angle α\alpha by cloning the input angle.

Such semantics are possible using the sesqui-pushout construction (SqPO) by Corradini et al. Corrad., 2006Andrea Corradini, Tobias Heindel, Frank Hermann and Barbara König. 2006. Sesqui-Pushout Rewriting. In Graph Transformations, Berlin, Heidelberg. Springer Berlin Heidelberg, 30--45. doi: 10.1007/11841883_4. We can reuse the same (p,m)\xRightarrow{(p,m)} notation: when DPO is restricted to left-mono rules as we have done, SqPO is a generalisation of DPO (i.e. the construction coincides whenever the DPO exists).

Definition 3.7Sesqui-pushout (SqPO) transformation

A transformation rule pp in an adhesive category A\mathbb A is a span LIRL \leftarrow I \rightarrow R. For objects G,HAG, H \in \mathbb A, we then write G(p,m)HG \xRightarrow{(p,m)} H or GpHG \xRightarrow{p} H if there is a matching morphism m:LGm: L \to G and a context object CC along with morphisms GCHG \leftarrow C \to H and ICI \to C such that CC is the final pullback complement of ILmGI \to L \xrightarrow{m} G and the right square is a pushout:

If the SqPO transformation G(p,m)HG \xRightarrow{(p,m)} H exists for some rule pp and match mm, then we say GHG \Rightarrow H is a valid (SqPO) rewrite.

The left square is redundant in the diagram above, as it follows from the requirement that CC be the final pullback complement (FPC). It is kept to highlight the similarities to DPO. As the commuting diagram indicates, the final pullback complement (FPC) construction forms a pullback square. Furthermore, unlike pushout complements, the FPC is defined by a universality property that ensures uniqueness if it exists. We refer to Corrad., 2006Andrea Corradini, Tobias Heindel, Frank Hermann and Barbara König. 2006. Sesqui-Pushout Rewriting. In Graph Transformations, Berlin, Heidelberg. Springer Berlin Heidelberg, 30--45. doi: 10.1007/11841883_4 for the exact FPC construction.

With SqPO, we can define the set of valid minIR rewrites as given by the SqPO transformations GpHG \xRightarrow{p} H in [hier-lin-C,Set][\textrm{hier-lin-}\mathbb C, \mathrm{Set}] satisfying the relaxed set of conditions

  1. the pushout complement CC and pushout HH also exist in the slice category hier-lin-HtypeΣ\textrm{hier-lin-}\mathbb H_\textrm{type} \searrow \Sigma,
  2. HH satisfies the hierarchy condition of Definition 3.3,
  3. HH is IO-free.

We conclude this section with a discussion of some of the properties of minIR transformations using SqPO (referring again to Corradini Corrad., 2006Andrea Corradini, Tobias Heindel, Frank Hermann and Barbara König. 2006. Sesqui-Pushout Rewriting. In Graph Transformations, Berlin, Heidelberg. Springer Berlin Heidelberg, 30--45. doi: 10.1007/11841883_4 or König, 2018Barbara König, Dennis Nolte, Julia Padberg and Arend Rensink. 2018. A Tutorial on Graph Transformation. In Graph Transformation, Specifications, and Nets - In Memory of Hartmut Ehrig. Springer, 83--104. doi: 10.1007/978-3-319-75396-6_5 for a more detailed explanation of the concepts discussed):

Deletion in unknown context.  A key difference between DPO and SqPO transformations is that SqPO transformations on graphs will delete edges attached to a vertex vdv_d that is deleted by the transformation rule (i.e. vdLv_d \in L but vd∉Rv_d \not\in R of the rule). The DPO transformation on the other hand is only well-defined when all edges incident to vdv_d are in the image of mm and thus explicitly deleted (this is known as the dangling condition).

As minIR rewrites follow SqPO semantics, transformation rules such as the following are allowed:

Here ×\times denotes the multiplication of angles and const(0)\textsf{const(0)} the zero angle. Any operation that would be connected to the starred value on the left would be deleted by this rule. However such an implicit operation deletion only yields valid minIR graphs if all incident values are non-linear and none of the target values of the deleted operation are used.

Non-left-mono rules.  As discussed in the introduction to SqPO, the cloning of values is allowed in minIR rewrites. However, linear values may never be cloned (the FPC or pushout will not exist in these cases). Thus any minIR transformation rule will be left-mono on linear values. It must further be left-linear on all (linear and non-linear) values in II that are mapped to outputs in RR: if a value ww is produced by op applied to vv, then cloning vv and 'op will result in two definitions of ww.

Non-right-mono rules.  Non-right-mono rules are allowed in both DPO and SqPO. They result in vertex merges. In minIR, the situation for right-mono is symmetric to left-mono: the map must be mono on linear values (otherwise the same value will have multiple uses or definitions) and it must be mono on all values in II that are mapped to inputs in LL (otherwise a value in the rewritten minIR graph will have more than one value definition).


  1. In fact, a much simpler argument applies: the category lin-Htype\textrm{lin-}\mathbb H_\textrm{type} is isomorphic to the presheaf category [lin-C~type,Set][\textrm{lin-}\tilde{\mathbb C}_\textrm{type}, \mathrm{Set}], where lin-C~type\textrm{lin-}\tilde{\mathbb C}_\textrm{type} is obtained from lin-Ctype\textrm{lin-}\mathbb C_\textrm{type} by removing the object EE. Adhesivity follows. ↩︎

  2. This is often called left-linear in the literature. We avoid this term in this thesis to avoid confusion with the linearity property of values in minIR. ↩︎

3.5. MinIR rewriting, operationally

The previous section proposed to view minIR rewrites as the result of a (DPO or SqPO) graph transformation. This yields valid rewriting semantics elegantly (and with little effort!). However, the conditions that must be imposed on the transformation to be valid, along with the fact that pushouts may not exist mean that the existence of a rewrite given a transformation rule and a match is not guaranteed.

In this section, we address this by considering a more restricted notion of minIR rewriting, for which the existence of the right-hand side of the rewrite is guaranteed. In addition, in place of the categorical presentation of the last section, we express the rewriting operation operationally, i.e. as data and a procedure on sets that translates directly into an algorithmic implementation.

We find that this rewrite definition is sufficient in practice. We conclude the section with an example of how more complex rewrites can be achieved by composition of simpler rewrites that can be expressed in this framework.

Graph glueings and rewrites #

Throughout, we consider graph glueings on disjoint vertex and (hyper)edge sets. To underline this, we will use the \sqcup symbol to denote disjoint set unions.

As we will be working exclusively with vertex and edge sets in this section (as opposed to the objects in the indexing category), we will drop the bold typeface for sets, writing e.g. VV instead of V\mathbf V for the set of vertices of a hypergraph.

Finally, all minIR graphs in this section are IO free.

We define local graph rewrites in terms of graph glueings. Consider first the case of two arbitrary graphs G1=(V1,E2)G_1 = (V_1, E_2) and G2=(V2,E2)G_2 = (V_2, E_2), along with a relation μ V1×V2\mu\ \subseteq V_1 \times V_2. Let μ (V1V2)2\sim_\mu \ \subseteq (V_1 \sqcup V_2)^2 be the equivalence relation induced by μ\mu, i.e. the smallest relation on V1V2V_1 \sqcup V_2 that is reflexive, symmetric and transitive, and satisifes for all v1V1v_1 \in V_1 and v2V2v_2 \in V_2,

(v1,v2)μv1μv2.(v_1, v_2) \in \mu \Rightarrow v_1 \sim_\mu v_2.

Then, we can define

  • V=(V1V2)/μV = (V_1 \sqcup V_2)/\sim_\mu is the set of all equivalence classes of μ\sim_\mu, and
  • for vV1V2v \in V_1 \sqcup V_2, αμ(v)V\alpha_\mu(v) \in V is the equivalence class of μ\sim_\mu that vv belongs to.
Definition 3.8Graph glueing

The glueing of G1G_1 and G2G_2 according to the glueing relation μ\mu is given by the vertices V=(V1V2)/μV = (V_1 \sqcup V_2)/\sim_\mu and the edges

E={(αμ(u),αμ(v))(u,v)E1E2}V2.E = \{(\alpha_\mu(u), \alpha_\mu(v)) \mid (u,v) \in E_1 \sqcup E_2 \} \subseteq V^2.

We write the glueing graph as (G1G2)/μ(G_1 \sqcup G_2) / \sim_\mu.

In other words, the glueing is the disjoint union of the two graphs, with identification (and merging) of vertices that are related in μ\mu.

This allows us to define a rewrite on a graph GG:

Definition 3.9Graph rewrite

A rewrite rr on a graph G=(V,E)G = (V, E) is given by a tuple r=(GR,V,E,μ)r = (G_R, V^-, E^-, \mu), with

  • GR=(VR,ER)G_R = (V_R, E_R) is a graph called the replacement graph,
  • VVV^- \subseteq V is the vertex deletion set,
  • EEdom(μ)2E^- \subseteq E \cap dom(\mu)^2 is the edge deletion set, and
  • μ:VVR\mu: V^- \rightharpoonup V_R is the glueing relation, a partial function that maps a subset of the deleted vertices of GG to vertices in the replacement graph.

The domain of definition dom(μ)dom(\mu) is known as the boundary values of rr.

A graph rewrite per this definition can always be generated by a single pushout (SPO) transformation Löwe, 1991Michael Löwe. 1991. Extended algebraic graph transformation. PhD Thesis. Technical University of Berlin.

  • define LL as the graph (V,E)(V^-, E^-). Then the injection LGL \subseteq G is the match morphism LGL \to G;
  • the partial map μ\mu maps a subset of VV^- to vertices in the replacement R=GRR = G_R. By injectivity of the match morphism, it also defines a partial map LRL \rightharpoonup R.

We opted for SPO-like semantics in this definition, as they are the simplest to write in set-theoretic terms and coincide with DPO and SqPO in our restricted domain of interest.

The result of the rewrite is computed by gluing the right-hand side GRG_R to the context subgraph GC=(VC,EC)G_C = (V_C, E_C) of GG given by

VC=(VV)  dom(μ)EC=(EE)  VC2.\begin{aligned}V_C &= (V \smallsetminus V^-) \ \cup\ dom(\mu)\\E_C &= (E \smallsetminus E^-)\ \cap\ V_C^2.\end{aligned}

The partial function μ\mu is a special case of a glueing relation μVC×VR\mu \subseteq V_C \times V_R, and thus defines a glueing of GCG_C with GRG_R. The rewritten graph resulting from applying rr to GG is r(G)=(GCGR)/μ.r(G) = (G_C \sqcup G_R) / \sim_\mu.

An example of a graph rewrite is given in the next figure. This is equivalent to an SPO transformation with the graph induced by VV^- on the left-hand side, the graph GRG_R on the right-hand side and the partial map LRL \hookrightarrow R given by μ\mu.

Application of a graph rewrite. On the left, the original graph GGG along with the replacement graph GRG_RGR​ (grey box). On the right, the rewritten graph r(G)r(G)r(G). Only the vertex ggg has been deleted, as other vertices in V−V^-V− are in the boundary dom(μ)dom(\mu)dom(μ) (in orange). The (singleton) edge deletion set is red. The blue edge connects a vertex of V∖V−V \smallsetminus V^-V∖V− to a boundary vertex, and is thus also present on the right-hand side. The purple edge, on the other hand, connects a vertex of V∖V−V \smallsetminus V^-V∖V− to a non-boundary vertex of V−V^-V−, and is thus deleted.

Application of a graph rewrite. On the left, the original graph GG along with the replacement graph GRG_R (grey box). On the right, the rewritten graph r(G)r(G). Only the vertex gg has been deleted, as other vertices in VV^- are in the boundary dom(μ)dom(\mu) (in orange). The (singleton) edge deletion set is red. The blue edge connects a vertex of VVV \smallsetminus V^- to a boundary vertex, and is thus also present on the right-hand side. The purple edge, on the other hand, connects a vertex of VVV \smallsetminus V^- to a non-boundary vertex of VV^-, and is thus deleted.

When there are no edges between VVV \smallsetminus V^- and Vdom(μ)V^- \smallsetminus dom(\mu) (purple in the example above), this definition corresponds to graph rewrites that can be produced using DPO transformations (see discussion in section 3.4). Otherwise, such edges are deleted.

The notions of graph glueing and graph rewrite can straightforwardly be lifted to hypergraphs and, by extension, to minIR graphs. Notice that in this case, values are glued together, not operations (the former were defined as the graph’s vertices, the latter as its hyperedges).

However, the glueing of two valid minIR graphs – and the result of applying a valid rewrite – may not be a valid minIR graph. Glueing two values of a linear type, for instance, is a sure way to introduce multiple uses (or definitions) of it. Thus, we must be careful to only consider glueings and rewrites of minIR graphs that preserve all the constraints we have imposed in Definition 3.4.

Ensuring rewrite validity: interfaces #

As a sufficient condition for valid minIR rewrites, we introduce minIR interfaces, a concept closely related to the “hypergraph with interfaces” construction of Bonchi, 2017Filippo Bonchi, Fabio Gadducci, Aleks Kissinger, Paweł Sobociński and Fabio Zanasi. 2017. Confluence of Graph Rewriting with Interfaces or the supermaps of quantum causality Hefford, 2024James Hefford and Matt Wilson. 2024. A Profunctorial Semantics for Quantum Supermaps. In Proceedings of the 39th Annual ACM/IEEE Symposium on Logic in Computer Science, July 2024. ACM, 1--15. doi: 10.1145/3661814.3662123. We eschew the presentation of holes as a slice category in favour of a definition that fits naturally within minIR and is sufficient for our purposes.

Let GG be a Σ\Sigma-typed minIR graph with data types TT and linear types TLTT_L \subseteq T. Consider type strings S,STS, S' \in T^\ast. We define the index sets

Idx(S)={iN1iS}IdxL(S)={iIdx(S)SiTL}Idx(S)\begin{aligned}\mathrm{Idx}(S) &= \{i \in \mathbb{N} \mid 1 \leq i \leq |S|\}\\\mathrm{Idx}_L(S) &= \{i \in \mathrm{Idx}(S) \mid S_i \in T_L\} \subseteq \mathrm{Idx}(S)\end{aligned}

corresponding respectively to the set of all indices into SS and the subset of indices of linear types. For any iIdx(S)i \in \mathrm{Idx}(S), we denote by SiS_i the type at position i in SS.

We define a partial order \preccurlyeq1 on TT^\ast where SSS \preccurlyeq S' and say that SS' can be coerced into SS if there exists an index map ρ:Idx(S)Idx(S)\rho: \mathrm{Idx}(S) \to \mathrm{Idx}(S') such that

  • types are preserved: Si=Sρ(i)S_i = S'_{\rho(i)}, and
  • ρ\rho is well-defined and bijective on the restriction to indices of linear types ρIdxL(S):IdxL(S)IdxL(S).\left.\rho\right|_{\mathrm{Idx}_L(S)}: \mathrm{Idx}_L(S) \to \mathrm{Idx}_L(S').
Definition 3.10Interface

Let TT be a set of data types. An interface I=(U,D)I = (U, D) is a pair of type strings U,DTU, D \in T^\ast.

We say that an interface I=(U,D)I' = (U', D') can be coerced into an interface I=(U,D)I = (U, D), written III \triangleleft I', if UUU \succcurlyeq U' and DDD \preccurlyeq D'.

We can define the interface associated with an operation oo in a minIR graph GG by considering the values used and defined by oo. Calling τ\tau the type morphism on GG and assuming oEsto \in E_{st} to be an operation in GG with ss inputs and tt outputs, we define the interface of oo in GG as the pair of strings in TT^\ast

I(o)=(τ(src1(o))τ(srcs(o)),τ(tgt1(o))τ(tgtt(o))).I(o) = (\tau(\textit{src}_1(o))\cdots\tau(\textit{src}_s(o)), \tau(\mathit{tgt}_1(o))\cdots\tau(\mathit{tgt}_t(o))).

Similarly, we can assign interfaces to subgraphs of minIR graphs:

Definition 3.11MinIR subgraph

Consider a subset of values and operations VHVV_H \subseteq V and EHEE_H \subseteq E. Define the use and define boundary sets

BU={vVHdef(v)EEH},BD={vVHuse(v)EEH}.\begin{aligned} B_U &= \{v \in V_H \mid \mathit{def}\,(v) \in E \smallsetminus E_H \},\\B_D &= \{v \in V_H \mid use(v) \in E \smallsetminus E_H \}.\end{aligned}

The tuple H=(VH,EH)H = (V_H, E_H) of GG is called a minIR subgraph of GG if there exists a region RR of GG such that all boundary values of HH are in RR:

B=BUBDR. B= B_U \cup B_D \subseteq R.

We write HGH \subseteq G to indicate that HH is a minIR subgraph of GG.

Note that BUB_U is exactly the set of inputs II in the non-IO free minIR graph given by the subgraph (VH,EH)(V_H, E_H) of the minIR graph. BDB_D is a superset of the outputs OO of HH: it includes all linear values in HH that do not have a use in HH, but also any non-linear value that has a use outside of HH.

Unlike interfaces, subgraph boundary values are not ordered. An ordering of BVB \subseteq V is a string SVS \in V^\ast along with a bijective map

ord:BIdx(S)such thatv=Sord(v).\mathrm{ord}: B \to \mathrm{Idx}(S) \quad\textrm{such that}\quad v = S_{\mathrm{ord}(v)}.

If there are strings SU,SDVS_U, S_D \in V^\ast and orderings of BUB_U and BDB_D

ordU: BUIdx(SU)ordD: BDIdx(SD),\begin{aligned}\textrm{ord}_U:\ &B_U \to \mathrm{Idx}(S_U)&\quad\textrm{ord}_D:\ &B_D \to \mathrm{Idx}(S_D),\end{aligned}

then we can set srci(H)=(SU)i\textit{src}_i\,(H) = (S_U)_i and tgti(H)=(SD)i\textit{tgt}_i\,(H) = (S_D)_i in complete analogy to operations. We will write src(H)\textit{src}(H) and tgt(H)\textit{tgt}(H) for the strings src1(H)srcSU(H)\textit{src}_1(H)\cdots\textit{src}_{|S_U|}(H) and tgt1(H)tgtSD(H)\textit{tgt}_1(H)\cdots\textit{tgt}_{|S_D|}(H) respectively. We say that the subgraph HH implements the interface

IH=(τ(src(H)),τ(tgt(H)),I_H = (\tau(\textit{src}(H)), \tau(\mathit{tgt}(H)),

where the type morphism τ\tau was extended element-wise to strings VV^\ast.

Remark, though, that unlike operations, the same subgraph may implement more than one interface as a result of various choices of orderings ordU\textrm{ord}_U and ordD\textrm{ord}_D.

As mentioned, the subgraph HH forms a non-IO free minIR graph. We can always construct an IO-free minIR graph from HH by adding two operations oino_{in} and oouto_{out} in the root region respectively in E0,SUE_{0, |S_U|} and ESD,0E_{|S_D|, 0} inputs-outputs, defined by

tgti(oin)=srci(H),srci(oout)=tgti(H). \textit{tgt}_i\,(o_{in}) = \textit{src}_i(H),\quad\quad \textit{src}_i(o_{out}) = \textit{tgt}_i\,(H).

We call the resulting graph Hˉ\bar{H} an interface graph. It implements the interface IHI_H if HH implements IHI_H. Calling to mind the illustrations of section 3.3, Hˉ\bar{H} looks like one of the nested regions within regiondef operations that we were considering.

MinIR operation rewrite #

Consider

  • an operation oo in a minIR graph GG with values V,V,
  • an interface graph Hˉ\bar{H} with values VHV_H and its associated subgraph HHˉH \subseteq \bar{H}, such that HH implements an interface I(o)IH,I(o) \triangleleft I_H,
  • the index maps ρ:Idx(src(H))Idx(src(o))\rho: \mathrm{Idx}(\textit{src}(H)) \to \mathrm{Idx}(\textit{src}(o)) and σ:Idx(tgt(o))Idx(tgt(H))\sigma: \mathrm{Idx}(\textit{tgt}\,(o)) \to \mathrm{Idx}(\textit{tgt}\,(H)) that define the generalisation I(o)IHI(o) \triangleleft I_H (per Definition 3.10).

We can define a glueing relation μoV×VH\mu_o \subseteq V \times V_H

μo= {(srcρ(i)(o),srci(H))iIdx(src(H))} {(tgti(o),tgtσ(i)(H))iIdx(tgt(o))}.\begin{aligned}\mu_o =\ & \{ \left(\textit{src}_{\rho(i)}(o), \textit{src}_{i}(H)\right) \mid i \in \mathrm{Idx}(\textit{src}(H)) \}\ \cup \\& \{ \left(\mathit{tgt}_{i}\,(o), \mathit{tgt}_{\sigma(i)}(H)\right) \mid i \in \mathrm{Idx}(\textit{tgt}\,(o)) \}.\end{aligned}

This is almost enough to define a rewrite that replaces the operation oo in GG with the values and operations of HH – the interface compatibility constraint I(o)IHI(o) \triangleleft I_H that we have imposed ensures that the resulting minIR graph is valid. Unfortunately, μo\mu_o is not a partial function as required by Definition 3.4.

This is resolved in the following proposition:

Proposition 3.5MinIR operation rewrite

Let GG, oo and HH such that I(o)IHI(o) \triangleleft I_H, as defined above. Then

((GH)/μo ⁣){o},\big((G \sqcup H) / \sim_{\mu_o}\!\big) \smallsetminus \{o\},

i.e. the graph obtained by removing the operation oo from the glueing of GG and HH along μo\mu_o, is a valid minIR graph.

There is a graph GRG_R with values VRV_R and a partial function μo:VVR\mu_o': V \rightharpoonup V_R such that the graph (13) is the graph ro(G)r_o(G), obtained from the rewrite

ro=(GR,dom(μo),{o},μo).r_o = (G_R, dom(\mu_o), \{o\}, \mu_o').

We call ror_o the rewrite of oo into HH.

The definition of the rewrite of oo into a graph HH behaves as one would expect – the only subtleties relate to handling non-linear (i.e. copyable) values at the boundary of the rewrite. The following example illustrates some of these edge cases.

Rewriting operation ooo in the graph GGG (top left) into the operations o1o_1o1​ and o2o_2o2​ of the graph Hˉ\bar{H}Hˉ (bottom left). Coloured dots indicate the index maps ρ\rhoρ and σ\sigmaσ from inputs BUB_UBU​ of Hˉ\bar{H}Hˉ to inputs of ooo, respectively from outputs of ooo to outputs BDB_DBD​ of Hˉ\bar{H}Hˉ.

Rewriting operation oo in the graph GG (top left) into the operations o1o_1 and o2o_2 of the graph Hˉ\bar{H} (bottom left). Coloured dots indicate the index maps ρ\rho and σ\sigma from inputs BUB_U of Hˉ\bar{H} to inputs of oo, respectively from outputs of oo to outputs BDB_D of Hˉ\bar{H}.

When the index maps ρ\rho and σ\sigma are not injective (yellow and green dots), values are merged, resulting in multiple uses of the value (i.e. copies). This is why the index maps must be injective on linear values (dots in shades of blue). Value merging also happens when a value is used multiple times in oo (yellow and red dots). This will never happen with linear values (as they can never have more than one use in oo), nor with any value definitions (the same value can never be defined more than once). Finally, values not in the image of ρ\rho or σ\sigma (purple dot) are discarded. This case is also excluded for linear values by requiring surjectivity.

We start this proof with the explicit construction of GRG_R and μo\mu_o'. Define R(VH)2\sim_R \subseteq (V_H)^2 as the smallest equivalence relation such that

srcρ(i)(o)=srcρ(j)(o)tgti(oin)Rtgtj(oin).\textit{src}_{\rho(i)}(o) = \textit{src}_{\rho(j)}(o) \Rightarrow \textit{tgt}_i\,(o_{in}) \sim_R \textit{tgt}_j\,(o_{in}).

Then we define GˉR=Hˉ/R\bar{G}_R = \bar{H} / \sim_R, the graph obtained by glueing together values within the same equivalence class of R\sim_R.

Claim 1: GˉR\bar{G}_R is a valid minIR graph.

Claim 1 follows from the observation that only values of non-linear types are glued together. If vRvv \sim_R v', then either v=vv = v' or there exist iji \neq j such that tgti(oin)Rtgtj(oin).\textit{tgt}_i\,(o_{in}) \sim_R \textit{tgt}_j\,(o_{in}). If ρ(i)=ρ(j)\rho(i) = \rho(j), then ρ\rho is not injective on ii and jj, and by the definition of ρ\rho, τ(v)TL\tau(v)\notin T_L and τ(v)TL\tau(v') \notin T_L. Otherwise, there are i=ρ(i)srcρ(j)(o)=ji' = \rho(i) \neq \textit{src}_{\rho(j)}(o) = j' such that srci(o)=srcj(o)\textit{src}_{i'}(o) = \textit{src}_{j'}(o). The same value is used twice, which is only a valid minIR graph if vv and vv' are not linear, thus proving Claim 1.

Define GRG_R as the subgraph obtained from GˉR\bar{G}_R by removing the operations {oin,oout}\{o_{in}, o_{out}\}. Let VR=VH/RV_R = V_H / \sim_R be the set of values of GRG_R (and of GˉR\bar{G}_R). Writing αR(v)VR\alpha_R(v) \in V_R for the equivalence class of R\sim_R that vVHv \in V_H belongs to, we can define μoV×VR\mu_o' \in V \times V_R as:

(v,w)μo(v,αR(w))μo.(v, w) \in \mu_o \Leftrightarrow (v, \alpha_R(w)) \in \mu_o'.

Claim 2: μo\mu_o' is a partial function VVRV \rightharpoonup V_R.

In other words, for all (v,α1),(v,α2)μo(v, \alpha_1), (v, \alpha_2) \in \mu_o', then α1=α2\alpha_1 = \alpha_2. Let w1α1w_1 \in \alpha_1 and w2α2w_2 \in \alpha_2 be values in VHV_H. First of all, srci(o)tgtj(o)\textit{src}_i(o) \neq \textit{tgt}_j\,(o) for all i,ji, j, otherwise GG is not acyclic. So either use(v)=ouse(v) = o, or def(v)=o\textit{def}(v) = o, but not both.

The simpler case: if def(v)=o\textit{def}\,(v) = o, then there exists ii such that tgt(o)i=v\textit{tgt}\,(o)_i = v. Furthermore ii is unique because by minIR definition, vv has a unique definition in GG. It follows from (12) that w1=srcρ(i)(oout)=w2w_1 = \textit{src}_{\rho(i)}(o_{out}) = w_2 and hence α1=α2\alpha_1 = \alpha_2.

Otherwise, there exists ii and jj such that v=srcρ(i)(o)=srcρ(j)(o)v = \textit{src}_{\rho(i)}(o) = \textit{src}_{\rho(j)}(o) and tgti(oin)=w1\textit{tgt}_i\,(o_{in}) = w_1 as well as tgtj(oin)=w2\textit{tgt}_j\,(o_{in}) = w_2. By definition of R\sim_R, we have wRww \sim_R w', and thus

α1=αR(w1)=αR(w2)=α2,\alpha_1 = \alpha_R(w_1) = \alpha_R(w_2) = \alpha_2,

proving Claim 2.

Claim 3: ro(G)r_o(G) is given by ((GH)/μo){o}((G \sqcup H) / \sim_{\mu_o}) \smallsetminus \{o\}.

It follows directly from our construction of R\sim_R and μo\mu_o' that the equivalence classes of (the smallest equivalence relation closure of) μoαR\mu_o' \circ \alpha_R is equal to the equivalence classes of (the smallest equivalence relation closure of) μo\mu_o. The claim follows by Definition 3.8 and the definition of ror_o.

And finally, Claim 4: ro(G)r_o(G) is a valid minIR graph.

Per Definition 3.4, We must check four properties: (i) every value is defined exactly once, (ii) every linear value is used exactly once, (iii) the graph is acyclic, and (iv) every region has (at most) one parent.

(iii) follows from the fact that GG and HH are acyclic and a single operation oo in GG is replaced: any cycle across GG and HH would also be a cycle in GG by replacing the subpath in HH with oo. (iv) follows from the fact that oino_{in} and oouto_{out} are in the root region of Hˉ\bar{H}, by definition of interface implementation. (i): removing oo from GG removes the unique definitions of all values that are targets of oo. Each such value vv is glued to a unique value srci(oout)\mathit{src}_i\,(o_{out}) in HH – the new and unique definition of vv in ro(G).r_o(G). (ii) follows from the same argument as in (i), but relying on injectivity of ρ\rho on linear values to establish uniqueness.

Arbitrary minIR rewrites #

We have so far defined rewrites of single operations into graphs HH. We can generalise these rewrites to rewrite subgraphs PGP \subseteq G, provided the minIR subgraphs satisfy some constraints. We require for this a notion of convexity, as discussed in Bonchi, 2022Filippo Bonchi, Fabio Gadducci, Aleks Kissinger, Pawel Sobocinski and Fabio Zanasi. 2022. String diagram rewrite theory II: Rewriting with symmetric monoidal structure. Mathematical Structures in Computer Science 32, 4 (April 2022, 511--541). doi: 10.1017/s0960129522000317.

As usual, let us consider a minIR graph GG with values VV, linear values VLVV_L \subseteq V, edges EE, the incidence maps srci\textit{src}_i and tgtj\textit{tgt}_j as well as their inverses use\textit{use} and def\textit{def}. Consider further a subgraph of GG that we will now call P=(VP,EP)GP = (V_P, E_P) \subseteq G, to distinguish from HH.

Let us further define the partial parentparent morphism that maps a value vVv \in V to the parent of the region of vv.

Definition 3.12Convex minIR subgraph

A minIR subgraph PGP \subseteq G is convex if the following conditions hold:

  • for all v1,v2VPv_1, v_2 \in V_P, any path along \leadsto from v1v_1 to v2v_2 contains only vertices in VPV_P,
  • parent-child relations are contained within the subgraph, i.e. vVPdom(parent)parent(v)VP.v \in V_P \cap dom(\mathit{parent}) \Leftrightarrow \mathit{parent}(v) \in V_P.

Define the sets of boundary values BU,BDB_U, B_D and B=BUBDB = B_U \cup B_D, as in (10); then fix the boundary orderings src(P)\textit{src}(P) and tgt(P)\textit{tgt}\,(P) as in (11). The subgraph PP implements the interface

Consider an interface graph Hˉ\bar{H} that implements IHI_H such that IPIHI_P \triangleleft I_H. Instead of defining a gluing relation from values of an operation oo to values of HH, we replace the interface I(o)I(o) with IPI_P. This generalises the definition of μo\mu_o from (12) to a glueing μB×VH\mu\subseteq B \times V_H defined as

μ= {((srcρ(i)(P)),srci(H))iIdx(src(H))} {((tgti(P)),tgtσ(i)(H))iIdx(tgt(P))},\begin{aligned}\mu =\ & \{ \left((\textit{src}_{\rho(i)}(P)), \textit{src}_{i}\,(H)\right) \mid i \in \mathrm{Idx}(\textit{src}(H)) \}\ \cup \\& \{ \left((\textit{tgt}_{i}\,(P)), \textit{tgt}_{\sigma(i)}\,(H)\right) \mid i \in \mathrm{Idx}(\textit{tgt}\,(P)) \},\end{aligned}

With the set of boundary operations defined as2

EB={oEP(tgt(o)src(o))B},E_B = \left\{o \in E_P \mid \left(\mathit{tgt}\,(o) \cup \textit{src}(o)\right) \subseteq B\right\},

we are able to define minIR rewrites in their most general form.

Proposition 3.6MinIR subgraph rewrite

Let PGP \subseteq G and HH such that IPIHI_P \triangleleft I_H and PP is convex, as defined above. Then,

((GH)/μ ⁣)(VPB,EB),\big((G \sqcup H) / \sim_{\mu}\!\big) \smallsetminus (V_P \smallsetminus B, E_B),

i.e. the graph obtained by removing the values VPBV_P \smallsetminus B and operations EBE_B from the glueing of GG and HH along μ\mu, is a valid minIR graph.

There is a graph GRG_R with values VRV_R and a partial function μ:VPVR\mu': V_P \rightharpoonup V_R such that the graph (16) is the graph rP(G)r_P(G), obtained from the rewrite

rP=(GR,VP,EB,μ).r_P = (G_R, V_P, E_B, \mu').

We call rPr_P the rewrite of PP into HH.

Consider an operation oo that implements IP=(UP,DP)I_P = (U_P, D_P). We can define the interface graph Hˉo\bar{H}_o given by three operations oino_{in}, oouto_{out} and oo. Its associated subgraph HoHˉoH_o \subseteq \bar{H}_o only includes oo. Let μ~\tilde \mu be the glueing relation

μ~= {(srci(P),srci(o))iIdx(UP)} {(tgti(P),tgti(o))iIdx(DP)}.\begin{aligned}\tilde \mu =\ &\{ (\textit{src}_i(P), \textit{src}_i(o)) \mid i \in \mathrm{Idx}(U_P) \}\ \cup \\& \{ (\textit{tgt}_i\,(P), \textit{tgt}_i\,(o)) \mid i \in \mathrm{Idx}(D_P) \}.\end{aligned}

Consider the rewrite r=(Ho,VPB,EB,μ~)r = (H_o, V_P \smallsetminus B, E_B, \tilde\mu). If we write G=(V,E)G' = (V', E') for the subgraph of GG given by V=(V(VPB))E=(EEB)(V),\begin{aligned}V' &= (V \smallsetminus (V_P \smallsetminus B)) \\E' &= (E \smallsetminus E_B) \cap (V')^\ast,\end{aligned} then according to (9), the graph resulting from applying rr to GG can be expressed as the glueing

Go=r(G)=(GHo)/μ~.G_o = r(G) = (G' \sqcup H_o) / \sim_{\tilde \mu}.

Our claim is that GoG_o is a valid minIR graph.

The graph (16) is then obtained by applying the rewrite ror_o as given by (14) to GoG_o. Defining the rewrite rPr_P as the composition of rr followed by ror_o, the result follows from our claim and Proposition 3.5.

We now prove the claim, by showing the four properties of minIR graphs as per Definition 3.4. Property i) requires showing that every value is defined exactly once. As GG' is obtained by removing values and operations from a valid minIR graph GG, no value in VV' can be defined more than once. A value vVv \in V' that is not defined in GG' must be in the boundary vBv \in B of PP. By the boundary definitions of (10), vv cannot be in BUB_U and thus must be in BDB_D. It follows by the definition of the glueing μ~\tilde \mu that in GoG_o, vv will be in the definitions of oo: def(v)=o\textit{def}(v) = o. The glueing μ~\tilde \mu is bijective between the values of PP and oo and thus we can conclude that vv has a unique definition in GoG_o.

The same argument applies to property ii). Property iii) follows from the convexity requirement of PP. Finally, property iv) (every region has at most one parent) follows from two observations. First, by convexity of PP, no deleted value or operation could be the parent of any value not in PP, and thus the parentparent relation is well-defined on GG': im(parent)Eim(parent) \subseteq E'. Secondly, all new values and operations added to the boundary region of GG' are from the root region of HH, and thus do not have a parent, ensuring that parent uniqueness is preserved.

This simple and limited graph transformation framework captures a remarkably large set of minIR program transformations. It may seem at first that the restriction to boundary values within a single region of Definition 3.11, as well as the convexity requirements of Definition 3.12 represent significant limitations on the expressivity of the rewrites. In practice, however, the semantics of minIR operations can be used to decompose more complex rewrites into a sequence of simple rewrites to which Proposition 3.6 applies.

Consider minIR graphs with a type system that includes regiondef and call operations as discussed in examples of the previous section – respectively defining a code block by a nested region and redirecting control flow to a code block defined using a regiondef. Then all constraints that we impose on rewriting can be effectively side-stepped using the region outlining and value hoisting transformations.

Region outlining moves a valid minIR subgraph into its own separate region, and replaces the hole left by the subgraph in the computation by a call operation to the newly outlined region.

Value hoisting moves a value definition within a region to its parent region and passes the value down to the nested region through an additional input. In case of linear values, we can similarly hoist the unique use of the value to the parent region.

Using these transformations, non-convex subgraphs can always be made convex by taking the convex hull and outlining any parts within it that are not part of the subgraph. Outlined regions can then be passed as additional inputs to the subgraph. Step 1 of the figure below illustrates this transformation. Similarly, a subgraph that includes operations without their parent can be extended to cover the entire region and its parent, outlining any parts of the region that are not part of the subgraph.

Finally, whenever a boundary value vv belongs to a region that is not the top level region of the subgraph3, we can repeatedley hoist vv to its parent region until it is in the top level region. The value is then recursively passed as argument to descendant regions until the region that it is required in. Subgraphs can thus always be transformed to only have input and output boundary values at the top level region. Step 2 of the figure below illustrates this transformation.

A non-convex minIR graph rewrite, obtained by decomposition into valid convex rewrites, using outlining and hoisting. For simplicity, regiondef operations were made implicit and represented by nested boxes: a region within an operation corresponds to a region definition that is passed as an argument to the operation. Edge colours correspond to value types. Step 1 outlines the ... operations into a dedicated region, which step 2 hoists outside of the region being rewritten. Step 3 and 4 together correspond to a minIR sugraph rewrite. They have been split into two steps following the proof strategy. Step 4 is an instance of a minIR operation rewrite.

A non-convex minIR graph rewrite, obtained by decomposition into valid convex rewrites, using outlining and hoisting. For simplicity, regiondef operations were made implicit and represented by nested boxes: a region within an operation corresponds to a region definition that is passed as an argument to the operation. Edge colours correspond to value types. Step 1 outlines the ... operations into a dedicated region, which step 2 hoists outside of the region being rewritten. Step 3 and 4 together correspond to a minIR sugraph rewrite. They have been split into two steps following the proof strategy. Step 4 is an instance of a minIR operation rewrite.


  1. To be precise, \preccurlyeq is a partial order on the type strings up to isomorphism↩︎

  2. The set operations \subseteq and \cup are again understood to apply to the unordered set of elements contained in the lists tgt(o)\mathit\,{tgt}(o) and src(o)\mathit{src}\,(o)↩︎

  3. We can always extend a subgraph to contain more ancestor regions, until there is indeed a unique top-level region in the subgraph. ↩︎


Chapter 4

Pattern Matching in large Graph Transformation Systems

To our knowledge, the first practical proposal for a GTS-based quantum compiler was presented in Xu, 2022Mingkuan Xu, Zikun Li, Oded Padon, Sina Lin, Jessica Pointing, Auguste Hirth, Henry Ma, Jens Palsberg, Alex Aiken, Umut A. Acar and Zhihao Jia. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 and then refined in Xu, 2023Amanda Xu, Abtin Molavi, Lauren Pick, Swamit Tannu and Aws Albarghouthi. 2023. Synthesizing Quantum-Circuit Optimizers. Proceedings of the ACM on Programming Languages 7, PLDI (June 2023, 835--859). doi: 10.1145/3591254. In these, the set of possible graph transformations is obtained by exhaustive enumeration. Using SAT solvers and fingerprinting techniques, the set of all small programs up to a certain size can be generated ahead of time and clustered into disjoint partitions of equivalent programs. This concisely expresses every possible peephole optimisation up to the specified size: for every small enough subset of operations of an input program, its equivalence class can be determined. Any replacement of that set of operations with another program in the same equivalence class is a valid transformation and, thus, a potential peephole optimisation. Transformation systems on minIR graphs based on equivalence classes were formalised in section 3.4.

First results of this approach are promising. Xu, 2022Mingkuan Xu, Zikun Li, Oded Padon, Sina Lin, Jessica Pointing, Auguste Hirth, Henry Ma, Jens Palsberg, Alex Aiken, Umut A. Acar and Zhihao Jia. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 demonstrated that optimisation performance improves markedly with larger sets of transformation rules. Such workloads however rely heavily on pattern matching, the computational task that identifies subgraphs on which transformation rules apply. In Xu, 2022Mingkuan Xu, Zikun Li, Oded Padon, Sina Lin, Jessica Pointing, Auguste Hirth, Henry Ma, Jens Palsberg, Alex Aiken, Umut A. Acar and Zhihao Jia. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 and Xu, 2023Amanda Xu, Abtin Molavi, Lauren Pick, Swamit Tannu and Aws Albarghouthi. 2023. Synthesizing Quantum-Circuit Optimizers. Proceedings of the ACM on Programming Languages 7, PLDI (June 2023, 835--859). doi: 10.1145/3591254, pattern matching is carried out separately for each pattern. This becomes a significant bottleneck for large rule sets. In Xu, 2022Mingkuan Xu, Zikun Li, Oded Padon, Sina Lin, Jessica Pointing, Auguste Hirth, Henry Ma, Jens Palsberg, Alex Aiken, Umut A. Acar and Zhihao Jia. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433, performance peaks at around 50,000 transformation rules, after which the additional overhead from pattern matching becomes dominant, deteriorating the compilation results.

In this chapter, we solve these scaling difficulties by presenting an algorithm for pattern matching on minIR graphs that uses a pre-computed data structure to return all pattern matches in a single query. The set of transformation rules is directly encoded in this data structure. After a one-time cost for construction, pattern-matching queries can be answered in running time independent of the number of rules in the transformation system.

The asymptotic complexity results presented in this chapter depend on some simplifying assumptions on the properties that the pattern graphs and embeddings must satisfy. This represents a restriction on the generality of minIR graphs, but we do not find that they restrict the usefulness of the result in practice. As discussed in section 4.7, none of these assumptions are required in practice for the implementation. We have not observed any impact on performance when the imposed constraints are lifted, so we conjecture that at least some of these assumptions can be relaxed and our results generalised.

After a discussion of related work in section 4.1, Section 4.2 presents the assumptions that we are making in detail, along with some relevant definitions for the rest of the chapter. Sections 4.3, 4.4, and 4.5 present the core ideas of our approach, respectively introducing: a reduction of minIR graphs to equivalent trees, a canonical construction for the tree reduction and an efficient way to enumerate all possible subtrees of a graph. We also prove bounds on the size and number of the resulting trees.

In section 4.6, we introduce a pre-computation step and show that the pattern-matching problem reduced to tree structures can be solved using a prefix tree-like automaton that is fixed and pre-computed for a given set of patterns. Combining the automaton construction with bounds from section 4.5 leads to the final result. We conclude in section 4.7 with benchmarks on a real-world dataset of 10000 quantum circuits, obtaining a 20x speedup over a leading C++ implementation of pattern matching for quantum circuits.

4.1. Related work

Our proposed solution can be seen as a specialisation of RETE networks Forgy, 1982Charles L. Forgy. 1982. Rete: A fast algorithm for the many pattern/many object pattern match problem. Artificial Intelligence 19, 1 (Septempter 1982, 17--37). doi: 10.1016/0004-3702(82)90020-0 Varró, 2013Gergely Varró and Frederik Deckwerth. 2013. A Rete Network Construction Algorithm for Incremental Pattern Matching and derivatives Ian, 2003Wright Ian and James A. R. Marshall. 2003. The execution kernel of RC++: RETE*, a faster RETE with TREAT as a special case. International Journal of Intelligent Games and Simulation 2, 1 (Feb 2003, 36-48) Armstr., 2014Dylan Armstrong. 2014. Memory Efficient Stream Reasoning onResource-Limited Devices. PhD Thesis. University of Dublin, Trinity College Mirank., 1987D. P. Miranker. 1987. TREAT: a new and efficient match algorithm for AI production systems. PhD Thesis. Columbia University, 2960 Broadway New York, NYUnited States to the case of graph pattern matching. The additional structure obtained from restricting our considerations to graphs results in a simplified network design that allows us to derive worst-case asymptotic runtime and space bounds that are polynomial in the parameters relevant to our use case1 – overcoming a key limitation of RETE.

Another well-studied application of large-scale pattern matching is in the context of stochastic biomolecular simulations Sneddon, 2010Michael W Sneddon, James R Faeder and Thierry Emonet. 2010. Efficient modeling, simulation and coarse-graining of biological complexity with NFsim. Nature Methods 8, 2 (December 2010, 177--183). doi: 10.1038/nmeth.1546 Bachman, 2011John A Bachman and Peter Sorger. 2011. New approaches to modeling complex biochemistry. Nature Methods 8, 2 (January 2011, 130--131). doi: 10.1038/nmeth0211-130, particularly the Kappa project Danos, 2004Vincent Danos and Cosimo Laneve. 2004. Formal molecular biology. Theoretical Computer Science 325, 1 (Septempter 2004, 69--110). doi: 10.1016/j.tcs.2004.03.065. Stochastic simulations depend on performing many rounds of fast pattern-matching for continuous Monte Carlo simulations Yang, 2008Jin Yang, Michael I. Monine, James R. Faeder and William S. Hlavacek. 2008. Kinetic Monte Carlo method for rule-based modeling of biochemical networks. Physical Review E 78, 3 (Septempter 2008, 031910). doi: 10.1103/physreve.78.031910. However, unlike our use case, the procedure typically does not need to scale well to a large number of patterns. In Danos, 2007Vincent Danos, Jérôme Feret, Walter Fontana and Jean Krivine. 2007. Scalable Simulation of Cellular Signaling Networks, Danos et al. introduced a pre-computation step to accelerate matching by establishing relations between patterns that activate or inhibit further patterns. This idea was later expanded upon and formalised in categorical language in Boutil., 2017Pierre Boutillier, Thomas Ehrhard and Jean Krivine. 2017. Incremental Update for Graph Rewriting. The ideas presented in Boutil., 2017Pierre Boutillier, Thomas Ehrhard and Jean Krivine. 2017. Incremental Update for Graph Rewriting are similar to ours; their formalism has the advantage of being more general but does not present any asymptotic complexity bounds and suffers from similar worst-case complexities as RETE.

A similar problem has also been studied in the context of multiple-query optimisation for database queries Sellis, 1988Timos K. Sellis. 1988. Multiple-query optimization. ACM Transactions on Database Systems 13, 1 (March 1988, 23–52). doi: 10.1145/42201.42203 Ren, 2016Xuguang Ren and Junhu Wang. 2016. Multi-Query Optimization for Subgraph Isomorphism Search. Proceedings of the VLDB Endowment 10, 3 (November 2016, 121–132). doi: 10.14778/3021924.3021929, but it has limited itself to developing caching strategies and search heuristics for specific use cases. Finally, using a pre-compiled data structure for pattern matching was already proposed in Messmer, 1999Bruno T. Messmer and Horst Bunke. 1999. A decision tree approach to graph and subgraph isomorphism detection. Pattern Recognition 32, 12 (December 1999, 1979--1998). doi: 10.1016/S0031-3203(98)90142-X. However, with a nΘ(m)n^{\Theta(m)} space complexity – nn is the input size and mm the pattern size – it does not scale to large input graphs, even for small patterns.


  1. RETE networks have been shown to have exponential worst-case space (and thus time) complexity Rakib, 2018Abdur Rakib and Ijaz Uddin. 2018. An Efficient Rule-Based Distributed Reasoning Framework for Resource-bounded Systems. Mobile Networks and Applications 24, 1 (October 2018, 82--99). doi: 10.1007/s11036-018-1140-x, although performance in practical use cases can vary widely Uddin, 2016Ijaz Uddin, Hafiz Mahfooz Ul Haque, Abdur Rakib and Mohamad Rafi Segi Rahmat. 2016. Resource-Bounded Context-Aware Applications: A Survey and Early Experiment↩︎

4.2. Preliminaries and simplifying assumptions

For simplicity, we will throughout consider minIR graphs that admit a type system Σ\Sigma, though most of the results can also be adapted to other graph domains. We will write TT for the types of Σ\Sigma (i.e. its values) and Γ\Gamma for the operation types (i.e. its edges).

Linear paths and operation splitting #

An operation type νΓ\nu \in \Gamma in the type system Σ\Sigma is a hyperedge. Its endpoints

src(ν)=src1(ν)srcs(ν) and tgt(ν)=tgt1(ν)tgtt(ν) \textit{src}(\nu) = \textit{src}_1(\nu) \cdots \textit{src}_s(\nu) \textrm{ and } \textit{tgt}(\nu) = \textit{tgt}_1(\nu) \cdots \textit{tgt}_t(\nu)

are strings of data types that define the input and output signature of the operation ν\nu. We can refer to the set of all hyperedge endpoints of ν\nu using the string indices Idx()\mathrm{Idx}(\cdot) (\sqcup denotes the disjoint set union):

Pν=Idx(src(ν))Idx(tgt(ν))={1,,s}{1,,t}.P_\nu = \mathrm{Idx}(\textit{src}(\nu)) \sqcup \mathrm{Idx}(\textit{tgt}(\nu)) = \{1, \dots, s\} \sqcup \{1, \dots, t\}.

Fix a partition of PνP_\nu into disjoint pairs

Pν={p1,p1}{p2,p2},P_\nu = \{p_1, p_1'\} \,\sqcup\, \{p_2, p_2'\} \,\sqcup\, \cdots,

where the last set of the partition may be a singleton if Pν|P_\nu| is odd. For every νΓ\nu \in \Gamma, we then define f=Po/2f = \lceil |P_o| / 2 \rceil new split operation types ν1,,νf\nu_1, \dots, \nu_f, each with two endpoints: the ii-th operation type νi\nu_i has endpoints pip_i and pip_i' in PνP_\nu. For every operation oEo \in E of type ν\nu, we can then split oo into ff operations o1,,of,o_1, \dots, o_f, each of arity 1 or 2 and of types ν1,,νf\nu_1, \dots, \nu_f respectively. We will refer to the graph transformation that replaces an operation oo in a minIR graph with the operations oio_i for 1if1 \leqslant i \leqslant f as operation splitting.

It is important to note that the splitting of an operation oo is unique and given by the type of oo, and thus invariant under (typed) morphisms: there is a morphism PGP \to G of a pattern into a graph if and only if there is a morphism PGP' \to G' from the split pattern PP' into the split graph GG'.

Operation splitting

A transformation rule splitting an operation with 3 sources and 2 targets. The choice of endpoint partition made here, obtained by pairing the ii-th use with the ii-th define, is arbitrary but convenient for quantum gates as they correspond to the input and output values of a same qubit.

The endpoint partitions PνP_\nu also define linear paths. Two values v,vv, v' in a minIR graph are on the same linear path if there are values u1,,uku_1, \dots, u_k with v=u1v = u_1 and v=ukv' = u_k such that uiu_i is connected to ui+1u_{i+1} through an operation oo and they correspond to the same pair of endpoints in the endpoints partition (i.e. the indices of Ptype(o)P_{type(o)} correspond to values uiu_i and ui+1u_{i+1} in src(o)tgt(o)\textit{src}(o) \sqcup \textit{tgt}(o)).

Linearity assumption and rigidity #

Recall that in Definition 3.2, VsrcV_\textit{src} and VtgtV_\textit{tgt} refer to the subset of values that are within the domain of definition of src\textit{src} and tgt\textit{tgt} respectively. For this chapter, we will assume Vsrc=Vtgt=VV_\textit{src} = V_\textit{tgt} = V; in other words, minIR graphs are IO-free (this is w.l.o.g., see discussion after Proposition 3.4) and all values are linear1 (this is definitely not w.l.o.g.!).

As a result of this assumption, the subcategory of minIR graphs that we consider forms a rigid category, as introduced by Danos et al. Danos, 2014Vincent Danos, Reiko Heckel and Pawel Sobocinski. 2014. Transformation and Refinement of Rigid Structures. The definition, which we reproduce here, is given in terms of morphisms that intersect all components of the codomain. We refer to Danos, 2014Vincent Danos, Reiko Heckel and Pawel Sobocinski. 2014. Transformation and Refinement of Rigid Structures for the precise definition of that notion: in the context of linear-valued minIR graphs, this is equivalent to requiring that the image of the graph homomorphism intersects every connected component of the codomain.

Definition 4.1Rigid Category

A category C\mathbb C is rigid if for all morphisms AhBA \xrightarrow{h} B in C\mathbb C that intersects all components of BB and for all AgCA \xrightarrow{g} C that factorises as g=fhg = f \circ h, then BfCB \xrightarrow{f} C is unique.

In other words, there is a unique way to extend a morphism AgCA \xrightarrow{g} C to a morphism BfCB \xrightarrow{f} C, if it exists. If we interpret AA and BB as graph patterns that we are interested in matching with ABA \subseteq B, then rigidity guarantees that there is (at most) a unique way to extend a match morphism AGA \to G into a match morphism on the larger pattern BGB \to G.

The linearity assumption also has other useful consequences. Every linear value has exactly one use and one definition. As a result, all linear paths are disjoint and form a partition of the values of the graph. They correspond to the paths that form the connected components of the fully split graph, i.e. the graph obtained by splitting every operation. We call the number of linear paths (and hence the number of connected components in the fully split graph) the circuit width, written width(G)width(G). We also use the linear path decomposition to define circuit depth, written depth(G)depth(G), as the longest linear path in GG.

As discussed in section 3.4, minIR rewrites are instantiated from transformation rules by minIR match morphisms m:PGm: P \to G. Restricting our considerations to linear-valued minIR graphs has the further implication that all such match morphsisms mm will be injective. We call mm an embedding and write it using greek letters and a hooked arrow m=φ:PG.m = \varphi: P \hookrightarrow G.

Finding such embeddings PGP \hookrightarrow G is the pattern matching problem that we are solving. This problem is equivalent to finding minIR subgraphs HGH \subseteq G of GG such that HH is isomorphic to the pattern PHP \simeq H.

Convexity #

According to Proposition 3.6, a necessary condition for a subgraph HH to define a valid minIR rewrite is convexity. In this chapter we weaken this requirement and propose a condition based on circuit width:

Proposition 4.1Necessary condition for convexity

Let φ:PG\varphi: P \hookrightarrow G be an embedding of a pattern PP into a linear-valued minIR graph GG such that φ(P)\varphi(P) is a convex subgraph of GG. Then for every subgraph HGH \subseteq G such that φ(P)H\varphi(P) \subseteq H, it holds that width(P)width(H).width(P) \leq width(H).

Up to isomorphism, we can assume PGP \subseteq G. Suppose there is HGH \subseteq G such that PHP \subseteq H and width(P)>width(H)width(P) > width(H). Let LP,LHP(V)\mathcal{L}_P, \mathcal{L}_H \subseteq \mathcal{P}(V) be partitions of VPV_P and VHV_H into sets of values that are on the same linear path of PP and HH respectively. It must hold that for all LP\ell \in \mathcal{L}_P there is LH\ell' \in \mathcal{L}_H such that \ell \subseteq \ell', because PHP \subseteq H and operation splitting is preserved under embeddings. As the map from LP\mathcal{L}_P to LH\mathcal{L}_H cannot be injective, there must be 1,2LP\ell_1, \ell_2 \in \mathcal{L}_P and LH\ell' \in \mathcal{L}_H, such that 12\ell_1 \neq \ell_2 and 1,2\ell_1, \ell_2 \subseteq \ell'. We conclude that there must be a path in the fully split graph of HH between a value of 1\ell_1 and a value of 2\ell_2 that is not in the fully split graph of PP. Given that PP is convex, this path must be in PP, which contradicts the preservation of operation splitting under embeddings.

In this chapter, whenever we define a subgraph HGH \subseteq G of a graph GG, we will assume that HH satisfies the above weakened convexity condition.

The converse of Proposition 4.1, however, is not true. The pattern-matching technique presented below will find a strict superset of convex embeddings. To restrict considerations to convex embeddings, it suffices to filter out the non-convex ones in a post-processing step.

Ignoring minIR Hierarchy #

So far, we have omitted discussing one part of the minIR structure: the nested hierarchy of operations. Syntactically, the hierarchy formed by parentparent relations between minIR operations can be viewed as just another value type that operations are incident to: parent operations define an additional output that children operations consume as additional input. Because of the bijectivity requirement of minIR morphisms on parent-child relations of Definition ., these parent-child relations behave, in fact, like linear values – and hence do not violate the linearity assumption we have imposed.

However, by treating them as such, we have further weakened the constraints on pattern embeddings. We do not enforce that boundary values must be in the same regions or that parent-child relations cannot be boundary values. Similarly to convexity, we defer checking these properties to a post-processing step.

Further assumptions (harmless) #

We will further simplify the problem by making presentation choices that do not imply any loss of generality. First of all, we assume that all patterns have the same width ww and depth dd, are connected graphs and have at least 2 operations. These conditions can always be fulfilled by adding “dummy” operations if necessary. Embeddings of disconnected patterns can be computed one connected component at a time.

We will further assume that all operations are on at most two linear paths (and thus in particular, have at most 4 endpoints). Operations on Δ>2\Delta > 2 linear paths can always be broken up into a composition of Δ1\Delta-1 operations, each on two linear paths as follows:

Gate decomposition

Expressing an operation on Δ=3\Delta = 3 linear paths as a composition of two operations on 2 linear paths.

This transformation leaves circuit width unchanged but may multiply the graph depth by up to a factor Δ\Delta.

We furthermore define the set of all port labels

Pall=νΓPνP_{all} = \bigcup_{\nu \in \Gamma} P_\nu

so that we can associate every operation endpoint in a minIR graph GG with a port label from the set PallP_{all}. We further endow the labels PallP_{all} with a total order (for instance, based on the string index values). The total order on PallP_{all} then induces a total order on the paths v1vkVv_1\cdots v_k \in V^\ast in GG that start in the same value v1v_1: the paths are equivalently described by the sequence of port labels of the operations traversed. These form strings in PallP_{all}^\ast, which we order lexicographically. Given a root value rr, for every value vv in GG there is thus a unique smallest path from rr to vv in GG2. This path is invariant under isomorphism of the underlying graph (i.e. relabelling of the values and operations but preserving the port labels). With this we conclude the discussions of the specificities of minIR graphs related to typing, linearity and hierarchy, and the related assumptions that we are making.

To summarise, minIR graphs as they are considered in this chapter are hypergraphs (Definition 3.1) that satisfy the following properties

  • every vertex (value) is incident to exactly two hyperedges (operations). It is the target of one hyperedge (its definition) and the source of another one (its use),
  • every hyperedge is incident to at most four vertices,
  • every hyperedge can be split in a unique way (and invariant under isomorphism) into at most two split operations, with each at most two endpoints.

When modelling subgraphs of IO-free minIR graphs (typically patterns for pattern matching), some hyperedge connections at the boundary of the subgraph will be missing. We say a value is open if a use or define operation is missing (i.e. it is a boundary value in a minIR subgraph).

We will simplify refer to hypergraphs that satisfy the above assumptions as graphs. In the unique instance of this chapter where a graph that does not satisfy this construction is referred to, we will specifically call it a simple graph.

We conclude with the following notable bound on circuit width.

Proposition 4.2Bound on circuit width

Let GG be a graph with noddn_\textrm{odd} operations of odd arity (i.e. def(o)+use(o)|def(o) + use(o)| is odd) and nωn_\omega open values. Then, the circuit width of GG is

width(G)=(nodd+nω)/2.width(G) = \lfloor(n_\textrm{odd} + n_\omega) / 2\rfloor.

For any linear path PVP \subseteq V^\ast in GG consider its two ends v1v_1 and v2v_2, i.e. the two values in PP with only one neighbouring value in PP (by definition linear paths cannot be empty). In the fully split graph of GG, these values are either open or must be connected to two operations. In the latter case, at least one of the operations must have a single endpoint (otherwise by acyclicity, the operation would have two neighbours).

In a fully split graph, operations with a single endpoint result from a split operation with an odd number of endpoints. We conclude that for every linear path, there are either two operations with an odd number of endpoints in GG, or one such operation and one open value, or two open values. The result follows.


  1. This restriction is necessary for our results: copyable values may admit an arbitrary number of adjacent hyperedges. As a result, minIR graph pattern matching with copyable values is a generalisation of the subgraph isomorphism problem, a well-known NP-complete problem Cook, 1971Stephen A. Cook. 1971. The complexity of theorem-proving procedures. In Proceedings of the third annual ACM symposium on Theory of computing - STOC ’71. ACM Press, 151--158. doi: 10.1145/800157.805047. The approach generalises to non-linear types, but the complexity analysis no longer holds (we pay a computational price for every non-linear value matched). ↩︎

  2. Remark that the ordering of the operations thus defined is a particular case of a depth-first search (DFS) ordering of the graph: given an operation oo that has been visited, all its descendants will be visited before proceeding to any other operation. ↩︎

4.3. Tree reduction

We reduce the problem of graph pattern matching to matching on rooted trees – as we will see in section 4.6, a much simpler problem to solve. The map between graphs and (rooted) trees is given by rooted dual trees. Call GG tree-like if GG is connected and the underlying undirected graph GUG_U of GG is acyclic.

Definition 4.2Rooted dual tree

Let GG be a tree-like graph with operations OO. Then given a root operation rOr \in O, the rooted dual tree of GG rooted at rr, written τr(G)\tau_r(G) is the tree given by

  • the nodes of the tree are the operations OO of GG,
  • the parent and children of oOo \in O are the operations that share a value with oo in GG; the parent is the unique operation on the path from oo to rr,
  • the children of an operation are ordered according to the port labels.

Unlike graphs, tree nodes are identified uniquely by their path from the root. Trees isomorphic as graphs with identical root are thus considered equal.

A tree reduction using path splitting #

To reduce a graph GG to a tree using the rooted dual tree construction, it suffices to reduce GG to a tree-like graph. The following result shows that this can always be achieved by repeatedly applying operation splitting transformations.

Proposition 4.6Path splitting

A tree-like graph can be obtained from any connected graph GG by applying operation splittings. The resulting graph is a path-split graph (PSG) of GG.

Consider the undirected simple graph I\mathcal{I}, where vertices are linear paths, and there is an edge between two linear paths for every operation that belongs to both paths. We call I\mathcal{I} the interface graph of GG.

Splitting an operation oo in a graph GG corresponds to removing the corresponding edge in I\mathcal{I}. On the other hand, the underlying undirected graph GUG_U of GG has a cycle if and only if there is a cycle in I\mathcal{I}. Indeed, a cycle in GUG_U cannot belong to a single linear path in GG, by acyclicity of minIR graphs. There is, therefore, a cycle of operations that span multiple linear paths, thus forming a cycle in I\mathcal{I}.

Hence, the operations to be split to turn GG into a tree-like graph are given by the set of edges EE^- in I\mathcal{I} that must be removed to obtain a spanning tree of I.\mathcal{I}.1

As we consider typed graph, the splitting of an operation is unique; however the choice of spanning tree of I\mathcal I is not unique, and thus multiple PSGs exist for a given graph GG.

If GG' is a PSG of some graph GG, then call an operation oo of GG an anchor operation if it is on two linear paths and it is not split in GG'. The set of all anchors operations πO\pi \subseteq O fully determines the path-split graph. We write Gπ=GG^\pi = G' for the PSG of GG obtained by anchors π\pi.

Proposition 4.3Rooted dual trees are ternary
Consider a PSG GπG^\pi of a graph GG. There is a root operation rOr \in O such that the rooted dual tree τr(Gπ)\tau_r(G^\pi) is a ternary tree, i.e. every node of τr(Gπ)\tau_r(G^\pi) has at most three children.

We have assumed in section 4.2 that every operation in GG is on at most two linear paths and thus can be connected to at most four values. Each value is linear and hence connected to at most one other operation. It results that every operation in τr(Gπ)\tau_r(G^\pi) has at most four neighbouring operations – one parent and three children. A tree leaf can be chosen as the root operation to ensure the root does not have four children.

We can make the path splitting transformation GGπG \to G^\pi reversible by separately storing the set of split operations in GπG^\pi that correspond to a single operation in GG. As every operation of GG can get split in at most two split operations, we can store the pairs of split operations in GπG^\pi that correspond to an operation in GG in a partial map that defines weights for (a subset of) the operations OπO^\pi of GπG^\pi:

split:OπP.split: O^\pi \rightharpoonup P^\ast.

This maps a split operation oo to the unique undirected path in GπG^\pi from oo to the other half of the split operation.

This defines a map σ1:(Gπ,split)G\sigma_1: (G^\pi, split) \mapsto G, the inverse of the path splitting transformation GGπG \to G^\pi.

Contracted path-split graphs #

We can further simplify the structure of the data of a PSG by contracting all operations of GπG^\pi that are on a single linear path. The result is the contracted path-split graph (cPSG) of GG, written c(Gπ)c(G^\pi).

We employ a similar trick as above to make this transformation reversible, this time by introducing weights on the values of c(Gπ)c(G^\pi) that store the string of operations that were contracted2

contract:VC(Γπ)contract: V_C \rightharpoonup (\Gamma^\pi)^\ast

where VCV_C are the values of c(Gπ)c(G^\pi) and Γπ\Gamma^\pi are the optypes of operations in GπG^\pi, i.e. the optypes of the minIR graph GG along with the optypes of the split operations. This defines a second map σ2:(c(Gπ),contract)Gπ\sigma_2: (c(G^\pi), contract) \mapsto G^\pi that is the inverse of the path-split graph contraction transformation c()c(\cdot). In summary, we have the composition (c(Gπ),contract,split)σ2×Id(Gπ,split)σ1G.(c(G^\pi), contract, split) \xrightarrow{\sigma_2 \times Id} (G^\pi, split) \xrightarrow{\sigma_1} G.

Contracted PSGs are particularly useful for the study of the asymptotic complexity of the pattern matching algorithm we propose, as they have a very regular structure. This is expressed by the following proposition that further extends the statement of Proposition 4.3:

Proposition 4.4Contracted PSG
Consider a PSG GπG^\pi of a graph GG. There is a root operation rOr \in O such that the rooted dual tree of the contracted PSG τr(c(Gπ))\tau_r(c(G^\pi)) is a ternary tree with width(G)1width(G) - 1 nodes.

That the tree is ternary follows from Proposition 4.3. Every node of the tree corresponds to an operation in c(Gπ)c(G^\pi), which is on exactly two linear paths. As a result of acyclicity of the tree, a tree of kk nodes spans k+1k+1 linear paths – and hence, we conclude k=width(G)1k = width(G) - 1.

We conclude the construction presented in this section with the following result, expressing graph pattern matching in terms of tree equality:

Proposition 4.5Reduction to Tree Pattern matching

Let PP be a pattern graph and GG a graph. Let PπP^\pi be a PSG of GG. There is an embedding PGP \hookrightarrow G if and only if there is HGH \subseteq G and a PSG HπH^{\pi'} of HH such that

τ(c(Pπ))=τ(c(Hπ))\tau(c(P^\pi)) = \tau(c(H^{\pi'}))

and the trees have equal weight maps splitsplit and contractcontract.

The proof of this follows directly from our construction, the unicity of trees under isomorphism and the bijection between the graphs P,HP, H and their cPSGs.

We have thus successfully reduced the problem of pattern matching to the problem of matching on trees. Given that the ordering of children of a node in a tree is fixed, checking trees for equality is a simple matter of checking node and weight equality, one node (and edge) at a time.

We conclude this section with a figure summarising the constructions we have presented.

A graph GGG, along with the path-split graph GπG^\piGπ, the contracted path-split graph c(Gπ)c(G^\pi)c(Gπ) and their rooted dual trees. The anchor operations are ddd (grey) and eee (red). The root of the rooted dual trees is eee.

A graph GG, along with the path-split graph GπG^\pi, the contracted path-split graph c(Gπ)c(G^\pi) and their rooted dual trees. The anchor operations are dd (grey) and ee (red). The root of the rooted dual trees is ee.


  1. It is a simple result from graph theory that such a set of edges always exists – it suffices to remove one edge from every cycle in the graph. ↩︎

  2. Because all contracted operations apply on a single, shared, linear path, they indeed form a string of operations. ↩︎

4.4. Canonicalising the tree reduction

The reduction of graph matching to ternary trees from the previous section is a big step towards an algorithm for graph matching. However, Proposition 4.5 is expressed in terms of existence of PSGs – it is as yet unclear how the trees can be constructed. This is the purpose of this section.

We introduce for this purpose a canonical, that is, invariant under isomorphism, choice of PSG GπG^\pi of GG. The result is a unique canonical transformation GGπc(Gπ)G \mapsto G^\pi \mapsto c(G^\pi) from GG to a cPSG that we can use for pattern matching.

We proceed by using the total order that we have defined on port labels and can be extended lexicographically to paths outgoing from a shared root operation (see section 4.2 for more details). Whenever more than one path from rr to oo exist in GG, it suffices to consider the smallest one. For a choice of root operation rO,r \in O, we thus obtain a total order of all operations OO in GG.

We then restrict our attention to operations on two linear paths and consider them in order. We keep track of linear paths that have been visited and proceed as follows to determine whether oOo \in O must be split:

  • if oo is on a linear path that was not seen before, it is left unchanged and the set of visited linear paths is updated;
  • otherwise, i.e. oo is on two linear paths that have already been visited, the operation is split, resulting in two operations on a single linear path.

The pseudocode CanonicalPathSplit implements this algorithm. We use Operations(G) to retrieve all the operations on the graph G and LinearPaths(G, op) to retrieve the linear paths of the operation op. The linear paths are identified using integer indices that can be pre-computed and stored in linear time in the graph size. SplitOperation(G, op) returns the graph resulting from splitting op into two operations on a single linear path. Finally, PathAsPortLabels(G, root, v) returns the string of the port labels that encode the path from root to v in the graph G. The strings are ordered lexicographically. The non-capitalized functions set, union, sort1, len, and issubset have their standard meanings.

 1def CanonicalPathSplit(G: Graph, root: Operation) -> Graph:
 2  new_G := G
 3  all_operations := Operations(G)
 4  sorted_operations := sort(
 5      all_operations,
 6      sort_key= lambda v: PathAsPortLabels(G, root, v)
 7  )
 8
 9  # keep track of the visited linear paths
10  seen_paths := set()
11  for op in sorted_operations:
12    # Get the (pre-computed) indices of the linear paths
13    op_linear_paths := LinearPaths(G, op)
14    if len(op_linear_paths) == 2:
15      if issubset(op_linear_paths, seen_paths):
16        # The two linear paths of `op` are already visited
17        new_G = SplitOperation(new_G, op)
18      else:
19        # Mark the new linear paths as visited
20        seen_paths = union(seen_paths, op_linear_paths)
21  return new_G

The following figure shows an example of splitting a graph into its canonical PSG using CanonicalPathSplit.

Splitting a graph into its canonical PSG. Ports are ordered counter-clockwise on each edge, and numbered according to the lexicographic order of the paths from root to the port, as returned by PathAsPortLabels. This induces an order on the hyperedges, reflected in the alphabetic order of the edge labels. Linear paths are formed by ports in a horizontal line (as marked by the dotted lines). Vertex root is chosen as the root of the canoncal splitting. Vertices d and g are not split because they are the smallest edges that contain the fourth, respectively first linear path.

Splitting a graph into its canonical PSG. Ports are ordered counter-clockwise on each edge, and numbered according to the lexicographic order of the paths from root to the port, as returned by PathAsPortLabels. This induces an order on the hyperedges, reflected in the alphabetic order of the edge labels. Linear paths are formed by ports in a horizontal line (as marked by the dotted lines). Vertex root is chosen as the root of the canoncal splitting. Vertices d and g are not split because they are the smallest edges that contain the fourth, respectively first linear path.

Proposition 4.7Correctness of CanonicalPathSplit
For a graph GG, the graph returned by CanonicalPathSplit(G) is a valid PSG of G.G. It is deterministic and invariant under isomorphism of GG. The runtime of CanonicalPathSplit is O(G)O(|G|), where G|G| is the number of operations in the graph GG.

Let GπG^\pi be the graph returned by CanonicalPathSplit(G). From the discussion in the proof of Proposition 4.6, we know it is sufficient to show that the interaction graph I\mathcal{I} of GπG^\pi is acyclic and connected.

I\mathcal{I} is acyclic. If there was a cycle in I\mathcal{I}, then there would be operations o0,,ok1o_0, \dots, o_{k-1} in GG that pairwise (oi,oi+1modk)(o_i, o_{i+1\, mod\, k}) share a linear path. One of these operations must be considered last in the for loop of lines 11–20, suppose it is ok1o_{k-1}. But every linear path of ok1o_{k-1} is either also a linear path of ok2o_{k-2} or a linear path of o1o_{1}: ok1o_{k-1} thus does not satisfy the condition on line 15, and thus cannot be in I\mathcal{I}, a contradiction. Hence I\mathcal{I} is acyclic.

I\mathcal{I} is connected. We proceed inductively to show the following invariant for the main for loop (lines 11–20): for all linear paths in seen_paths, there is a path in I\mathcal{I} to a linear path of the root operation. seen_paths is only modified on line 20. If op is the root operation, then trivially there is a path from the linear paths op_linear_paths to a linear path of the root operation. Otherwise, we claim that there must be one of the paths in op_linear_paths that is already in seen_paths. From there it follows that there is a path in I\mathcal{I} from the root path to the unseen linear path, given by the path to the linear path in seen_path followed by the edge in I\mathcal{I} that corresponds to op.

By connectedness of GG, there is a path from the root operation to op. The path is not empty because op is not the root operation, so we can consider the prefix of the path of all operations excluding op. Call op' the last operation preceding op and op_linear_paths' its linear paths. Two successive operations on a path must share a linear path: op_linear_paths \cap op_linear_paths' cannot be empty. According to line 4, op' must have been visited before op, thus op_linear_paths' \subseteq seen_paths. It follows that at least one element of op_linear_paths must be in seen_paths.

Determinstic and isomorphism invariant. The pseudocode above is deterministic and only depends on paths in GG encoded as strings of port labels, which are invariant under isomorphism.

Runtime complexity. Lines 2 and 3 run in O(G)O(|G|) time. With the exception of the sort function on lines 4–7, every other line can be run in O(1)O(1) time:

  • lines 13 and 15 run in constant time because the size of op_linear_paths is always at most 2;
  • line 20 (and the in check on line 15) can be run in constant time by representing the seen_paths set as a fixed-size boolean array of size ww, with the ii-th bit indicating whether the ii-th linear path has been seen;
  • line 17 is a constant time transformation if we allow in-place modification of new_G.

The for loop will run G|G| iterations, for a total of O(G)O(|G|) runtime. Finally, the sorting operation would naively take time O(GlogG)O(|G| \log |G|). However, given that the ordering is obtained lexicographically from the paths starting at the root, we can obtain the sorted list of operations by depth-first traversal of the graph starting at the root. The result follows.

Using CanonicalPathSplit, we can now sketch what the pattern matching algorithm should look like. For each pattern, we first compute their canonical PSG for an arbirary choice of pattern root operation; then, given a graph GG, we can find all embeddings of patterns into GG by iterating over all possible PSGs within GG. Naively, this involves enumerating all posible subgraphs of GG, and then for each of them, iterating over all possible root choices.

This can be significantly sped up by realising that many of the PSGs that are computed when iterating over all possible subgraphs and root choices are redundant2. We will see in the next section that we can i) iterate once over all possible root choices rr in GG and ii) introduce a new procedure AllPathSplits that will efficiently enumerate all possible rooted ual trees of PSGs that are rooted in rr for subgraphs within GG. In the process, we will also see that we can replace the tree equality check of line 12 with a subtree inclusion check, further reducing the number of PSGs that must be considered.

Naive pattern matching.

 1# Precompute all PSGs
 2allT = [CanonicalPathSplit(
 3    P, root_P
 4) for (P, root_P) in patterns]
 5
 6for S in Subgraphs(G):
 7  for root_S in Operations(S):
 8    TG = CanonicalPathSplit(
 9        S, root_S
10    )
11    for T in allT:
12      if T == TG:
13        yield T

Improved using AllPathSplits (section 4.5).

 1# Precompute all PSGs
 2allT = [CanonicalPathSplit(
 3    P, root_P
 4) for (P, root_P) in patterns]
 5
 6for root_G in Operations(G):
 7  for TG in AllPathSplits(
 8      G, root_G
 9  ):
10    for T in allT
11      # Replace == with subtree
12      if IsSubTree(T, TG)
13        yield T

  1. The sort_key parameter of the sort function defines the total order according to which the elements are sorted, from smallest to largest. ↩︎

  2. Think for example of the same root operation rr that is considered repeatedly for every overlapping subgraph of GG that contains rr↩︎

4.5. Enumerating all path-split graphs

The CanonicalPathSplit procedure in the previous section defines for all graphs GG and choice of root operation rr a canonical PSG GπG^\pi, and thus a canonical set of anchors π\pi that we write as πr(G)=π.\pi_r(G) = \pi.

Instead of CanonicalPathSplit, we can equivalently consider a CanonicalAnchors procedure, which computes πr(G)\pi_r(G) directly instead of the graph Gπr(G)G^{\pi_r(G)}.

We formulate this computation below, using recursion instead of a for loop. This form generalises better to the AllAnchors procedure that we will introduce next.

The equivalence of the CanonicalAnchors procedure with CanonicalPathSplit follows from the observation made in section 4.2 that ordering operations in lexicographic order of the port labels is equivalent to a depth-first traversal of the graph.

CanonicalAnchors implements a recursive depth-first traversal (DFS), with the twist that the recursion is explicit only on the anchor nodes and otherwise relying on the lexicographic ordering just like in CanonicalPathSplit: lines 5–15 of CanonicalAnchors correspond to the iterations of the for loop (line 11–20) of CanonicalPathSplit until an anchor operation is found (i.e. the else branch on lines 18–20 is executed). From there, the graph traversal proceeds recursively.

We introduce the ConnectedComponent, Neighbours and RemoveOperation procedures; the first returns the connected component of the current operation, whereas the other two procedures are used to traverse, respectively modify, the graph GG. Importantly, Neighbours(G, op) returns the neighbours of op ordered by port label order.

To ensure that the recursive DFS does not visit the same operation twice, we modify the graph with RemoveOperation on lines 11 and 15, ensuring that no visited operation remains in G. As a consequence, CanonicalAnchors may be called on disconnected graphs, which explains why an additional call to ConnectedComponent (line 4) is required.

Proposition 4.8Equivalence of CanonicalPathSplit and CanonicalAnchors

Let GG be a connected graph and let rr be a root operation in GG. Then CanonicalAnchors maps the graph to the canonical anchor set:

(G,r,{})(π(G)r,L,),(G, r, \{\}) \mapsto (\pi(G)_r, L, \varnothing),

where LL is the set of all paths in GG and \varnothing designates the empty graph.

The proof follows directly from the previous paragraphs.

 1def CanonicalAnchors(
 2    G: Graph, root: Operation, seen_paths: Set[int]
 3) -> (Set[Operation], Set[int], Graph):
 4  operations = Operations(ConnectedComponent(G, root))
 5  # sort by PathAsPortLabels, as previously
 6  sorted_operations := sort(operations)
 7  operations_queue := queue(sorted_operations)
 8
 9  # Skip all operations that are not anchors
10  op := operations_queue.pop() # never emtpy, contains root
11  G = RemoveOperation(G, op)
12  while len(LinearPaths(G, op)) == 1 or
13        issubset(LinearPaths(G, op), seen_paths):
14    op = operations_queue.pop() or return ({}, {}, G)
15    G = RemoveOperation(G, op)
16
17  # op is anchor, update seen_paths and recurse
18  seen_paths = union(seen_paths, LinearPaths(G, op))
19  anchors := [op]
20  # sort by port labels
21  for child in Neighbours(G, op):
22    (new_anchors, seen_paths, G) = CanonicalAnchors(
23        G, child, seen_paths
24    )
25    anchors += new_anchors
26
27  return (anchors, seen_paths, G)

Maximal PSGs #

In addition to “simplifying” the data required to define path splitting, the definition of PSGs using anchor operations has another advantage that is fundamental to the pattern matching algorithm.

Consider the rooted dual tree τr(Gπ)\tau_r(G^\pi) of a PSG with root operation rr in GπG^\pi. Recall that tree nodes are uniquely identified by their path from the root and thus are considered equal if they are isomorphic as graphs. We can in the same way define a tree inclusion relation \subseteq on rooted dual trees that corresponds to checking that the trees have the same root and that the left-hand side is isomorphic to a subtree of the right-hand side. We also require that the operation weights given by the splitsplit map splitsplit map coincide on the common subtree.

Proposition 4.9Maximal PSG

Let GG be a connected graph, π\pi a set of operations in GG and rπr \in \pi a root operation. Consider the set Gπ={HGπr(H)=π}.\mathcal{G}_\pi = \{H \subseteq G \mid \pi_r(H) = \pi \}.

There is a subgraph MGM \subseteq G such that for all subgraphs HGXH \in \mathcal{G}_X: HMH \subseteq M. Furthermore, for all graph PP, there is rr' and π=π(P)r\pi' = \pi(P)_{r'} such that

PHGXτr(Pπ)τr(Mπ).P \simeq H \in \mathcal{G}_X \quad\Leftrightarrow\quad \tau_{r'}(P^{\pi'}) \subseteq \tau_r(M^\pi).

We call MπM^\pi the maximal PSG with anchors π\pi in GG.

The proof gives an explicit construction for MM.

Assume GX\mathcal{G}_X \neq \varnothing, otherwise the statement is trivial.

Construction of MM. Let LL be the set of linear paths in GG that go through at least one operation in π\pi. Consider the set of operations OLO_L in GG given by the operations whose linear paths are contained in LL. This defines a subgraph GOLG|_{O_L} of GG. Since GX\mathcal{G}_X \neq \varnothing, there exists HGXH \in \mathcal{G}_X. By assumption, HH is connected, and thus the anchors π\pi of HH are connected in HH. There is therefore a connected component MGOLM \subseteq G|_{O_L} that contains the set π\pi.

Well-definedness of MπM^\pi. Consider the PSG MπM^\pi of MM. We must show that MπM^\pi is a tree-like graph for the proposition statement to be well-defined. In other words, we must show that the interaction graph I\mathcal{I} of MπM^\pi is acyclic and connected. MM is connected by construction, which implies connectedness of MπM^\pi and thus of I\mathcal{I}. It is acyclic because width(M)=π+1width(M) = |\pi| + 1 and MM has exactly π|\pi| operations on more than one linear path. I\mathcal{I} is a thus a tree.

HMH \subseteq M. For any subgraph HGXH \in \mathcal{G}_X, its operations must be contained in OLO_L. Since any HGH \in \mathcal{G} is connected and contains π\pi, it must further hold that HMH \subseteq M.

We can now prove the \Leftrightarrow equivalence of (2).

\Leftarrow: If τr(Pπ)τr(Mπ)\tau_{r'}(P^{\pi'}) \subseteq \tau_r(M^\pi), then there exists HMπH' \subseteq M^\pi with rooted dual tree

τr(Hπ)=τr(Pπ).\tau_r(H^\pi) = \tau_r(P^{\pi'}).

Furthermore, by definition of \subseteq on rooted trees, a splitsplit map is defined on HH', given by the splitsplit map of MπM^\pi on the domain HH'. Recall from section 4.3 that there is a map σ\sigma that maps (H,split)σHand(Mπ,split)σM.(H', split) \overset{\sigma}{\longmapsto} H\quad\textrm{and}\quad(M^\pi, split) \overset{\sigma}{\longmapsto} M. It merges split operations pairwise, and thus it is immediate that HMπH' \subseteq M^\pi implies HMH \subseteq M. Thus HGXH \in \mathcal{G}_X and H=HπH' = H^\pi. By construction, one can also derive that PHP \simeq H. The statement follows.

\Rightarrow: Since HGXH \in \mathcal{G}_X, we know from point 1 that HMH \subseteq M. Thus we can define an injective embedding φ:PM\varphi: P \to M.

Operation splitting leaves the set of values from HH to HπH^\pi, as well as from MM to MπM^\pi unchanged. Similarly, there is a bijection between values in HπH^\pi and MπM^\pi and thus between edges in τr(Hπ)\tau_r(H^\pi) and τr(Mπ)\tau_r(M^\pi). The pattern embedding φ\varphi hence defines an injective map ϕE\phi_E from tree edges in τr(Hπ)\tau_r(H^\pi) to tree edges in τr(Mπ)\tau_r(M^\pi). We extend this map to a map on the trees ϕ:τr(Hπ)τr(Mπ)\phi: \tau_r(H^\pi) \to \tau_r(M^\pi) by induction over the nodes set of τr(Hπ)\tau_r(H^\pi). We start by the root map ϕ(r)=r\phi(r) = r. Using ϕE\phi_E, we can then uniquely define the image of any child node of rr in τr(Hπ)\tau_r(H^\pi), and so forth inductively.

We show now that the map ϕ\phi thus defined is injective. Suppose v,vv, v' are nodes in τr(Hπ)\tau_r(H^\pi) such that ϕ(v)=ϕ(v)\phi(v) = \phi(v'). By the inductive construction there are paths from the root rr to vv and vv' respectively such that their image under ϕE\phi_E are two paths from rr to ϕ(v)=ϕ(v)\phi(v) = \phi(v'). But τr(Mπ)\tau_r(M^\pi) is a tree, so both paths must be equal. By bijectivity of ϕE\phi_E, it follows v=vv = v', and thus ϕ\phi is injective. Finally, the value and operation weights are invariant under pattern embedding and thus are preserved by definition.

This result means that instead of listing all PSGs for every possible subgraph of GG, it is sufficient to proceed as follows:

  1. for every pattern PP, fix a root operation rPr_P and construct the rooted tree dual of the canonical PSG τP:=τrP(PπrP(P)).\tau_P := \tau_{r_P}(P^{\pi_{r_P}(P)}).
  2. enumerate every possible root operation rr in GG,
  3. enumerate every possible sets of anchors π\pi in GG with root rr,
  4. for each set π\pi, find the maximal PSG MM with anchors π\pi in GG, and take its rooted tree dual τM:=τr(Mπ)\tau_M := \tau_r(M^\pi),
  5. find all patterns PP such that τPτM\tau_P \subseteq \tau_M.

In other words, if AllAnchors is a procedure that enumerates all possible sets of anchors π\pi in GG and MaximalPathSplit computes the maximal PSG MM as presented in the proof of Proposition 4.9, then AllPathSplits(G) can simply be obtained by calling AllAnchors and then returning their respective maximal PSGs in GG:

def AllPathSplits(G: Graph, root: Operation) -> Set[Graph]:
  all_anchors = AllAnchors(G, root)
  return {MaximalPathSplit(G, pi) for pi in all_anchors}

The missing piece: AllAnchors #

We can now complete the definition of AllPathSplits by defining the AllAnchors procedure, which enumerates all possible sets of anchors in GG given a root operation rr.

The procedure is similar to CanonicalAnchors, described in detail in the previous paragraphs. In addition to the arguments of CanonicalAnchors, AllAnchors requires a width w1w \geq 1 argument. It then returns all sets of at most ww operations1 that form the canonical anchors of some width-ww subgraph of GG with root rr. The main difference between CanonicalAnchors and AllAnchors is that the successive recursive calls (line 22 in CanonicalAnchors) are replaced by a series of nested loops (lines 42–48 in AllAnchors) that exhaustively iterate over the possible outcomes for different subgraphs of GG. The results of every possible combination of recursive calls are then collected into a list of anchor sets, which is returned.

The part of the pseudocode that is without comments is unchanged from CanonicalAnchors. Using Proposition 4.3, we know that we can assume that every operation has at most 3 children, and thus 3 neighbours in G, given that the operations equivalent to parent nodes were removed.

 1def AllAnchors(
 2    G: Graph, root: Operation, w: int,
 3    seen_paths: Set[int] = {}
 4) -> List[(Set[Operation], Set[int], Graph)]:
 5  # Base case: return one empty anchor list
 6  if w == 0:
 7    return [({}, {}, G)]
 8
 9  operations = Operations(ConnectedComponent(G, root))
10  sorted_operations := sort(operations)
11  operations_queue := queue(sorted_operations)
12
13  op := operations_queue.pop()
14  G = RemoveOperation(G, op)
15  while len(LinearPaths(G, op)) == 1 or
16        issubset(LinearPaths(G, op), seen_paths):
17    op = operations_queue.pop() or return [({}, {}, G)]
18    G = RemoveOperation(G, op)
19
20  seen0 = union(seen_paths, LinearPaths(G, op))
21  # There are always at most three neighbours: we
22  # unroll the for loop of CanonicalAnchors.
23  [child1, child2, child3] = Neighbours(G, op)
24  # Iterate over all ways to split w-1 anchors over
25  # the three children and solve recursively
26  all_anchors = []
27  for 0 <= w1, w2, w3 < w with w1 + w2 + w3 == w - 1:
28    for (anchors1, seen1, G1) in
29        AllAnchors(G, child1, w1, seen0):
30      for (anchors2, seen2, G2) in
31          AllAnchors(G1, child2, w2, seen1):
32        for (anchors3, seen3, G3) in
33            AllAnchors(G2, child3, w3, seen2):
34          # Concatenate new anchor with anchors from all paths
35          anchors = union([op], anchors1, anchors2, anchors3)
36          all_anchors.push((anchors, seen3, G3))
37  return all_anchors

We can represent the sequence of recursive calls to AllAnchors as a tree. The call tree for the graph used as example to illustrate CanonicalAnchors earlier is given on the next page.

We now show correctness of the procedure. Let us write Πrw(G)\Pi_r^w(G) for the set of sets of anchors returned by AllAnchors(G, r, w, {}).

Proposition 4.11Correctness of AllAnchors

Let GG be a graph and HGH \subseteq G be a subgraph of GG of width ww. Let rr be a choice of root operation in HH. We have πr(H)Πrw(G).\pi_r(H) \subseteq \Pi_r^w(G).

A call tree for an execution of AllAnchors on the example graph of the previous figure with w=3w = 3w=3. Starting from the root, each node in the tree corresponds to either picking an operation as anchor or not (thus splitting it). Edges are labelled by the values assigned to www for the respective children of the source node. One path from root to leaf leads to no solution (it is impossible to find an unseen linear path from operation bbb. The other paths each lead to a valid set of three anchors.

A call tree for an execution of AllAnchors on the example graph of the previous figure with w=3w = 3. Starting from the root, each node in the tree corresponds to either picking an operation as anchor or not (thus splitting it). Edges are labelled by the values assigned to ww for the respective children of the source node. One path from root to leaf leads to no solution (it is impossible to find an unseen linear path from operation bb. The other paths each lead to a valid set of three anchors.

The proof is by induction over the width ww of the subgraph HH. The idea is to map every recursive call in CanonicalAnchors to one of the calls to AllAnchors on lines 29, 31 or 33. All recursive results are concatenated on line 36, and thus, the value returned by CanonicalAnchors will be one of the anchor sets in the list returned by AllAnchors.

Let HGH \subseteq G be a connected subgraph of GG of width ww. We prove inductively over ww that if (X,S,H)=(X, S', H') = CanonicalAnchors$(H, r,S)$ then there is a graph GG' such that HGGH' \subseteq G' \subseteq G such that

(X,S,G)(X, S', G') \in AllAnchors(G,r,w,S)(G, r, w, S)

for all valid root operations rr of HH and all subsets of the linear paths of HH in seen_paths. The statement in the proposition directly follows this claim.

For the base case w=1w = 1, CanonicalAnchors will return the anchors anchors = [op] as defined on line 19: there is only one linear path, and it is already in seen_paths, thus for every recursive call to CanonicalAnchors, the while condition on line 12 will always be satisfied until all operations have been exhausted and empty sets are returned. In AllAnchors, on the other hand, The only values of w1, w2 and w3 that satisfy the loop condition on line 27 for w=1w = 1 are w1 == w2 == w3 =0= 0. As a result, given the w =0=0 base case on lines 6–7, the lines 35 and 36 of AllAnchors are only executed once, and the definition of anchors on line 36 is equivalent to its definition in CanonicalAnchors.

We now prove the claim for w>1w > 1 by induction. As documented in AllAnchors, we can assume that every operation has at most 3 children. This simplifies the loop on lines 21–25 of CanonicalAnchors to, at most, three calls to CanonicalAnchors.

Consider a call to CanonicalAnchors for a graph HGH \subseteq G, a root operation rr in HH and a set SS of linear paths. Let waw_a, wbw_b and wcw_c be the length of the values returned by the three recursive calls to CanonicalAnchors of line 22 for the execution of CanonicalAnchors with arguments HH, rr and SS. Let ca,cbc_a, c_b and ccc_c be the three neighbours of rr in HH. If the child cxc_x does not exist, then one can set wx=0w_x = 0 and it can be ignored – the argument below still holds in that case. The definition of seen0 on line 20 in AllAnchors coincides with the update to the variable seen_paths on line 18 of CanonicalAnchors; similarly, the updates to G on lines 14 and 18 of AllAnchors are identical to the lines 11 and 15 of CanonicalAnchors that update H. Let the updated seen_paths be the set SaS_a, the updated G be GaG_a and the updated HH be HaH_a, with HaGaH_a \subseteq G_a.

As every anchor operation reduces the number of unseen linear paths by exactly one (using the simplifying assumptions of section 4.2), it must hold that wa+wb+wc+1=ww_a + w_b + w_c + 1 = w. Thus, for a call to AllAnchors with the arguments GG, rr, ww and SS, there is an iteration of the for loop on line 27 of AllAnchors such that w1 =wa= w_a, w2 =wb= w_b and w3 =wc= w_c. It follows that on line 29 of AllAnchors, the procedure is called recursively with arguments (Ga,ca,wa,Sa)(G_a, c_a, w_a, S_a). From the induction hypothesis, we obtain that there is an iteration of the for loop on line 29 in which the values of anchors1 and seen1 coincide with the values of the new_anchors and seen_paths variables after the first iteration of the for loop on line 21 of CanonicalAnchors. Call the value of seen1 (and seen_paths) SbS_b. Similarly, call the updated value of G in AllAnchors GbG_b and the updated value of G in CanonicalAnchors HbH_b. We have, by the induction hypothesis, that HbGbH_b \subseteq G_b.

Repeating the argument, we obtain that there are iterations of the for loops on lines 30 and 32 of AllAnchors that correspond to the second and third recursive calls to CanonicalAnchors on line 22 of the procedure. Finally, the concatenation of anchor lists on line 36 of AllAnchors is equivalent to the repeated concatenations on line 25 of CanonicalAnchors, and so we conclude that the induction hypothesis holds for ww.

We will see that the overall runtime complexity of AllAnchors can be easily derived from a bound on the size of the returned list. For this, we use the following result:

Proposition 4.10Number of anchor sets in AllAnchors
For a graph GG, a root operation rr in GG and 1wwidth(G)1 \leq w \leq width(G), the length of the list AllAnchors(G,r,w)(G, r, w) is in O(cww3/2)O(c^w \cdot w^{-3/2}), where c=6.75c = 6.75 is a constant.

Let CwC_w be an upper bound for the length of the list returned by a call to AllAnchors for width ww. For the base case w=0w = 0, C0=1C_0 = 1. The returned all_anchors list is obtained by pushing anchor lists one by one on line 36. We can count the number of times this line is executed by multiplying the length of the lists returned by the recursive calls on lines 28–32, giving us the recursion relation

Cw0w1,w2,w3<ww1+w2+w3=w1Cw1Cw2Cw3.C_w \leq \sum_{\substack{0 \leq w_1, w_2, w_3 < w\\w_1 + w_2 + w_3 = w - 1}} C_{w_1} \cdot C_{w_2} \cdot C_{w_3}.

Since CwC_w is meant to be an upper bound, we replace \leq with equality above to obtain a recurrence relation for CwC_w. This recurrence relation is a generalisation of the well-known Catalan numbers Stanley, 2015Richard P. Stanley. 2015. Catalan Numbers. Cambridge University Press. doi: 10.1017/CBO9781139871495, equivalent to counting the number of ternary trees with ww internal nodes: a ternary tree with w1w \geq 1 internal nodes is made of a root along with three subtrees with w1,w2w_1,w_2 and w3w_3 internal nodes respectively, with w1+w2+w3=w1w_1 + w_2 + w_3 = w-1. A closed form solution to this problem can be found in Aval, 2008Jean-Christophe Aval. 2008. Multivariate Fuss–Catalan numbers. Discrete Mathematics 308, 20 (October 2008, 4660–4669). doi: 10.1016/j.disc.2007.08.100​:

Cw=(3ww)2w+1=Θ(cww3/2)C_w = \frac{{3w \choose w}}{2w + 1} = \Theta \left(\frac{c^w}{w^{3/2}} \right)

satisfying the above recurrence relation with equality, where c=27/4=6.75c = 27/4 = 6.75 is a constant obtained from the Stirling approximation:

(3ww)=(3w)!(2w)!w!=Θ(1w)((3w)3e3)w(e2(2w)2)w(ew)w=Θ((27/4)ww1/2).\begin{aligned}{3w \choose w} = \frac{(3w)!}{(2w)!w!} &= \Theta\left(\frac{1}{\sqrt{w}}\right) \Big(\frac{(3w)^3}{e^3}\Big)^{w}\Big(\frac{e^2}{(2w)^2}\Big)^{w}\Big(\frac{e}{w}\Big)^{w}\\ &= \Theta\left(\frac{(27/4)^w}{w^{1/2}}\right).\end{aligned}

To obtain a runtime bound for AllAnchors, it is useful to identify how much of GG needs to be traversed. If we suppose all patterns have at most depth dd, then it immediately follows that any operation in GG that is in the image of a pattern embedding must be at most a distance dd away from an anchor operation. We can thus equivalently call AllAnchors on a subgraph of GG such that no linear path is longer than 2d2d. We thus obtain the following runtime.

Proposition 4.12Runtime of AllAnchors

For patterns with at most width ww and depth dd, the total runtime of AllAnchors is in O(cwdw1/2).O\left(\frac{c^w \cdot d}{w^{1/2}}\right).

We restrict Operations on line 9 to only return the first dd operations on the linear path in each direction, starting at the anchor operation: operations more than distance dd away from the anchor cannot be part of a pattern of depth dd.

We use the bound on the length of the list returned by calls to AllAnchors of Proposition 4.10 to bound the runtime. We can ignore the non-constant runtime of the concatenation of the outputs of recursive calls on line 35, as the total size of the outputs is asymptotically at worst of the same complexity as the runtime of the recursive calls themselves. Excluding the recursive calls, the only remaining lines of AllAnchors that are not executed in constant time are the while loop on lines 15–18 and the Operations and sort calls on lines 9–11. Using the same argument as in CanonicalAnchors, we can ignore the latter two calls by replacing the queue of operations by a lazy iterator of operations. The next operation given op and the graph G can always be computed in O(1)O(1) time using a depth-first traversal of G.

Consider the recursion tree of AllAnchors, i.e. the tree in which the nodes are the recursive calls to AllAnchors and the children are the executions spawned by the nested for loops on line 28–32. This tree has at most

Cw=Θ(cww3/2)C_w = \Theta\left(\frac{c^w}{w^{3/2}}\right)

leaves. A path from the root to a leaf corresponds to a stack of recursive calls to AllAnchors. Along this recursion path, seen_paths set is always strictly growing (line 35) and the operations removed from G on lines 14 and 18 are all distinct. For each linear path, at most 2d2d operations are traversed. Thus the total runtime of the while loop (lines 15–18) along a path from root to leaf in the recursion tree is in O(wd)O(w \cdot d). We can thus bound the overall complexity of executing the entire recursion tree by O(Cwwd)=O(cwdw1/2)O(C_w \cdot w \cdot d) = O(\frac{c^w \cdot d}{w^{1/2}}).


  1. Every anchor operation is on at least one previously unseen linear path, thus there can be at most ww operations in the set of anchors. ↩︎

4.6. An automaton for multi-pattern matching

We have shown in the previous sections that graph pattern-matching can be reduced to a problem of tree inclusions, with trees of fixed width ww. To complete the pattern-matching algorithm, we must provide a fast way to evaluate the subtree relation for many trees representing the set of all patterns we wish to match.

More precisely, for patterns P1,,PP_1, \dots, P_\ell with width ww, fix a root operation rir_i in PiP_i for each 1i1 \leqslant i \leqslant \ell and consider the rooted tree duals of the canonical PSGs τri(Piπi)\tau_{r_i}(P_i^{\pi_i}), with πi=πri(Pi)\pi_i = \pi_{r_i}(P_i) the canonical anchors. Then given a subject graph GG, we wish to compute the set

{1iτri(Piπi)τr(Gπ)},\{1 \leqslant i \leqslant \ell \mid \tau_{r_i}(P_i^{\pi_i}) \subseteq \tau_r(G^\pi)\},

for all anchor sets πΠrw(G)\pi \in \Pi_r^w(G) and root operation rr in GG. This corresponds to the IsSubTree predicate introduced in the sketch of the algorith in section 4.4.

Instead of considering the trees of PSGs, it will prove easier to consider the contracted PSGs (cPSGs)

τri(c(Piπi))andτr(c(Gπ)).\tau_{r_i}(c(P_i^{\pi_i}))\quad\textrm{and}\quad \tau_r(c(G^\pi)).

Such tree inclusions are equivalent to finding embeddings in the subject graph itself, provided that we keep track of the splitsplit and contractcontract weight maps (see section 4.3).

It will be useful to remind ourselves the following properties of contracted PSGs. Every operation of a cPSG (and thus every node in its rooted dual tree) is an anchor operation of the PSG. Per Proposition 4.4, the rooted dual tree of a cPSG is a ternary tree and has exactly width(G)1width(G) - 1 nodes. Finally, recall the concept of an open value of a graph, i.e. a value that is missing either a use or define operation (see section 4.2).

Reduction of tree inclusion to string prefix matching #

Now consider two contracted spanning tree reductions c(G1π1)c(G_1^{\pi_1}) and c(G2π2)c(G_2^{\pi_2}) with values V1V_1 and V2V_2. To simplify notation, define

τ1=τr1(c(G1π1))andτ2=τr2(c(G2π2))\tau_1 = \tau_{r_1}(c(G_1^{\pi_1})) \quad\textrm{and}\quad \tau_2 = \tau_{r_2}(c(G_2^{\pi_2}))

for some choice of root operations r1r_1 and r2r_2 in G1G_1 and G2G_2, respectively. We lift the \subseteq relation on rooted dual trees of PSGs introduced in section 4.5 to rooted dual trees of cPSGs in Such a way that there is an inclusion relation between two rooted dual trees of PSGs if and only if the same relation holds on the rooted duals of cPSGs.

We say that τ1τ2\tau_1 \subseteq \tau_2 if and only if

  • the trees share the same root operation,
  • τ1\tau_1 is a subtree of τ2\tau_2,
  • the spiltspilt map coincides on the common subtree, and
  • the contractcontract map satisfies for all vV1v \in V_1: {contract(v)contract(f(v))if v is an open value,contract(v)=contract(f(v))otherwise,\begin{cases}contract(v) \subseteq contract(f(v))\quad&\textrm{if }v\textrm{ is an open value},\\contract(v) = contract(f(v))\quad&\textrm{otherwise},\\\end{cases}

where f:V1V2f: V_1 \hookrightarrow V_2 designates the embedding of V1V_1 into V2V_2 given by the tree embedding.

The first three conditions are taken as-is from the \subseteq relation on non-contracted trees, whilst the fourth condition on the contractcontract map is specific to contracted trees.

Using Proposition 4.2, there are at most 2 open values for each linear path in the graph, and thus at most 2w2 \cdot w open values in a rooted dual tree of a cPSG of width ww. For each such contracted rooted dual, we can thus define a contracted string tuple S=(s1,,s2w)(O)2wS = (s_1, \dots, s_{2w}) \in (O^\ast)^{2w} given by the values of the contractcontract map evaluated in the (up to) 2w2w open values1.

If contractCcontract|_C is the restriction of contractcontract to the domain of definition of non-open values of a cPSG, the fourth condition for the inclusion relation \subseteq on rooted dual cPSGs, given above becomes an equality condition when restricted to non-open values. A special case of this property of particular interest to us is stated as the following result. The \subseteq relation on strings refers to prefix inclusion, i.e. sts \subseteq t if and only if ss is a prefix of tt.

Proposition 4.14Inclusion of equal-width trees

Let S=(s1,,s2w)S = (s_1, \dots, s_{2w}) and T=(t1,,t2w)(O)2wT = (t_1, \dots, t_{2w}) \in (O^\ast)^{2w} be the contracted string tuples of τ1\tau_1 and τ2\tau_2 respectively. Then τ1τ2\tau_1 \subseteq \tau_2 if and only if the trees share the same root, are isomorphic, have the same splitsplit and contractCcontract|_C maps and for all i{1,,2w}i \in \{1, \dots, 2w\}: sitis_i \subseteq t_i.

The proof of this follows directly from observing that rooted duals of cPSGs have the same set of nodes and that the restriction to non-open values contractCcontract|_C must satisfy equality.

Why restricting ourselves to trees of the same width ww? It is sufficient for our purposes! All patterns are of width ww by assumption and so are the rooted dual trees of the form τr(Gπ)\tau_r(G^\pi), given that πΠrw(G)\pi \in \Pi_r^w(G).

The string prefix matching problem is a simple computational task that can be generalised to check for multiple string patterns at the same time using a prefix tree. An overview of this problem can be found in appendix A. We can thus obtain a solution for the pattern matching problem for \ell patterns:

Proposition 4.15Fixed anchor pattern matching

As above, let

  • GG be a graph, with πΠrw(G)\pi \in \Pi_r^w(G) a set of w1w - 1 operations and rπr \in \pi a choice of root operation,
  • P1,,PP_1, \dots, P_\ell be patterns of width ww and depth dd, with choices of root operations r1,,rr_1, \dots, r_\ell and canonical anchors πi=πri(Pi).\pi_i = \pi_{r_i}(P_i).

The set of all pattern embeddings mapping the canonical anchor set πi\pi_i to π\pi and root rir_i to rr for 1i1 \leq i \leq \ell can be computed in time O(wd)O(w\cdot d) using at most \ell pre-computed prefix tree of size at most (d)w+1(\ell \cdot d)^w + 1, each constructed in time complexity O((d)w)O((\ell \cdot d)^w).

For each pattern, we consider its canonical spanning tree reduction and construct a multi-dimensional prefix tree (see Appendix ) for each group of patterns that share the same spanning tree reduction.

Given a graph GG, we can compute the cPSG of GG for anchors π\pi and map its rooted dual tree to the corresponding prefix tree. This can be done in O(TG)O(|T_G|) time by using a search tree. We can restrict GπG^\pi to a graph of size O(wd)O(w \cdot d) by truncating the linear paths to at most 2d2d length, as in the proof of Proposition 4.12. Thus we can assume GπO(wd)|G^\pi| \in O(w \cdot d).

The rest of the proof and the runtime follow from the multi-dimensional prefix tree construction detailed in Appendix ).

Combining everything #

Finally, putting Proposition 4.15 and Proposition 4.12 together, we obtain our main result.

Proposition 4.13Pattern matching

Let P1,,PP_1, \dots, P_\ell be patterns with width ww and depth dd. The pre-computation runs in time and space complexity

O((d)w+wd).O \left( (d\cdot \ell)^w \cdot \ell + \ell \cdot w \cdot d \right).

For any subject graph GG, the pre-computed prefix tree can be used to find all pattern embeddings PiGP_i \to G in time

O(Gcww1/2d)O \left( |G| \cdot \frac{c^w}{w^{1/2}} \cdot d \right)

where c=6.75c = 6.75 is a constant.

The pre-computation consists of running the CanonicalAnchors procedure on each of the \ell patterns and then transforming them into a map of prefix trees using Proposition 4.15. By Proposition 4.7, CanonicalAnchors runs in O(wd)O(w\cdot d) for each pattern, where we used that Piwd|P_i| \leqslant w \cdot d for all patterns. The total runtime of prefix construction is thus

O((d)w+wd).O \left( (d\cdot \ell)^w \cdot \ell + \ell \cdot w \cdot d \right).

The complexity of pattern matching itself on the other hand is composed of two parts: the computation of all possible anchor sets Πrw(G)\Pi_r^w(G), and the execution of the prefix string matcher for each of the trees resulting from these sets πΠrw(G)\pi \in \Pi_r^w(G). As AllAnchors must be run for every choice of root vertex rr in GG, the runtime is thus obtained by multiplying i) G|G| with ii) the runtime of the prefix tree matching (Proposition 4.15), and with iii) Πrw(G)|\Pi_r^w(G)|, i.e. the number of anchor lists returned by AllAnchors (Proposition 4.10):

O(GwdCw),O(|G| \cdot w \cdot d \cdot C_w ),

where CwC_w is the bound for the number of anchor lists returned by AllAnchors. The result follows.


  1. The values can be ordered as usual by using the total lexicographic order on port labels of the tree. ↩︎

4.7. Benchmarks

Proposition 4.13 shows that pattern-independent matching can scale to large datasets of patterns but imposes some restrictions on the patterns and embeddings that can be matched. In this section, we discuss these limitations and give empirical evidence that the pattern-matching approach we have presented can be used on a large scale and outperform existing solutions.

Pattern limitations #

In section 4.2, we imposed conditions on the pattern embeddings to obtain a complexity bound for pattern-independent matching. We argued how these restrictions are natural for applications in quantum computing, and most of the arguments will also hold for a much broader class of computation graphs.

In future work, it would nonetheless be of theoretical interest to explore the importance of these assumptions and their impact on the complexity of the problem. As a first step towards a generalisation, our implementation and all our benchmarks in this section do not make any of these simplifying assumptions. Our results below give empirical evidence that a significant performance advantage can be obtained regardless.

Implementation #

We provide an open-source implementation in Rust of pattern independent matching using the results of this chapter, available on GitHub. The code and datasets used for the benchmarks themselves are available in a dedicated repository.

The implementation works for weighted or unweighted port graphs – of which typed minIR graphs are a special case – and makes none of the simplifying assumptions employed in the theoretical analysis. Pattern matching proceeds in two phases: precomputation and runtime.

Precomputation.  In a first step, all graph patterns are processed and compiled into a single state automaton that will be used at runtime for fast pattern independent matching. The automaton in the implementation combines in one data structure two distinct computations of this chapter:

  • the recursive branching logic used in the AllAnchors procedure to enumerate all possible choices of anchors.
  • the automaton described in section 4.6 that matches patterns for a fixed set of anchors, and

The former is implemented with non-deterministic state transitions – each transition corresponding to choosing an additional anchor – , whereas the latter is implemented deterministically.

Concretely, the automaton is constructed by following the construction of section 4.4 to decompose each pattern into its canonical path-split graph. We then order the nodes of the PSG and express each node as a condition that ensures the connectivity and node weight in the graph matches the pattern. We thus obtain a chain of conditions, with a transition between any two consecutive conditions; transitions are deterministic by default and marked as non-deterministic whenever they lead to a condition on an anchor node. The state automaton for all patterns is then obtained by joining all chains of conditions into a tree.

Runtime.  Pattern matching is then as simple as simulating the state automaton, evaluating all conditions on the graph GG passed as input. The states in the automaton corresponding to the last condition of a pattern must be marked as end states, along with a label identifying the pattern that was matched. This can then be used at runtime to report all patterns found.

Our implementation has been tested for correctness, i.e. on the one hand that all matches that are reported are correct, and on the one hand that all pattern matches are found. This was done by comparing the matches of our implementation with the results obtained from matching every pattern separately on millions of randomly generated graphs and edge cases. We also ensured during benchmarking that the number of matches reported by our implementation and by Quartz were always the same.

Benchmarks #

Baseline.  To assess practical use, we have benchmarked our implementation against a leading C++ implementation of pattern matching for quantum circuits from the Quartz superoptimiser project Xu, 2022Mingkuan Xu, Zikun Li, Oded Padon, Sina Lin, Jessica Pointing, Auguste Hirth, Henry Ma, Jens Palsberg, Alex Aiken, Umut A. Acar and Zhihao Jia. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433. This implementation is the principal component of an end-to-end quantum circuit optimisation pipeline. The results and speedups we obtain here thus apply and transfer directly to this application.

Dataset.  We further ensure that our results apply in practice by using a real-world dataset of patterns. The Quartz optimiser finds opportunities for circuit optimisation by relying on precomputed equivalence classes of circuits (ECC). These are obtained exhaustively by enumerating all possible small quantum circuits, computing their unitaries and clustering them into classes of circuits with identical unitaries.

The generation of ECC sets is parametrised on the number of qubits, the maximum number of gates and the gate set in use. For these benchmarks we chose the minimal set of gates T,H,CXT, H, CX and considered circuits with up to 6 gates and 2, 3 or 4 qubits. The size of these pattern circuits is typical for the application1.

Thus, for our patterns, we have the bound d6d \leq 6 for the maximum depth and width w=2,3,4w = 2,3,4. In all experiments, the graph GG subject to pattern matching was barenco_tof_10 input, i.e. a 19-qubit circuit input with 674 gates obtained by decomposing a 10-qubit Toffoli gate using the Barenco decomposition Barenco, 1995Adriano Barenco, Charles H. Bennett, Richard Cleve, David P. DiVincenzo, Norman Margolus, Peter Shor, Tycho Sleator, John A. Smolin and Harald Weinfurter. 1995. Elementary gates for quantum computation. Physical Review A 52, 5 (November 1995, 3457--3467). doi: 10.1103/PhysRevA.52.3457.

Results.  We study the runtime of our implementation as a function of the number \ell of patterns being matched, up to =104\ell = 10^4 patterns. We expect the runtime of pattern matching algorithms that match one pattern at a time to scale linearly with \ell. On the other hand, Proposition 4.13 results in a complexity that is independent of \ell.

For each value of \ell, we select a subset of all patterns in the ECC sets at random. For w=2w = 2, there are only a total of =1954\ell = 1954 patterns, explaining why we do not report result beyond that number. For =200\ell = 200 patterns, our proposed algorithm is 3×3\times faster than Quartz. As expected, the advantage of our approach increases as we match more patterns, scaling up to a 20×20\times speedup for =104\ell=10^4. The results are summarised in the following figure:

Runtime of pattern matching for ℓ=0…104\ell = 0\dots 10^4ℓ=0…104 patterns on 2, 3 and 4 qubit quantum circuits from the Quartz ECC dataset, for our implementation (Portmatching) and the Quartz project. All ℓ=1954\ell = 1954ℓ=1954 two-qubit circuits were used, whereas for 3 and 4 qubit circuits, ℓ=104\ell = 10^4ℓ=104 random samples were drawn.

Runtime of pattern matching for =0104\ell = 0\dots 10^4 patterns on 2, 3 and 4 qubit quantum circuits from the Quartz ECC dataset, for our implementation (Portmatching) and the Quartz project. All =1954\ell = 1954 two-qubit circuits were used, whereas for 3 and 4 qubit circuits, =104\ell = 10^4 random samples were drawn.

Dependency on ww and \ell.  We further study the runtime of our algorithm as a function of its two main parameters, the number of patterns \ell and the pattern width ww, on an expanded dataset. To this end, we generate random sets of 10,000 pattern circuits with 15 gates and between w=2w=2 and w=10w=10 qubits, using the same gate set as previously. The resulting pattern matching runtimes are shown in the figure below.

From Proposition 4.13, we expect that the pattern matching runtime is upper bounded by a \ell-independent constant. Our results support this result for w=2w=2 and w=3w=3 qubit patterns, where runtime seems indeed to saturate, reaching an observable runtime plateau at large \ell.

We suspect on the other hand that the exponential cwc^w dependency in the complexity bound of Proposition 4.13 makes it difficult to observe a similar plateau for w4w \geq 4, as we expect this upper bound on the runtime to increase rapidly with qubit counts . A runtime ceiling is not directly observable at this experiment size, but the gradual decrease in the slope of the curve is consistent with the existence of the \ell-independent upper bound predicted in Proposition 4.13.

Runtime of our pattern matching for random quantum circuits with up to 10 qubits.

Runtime of our pattern matching for random quantum circuits with up to 10 qubits.


  1. Such small circuit sizes are imposed in part by the fact that ECCs of larger circuits quickly become unfeasible to generate as their number grows exponentially. In practice, large circuit transformations can often be expressed as the composition of smaller atomic transformations, hence the good performance of this approach in practice. ↩︎


Chapter 5

Fully and Confluently Persistent Graph Rewriting

This chapter leverages another construction from graph rewriting theory that finds a direct application in quantum compilation: the unfolding of graph transformation systems Baldan, 1999Paolo Baldan, Andrea Corradini and Ugo Montanari. 1999. Unfolding and Event Structure Semantics for Graph Grammars. In Foundations of Software Science and Computation Structures, Berlin, Heidelberg. Springer Berlin Heidelberg, 73--89. doi: 10.1007/3-540-49019-1_6 Winskel, 1987Glynn Winskel. 1987. Event structures. Whereas most applications to date have focused on using unfolding for model verification (see section 5.1 for a review), we will use the same techniques to instead speed up optimisation problems over the space of reachable graphs in a GTS.

In the unfolding, graph rewrites are expressed as persistent modifications of the graph. Mutable data structures are typically ephemeral: modifying the data structure overwrites information and invalidates any references to the old data. In contrast, a persistent data structure applies changes to the data so that both the old and new versions remain accessible – a famous example of this are version control systems such as git.

A data structure is fully persistent if modifications can be applied not only to the latest version but also to previous versions of the data structure. In that case, a version of the data may be used to create several new versions. Instead of a linear edit history of all mutations, the result is an edit history tree, with possibly many “most recent” versions – leaves in the edit history.

Finally, a fully persistent data structure is also confluently persistent if different versions of the data in the edit history can be joined together. As a result, the edit history forms a directed acyclic graph (DAG) of versions of the data, linked by data mutation and joining operations. Adopting terminology from git, we call a join of two or more versions a merge of multiple versions.

In this chapter, we will consider all graphs to be hypergraphs (V,E)(V, E) with vertex set VV and hyperedge set EVE \subseteq V^\ast. All results can easily be adapted to accommodate graph attributes, weights, and types as required by applications. This means the data structure and algorithms we present apply directly to minIR graphs and, more broadly, to most instances of graph rewriting.

The central object of study in this chapter is the graph rewrite. We restate a simplified version of Definition 3.9 here. We opt for convenience for a rewrite definition that omits the edge deletion set EE^- of Definition 3.9. This is not a restriction of the general case, as can be seen by adding a “dummy” vertex vev_e for each edge ee in a graph: a rewrite that removes an edge eEe \in E^- can equivalently be expressed by the rewrite that removes the dummy vertex veVv_e \in V^-1.

As in previous chapters, \sqcup denotes disjoint union, f:ABf: A \rightharpoonup B denotes a partial function and domdom denotes the domain of definition of a (partial) function.

Definition 5.4Graph rewrite

A rewrite rr on a graph G=(V,E)G = (V, E) is given by a tuple r=(GR,V,μ)r = (G_R, V^-, \mu), with

  • GR=(VR,ER)G_R = (V_R, E_R) is a graph called the replacement graph,
  • VVV^- \subseteq V is the vertex deletion set, and
  • μ:VVR\mu: V^- \rightharpoonup V_R is the glueing relation, a partial function that maps a subset of the deleted vertices of GG to vertices in the replacement graph.

Define the context subgraph GCG_C as the subgraph induced by the vertices VC=(VV)  dom(μ).V_C = (V \smallsetminus V^-) \ \cup\ dom(\mu).

The rewritten graph resulting from applying rr to GG is the glueing r(G)=(GCGR)/μ.r(G) = (G_C \sqcup G_R) / \sim_\mu.

obtained from the union of GCG_C and GRG_R by merging all vertices within the same class in the equivalence relation μ\sim_\mu that is the closure of μ\mu. We refer to section 3.5 for more details and an illustration of glueings and rewrites.

In this chapter, we will consider sequences of multiple rewrites. We will use the notation V(G)V(G) and E(G)E(G) to designate the vertices, respectively the edges, of a graph GG. It is further assumed that the vertices V(G)V(G) and V(G)V(G') for GGG \neq G' are always disjoint, a fact that we underline by always writing unions of graphs and vertices with \sqcup.

We make use of the fact that for every rewrite r=(GR,V,μ)r = (G_R, V^-, \mu), the equivalence classes α\alpha of μ\sim_\mu are of the form

α={m}{vVμ(v)=m},\alpha = \{ m \} \sqcup \{ v \in V^- \mid \mu(v) = m \},

for some mV(GR)m \in V(G_R). For every set of merged vertices αV(G)\alpha \subseteq V(G) in r(G)r(G), there is thus a unique vertex not in VV^-:

αV={m}.\alpha \smallsetminus V^- = \{ m \}.

We choose to always identify the merged vertex in r(G)r(G) with mm. Using this convention, the set of vertices of r(G)r(G) is simply

V(r(G))=(V(G)V)V(GR).V(r(G)) = (V(G) \smallsetminus V^-) \sqcup V(G_R).


  1. This makes use of the fact that unlike in DPO, our rewrite definition allows the (implicit) deletion of edges with one endvertex in VV^-↩︎

5.1. Related work

The unfolding of a graph transformation system (GTS) was first proposed in Baldan, 1999Paolo Baldan, Andrea Corradini and Ugo Montanari. 1999. Unfolding and Event Structure Semantics for Graph Grammars. In Foundations of Software Science and Computation Structures, Berlin, Heidelberg. Springer Berlin Heidelberg, 73--89. doi: 10.1007/3-540-49019-1_6 as a generalisation of a well-known construction on Petri nets Winskel, 1987Glynn Winskel. 1987. Event structures. Originally defined for DPO rewriting, the unfolding was later generalised to SPO Baldan, 2007Paolo Baldan, Andrea Corradini, Ugo Montanari and Leila Ribeiro. 2007. Unfolding semantics of graph transformation. Information and Computation 205, 5 (May 2007, 733--782). doi: 10.1016/j.ic.2006.11.004 Baldan, 2014Paolo Baldan, Andrea Corradini, Tobias Heindel, Barbara König and Paweł Sobociński. 2014. Processes and unfoldings: concurrent computations in adhesive categories. Mathematical Structures in Computer Science 24, 4 (June 2014). doi: 10.1017/s096012951200031x and SqPO Behr, 2019Nicolas Behr. 2019. Sesqui-Pushout Rewriting: Concurrency, Associativity and Rule Algebra Framework. Electronic Proceedings in Theoretical Computer Science 309 (December 2019, 23--52). doi: 10.4204/eptcs.309.2 in arbitrary adhesive categories. The unfolding is a powerful GTS technique that has found applications in model verification Baldan, 2008Paolo Baldan, Andrea Corradini and Barbara König. 2008. Unfolding Graph Transformation Systems: Theory and Applications to Verification Baldan, 2008Paolo Baldan, Andrea Corradini and Barbara König. 2008. A framework for the verification of infinite-state graph transformation systems. Information and Computation 206, 7 (July 2008, 869--907). doi: 10.1016/j.ic.2008.04.002 Costa, 2012Simone André da Costa and Leila Ribeiro. 2012. Verification of graph grammars using a logical approach. Science of Computer Programming 77, 4 (April 2012, 480--504). doi: 10.1016/j.scico.2010.02.006 and other formal analysis tools such as model-based diagnosis Baldan, 2008Paolo Baldan, Thomas Chatain, Stefan Haar and Barbara König. 2008. Unfolding-Based Diagnosis of Systems with an Evolving Topology and model transformation analysis Bisztr., 2009Dénes András Bisztray. 2009. Compositional verification of model-level refactorings based on graph transformations. PhD Thesis. University of Leicester.

Unfoldings of finite GSTs are often infinite. A lot of work has therefore concerned itself with finding sufficient conditions for finiteness or the existence of finite complete prefixes of unfoldings Baldan, 2008Paolo Baldan, Andrea Corradini and Barbara König. 2008. Unfolding Graph Transformation Systems: Theory and Applications to Verification Baldan, 2004Paolo Baldan, Andrea Corradini and Barbara König. 2004. Verifying Finite-State Graph Grammars: An Unfolding-Based Approach. In CONCUR 2004 - Concurrency Theory, Berlin, Heidelberg. Springer Berlin Heidelberg, 83--98. doi: 10.1007/978-3-540-28644-8_6 Baldan, 2008Paolo Baldan, Andrea Corradini, Barbara König and Stefan Schwoon. 2008. McMillan's Complete Prefix for Contextual Nets Baldan, 2010Paolo Baldan, Alessandro Bruni, Andrea Corradini, Barbara König and Stefan Schwoon. 2010. On the Computation of McMillan's Prefix for Contextual Nets and Graph Grammars. In Graph Transformations, Berlin, Heidelberg. Springer Berlin Heidelberg, 91--106. doi: 10.1007/978-3-642-15928-2_7 Schwoon, 2013Stefan Schwoon. 2013. Efficient verification of sequential and concurrent systems. PhD Thesis. École normale supérieure de Cachan-ENS Cachan. On the other hand, unfoldings of GTSs of quantum computation are expected to be intractably large Yang, 2021Yichen Yang, Mangpo Phitchaya Phothilimtha, Yisu Remy Wang, Max Willsey, Sudip Roy and Jacques Pienaar. 2021. Equality Saturation for Tensor Graph Superoptimization. CoRR abs/2101.01332. doi: 10.48550/ARXIV.2101.01332, with no complete prefixes in general. Rather, our interests lie in finding heuristics that determine the subspace of the unfolding of interest, combined with fast algorithms to expand finite unfolding prefixes into larger ones. This chapter is to our knowledge the first work in this direction.

Persistent data structures on the other hand have a rich history in computer science Drisco., 1989James R. Driscoll, Neil Sarnak, Daniel D. Sleator and Robert E. Tarjan. 1989. Making data structures persistent. Journal of Computer and System Sciences 38, 1 (February 1989, 86--124). doi: 10.1016/0022-0000(89)90034-2 Lagogi., 2005George Lagogiannis, Yannis Panagis, Spyros Sioutas and Athanasios Tsakalidis. 2005. A survey of persistent data structures. In Proceedings of the 9th WSEAS International Conference on Computers, Stevens Point, Wisconsin, USA. World Scientific and Engineering Academy and Society (WSEAS), and particularly within functional programming Okasaki, 1996Chris Okasaki and Peter Lee. 1996. Purely functional data structures. Carnegie Mellon University, USA Okasaki, 1998Chris Okasaki and Andy Gill. 1998. Fast Mergeable Integer Maps. In Workshop on ML, Septempter 1998, 77--86 Hinze, 2005Ralf Hinze and Ross Paterson. 2005. Finger trees: a simple general-purpose data structure. Journal of Functional Programming 16, 02 (November 2005, 197). doi: 10.1017/s0956796805005769. Confluently persistent data structures were first explored in Drisco., 1994James R. Driscoll, Daniel D. K. Sleator and Robert E. Tarjan. 1994. Fully persistent lists with catenation. Journal of the ACM 41, 5 (Septempter 1994, 943--959). doi: 10.1145/185675.185791. A general treatment of the approach was subsequently presented in Fiat, 2003Amos Fiat and Haim Kaplan. 2003. Making data structures confluently persistent. Journal of Algorithms 48, 1 (August 2003, 16--58). doi: 10.1016/s0196-6774(03)00044-0 and improved in Collet., 2012Sébastien Collette, John Iacono and Stefan Langerman. 2012. Confluent Persistence Revisited. In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, January 2012. Society for Industrial and Applied Mathematics, 593--601. doi: 10.1137/1.9781611973099.50. Chaler., 2018Parinya Chalermsook, Mayank Goswami, László Kozma, Kurt Mehlhorn and Thatchaphol Saranurak. 2018. Multi-Finger Binary Search Trees. In International Symposium on Algorithms and Computation 2018. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi: 10.4230/LIPICS.ISAAC.2018.55 proposed a data structure for confluently persistent tries.

Within the field of graph transformations, there is a well-developed theory for persistent (and confluently persistent) transformations in the form of the concurrent graph transformation formalism of Corradini et al. Corrad., 1996A. Corradini, U. Montanari and F. Rossi. 1996. Graph Processes. Fundamenta Informaticae 26, 3,4 (241--265). doi: 10.3233/fi-1996-263402 and Baldan et al. Baldan, 1999Paolo Baldan, Andrea Corradini, Ugo Montanari, Francesca Rossi, Hartmut Ehrig and Michael Löwe. 1999. Concurrent Semantic of Algebraic Graph Transformations. In Handbook of Graph Grammars and Computing by Graph Transformation, August 1999. World Scientific, 107--187. doi: 10.1142/9789812814951_0003. This categorical formalism has also been extended to include support for overlapping transformations in Echahed, 2017Rachid Echahed and Aude Maignan. 2017. Parallel Graph Rewriting with Overlapping Rules. In Proceedings of the 21st International Conference on Logic for Programming, Artificial Intelligence and Reasoning, LPAR, 300--318. doi: 10.29007/576h.

The first practical application of persistent graph rewriting was developed by the graph rewriting engine GRAPE Weber, 2017Jens H. Weber. 2017. GRAPE – A Graph Rewriting and Persistence Engine. In Graph Transformation. Springer International Publishing, 209--220. doi: 10.1007/978-3-319-61470-0_13 and was originally based on ephemeral data structures with transactional ACID semantics. Later, its successor, GrapeVine Weber, 2022Jens H. Weber. 2022. Tool Support for Functional Graph Rewriting with Persistent Data Structures - GrapeVine. In Graph Transformation. ICGT 2022. Springer International Publishing, 195--206. doi: 10.1007/978-3-031-09843-7_11, enhanced this with the first fully persistent data structure for graph rewriting. In this work, the vertices and edges that result from graph rewrites are stored individually in a specialised database. Depending on the requested data version, the graph can be retrieved from the database’s individual vertex and edge entities using the database’s graph query language. To our knowledge, no confluently persistent data structure has been proposed for graph rewriting.

As we will see in section 5.5, confluent persistence is a particularly valuable property in the absence of a rewriting strategy, i.e. a procedure to select and prioritise among possible graph transformations Echahed, 2008Rachid Echahed. 2008. Inductively Sequential Term-Graph Rewrite Systems. In Graph Transformations, ICGT. Springer Berlin Heidelberg, 84--98. doi: 10.1007/978-3-540-87405-8_7. This distinguishes the approach presented in this thesis from most previous work. Rewriting strategies feature prominently in PORGY, a tool for port graph rewriting Andrei, 2011Oana Andrei, Maribel Fernández, Hélène Kirchner, Guy Melançon, Olivier Namet and Bruno Pinaud. 2011. PORGY: Strategy-Driven Interactive Transformation of Graphs. Electronic Proceedings in Theoretical Computer Science 48 (February 2011, 54--68). doi: 10.4204/eptcs.48.7 Ferná., 2010Maribel Fernández and Olivier Namet. 2010. Strategic programming on graph rewriting systems. Electronic Proceedings in Theoretical Computer Science 44 (December 2010, 1--20). doi: 10.4204/eptcs.44.1; the graph rewriting software GROOVE provides the notion of a control program to govern the transformation order Rensink, 2004Arend Rensink. 2004. The GROOVE Simulator: A Tool for State Space Generation. In Applications of Graph Transformations with Industrial Relevance. Springer Berlin Heidelberg, 479--485. doi: 10.1007/978-3-540-25959-6_40; and finally, tools such as GrGen provide advanced control flow primitives to specify rewrite rule execution Geiß, 2006Rubino Geiß, Gernot Veit Batz, Daniel Grund, Sebastian Hack and Adam Szalkowski. 2006. GrGen: A Fast SPO-Based Graph Rewriting Tool. In Graph Transformations. ICGT 2006.. Springer Berlin Heidelberg, 383--397. doi: 10.1007/11841883_27.

Specifying rewriting strategies yields efficient graph transformation procedures and is particularly effective for systems with provable properties such as confluence and termination Verma, 1995Rakesh M. Verma. 1995. Transformations and confluence for rewrite systems. Theoretical Computer Science 152, 2 (December 1995, 269--283). doi: 10.1016/0304-3975(94)00255-0. As a result, rewriting strategies have also been used successfully within classical compiler optimisations Assmann, 2000Uwe Assmann. 2000. Graph rewrite systems for program optimization. ACM Transactions on Programming Languages and Systems 22, 4 (July 2000, 583--637). doi: 10.1145/363911.363914 and quantum circuit optimisation Fagan, 2018Andrew Fagan and Ross Duncan. 2018. Optimising Clifford Circuits with Quantomatic. In Proceedings 15th International Conference on Quantum Physics and Logic, QPL 2018, Halifax, Canada, 3-7th June 2018, 85--105. doi: 10.4204/EPTCS.287.5 Duncan, 2020Ross Duncan, Aleks Kissinger, Simon Perdrix and John van de Wetering. 2020. Graph-theoretic Simplification of Quantum Circuits with the ZX-calculus. Quantum 4 (June 2020, 279). doi: 10.22331/q-2020-06-04-279.

However, such properties of the transition system – or successful heuristic approximations for it – cannot always be derived. In these cases, the space of graphs reachable from an input graph within the transition system must be explored non-deterministically. In the absence of a control program, GROOVE will fall back to an exhaustive exploration of the search space – for an exploration up to depth Δ\Delta, the result is a search tree of size O(γΔ)\mathcal O(\gamma^\Delta), where γ\gamma is the number of possible rewrites at every graph in the search space (assuming γ\gamma is constant for every reachable graph).

Exhaustive exploration is used extensively in model checking, typically to verify properties that must hold for all reachable graphs Rensink, 2004Arend Rensink, Ákos Schmidt and Dániel Varró. 2004. Model Checking Graph Transformations: A Comparison of Two Approaches. In Graph Transformations. ICGT 2004. Springer Berlin Heidelberg, 226--241. doi: 10.1007/978-3-540-30203-2_17. It has also proven to be very useful for compiler optimisation, where the constantly evolving rewrite rules, instruction sets and complex, architecture-dependent cost functions render it challenging to fix a deterministic program transformation schedule.

Jia et al. showed in Jia, 2019Zhihao Jia, Oded Padon, James Thomas, Todd Warszawski, Matei Zaharia and Alex Aiken. 2019. TASO: optimizing deep learning computation with automatic generation of graph substitutions. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, October 2019. ACM, 47--62. doi: 10.1145/3341301.3359630 that computation graph optimisation using graph transformations was achievable without predefined rewriting strategies. They discovered new state-of-the-art implementations for computation graphs of interest to the deep learning community using a simple exhaustive search of the space of possible rewrites with backtracking. This approach was then adapted to quantum circuit optimisation in Xu, 2022Mingkuan Xu, Zikun Li, Oded Padon, Sina Lin, Jessica Pointing, Auguste Hirth, Henry Ma, Jens Palsberg, Alex Aiken, Umut A. Acar and Zhihao Jia. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 and Xu, 2023Amanda Xu, Abtin Molavi, Lauren Pick, Swamit Tannu and Aws Albarghouthi. 2023. Synthesizing Quantum-Circuit Optimizers. Proceedings of the ACM on Programming Languages 7, PLDI (June 2023, 835--859). doi: 10.1145/3591254.

These recent results fit within a long line of compiler research called superoptimisation Fraser, 1979Christopher W. Fraser. 1979. A compact, machine-independent peephole optimizer. In Proceedings of the 6th ACM SIGACT-SIGPLAN symposium on Principles of programming languages - POPL ’79. ACM Press, 1--6. doi: 10.1145/567752.567753 Massal., 1987Henry Massalin. 1987. Superoptimizer: a look at the smallest program. In Proceedings of the second international conference on Architectual support for programming languages and operating systems, October 1987. ACM, 122--126. doi: 10.1145/36206.36194 Sands, 2011Duncan Sands. 2011. Super-optimizing LLVM IR. (November 2011). Retrieved on 13/01/2025 (LLVM Developer's meeting) from http://llvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdf Bansal, 2006Sorav Bansal and Alex Aiken. 2006. Automatic generation of peephole superoptimizers. ACM SIGARCH Computer Architecture News 34, 5 (October 2006, 394--403). doi: 10.1145/1168919.1168906 Sasnau., 2017Raimondas Sasnauskas, Yang Chen, Peter Collingbourne, Jeroen Ketema, Jubi Taneja and John Regehr. 2017. Souper: A Synthesizing Superoptimizer. CoRR abs/1711.04422. On top of excellent optimisation performance, this approach to compiler optimisation using graph transformation systems (GTS) is exceptionally flexible, as rewrite rules can be generated and tailored on demand to the constraints and instruction set of the target hardware. For any supplied cost function, the compiler can explore all valid program transformations to find the rewrites sequence that minimises cost. This keeps the cost function-specific logic separate from the transformation semantics of the program, making it straightforward to replace or update the optimisation objective.

The adaptation of superoptimisation to quantum optimisation of Xu, 2022Mingkuan Xu, Zikun Li, Oded Padon, Sina Lin, Jessica Pointing, Auguste Hirth, Henry Ma, Jens Palsberg, Alex Aiken, Umut A. Acar and Zhihao Jia. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 and Xu, 2023Amanda Xu, Abtin Molavi, Lauren Pick, Swamit Tannu and Aws Albarghouthi. 2023. Synthesizing Quantum-Circuit Optimizers. Proceedings of the ACM on Programming Languages 7, PLDI (June 2023, 835--859). doi: 10.1145/3591254 is, however, showing scaling difficulties: unlike classical superoptimisation, which is usually designed to optimise small subroutines within programs, e.g. focusing on arithmetic instructions, single instruction multiple data (SIMD) etc., the technique should in principle be able to optimise quantum programs in their entirety and requires tens of thousands of rewrite rules. This leads to immense search spaces that superoptimisation does not scale well to.

For the special case of term rewriting, i.e. rewriting of tree expressions, a technique known as equality saturation was introduced in Tate, 2009Ross Tate, Michael Stepp, Zachary Tatlock and Sorin Lerner. 2009. Equality saturation: a new approach to optimization. ACM SIGPLAN Notices 44, 1 (January 2009, 264--276). doi: 10.1145/1594834.1480915 to compress and reduce the size of the search space significantly. Equality saturation can be viewed as a twist on persistent data structures designed to optimise terms through term rewriting. It is persistent insofar as it preserves all data inserted into it, though, unlike persistent data structures, it does not retain the history of transformations. This introduces a new step in the optimisation process known as the extraction phase, where the best term stored within the data structure must be identified and recovered.

An efficient implementation was presented in Willsey, 2021Max Willsey. 2021. Practical and Flexible Equality Saturation. PhD Thesis. University of Washington, and it has recently been adopted in modern compiler optimisation pipelines Fallin, 2022Chris Fallin. 2022. Cranelift: Using E-Graphs for Verified, Cooperating Middle-End Optimizations. (August 2022). Retrieved on 14/01/2025 (RFC) from https://github.com/bytecodealliance/rfcs/blob/main/accepted/cranelift-egraph.md. Though the approach was extended to optimise computation graphs for deep learning in Yang, 2021Yichen Yang, Mangpo Phitchaya Phothilimtha, Yisu Remy Wang, Max Willsey, Sudip Roy and Jacques Pienaar. 2021. Equality Saturation for Tensor Graph Superoptimization. CoRR abs/2101.01332. doi: 10.48550/ARXIV.2101.01332, equality saturation does not generalise to graph rewriting. Equality saturation and the difficulties of adapting it to graph rewriting are interesting (and subtle!) enough to warrant their own section.

5.2. A closer look at equality saturation

Below, we provide a succinct introduction to equality saturation and discuss its shortcomings in the context of quantum computation and graph rewriting in general. For further details on equality saturation, we recommend the presentation of Willsey, 2021Max Willsey. 2021. Practical and Flexible Equality Saturation. PhD Thesis. University of Washington, its implementation Willsey, 2025Max Willsey. 2025. egg: egraphs good. Retrieved on 14/01/2025 (code repository) from https://github.com/egraphs-good/egg, and this blog discussion Bernst., 2024Max Bernstein. 2024. What's in an e-graph?. (Septempter 2024). Retrieved on 14/01/2025 (blog post) from https://bernsteinbear.com/blog/whats-in-an-egraph/.

Unlike a general-purpose compiler utility, equality saturation is specifically a technique for term rewriting. Terms1 are algebraic expressions represented as trees, in which tree nodes correspond to operations, the children of an operation are the subterms passed as arguments to the operation, and leaf nodes are either constants or unbound variables. For instance, the term f(x×2,3)f(x \times 2, 3) would be represented as the tree:

This representation is particularly suited for any pure functional (i.e. side-effect-free) classical computation. Every node of a term is identified with its own term: the subterm given by the subtree the node is the root of. Given term transformation rules, term rewriting consists of finding subterms that match known transformation patterns. The matching subtrees can then be replaced with the new equivalent trees.

In equality saturation, all terms obtained through term rewriting are stored within a single persistent data structure. Term optimisation proceeds in two stages. First, an exploration phase adds progressively more terms to the data structure to discover and capture all possible terms that the input can be rewritten to until saturation (see below), or a timeout, is reached. In the second phase, the saturated data structure is passed to an extraction algorithm tasked with finding the term that minimises the cost function of interest among all terms discovered during exploration.

The data structure that enables this is a generalisation of term trees. Just as in terms, nodes correspond to operations and have children subterms corresponding to the operation’s arguments. To record that a new term obtained through a rewrite is equivalent to an existing subterm, we extend the data structure we employ to also store equivalence classes of nodes, typically implemented as Union-Find data structures Galler, 1964Bernard A. Galler and Michael J. Fisher. 1964. An improved equivalence algorithm. Communications of the ACM 7, 5 (May 1964, 301--303). doi: 10.1145/364099.364331 Cormen, 2009Thomas H Cormen, Charles E Leiserson, Ronald L Rivest and Clifford Stein. 2009. Introduction to algorithms (Third edition ed.). MIT Press, Cambridge, Massachusetts. If we, for instance, applied the rewrite x2x+xx * 2 \mapsto x + x to the term above, we would obtain

Nodes within a grey box indicate equivalent subterms. This diagram encodes that any occurrence of the x * 2 term can equivalently be expressed by the x + x term. Henceforth, when matching terms for rewriting, both the * term and the + term are valid choices for the first argument of the f operation. Suppose, for example, the existence of a rewrite f(x+y,z)f(x,yz)f(x + y, z) \mapsto f(x, y * z), then this would match the above data structure, resulting in

A consequence of using equivalence relations in the data structure is that the ordering in which the rewrites are considered and applied becomes irrelevant!

As presented, the exploration process would never terminate, and the data structure size would grow indefinitely: as more rewrites are applied, more and more terms are created, resulting in an ever-increasing set of possible rewrites to be considered and processed. Equality saturation resolves this by enforcing a term uniqueness invariant: every term or subterm explored is expressed by exactly one node in the data structure. We can see in the above example that this is currently not the case: the term xx for instance, is present multiple times – so is 33. As a result, the nodes no longer form a forest of trees but instead directed acyclic graphs:

This is commonly known as term sharing, and the resulting data structure is known as a term graph Willsey, 2021Max Willsey. 2021. Practical and Flexible Equality Saturation. PhD Thesis. University of Washington. Maintaining this invariance is not hard in practice: whenever a new term is about to be added by applying a rewrite, it must first be checked whether the term exists already – something that can be done cheaply by keeping track of all hashes of existing terms. In the affirmative case, rather than adding a new term to the matched term’s equivalence class, both terms’ classes must be merged.

It might be that equivalence classes must be merged recursively: given the terms f(x,3)f(x, 3) and f(y,3)f(y, 3), if the classes of xx and yy are merged (and thus xx and yy have been proven equivalent), then the classes of their respective parent f(x,3)f(x, 3) and f(y,3)f(y, 3) must also be merged. Doing so efficiently is non-trivial, so we will not go into details here and refer again to Willsey, 2021Max Willsey. 2021. Practical and Flexible Equality Saturation. PhD Thesis. University of Washington.

In the absence of terms of unbounded size, the uniqueness invariant guarantees that the exploration will eventually saturate: as rewrites are applied, there will come a point where all equivalent terms have been discovered, i.e. every applicable rewrite will correspond to an equivalence within an already known class, thereby not providing any new information. This marks the end of the exploration phase2.

Term optimisation then proceeds to the extraction phase. Reading out an optimised term out of the saturated term data structure is not trivial. For every equivalence class in the data structure, a representative node must be chosen so that the final term, extracted recursively from the root by always selecting the representative node of every term class, minimises the desired cost function3.

The strategy for choosing representative terms depends heavily on the cost function. In simple cases, such as minimising the total size of the extracted term, this can be done greedily in reverse topological order, i.e. proceeding from the leaves towards the root Willsey, 2021Max Willsey. 2021. Practical and Flexible Equality Saturation. PhD Thesis. University of Washington. There are also more complex cases, however: if the cost function allows for the sharing of subexpressions that may be used more than once in the computation, for instance, then finding the optimal solution will require more expensive computations such as solving boolean satisfiability (SAT) or Satisfiability Modulo Theories (SMT) problem instances Biere, 2021Armin Biere, Marijn J. H. Heule, Hans Maaren and Toby Walsh. 2021. Handbook of satisfiability (Second edition ed.). IOS Press, Amsterdam.

Equality saturation on graphs? #

Equality saturation is a fast-developing subfield of compilation with a growing list of applications. Unfortunately for us4, adapting these ideas to quantum computation (and graph rewriting more generally) presents several unsolved challenges.

The root of the problem lies in the program representation. The minIR representation we presented in section 3.3 – but also the quantum circuit representation – captures quantum computations, not as a term, but in a directed acyclic graph (DAG) structure.

A generalisation of equality saturation to computation DAGs was studied in Yang, 2021Yichen Yang, Mangpo Phitchaya Phothilimtha, Yisu Remy Wang, Max Willsey, Sudip Roy and Jacques Pienaar. 2021. Equality Saturation for Tensor Graph Superoptimization. CoRR abs/2101.01332. doi: 10.48550/ARXIV.2101.01332 in the context of optimisation of computation graphs for deep learning. Their approach is based on the observation that the computation of a (classical) computation DAG can always be expressed by a term for each output of the computation. Consider, for example, the simple computation that takes two inputs (x, y) representing 2D cartesian coordinates and returns its equivalent in polar coordinates (r, θ).

By introducing two operations polarr\textit{polar}_r and polarθ\textit{polar}_\theta that compute polar\textit{polar} and subsequently, discard one of the two outputs, the DAG can equivalently be formulated as two terms

corresponding to the two outputs r and θ of the computation. This involves temporarily duplicating some of the data and computations in the DAG – though all duplicates will be merged again in the term graph due to the term sharing invariant.

This duplicating and merging of data is fundamentally at odds with the constraints we must enforce on linear data, such as quantum resources. Each operation (or data) of a DAG that is split into multiple terms introduces a new constraint that must be imposed on the extraction algorithm: a computation DAG will only satisfy the no-discarding theorem (section 2.1) for linear values if, for each split operation it contains, it either contains all or none of its split components.

To illustrate this point, consider the following simple rewrite on quantum circuits that pushes X gates (\oplus) from the right of a CX gate to the left:

Both the left and right hand sides would be decomposed into two terms, one for each output qubit. The left terms could be written as

X(CX0(0,1))andCX1(0,1)X(CX0(0, 1)) \quad\textrm{and}\quad CX1(0, 1)

whereas the right terms would be

CX0(X(0),X(1))andCX1(X(0),X(1)).CX0(X(0), X(1)) \quad\textrm{and}\quad CX1(X(0), X(1)).

We introduced the term X()X(\cdot) for the single-qubit X gate and two terms CX1(,)CX1(\cdot, \cdot) and CX2(,)CX2(\cdot, \cdot) for the terms that produce the first, respectively second, output of the two-qubit CX gate. 11 and 00 denote the input qubits of the computation. This would be interpreted as two different rewrites

X(CX0(0,1))CX0(X(0),X(1))andCX1(0,1)CX1(X(0),X(1)).\begin{aligned}X(CX0(0, 1)) &\mapsto CX0(X(0), X(1))\\\textrm{and}\quad CX1(0, 1) &\mapsto CX1(X(0), X(1)).\end{aligned}

Unlike classical computations, however, either of these rewrites on their own would be unphysical: there is no implementation of either split operations CX0CX0 or CX1CX1 on their own. We would thus have to enforce at extraction time that for every application of this pair of rewrite rules, either both or none of the rewrites are applied.

Conversely, satisfying the no-cloning theorem requires verification that during extraction, terms that share a subterm but correspond to distinct graph rewrites are never selected simultaneously – otherwise, the linear value corresponding to the shared subterm would require cloning to be used twice.

The no-discarding and no-cloning restrictions result in a complex web of AND respectively XOR relationships between individual terms in the term graph. These constraints could be ignored during the exploration phase and then be modelled in the extraction phase by an integer linear programming (ILP) problem. However, Yang, 2021Yichen Yang, Mangpo Phitchaya Phothilimtha, Yisu Remy Wang, Max Willsey, Sudip Roy and Jacques Pienaar. 2021. Equality Saturation for Tensor Graph Superoptimization. CoRR abs/2101.01332. doi: 10.48550/ARXIV.2101.01332 observed that this approach causes the term graph to encode a solution space that grows super-exponentially with rewrite depth (see Fig. 7 in Yang, 2021Yichen Yang, Mangpo Phitchaya Phothilimtha, Yisu Remy Wang, Max Willsey, Sudip Roy and Jacques Pienaar. 2021. Equality Saturation for Tensor Graph Superoptimization. CoRR abs/2101.01332. doi: 10.48550/ARXIV.2101.01332), rendering the ILP extraction problem computationally intractable beyond 3 subsequent rewrites. Recent work has attempted to tackle this issue using reinforcement learning Bărbu., 2024George-Octavian Bărbulescu, Taiyi Wang, Zak Singh and Eiko Yoneki. 2024. Learned Graph Rewriting with Equality Saturation: A New Paradigm in Relational Query Rewrite and Beyond. arXiv: 2407.12794 [cs.DB].

Linearity-preserving rewrites are an exponentially small subset #

A simple calculation shows that in the case that all values in the computation graph are linear and only graphs up to a maximal size are considered, the number of possible rewrites only grows exponentially in the rewrite depth. In other words, for optimisation of quantum computations, the solution space of valid computations is much smaller5 than the space explored by the equality saturation approach of Yang, 2021Yichen Yang, Mangpo Phitchaya Phothilimtha, Yisu Remy Wang, Max Willsey, Sudip Roy and Jacques Pienaar. 2021. Equality Saturation for Tensor Graph Superoptimization. CoRR abs/2101.01332. doi: 10.48550/ARXIV.2101.01332.

Indeed suppose there is a maximal graph size V(G)Θ|V(G)| \leqslant \Theta and suppose that all rewrite patterns, i.e. the subgraph induced by the vertex deletion set VV^- of a rewrite, are connected. This is an assumption that was also made in chapter 4, see section 4.2 for a discussion.

In a computation graph of linear values GG, every vertex (value in the computation) vV(G)v \in V(G) has a unique incoming and outgoing edge. This means that any pattern embedding φ:PG\varphi: P \hookrightarrow G is uniquely defined by the image φ(vP)\varphi(v_P) of a single vertex vPV(P)v_P \in V(P). Thus for a GTS with mm transformation rules, there can be at most a constant number

mV(G)mΘ=:αm \cdot |V(G)| \leqslant m \cdot \Theta =: \alpha

of possible rewrites that can be applied to any graph GG. Let Gd\mathcal{G}_d be the set of all graphs that can be reached in the GTS in at most dd rewrites from some input graph G0G_0. Gd+1\mathcal{G}_{d+1} is the set of all graphs obtained by applying a rewrite to a graph GGdG \in \mathcal{G}_d. Thus we have the relation:

Gd+1αGd,|\mathcal{G}_{d+1}| \leqslant \alpha \cdot |\mathcal{G}_d|,

The total number of rewrites RdR_d that can be applied on any graph in Gd\mathcal{G}_d is thus

RdαGd=O(eαd).R_d \leqslant \alpha \cdot |\mathcal{G}_d| = O(e^{\alpha \cdot d}).


In summary, equality saturation is a specialisation of persistent data structures uniquely suited to the problem of term rewriting. It succinctly encodes the space of all equivalent terms, and using term sharing does away with the need to apply equivalent rewrites on multiple copies of the same term, which inevitably occurs on more naive rewriting approaches.

However, equality saturation cannot model rewrites that require deleting parts of the data. This is not a problem for terms representing classical operations, as data can always be implicitly copied during exploration and discarded during extraction as required. This is not the case for quantum computations – and for graph rewriting in general, where explicit vertex (and edge) deletions are an integral part of graph transformation semantics.

As a result, numerous constraints would have to be imposed to restrict the solution space encoded by term graphs to valid outcomes of graph rewriting procedures. This would make extraction algorithms complex and cumbersome. More importantly, we showed that in the case of computation graphs on linear values, such as quantum computations, the solution space explored by equality saturation is super-exponentially larger than the space of valid computations, rendering the extraction algorithm and meaningful exploration of the relevant rewriting space computationally intractable.


  1. Depending on the context, computer scientists also call them abstract syntax trees (AST) – for our purposes, it’s the same thing. ↩︎

  2. Of course, it is also practical to include a timeout parameter in implementations to guarantee timely termination even on large or ill-behaved systems. ↩︎

  3. Note that we are omitting a subtle point here that arises due to term sharing: depending on the cost function, choosing different representative nodes for the same class could be favourable for the other occurrences of the term in the computation. ↩︎

  4. but fortunately for this thesis ↩︎

  5. Exponential is super-exponentially smaller than super-exponential! Or put mathematically eo(n)/eΘ(n)=eo(n)Θ(n)=eo(n)e^{o(n)}/e^{\Theta(n)} = e^{o(n) - \Theta(n)} = e^{o(n)}↩︎

5.3. The data structure

We now present a data structure that is closely related to equality saturation but supports arbitrary graph rewriting. It is modelled on the graph unfolding construction as presented in Baldan, 2008Paolo Baldan, Andrea Corradini and Barbara König. 2008. Unfolding Graph Transformation Systems: Theory and Applications to Verification.

Rather than maintaining equivalence relations between terms, as done in term graphs, we maintain equivalence relations between graph vertices. Our data structure stores the set of all applied rewrites – the main subject of this section is to show how all operations of interest on this data structure can be implemented efficiently.

The persistent graph rewriting data structure is given by a set D\mathcal{D} of events δ=(GR,V,μ)D\delta = (G_R, V^-, \mu) \in \mathcal{D}, with

  • vertex deletion set VV(Dδ)V^- \subseteq V(\mathcal{D} \smallsetminus \delta) and
  • glueing relation μ:VV(GR)\mu: V^- \rightharpoonup V(G_R).

We have extended the V()V(\cdot) notation to D\mathcal{D} by defining it as the union of all vertex sets of replacement graphs in D\mathcal{D}. We will similarly use V(δ)V(\delta) to denote the set of vertices in the replacement graph of a rewrite δ\delta.

Events resemble rewrites as defined in Definition 5.4 but differ in that they do not apply to a single graph GG, i.e. there is no graph such that VV(G)V^- \subseteq V(G). Instead,

V  δDV(δ)=V(D).V^-\ \subseteq\ \bigsqcup_{\delta \in \mathcal{D}} V(\delta) = V(\mathcal{D}).

We will see below how a graph GG can be constructed such that an event δD\delta \in \mathcal{D} is indeed a valid rewrite on GG.

Using the disjointness of the union in (2), for all vV(D)v \in V(\mathcal{D}), there is a unique δD\delta \in \mathcal{D} such that vV(δ)v \in V(\delta) that we call the owner of vv. The parents (or directed causes) P(δ)P(\delta) of an event δ\delta are then the owners of the vertices in the deletion set VV^- of δ\delta:

P(δ)={δpDVV(δp)}.P(\delta) = \left\{ \delta_p \in \mathcal{D} \mid V^- \cap V(\delta_p) \neq \varnothing \right\}.

Inversely, we define the children of δ\delta as the set of event whose parents include δ\delta

P1(δ)={δcDδP(δc)}.P^{-1}(\delta) = \left\{\delta_c \in \mathcal{D} \mid \delta \in P(\delta_c)\right\}.

The following figure shows an example of a data structure D\mathcal{D} on undirected graphs.

Events on an undirected graph with their history. Coloured directed edges represent the parent-child relationship. The area that they rewrite in the parent event is represented by dashed regions of the same colour. The map between graphs is given by the vertex IDs.

Events on an undirected graph with their history. Coloured directed edges represent the parent-child relationship. The area that they rewrite in the parent event is represented by dashed regions of the same colour. The map between graphs is given by the vertex IDs.

Merges, confluent persistence and event creation #

A rewrite r=(GR,V,μ)r = (G_R, V^-, \mu) that applies to a replacement graph of an event δp\delta_p, i.e. VV(δp)V^- \subseteq V(\delta_p) immediately defines a valid event δr:=r\delta_r := r. In that case, δr\delta_r has a unique parent P(δr)={δp}.P(\delta_r) = \{ \,\delta_p\, \}.

Creating an event δr\delta_r from a rewrite rr is the simplest type of data mutation that can be recorded in D\mathcal{D}. For D\mathcal{D} to be a confluently persistent data structure, it must also be allowed to merge mulitple data mutations together. Rather than handling merges of versions of the data structure explicitly, an event δD\delta \in \mathcal{D} can define graph mutation operations that apply on collections of events PDP \subseteq \mathcal{D} – the resulting mutation is equivalent to explicitly creating a merged version of the versions in PP, followed by the desired rewrite. In this case, the parents of δ\delta are precisely the set P=P(δ)P = P(\delta).

In other words, the parent-child relationships of D\mathcal{D} is precisely the event history of D\mathcal{D}: a directed graph with vertex set D\mathcal{D} and edges δ1δ2\delta_1 \to \delta_2 if δ1P(δ2)\delta_1 \in P(\delta_2). For D\mathcal{D} to define a valid confluently persistent data structure, we need to

  1. Ensure that the event history is acyclic, and
  2. Define conditions that guarantee that events correspond to valid data mutations.

We hit both birds with one stone by restricting how D\mathcal{D} can be constructed and modified in such a way that acyclicity is guaranteed. Specifically, we introduce two procedures:

  • CreateEmpty constructs an empty D=\mathcal{D} = \varnothing, and
  • AddEvent, adds an event δ\delta to D\mathcal{D}.

The first is straightforward – and importantly, the only way to construct an instance D\mathcal{D}. AddEvent, on the other hand, enforces two conditions that δ\delta must satisfy to be added to a set D\mathcal{D}:

  • P(δ)DP(\delta) \subseteq \mathcal{D}, and
  • all parents of δ\delta must be compatible.

We defer the discussion on the second condition, enforced by the AreCompatible procedure, to its dedicated section below. The restriction P(δ)DP(\delta) \subseteq \mathcal{D} defines a partial order on events by guaranteeing that an event δ\delta can only be defined and added to D\mathcal{D} after all its parents P(δ)P(\delta) have been added.

We say that D\mathcal{D} is valid if it can be constructed from a single call to CreateEmpty, followed by a sequence of calls to AddEvent. This is equivalent to requiring that

  1. the parent-child relationship is acylic and
  2. the parents of every event satisfy AreCompatible.

For the remainder of this chapter, we will always assume that D\mathcal{D} is valid, and thus the event history of D\mathcal{D} is always well-defined and acyclic.

def CreateEmpty() -> Set[Event]:
    return set()

def AddEvent(
    events: Set[Event],
    replacement_graph: Graph
    deletion_set: Set[V],
    glueing_relation: EquivalenceRelation[V]
) -> Set[Event]:
    new_event = (
        replacement_graph,
        deletion_set,
        glueing_relation
    )
    parents = parents(new_event)
    assert(issubset(parents, events))
    assert(AreCompatible(parents))

    events = union(events, {new_event})

Compatible events #

Assuming the parent-child relationship is acylic, we can define the ancestors (or causes) δ\lfloor\delta\rfloor of an event δ\delta recursively

δ={δ}  δpP(δ)δp.\lfloor\delta\rfloor = \{\,\delta\,\}\ \cup\ \bigcup_{\delta_p \in P(\delta)} \lfloor\delta_p\rfloor.

Events DDD \subseteq \mathcal{D} are compatible (or a configuration) if all vertex deletion sets VV^- for all ancestors of δD\delta \in D are disjoint. That is, writing

D=δDδ,\lfloor D\rfloor = \bigcup_{\delta \in D} \lfloor \delta\rfloor,

we require that all sets {V(GR,V,μ)D}\{V^- \mid (G_R, V^-, \mu) \in \lfloor D \rfloor \} are disjoint. In the example above, events δ5\delta_5 and δ6\delta_6 are compatible, wheresa δ5\delta_5 and δ4\delta_4 are not. As pseudocode, this is implemented by the following procedure.

def AreCompatible(events: Set[Event]) -> bool:
    all_ancestors = union([ancestors(d) for d in events])
    deleted_vertices = set()
    for d in all_ancestors:
        for v in deletion_set(d):
            if v in deleted_vertices:
                return False
            deleted_vertices.add(v)
    return True

Note that this definition of event compatibility is a strictly stronger version of parallel independence as is typically defined in DPO rewriting Corrad., 2018Andrea Corradini, Dominique Duval, Michael Löwe, Leila Ribeiro, Rodrigo Machado, Andrei Costa, Guilherme Grochau Azzi, Jonas Santos Bezerra and Leonardo Marques Rodrigues. 2018. On the Essence of Parallel Independence for the Double-Pushout and Sesqui-Pushout Approaches. In Graph Transformation, Specifications, and Nets, Cham. Springer International Publishing, 1--18. doi: 10.1007/978-3-319-75396-6_1. It does not allow for events δ1\delta_1 and δ2\delta_2 such that a vertex vv is both

  • in the read only context of δ1\delta_1, i.e. vV1dom(μ1)v \in V_1^- \cap dom(\mu_1), and thus present both before and after the application of δ1\delta_1,
  • in the deletion set of δ2\delta_2, i.e. vV2v \in V^-_2.

This excludes asymmetric conflicts as discussed in e.g. Baldan, 2008Paolo Baldan, Andrea Corradini and Barbara König. 2008. Unfolding Graph Transformation Systems: Theory and Applications to Verification, which arise in the more generali definition. This restriction simplifies our considerations as makes the event history of any event unique.

The runtime is O(DAlogDA)O(D_A \log D_A), where DAD_A is the sum of the sizes of all vertex deletion sets of events in D\lfloor D \rfloor δDVδ.\sum_{\delta \in \lfloor D \rfloor} |V^-_\delta|.

The log\log factor can typically be removed if the vertices vv span a contiguous integer range or by using a hash function. Alternatively, the log\log factor can also be reduced by using separate sets to track deleted vertices of each event.

When talking about compatible sets of events DDD \subseteq \mathcal{D}, it simplifies considerations to always choose DD such that the ancestors of δD\delta \in D are also in DD, i.e. D=D.D = \lfloor D \rfloor. We introduce the notation

Γ(D)={DDD and D is compatible}P(D)\Gamma(\mathcal{D}) = \{ \lfloor D \rfloor \mid D \subseteq \mathcal{D} \text{ and } D \text{ is compatible}\} \subseteq \mathcal{P}(\mathcal{D})

for the set of all compatible sets of rewrites of the form D\lfloor D \rfloor.

Events are rewrites on the flattened history #

We have so far explored how events can be added to D\mathcal{D}, as well as when they are compatible. However, until we have established that adding events to D\mathcal{D} is in some sense equivalent to applying rewrites on a graph, it is hard to see how the data structure D\mathcal{D} would be useable for graph rewriting. This is precisely our next point.

In a valid non-empty D\mathcal{D}, events δD\delta \in \mathcal{D} form a directed acyclic graph and therefore there must always be (at least) one “root” event δ1D\delta_1 \in \mathcal{D} with no parents P(δ1)=P(\delta_1) = \varnothing. δ1\delta_1 is thus a valid rewrite that can be applied to any graph.

For the applications of D\mathcal{D} that we consider, it will always be sufficient to have a unique root event δ1\delta_1. Viewing δ1\delta_1 as a rewrite that applies to the empty graph G0=G1G_0 = \varnothing \to G_1, we can understand it as injecting the input graph G1G_1 into D\mathcal{D}.

Non-root events in D\mathcal{D} on the other hand typically correspond to valid (semantics preserving) rewrites in the GTS under consideration.

Consider a set of compatible events DΓ(D)D \in \Gamma(\mathcal{D}). Define a topological ordering δ1,,δk\delta_1, \ldots, \delta_k of the events in DD, i.e. if δjP(δi)\delta_j \in P(\delta_i) then i<ji < j.

Proposition 5.1Events as valid rewrites

There are graphs G0,,GkG_0, \ldots, G_k such that for all 1ik1 \leqslant i \leqslant k, the event δi\delta_i defines a valid rewrite rir_i on Gi1G_{i-1} and Gi=ri(Gi1)G_i = r_i(G_{i-1}).

Define the empty graph G0=G_0 = \varnothing. The event δ1\delta_1 has no parent and thus must have an empty vertex deletion set and glueing relation. It is thus a valid rewrite r1r_1 on G0G_0. Define G1=r1(G0)G_1 = r_1(G_0).

We can similarly define Gi=ri(Gi1)G_i = r_i(G_{i-1}) inductively for graphs G2,,GkG_2, \ldots, G_k if we show for 2ik2 \leqslant i \leqslant k that the ii-th event δi\delta_i defines a valid rewrite rir_i on Gi1G_{i-1}. The set of vertices in Gi1G_{i-1} is the union of all vertices in the replacement graph of δ1,,δi1\delta_1, \ldots, \delta_{i-1} minus their vertex deletion sets

V(Gi1)=(1j<iV(δj))(1j<iVj),V(G_{i-1}) = \left(\bigsqcup_{1 \leqslant j < i} V(\delta_j)\right) \setminus \left(\bigsqcup_{1 \leqslant j < i} V^-_j\right),

where VjV^-_j is the vertex deletion set of δj\delta_j.

Now, by definition of the event δi\delta_i,

Vi1j<iV(δj).V^-_i \subseteq \bigsqcup_{1 \leqslant j < i} V(\delta_j).

On the other hand, because of the compatibility of all events in DD, we know that ViVj=V^-_i \cap V^-_j =\varnothing for all 1j<i1 \leqslant j < i. It thus follows ViV(Gi1)V^-_i \subseteq V(G_{i-1}). Hence δi\delta_i is indeed a valid rewrite of Gi1G_{i-1}, and thus rir_i and GiG_i are well-defined.

This construction is illustrated in the following figure for the compatible set δ5δ6\lfloor \delta_5 \rfloor \cup \lfloor \delta_6 \rfloor of the previous example.

Applying events as rewrites in topological order. The result is a sequence of valid graph rewrites that start from the graph of δ1\delta_1δ1​.

Applying events as rewrites in topological order. The result is a sequence of valid graph rewrites that start from the graph of δ1\delta_1.

We now show that the graph GkG_k is determined uniquely by DΓ(D)D \in \Gamma(\mathcal{D}) and provide an explicit procedure to construct it.

Proposition 5.2Flat graph extraction

The graph GkG_k obtained by applying the set of compatible rewrites DΓ(D)D \in \Gamma(\mathcal{D}) in topological order on the empty graph is independent of the topological ordering chosen.

Given the set of rewrites DDD \subseteq \mathcal{D}, the procedure FlattenHistory returns GkG_k in time

O(m+n)O(m + n)

where nn and mm are the total number of vertices and edges across all replacement graphs in DD.

Let us start with the definition of FlattenHistory:

 1def FlattenHistory(events: Set[Event]) -> Graph:
 2    all_ancestors = union([ancestors(d) for d in events])
 3    graph = Graph()
 4    for a in toposort(all_ancestors):
 5        add_graph(graph, replacement_graph(a))
 6        for (del_v, repl_v) in glueing_relation(a):
 7            move_edges(graph, repl_v, del_v)
 8        for v in deletion_set(a):
 9            remove_vertex(graph, v)
10    return graph

toposort is a function that returns a topological ordering of the rewrites in DD according to the parent-child rewrite relation, add_graph inserts the graph passed as second argument into the graph passed as first argument, remove_vertex removes the vertex along with all incident edges from the graph and move_edges moves all edges of the second vertex to the first vertex.

Correctness of FlattenHistory.  It is easy to see that if the graph GkG_k that is obtained from applying the rewrites in order is independent of the choice of the toplogical ordering, then FlattenHistory is a correct implementation of the procedure, as it applies one rewrite at a time, in topological order.

Rewrite order invariance.  Consider two rewrites δ1,δ2D\delta_1, \delta_2 \in D such that neither is an ancestor of the other. Let

Dpre=δ1δ2DD_{pre} = \lfloor \delta_1 \rfloor \cup \lfloor \delta_2 \rfloor \subseteq D

and proceed by induction over DpreD_{pre}: assume the graph GpreG_{pre} obtained by applying the rewrites in DpreD_{pre} is invariant on the choice of the topological ordering of DpreD_{pre}. Clearly this is true for Dpre=D_{pre} = \varnothing. All that remains to be shown is that GpostG_{post} obtained by applying first δ1\delta_1 then δ2\delta_2 on GpreG_{pre} is equal to GpostG_{post}', obtained by applying the same rewrites in the reverse order on GpreG_{pre}.

The vertex sets V1V^-_1 and V2V^-_2 of δ1\delta_1 and δ2\delta_2 must be disjoint because δ1,δ2D\delta_1, \delta_2 \in D and hence are compatible. Furthermore, the replacement graphs (by definition of the rewrites) and the glueing relations of δ1\delta_1 and δ2\delta_2 (by rewrite compatibility) cannot contain vertices in V1V2V^-_1 \sqcup V^-_2. It follows that the order in which vertices of V1V2V^-_1 \sqcup V^-_2 are removed from GpreG_{pre} does not affect the graph GpostG_{post}. Furthermore, vertex merging is a commutative operation, and so is disjoint graph addition. It follows Gpost=GpostG_{post} = G_{post}' and hence the result.

Runtime.  In total nn vertices and mm edges will be added to graph by add_graph on line 5. As a result, at most nn vertices can ever be deleted by line 9. Finally, while a naive implementation of move_edges of line 7 might result in the same edge being moved many times, all edge moves can be cached and only executed once at the end: notice that every time edges are moved away from a vertex, that vertex is subsequently removed from the graph. Instead of removing the vertex, keep it “hidden”, with a link to the vertex that the edges should be moved to. Once all graph operations are completed, traverse all hidden vertices and follow the links to the vertices that the edges should be moved to. This can be done in O(n)O(n) time. Then move all edges to the correct vertex, in time O(m)O(m), and delete the hidden vertices.

Now instead of exploring the space of all graphs G\mathcal{G} reachable by repeatedly applying rewrites, we can explore the rewrite space by adding events to D\mathcal{D}. Write flat(D)flat(D) for the graph returned by FlattenHistory on set DD. If G\mathcal{G}' is the set of all graphs returned by FlattenHistory on compatible events

G={ flat(D)DΓ(D)},\mathcal{G}' = \{\ flat(D) \mid D \in \Gamma(\mathcal{D})\},

then Proposition 5.1 and Proposition 5.2 combined guarantee that GG\mathcal{G}' \subseteq \mathcal{G}. To conclude, we show that indeed any graph in G\mathcal{G} is in G\mathcal{G}', and hence G=G\mathcal{G} = \mathcal{G}'.

Proposition 5.3Rewrites as valid events

Let DΓ(D)D \in \Gamma(\mathcal{D}) be a set of compatible events and G=flat(D)G = flat(D). Any rewrite rr that can be applied on GG defines an on GG defines an event δ=r\delta = r that can be added to D\mathcal{D}.

We recall that a rewrite r=(GR,V,μ)r = (G_R, V^-, \mu) defines an event δ=(GR,V,μ)\delta = (G_R, V^-, \mu) that can be added to D\mathcal{D} if

  • P(δ)DP(\delta) \subseteq \mathcal{D}, and
  • all rewrites in P(δ)P(\delta) are compatible.

By the rewrite definition, VV(G)V^- \subseteq V(G). It follows in particular that

VδDV(δ),V^- \subseteq \bigcup_{\delta' \in D} V(\delta'),

and thus VV(D)V^- \subseteq V(\mathcal{D}), as well as P(δ)DDP(\delta) \subseteq D \subseteq \mathcal{D}. This proves both conditions.

Starting from the empty graph D=\mathcal{D} = \varnothing, we can create a root event δ0=(G,,)\delta_0 = (G, \varnothing, \varnothing) with an empty vertex deletion set and glueing relation and add it to D.\mathcal{D}.

Clearly, flat({δ0})=Gflat(\{\delta_0\}) =G. We then apply Proposition 5.3 repeatedly. If we have a sequence r1,,rkr_1, \ldots, r_k of valid rewrites that can be applied on GG, then the sequence of events δ1=r1,,δk=rk\delta_1 = r_1, \ldots, \delta_k = r_k that it defines can also be added to D\mathcal{D} in this order. As we have further seen in Proposition 5.1 and Proposition 5.2, the graph GkG_k that is obtained as a result of the rewrites is the same graph returned by FlattenHistory called on D=DD = \mathcal{D}.

In other words, we conclude that exploring the rewrite space on GG is fully equivalent to exploring the space of valid events starting from D={δ0}\mathcal{D} = \{ \delta_0 \}.

5.4. Exploration and extraction

In the previous section, we proposed a data structure D\mathcal{D} that is confluently persistent and can be used to explore the space of all possible transformations of a graph transformation system (GTS). We are now interested in using D\mathcal{D} to solve optimisation problems over the space of reachable graphs in the GTS. Following the blueprint of equality saturation (see section 5.2), we proceed in two phases:

  1. Exploration.  Given an input graph GG, populate D\mathcal{D} with events that correspond to rewrites applicable to graphs reachable from GG,
  2. Extraction.  Given a cost function ff, extract the optimal graph in D\mathcal{D}, i.e. the graph that is a flattening of a set of compatible edits DDD \subseteq \mathcal{D} and minimises ff.

Each phase comes with its respective challenges, which we discuss in this section. We will first look at the exploration phase, which requires a way to find and construct new events δ\delta that can be added to D\mathcal{D}. We will consider the extraction phase in the second part of this section and see that the problem of optimisation over the power set P(D)\mathcal P(\mathcal D) can be reduced to boolean satisfiability formula that admit simple cost functions in the use cases of interest.

There is an additional open question that we do not cover in this section and would merit a study of its own: the choice of heuristics that guide the exploration phase to ensure the “most interesting parts” of the GTS rewrite space are explored. We propose a very simple heuristic to this end in the benchmarks of section 5.5, but further investigations are called for.

Exploring the data structure with pattern matching #

We established in the previous section that rewrites that apply on GG can equivalently be added as events to D\mathcal{D}. In other words, a graph GG' is reachable from GG using the rewrites of a GTS if and only if there is a set of compatible events DDD \subseteq \mathcal{D} such that GG' is the graph obtained from FlattenHistory on input DD.

To expand D\mathcal{D} to a larger set DD\mathcal{D}' \supseteq \mathcal{D}, we must find all applicable rewrites on all graphs within D\mathcal{D}. A naive solution would iterate over all subsets of DDD \subseteq \mathcal{D}, check whether they form a compatible set of events, compute FlattenHistory if they do, and finally run pattern matching on the obtained graph to find the applicable rewrites. We can do better.

The idea is to traverse the set of events in D\mathcal{D} using the glueing relations μ\mu that connect vertices between events. Define the function μˉ:V(D)P(V(D))\bar{\mu}: V(\mathcal{D}) \to \mathcal{P}(V(\mathcal{D})) that is the union of all glueing relations μ\mu in events in D\mathcal{D}:

μˉ(v)={μc(v)δc=(Vc,Vc,μc)P1(δv)}.\bar\mu(v) = \{\mu_c(v) \mid \delta_c = (V_c, V^-_c, \mu_c) \in P^{-1}(\delta_v)\}.

where we write δv\delta_v for the owner of vv, i.e. the (unique) event δvD\delta_v \in \mathcal{D} such that vV(δv)v \in V(\delta_v). We define the set ED(v)\mathcal{E}_D(v) of equivalent vertices of vv that are compatible with DD by applying μˉ(v)\bar\mu(v) recursively and filtering out vertices whose owner is not compatible with DD. It is easiest to formalise this definition using pseudocode for the EquivalentVertices procedure. The set of vertices in ED(v)\mathcal{E}_D(v) are vertices of descendant events of δv\delta_v.

def EquivalentVertices(
    v: Vertex, events: Set[Event]
) -> Set[Vertex]:
    all_vertices = set({v})
    for w in mu_bar(v):
        new_events = union(events, {owner(w)})
        if AreCompatible(new_events):
            all_vertices = union(all_vertices,
                EquivalentVertices(w, new_events)
            )
    return all_vertices

Whilst it looks as though EquivalentVertices does not depend on D\mathcal{D}, it does so through the use of the function calls to mu_bar.

We use EquivalentVertices to repeatedly extend a set of pinned vertices πV(D)\pi \subseteq V(\mathcal{D}). A set of pinned vertices must satisfy two properties:

  • the set Dπ={δvvπ}D_\pi = \{\delta_v \mid v \in \pi \} is a set of compatible events,
  • there is no vertex vπv \in \pi and event δD\delta \in D such that vV(δ)v \in V^-(\delta).

As a result, for the flattened graph G=flat(Dπ)G = flat(D_\pi), it always holds that πV(G)\pi \subseteq V(G). Furthermore, if G(π)GG(\pi) \subseteq G is the subgraph of GG induced by π\pi, then for any superset of pinned vertices ππ\pi' \supseteq \pi, we have G(π)G(π)G(\pi) \subseteq G'(\pi') where G=flat(Dπ)G' = flat(D_{\pi'}). In other words: extending a set of pinned vertices results in an extension of the flattened graph – a very useful property when pattern matching. This property follows from the second property above and the definition of FlattenHistory.

This gives us the following simple procedure for pattern matching:

  1. Start with a single pinned vertex π={v}\pi = \{v\}.
  2. Construct partial embeddings PG(π)P \rightharpoonup G(\pi) for patterns PP.
  3. Pick a new vertex vv in G=flat(Dπ)G = flat(D_\pi) but not in G(π)G(\pi) (that we would like to extend the domain of definition of our pattern embeddings to).
  4. For all vertices vED(v)v' \in \mathcal{E}_D(v), build new pinned vertex sets π=π{v}\pi' = \pi \cup \{v'\}, filter out the sets π\pi' that are not valid pinned vertex sets.
  5. Repeat steps 2–4 until all pattern embeddings have been found.

Step 1 is straightforward – notice that pattern matching must be started at a vertex in V(D)V(\mathcal{D}), so finding all patterns will require iterating over all choices of vv. The pattern embeddings are constructed over iterations of step 2: each iteration can be seen as one step of the pattern matcher – for instance, as presented in chapter 4 – extending the pattern embeddings that can be extended and discarding those that cannot. If all possible pattern embeddings have been discarded, then matching can be aborted for that π\pi set.

How step 3 should be implemented depends on the types of graphs and patterns that are matched on. It is straightforward in the case of computation graphs with only linear values, i.e. hypergraphs with hyperedges that have directed, ordered endpoints and vertices that are incident to exactly one incoming and one outgoing edge. In that case, vv can always be chosen in such a way as to ensure progress on the next iteration of step 2, i.e. the domain of definition of at least one partial pattern embedding PG(π)P \hookrightarrow G(\pi) will be extended by one vertex. The text in the blue box below explains this case in more detail.

Step 4 produces all possible extensions of π\pi to pinned vertex sets π\pi' that include a descendant vv' of vv (or vv itself). All vertices in ED(v)\mathcal{E}_D(v) are in events compatible with DD by definition, so to check that π\pi' is a valid pinned vertex set, we only need to check the second property of pinned vertices. Let PP be a pattern, let SS be the set of all π\pi sets under consideration. Step 4 increments the sizes of all pinned vertex sets πS\pi \in S whilst maintaining the following invariant.

Invariant for step 4.  If there is a superset DDπD' \supseteq D_\pi of compatible events such that PP embeds in G=flat(D)G' = flat(D'), then there is a superset ππ\pi' \supseteq \pi of vertices such that PP embeds in flat(Dπ)flat(D_{\pi'}).

Finally, step 5 ensures the process is repeated until, for all partial pattern embeddings, either the domain of definition is complete, or the embedding of PP is not possible. Given that step 4 increments the size of π\pi sets at each iteration, this will terminate as long as the vertex picking strategy of step 3 selects vertices that allow to extend (or refute) the partial pattern embeddings constructed and extended in step 2. This is satisfied, for example, in the case of linear minIR graphs, as explained in the box.

Choosing the next vertex to pin in linear minIR (step 3).   Assuming patterns are connected, for any partial pattern embedding PG(π)P \hookrightarrow G(\pi) there is an edge ePE(P)e_P \in E(P) with no image in G(π)G(\pi) but such that at least one of the endvertex vPv_P of ePe_P has an image vGv_G in π\pi – say, ePe_P is the outgoing edge of vPv_P. Let vPv'_P be an endvertex of ePe_P in PP that has no image in G(π)G(\pi) – and say, it is the ii-th outgoing endvertex of ePe_P in PP.

Then vPv_P uniquely identifies an edge eGe_G in G=flat(Dπ)G = flat(D_\pi) – the unique outgoing edge of vGv_G – which, in turn, uniquely identifies a vertex vGV(G)v'_G \in V(G) – the ii-th outgoing endvertex of eGe_G. By choosing vGv'_G in step 3, step 4 will create pinned vertex sets that include all possible vertices equivalent to vGv_G', which are all vertices that vGv_G might be connected to through its outgoing edge1. The next iteration of step 2 will then either extend the partial pattern embedding to vPv_P or conclude that an embedding of PP is not possible.

Using the approach just sketched, pattern matching can be performed on the persistent data structure D\mathcal{D}. The runtime of steps 2 and 3 depend on the type of graphs and patterns that are matched on – these are, however, typical problems that appear in most instances of pattern matching, independently of the data structure D\mathcal{D} used here. A concrete approach to pattern matching and results for the graph types of interest to quantum compilation was presented in chapter 4.

The runtime of step 4 and the number of overall iterations of steps 2–4 required for pattern matching will depend on the number of events in D\mathcal{D} (AreCompatible runs in runtime linear in the number of ancestors), the number of equivalent vertices that successive rounds of step 4 will return and the types of patterns and pattern matching strategies.

Extraction using SAT #

Moving on to the extraction phase, we are now interested in extracting the optimal graph from D\mathcal{D}, according to some cost function of interest. Unlike exploring the “naive” search space of all graphs reachable in the GTS, the optimal solution within the persistent data structure D\mathcal{D} cannot simply be read out.

We showed in section 5.3 that finding an optimal graph GG' that is the result of a sequence of rewrites on an input graph GG is equivalent to finding an optimal set of compatible events DΓ(D)P(D)D \in \Gamma(\mathcal{D}) \subseteq \mathcal{P}(\mathcal{D}) – the optimal graph GG' is then recoved by taking G=flat(D)G' = flat(D).

There are 2D2^{|\mathcal{D}|} elements in P(D)\mathcal{P}(\mathcal{D}), which we encode as a boolean assignment problem by introducing a boolean variable xδx_\delta for all events δD\delta \in \mathcal{D}. The set of events DD is then given by

D={δDxδ}P(D).D = \{\delta \in \mathcal{D} \mid x_\delta\} \in \mathcal{P}(\mathcal{D}).

We can constrain the boolean assignments to compatible sets DD by introducing a boolean formula

¬(xδxδ)\neg (x_\delta \land x_{\delta'})

for all δ,δD\delta,\delta' \in \mathcal{D} such that their vertex deletion sets intersect V(δ)V(δ)V^-(\delta) \cap V^-(\delta') \neq \varnothing. Any assignment of {xδδD}\{x_\delta \mid \delta \in \mathcal{D}\} that satisfies all constraints of this format defines a compatible set of events.

How many such pairs of events (δ,δ(\delta,\delta') are there? By definition of parents, two events δ\delta and δ\delta' can only have overlapping vertex deletion sets if they share a parent. Assuming all events have at most ss children, ensuring DD is a set of compatible events requires at most O(s2D)O(s^2 \cdot |\mathcal{D}|) constraints.

To further restrict to DΓ(D)D \in \Gamma(\mathcal{D}), i.e. to sets of compatible events D=DD = \lfloor D \rfloor that contain all ancestors, we can add the further constraints: δD\delta \in D implies P(δ)DP(\delta) \subseteq D. This introduces up to sDs \cdot |\mathcal{D}| implication constraints

xδ(¬xδ),x_\delta \lor (\neg x_{\delta'}),

for all δ,δD\delta,\delta' \in \mathcal{D} such that δchildren(δ)\delta' \in children(\delta).

For any set of events D\mathcal{D}, the conjunction of all constraints presented above, i.e. the event compatibility constraints (3) and the parent-child relation constraints (4), defines a boolean satisfiability problem (SAT) with variables xδx_\delta. We have shown:

Proposition 5.4Extraction as SAT problem

Consider a GTS with a constant upper bound ss on the number of rewrites that may overlap any previous rewrite.

The set of valid sequences of rewrites that can be extracted from a set of events D\mathcal{D} in the GTS is given by the set of satisfying assignments of a SAT problem Cook, 1971Stephen A. Cook. 1971. The complexity of theorem-proving procedures. In Proceedings of the third annual ACM symposium on Theory of computing - STOC ’71. ACM Press, 151--158. doi: 10.1145/800157.805047 Moskew., 2001Matthew W. Moskewicz, Conor F. Madigan, Ying Zhao, Lintao Zhang and Sharad Malik. 2001. Chaff: engineering an efficient SAT solver. In Proceedings of the 38th conference on Design automation - DAC ’01. ACM Press, 530--535. doi: 10.1145/378239.379017 with D|\mathcal{D}| variables of size O(D)O(|\mathcal{D}|).

Finding the optimal assignment #

We now have to find the optimal assignment among all satisfiable assignments for the SAT problem given above. In the most general case where the cost function ff to be minimised is given as a black box oracle on the graph GG', i.e. on the flattened history of the solution set DDD \subseteq \mathcal{D}, this optimisation problem is hard2.

However, if ff can be expressed as a function of xδx_\delta instead of the flattened history G=flat(D)G' = flat(D), then the ‘hardness’ can be encapsulated within an instance of a SMT problem (satisfiability modulo theories Nieuwe., 2006Robert Nieuwenhuis and Albert Oliveras. 2006. On SAT Modulo Theories and Optimization Problems. In Theory and Applications of Satisfiability Testing - SAT 2006. Springer Berlin Heidelberg, 156--169. doi: 10.1007/11814948_18 Barrett, 2018Clark Barrett and Cesare Tinelli. 2018. Satisfiability Modulo Theories), a well-studied generalisation of SAT problems for which highly optimised solvers exist Moura, 2008Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Tools and Algorithms for the Construction and Analysis of Systems. Springer Berlin Heidelberg, 337--340. doi: 10.1007/978-3-540-78800-3_24 Sebast., 2015Roberto Sebastiani and Patrick Trentin. 2015. OptiMathSAT: A Tool for Optimization Modulo Theories. In Computer Aided Verification. Springer International Publishing, 447--454. doi: 10.1007/978-3-319-21690-4_27. A class of cost functions for which the SMT encoding of the optimisation problem becomes particularly simple are local cost functions:

Definition 5.1Local cost function

A cost function ff on graphs is local if for all rewrites rr there is a cost Δfr\Delta f_r such that for all graphs GG that rr applies to

f(r(G))=f(G)+Δfr.f(r(G)) = f(G) + \Delta f_r.

The cost Δfr\Delta f_r of a rewrite rr also immediately defines a cost to the event that rr defines δ=r\delta = r. We can thus associate a cost Δfδ\Delta f_\delta with each event δD\delta \in \mathcal{D}, given by the cost of any of the rewrites that δ\delta defines.

An instance of such a local cost function often used in the context of the optimisation of computation graphs are functions of the type

f(G)=vV(G)w(v)f(G) = \sum_{v \in V(G)} w(v)

for some vertex weight function ww – i.e. cost functions that can be expressed as sums over the costs w()w(\cdot) associated to individual vertices in GG3. Indeed, it is easy to see that in this case we can write

f(r(G))=vr(V(G))w(v)=vV(G)w(v)vVw(v)+vVRw(v):=Δfr=f(G)+Δfr,\begin{aligned}f(r(G)) &= \sum_{v \in r(V(G))} w(v)\\&= \sum_{v\in V(G)} w(v) - \underbrace{\sum_{v \in V^-} w(v) + \sum_{v \in V_R} w(v)}_{:= \Delta f_r}\\&= f(G) + \Delta f_r,\end{aligned}

where VV^- and VRV_R are the vertex deletion set and replacement graph of rr respectively.

As discussed in section 2.2, many of the most widely used cost functions in quantum compilation are local, as the cost of a quantum computation is often estimated by the required number of instances of the most expensive gate type (such as \texttt{CX} gates on noisy devices, or \texttt{T} gates for hardware with built-in fault tolerance protocols).

In these cases, the cost function is integer valued and the extraction problem is indeed often sparse:

Definition 5.2Sparse cost function

The local cost function ff is said to be ε\varepsilon-sparse on D\mathcal{D} if {δDΔfδ=0}(1ε)D.\big|\{\delta \in \mathcal{D}\,|\,\Delta f_\delta = 0 \}\big| \geq (1 - \varepsilon) |\mathcal{D}|.

In case of ε\varepsilon-sparse local cost functions, the SAT problem on D\mathcal{D} can be simplified to only include D0={δDΔfδ0}\mathcal{D}_{\neq 0} = \{\delta \in \mathcal{D} \mid \Delta f_\delta \neq 0\}

by repeatedly applying the following constraint simplification rules on any δ0D\delta_0 \in \mathcal{D} such that Δfδ0=0\Delta f_{\delta_0} = 0:

  • for every parent δpparents(δ0)\delta_p \in parents(\delta_0) and child δcchildren(δ0)\delta_c \in children(\delta_0), remove the parent-child constraints between δp\delta_p and δ0\delta_0 and between δ0\delta_0 and δc\delta_c. Insert in their place a parent-child constraint between δp\delta_p and δc\delta_c.
  • for every non-compatible sibling event δsD,δsδ0\delta_s \in \mathcal{D}, \delta_s \neq \delta_0, remove the compatibility constraint between δ0\delta_0 and δs\delta_s. Insert in its place a compatibility constraint between δs\delta_s and δc\delta_c for all δcchildren(δs)\delta_c \in children(\delta_s).

This reduces the SAT or SMT problem to a problem with D0=εD|\mathcal D_{\neq 0}| = \varepsilon |\mathcal{D}| variables and at most O(min(D,ε2D2)O(min(|\mathcal{D}|, \varepsilon^2|\mathcal{D}|^2) constraints.

With the completion of this section, we have described an equivalent computation on D\mathcal{D} for every step of a GTS-based optimisation problem:

  1. a rewrite that can be applied on a graph GG can be added as an event to D\mathcal{D},
  2. a graph GG' that results from a sequence of rewrites can be recovered from D\mathcal{D} using FlattenHistory,
  3. the set of all graphs reachable from events in D\mathcal{D} can be expressed as a SAT problem; depending on the cost function, the optimisation over that space can then take the form of an SMT problem.

In essence, using the confluently persistent data structure D\mathcal{D} we replace a naive, exhaustive search over the space G\mathcal{G} of all graphs reachable in the GTS with a SAT (or SMT) problem – solvable using highly optimised dedicated solvers that could in principle handle search spaces with up to millions of possible rewrites Zulkos., 2018Edward Zulkoski. 2018. Understanding and Enhancing CDCL-based SAT Solvers. PhD Thesis. University of Waterloo.


  1. To realise this, notice that all vertices equivalent to vGv_G' are vertices that will be merged with vGv_G'. Hence, they will all be attached to the outgoing edge of vGv_G at its ii-th outgoing endvertex. ↩︎

  2. Hardness can be seen by considering the special case of the extraction problem in which all events are compatible and no two events have a parent-child relation: then there are no constraints on the solution space and the optimisation problem requires finding the minimum of an arbitrary oracle over 2D2^{|\mathcal{D}|} inputs. ↩︎

  3. A similar argument also applies to cost functions that sum over graph edges, as would be the case in minIR, where operations are modelled as hyperedges. ↩︎

5.5. Bounding the search space size

We show in this section that under some assumptions on the GTSs that hold in the use cases of interest to quantum compilation, there is a provable gap between the size of the search space of all reachable graphs in the GTS and the size of the corresponding confluently persistent data structure D\mathcal{D}.

Let us introduce first the notion of overwriting rewrites.

Definition 5.3Overwriting rewrite

For two rewrites r1r_1 and r2r_2, we say that r2r_2 overwrites r1r_1, written r1r2r_1 \twoheadrightarrow r_2, if the deletion set V2V^-_2 of δ2\delta_2 includes a vertex of the vertex set V1V_1 of the replacement graph of r1r_1

V2V1.V^-_2 \cap V_1 \neq \varnothing.

The definition can identically be applied to events. In this case, the overwriting events are precisely given by the parent-child relation: the set of all overwriting events of δD\delta \in \mathcal{D} is by definition the set of parents P(δ)P(\delta) of δ\delta.

Our argument relies on the comparison of asymptotic bounds for the sizes of two sets GΔ\mathcal{G}_\Delta and DΔ\mathcal{D}_\Delta, which we now define. Consider a GTS and an input graph GG. A graph GG' is reachable from GG within depth Δ0\Delta \geqslant 0 if there is a sequence of rewrites r1,,rr_1, \dots, r_\ell in the GTS from GG to GG' such that all subsequences rβ1,,rβkr_{\beta_1}, \dots, r_{\beta_k} formed of overwriting rewrites rβirβi+1r_{\beta_i} \twoheadrightarrow r_{\beta_{i + 1}} have length at most kΔk \leqslant \Delta.

The set GΔ\mathcal{G}_\Delta is the set of all graphs reachable within depth Δ\Delta. We derive:

  1. a lower bound on the size of GΔ|\mathcal{G}_\Delta|, the space of all graphs reachable in at most Δ\Delta rewrites from some input GG, and
  2. an upper bound on the size of the equivalent confluently persistent data structure DΔ\mathcal{D}_\Delta, i.e. such that GΔ={flat(D)DΓ(DΔ)}.\mathcal{G}_\Delta = \{ flat(D) \mid D \in \Gamma(\mathcal{D}_\Delta) \}.

In order to obtain bounds, we will introduce hypotheses that the GTSs must satisfy. Throughout this section, we will illustrate and motivate the restrictions that they impose in the following two use cases.

Use case 1: ζ\zeta-complete GTS #

The first GTS we consider can be defined on any graph domain that has a notion of graph size (e.g. based on number of nodes, number of edges, etc.1) and for any graph size ζ\zeta. The GTS is such that for any subgraph HGH \subseteq G of size H=ζ|H| = \zeta, there is at least one transformation rule in the GTS that matches HH. We will call this case CompleteGTS.

This is the use case of quantum superoptimisation discussed in section 3.1 and used for benchmarking in section 4.7. In those cases, the transformation rules are obtained by enumerating all small circuits up to a certain size ζ\zeta, thus guaranteeing that any subcircuit of size ζ\zeta will be matched by the GTS.

Note that there is also an (obvious) upper bound on the number of transformation rules that can match on any given subgraph: the total number of transformation rules in the GTS.

Use case 2: single-rule GTS in a uniform domain #

At the other extreme of the GTS spectrum, we can consider a GTS made of a single (arbitrary) transformation rule. In this case, we require that graphs are drawn from a domain uniformly at random, so that for any subgraph HGH \subseteq G, all patterns of size H|H| are equally likely. We will call this case SingleRuleGTS.

In this case, we will not show that our hypotheses hold for all inputs, but rather that they hold with a high probability. We will phrase our statements as a function of ϵ>0\epsilon > 0 and will require that they hold with probability 1ϵ1 - \epsilon for randomly drawn GG.

This regime is interesting as it is the simplest instance of problem domains for which few assumptions can be made about the GTS themselves, but all inputs are expected to be equally likely.

Lower bound on the naive search tree #

The event history of the set of graphs GΔ\mathcal{G}_\Delta defines a tree TΔT_\Delta, where GΔ\mathcal{G}_\Delta are the nodes and GpGΔG_p \in \mathcal{G}_\Delta is the parent of GcGΔG_c \in \mathcal{G}_\Delta if there is a rewrite rr rewriting GpG_p to GcG_c in the GTS. Paths in TΔT_\Delta are sequences of rewrites. We call TΔT_\Delta the naive search tree of the GTS. We wish to derive a lower bound for TΔ|T_\Delta|.

Graph partitioning #

For fixed search depth Δ\Delta, let n>0n > 0 be the largest integer such that for all graphs GTΔG' \in T_\Delta, there exists disjoint subgraphs Π1,,Πn\Pi_1, \ldots, \Pi_n of GG' that satisfies the following property, for all 1in:1 \leq i \leq n:

there exists a rewrite in the GTS that can be applied to Πi\Pi_i.

Hypothesis 1Linear scaling of nn

For a fixed GTS, a fixed depth Δ\Delta and a family of input graphs GG, we have the scaling n=Θ(G)n = \Theta(|G|).

We conjecture that this scaling holds for many GTSs of interest:

Proposition 5.5
Hypothesis 1 holds for CompleteGTS and for SingleRuleGTS probabilistically.

In CompleteGTS, it suffices to partition any input GG into n=Gζ=Θ(G)n = \lfloor \frac{|G|}\zeta \rfloor = \Theta(|G|) disjoint subgraphs of size at least ζ\zeta. Each subgraph will match a rule of the GTS by definition.

For SingleRuleGTS, let ϵ>0\epsilon > 0. Let L|L| be the size of the left hand side LL of the GTS rule. By assumption, for any subgraph HH of size L|L| of an input GG, there is a constant probability pp that the rule matches HH. For a subgraph HH of size kLk |L|, the probability of the rule not matching in HH is (1p)k(1 - p)^k. Picking

k>11+δlnϵ+lnLGln(1p)k > \frac1{1 + \delta}\frac{\ln \epsilon + \ln {\frac{|L|}{|G|}}}{\ln (1 - p)}

for some δ>0\delta > 0 ensures that whenever δ>lnkkln(1p)\delta > - \frac{\ln k}{k \ln (1 - p)} (i.e. for kk large enough),

klnkln(1p)>(1+δ)k>lnϵ+lnLGln(1p)k>lnϵnln(1p), \begin{aligned} &k - \frac {\ln k}{\ln (1 - p)} > (1 + \delta)k > \frac{\ln \epsilon + \ln {\frac{|L|}{|G|}}}{\ln (1 - p)}\\ \Leftrightarrow\quad &k > \frac{\ln{\frac \epsilon n}}{\ln (1 - p)}, \end{aligned}

where n=GkLn = \frac{|G|}{k |L|} was chosen. It follows that a partition of GG into n\lfloor n \rfloor disjoint subgraphs of size at least kLk |L| satisfies the hypothesis with probability 1ϵ1 - \epsilon.

Lower bound on TΔ|T_\Delta| #

Fix the tree depth Δ\Delta and the GTS. Any rewrite from the GTS removes at most a constant KK number of vertices from the graph it applies on. Thus, Any graph in TΔT_\Delta is at least of size (1KΔ)G=Θ(G)(1 - K\Delta)|G| = \Theta(|G|). Let nn' be the smallest value of nn for a graph GminGΔG_{min} \in \mathcal G_\Delta. Whenever hypothesis 1 applies, we have n=Θ(Gmin)=Θ(G)n' = \Theta(|G_{min}|) = \Theta(|G|).

For each GTΔG' \in T_\Delta and Πi\Pi_i, pick a rewrite ri(G)r_i(G') that applies to Πi\Pi_i and let RR be the set of all such rewrites

RΔ={ri(G)1in,GTΔ}.R_\Delta = \{ r_i(G') \mid 1 \leq i \leq n', G' \in T_\Delta \}.

We can consider the subtree TΔTΔT'_\Delta \subseteq T_\Delta that only contains graphs obtained by applying rewrites in RΔR_\Delta.

For Δ=1\Delta = 1, the search tree TΔT'_\Delta will contain 2n2^{n'} graphs: for each subgraph Πi\Pi_i of the input graph GG, we can choose to either apply ri(G)r_i(G) or not2.

By repeating Δ\Delta times the search tree of size 2n2^{n'} for Δ=1\Delta = 1, we obtain a lower bound

GΔ=TΔTΔ=(2n)Δ.|\mathcal{G}_\Delta| = |T_\Delta| \geqslant |T'_\Delta| = (2^{n'})^\Delta.

We frame this result as the following proposition.

Proposition 5.6Lower bound for GΔ|\mathcal{G}_\Delta|
The naive search tree size GΔ|\mathcal{G}_\Delta| is in Ω(2Δn)\Omega(2^{\Delta \cdot n'}).

As a result, for any GTS satisfying Hypothesis 1 the size of GΔ\mathcal G_\Delta grows at least exponentially with input graph size GG and search depth Δ\Delta.

Upper bound on the factorised search space #

Consider two rewrites r1r_1 and r2r_2. If neither overwrites the other, i.e. r1↠̸r2r_1 \not\twoheadrightarrow r_2 and r2↠̸r1r_2 \not\twoheadrightarrow r_1, then the order in which they are applied is irrelevant (see also Proposition 5.2). The persistent data structure D\mathcal{D} uses this symmetry explicitly when exploring the set of reachable graphs.

This drastically reduces the size of the event history of DΔ\mathcal{D}_\Delta. The event history defines a directed acyclic graph FΔF_\Delta that is the equivalent for DΔ\mathcal{D}_\Delta to the naive search tree TΔT_\Delta of GΔ\mathcal{G}_\Delta. The vertices of FΔF_\Delta are the flattened histories of events in DΔ\mathcal{D}_\Delta:

V(FΔ)={flat({δ})δDΔ},V(F_\Delta) = \{flat(\{\,\delta\,\}) \mid \delta \in \mathcal{D}_\Delta\},

with an edge from flat({δp})flat(\{\,\delta_p\,\}) to flat({δc})flat(\{\,\delta_c\,\}) if there is a parent-child relation δpP(δc)\delta_p \in P(\delta_c). We call FΔF_\Delta the factorised search space of DΔ\mathcal{D}_\Delta.

By construction, any graph GV(T)G' \in V(T) in the naive search tree maps injectively to a subgraph SFΔS \subseteq F_\Delta of the factorised search space, given by the subgraph of FF induced by the rewrites on the path from GG to GG' in TT.

Whereas our earlier discussion was focused on proving a lower bound for the size of the search tree, we now show an upper bound on the number of graphs FΔ|F_\Delta| in the factorised search space.

Graph covering #

Instead of considering partitions of graphs in TΔT_\Delta as we did above, we now consider coverings of graphs GG' in FΔF_\Delta, i.e. a set of subgraphs Γ1,,Γn\Gamma_1, \ldots, \Gamma_n such that their union is GG' but that might not be disjoint.

Let mm, ss and γ\gamma be parameters and fix a covering Γ1,,Γm\Gamma_1, \ldots, \Gamma_m for each graph GV(FΔ)G' \in V(F_\Delta) such that:

  • for all GG' and all 1im1 \leqslant i \leqslant m, there are at most ss applicable rewrites to the ii-th covering set Γi\Gamma_i of GG'. Furthermore, all rewrites within Γi\Gamma_i are mutually exclusive, i.e. they modify a shared subgraph so that it is never possible to apply more than one rewrite among the (up to ss) applicable ones3;
  • for all GG' and for all rewrites rr that apply to GG', there is 1im1 \leqslant i \leqslant m such that rr applies to Γi\Gamma_i (i.e. the matching subgraph of rr is fully contained within one of the coverings). Furthermore, the matching subgraph of rr overlaps with at most γ\gamma distinct Γi\Gamma_i;
  • for all rewrites rr that apply to the covering set Γi\Gamma_i of a graph GG', the image r(Γi)Γir(\Gamma_i) \subseteq \Gamma_i' must be a subgraph of the ii-th covering subgraph Γi\Gamma_i' of r(G)r(G').

The first condition is satisfied whenever the size of the coverings can be bounded: in that case ss can be chosen based on the number of distinct subgraphs that can be contained in a covering set and the number of rules that can apply to each. The second condition is related to the connectivity between the covering sets: γ\gamma can thus often be derived by considering how many neighbours a covering set has, and how many of those neighbours can a match of a GTS rule span.

The third condition above can be understood as “rewrites must preserve the coverings”. In other words, the coverings are chosen such that a graph mutation produced by the application of a rewrite rE(FΔ)r \in E(F_\Delta) on GG is always contained within a single covering subgraph of r(G)r(G').

Hypothesis 2Linear scaling of mm and constant s,γs, \gamma

For a fixed GTS, a fixed depth Δ\Delta and a family of input graphs GG, we have the scaling m=Θ(G)m = \Theta(|G|) and s=Θ(1)s = \Theta(1) and γ=Θ(1)\gamma = \Theta(1).

These conditions along with Hypothesis 2 are somewhat restrictive and future work should explore how to relax them. For our use cases CompleteGTS and SingleRuleGTS, we restrict our considerations to the special case where the graph domain is quantum circuits. We make the further simplifying assumptions (similar results can be obtained with variations on these assumptions)

  • transformation rules have two-qubit circuits as left and right hand sides, and
  • the number of qubits on the inputs is fixed (i.e. the number of gates on each qubit scales with circuit size).

Define \ell to be the largest number of gates on any one qubit in the left hand sides of the GTS transformation rules. Consider a partition of the gates on each qubit into sequences of \ell gates. We can obtain a covering of a quantum circuit GG by considering covering sets Γ1,,Γm\Gamma_1, \ldots, \Gamma_m defined for all 1im1 \leq i \leq m such that for all matches HH of a left hand side of a transformation rule of the GTS, HΓiH \subseteq \Gamma_i if and only if there is a gate vv in HH such that vv is in the ii-th sequence of \ell gate on some qubit in GG. Imposing the condition that rewrites must preserve coverings, the covering of the input fixes the covering of all reachable graphs GG' in the GTS.

Proposition 5.7
Restrict the graph domain to quantum circuits, the GTS to two-qubit rules and the inputs to a fixed number of qubits. Then hypothesis 2 holds for CompleteGTS and for SingleRuleGTS.

Let GG be the input circuit with qq qubits. Consider the covering Γ1,,Γm\Gamma_1, \dots, \Gamma_m of GG as constructed above. By construction m=N/=Θ(G)m = \lceil N / \ell \rceil = \Theta(|G|), where NN is the maximum number of gates on a qubit of GG.

The covering set Γi\Gamma_i contains the set ViV_i of gates composed of the ii-th sequence of \ell gates for each qubit in GG. Furthermore, for all vViv \in V_i, if HH is a match of a two-qubit rule that contains vv, then HH may contain at most 212\ell - 1 other gates. Hence by construction, Γi2V=2q2=Θ(1)|\Gamma_i| \leqslant 2\ell|V| = 2q\ell^2 = \Theta(1). This is a constant and thus there is a constant s=Θ(1)s = \Theta(1) such that for all 1im1 \leqslant i \leqslant m there are most ss matches of a two-qubit rule that intersect Γi\Gamma_i.

Finally, any match HH spans two qubits; the gates on each qubit (at most \ell) may belong to at most two distinct sequences of \ell gates of that qubit. Thus, any match HH spans at most γ=4\gamma = 4 distinct covering sets. These arguments made no assumption on properties of the rule set and thus apply equally to CompleteGTS and SingleRuleGTS.

Upper bound on FΔ|F_\Delta| #

The preservation of coverings under rewrites allows us to consider a covering of V(FΔ)V(F_\Delta): for each 1im1 \leqslant i \leqslant m, let FΔ(i)V(FΔ)F_\Delta^{(i)} \subseteq V(F_\Delta) be the set of graphs in V(FΔ)V(F_\Delta) that are the result of a rewrite of its ii-th covering subgraph. Every graph in V(FΔ)V(F_\Delta) is the result of a rewrite on some covering subgraph ii, or is the input graph GG. So, from a bound UΔFΔ(i)U_\Delta \geqslant |F_\Delta^{(i)}| for all 1im1 \leqslant i \leqslant m, we can obtain a bound

V(FΔ)1+mUΔ.|V(F_\Delta)| \leqslant 1 + m \cdot U_\Delta.

The bound UΔU_\Delta can be obtained recursively: U1=sU_1 = s upper bounds by definition the number of rewrites in any covering subgraph of the root graph GG, and thus the number of graphs in F1(i)F_1^{(i)}. We then proceed by induction for 1<δΔ1 < \delta \leqslant \Delta.

A rewrite rE(FΔ)r \in E(F_\Delta) overlaps with at most γ\gamma other covering subgraphs. It can overwrite at most one previous rewrite for each subgraph, and thus will have at most γ\gamma parent graphs in sets FΔ1(i1),,FΔ1(iγ)F_{\Delta-1}^{(i_1)}, \cdots, F_{\Delta-1}^{(i_\gamma)}. Each of the γ\gamma sets is of size at most Uδ1U_{\delta - 1}. Furthermore, there are at most ss rewrites in any covering subgraph. We thus obtain the recursion:

UδsUδ1γ,U_\delta \leqslant s \cdot U_{\delta - 1}^\gamma,

Unrolling the recursion, we can write this as

Uδs1+γ+γ2++γδ1=sγδ1γ1=sΘ(γδ1).U_\delta \leqslant s^{1 + \gamma + \gamma^2 + \dots + \gamma^{\delta - 1}} = s^{\frac{\gamma^\delta - 1}{\gamma - 1}} = s^{\Theta(\gamma^{\delta - 1})}.

Recalling that by construction FΔ=DΔ|F_\Delta| = |\mathcal{D}_\Delta|, we obtain:

Proposition 5.8Upper bound for DΔ|\mathcal{D}_\Delta|

The factorised search space size DΔ|\mathcal{D}_\Delta| is in msΘ(γΔ1)m \cdot s^{\Theta(\gamma^{\Delta - 1})}.

Discussion and empirical exploratory analysis #

We have derived bounds on the size of the search spaces and shown that under some assumptions on the properties of the GTS, the factorised search space grows linearly in the size of the input graph GG. This stands in stark contrast to the lower bound of the naive search tree, which scales exponentially with the size of the input graph.

However, when considering the overall optimisation problem of finding the optimal solution over the set of reachable graphs in a GTS, the exponential overhead does not disappear: it is rather shifted to the extraction phase that relies on a SAT solver. It is therefore an open question whether the factorised search space can be used to improve optimisation problems on GTSs.

To this end, we devise a simple numerical experiment that assesses the potential of using the unfolding construction as presented in this chapter in the context of quantum computation optimisation.

The toy problem.  We consider a very simple circuit optimisation problem that is desiged to require a deep search space (i.e. a large number of rewrites) to be solved. This will exacerbate the scaling difference between an optimiser that must traverse the naive search space and another that relies on the factorised representation instead.

The inputs are quantum circuits composed of two-qubit CX\textit{CX} and single-qubit Rz\textit{Rz} rotation gates. The angles of the rotations are not relevant and set randomly. They are of the following form:

i.e. each pair of subsequent qubits have 2 CX\textit{CX} gates at either end and 10 Rz\textit{Rz} rotation in-between, on the control qubits of the CX\textit{CX} gates. These circuits admit a very simple optimisation that can be expressed by the following two transformation rules:

Given the objective of minimising the number of CX\textit{CX} gates, the optimiser must commute the leftmost CX\textit{CX} gates through all of the rotation gates, until the two CX\textit{CX} on each qubit are adjacent and cancel out. We study the performance of the optimisers as we increase the number 2q2q of qubits in the circuit.

Optimisers.  We define two optimisers. Badger is a backtracking search through the naive search space of reachable graphs in the GTS: starting from the input, the search space is expanded by computing all possible rewrites at a given state. States with the lowest cost function are processed first. This is similar to an A* search Hart, 1968Peter Hart, Nils Nilsson and Bertram Raphael. 1968. A Formal Basis for the Heuristic Determination of Minimum Cost Paths. IEEE Transactions on Systems Science and Cybernetics 4, 2 (100--107). doi: 10.1109/tssc.1968.300136.

Seadog on the other hand performs the backtracking search on the factorised search space instead: when expanding a state of the search space, only rewrites that overlap with the last rewrite are considered and added to the search space, as discussed in section 5.4. In a second phase, the search space is encoded as a SAT problem that is solved using Z3 Moura, 2008Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Tools and Algorithms for the Construction and Analysis of Systems. Springer Berlin Heidelberg, 337--340. doi: 10.1007/978-3-540-78800-3_24.

The Badger optimiser is released and publicly available as part of the open-source TKET package4. The Seadog optimiser on the other hand is still in early development; more benchmarks and a release will follow.

Results.  We ran the experiment on an Apple M3 Max CPU (4.05GHz) for inputs between 22 (1212 gates) and 7878 qubits (468468 gates). Both optimisers ran on a single core. For each instance, we set a timeout of 22 seconds and report the relative CX\textit{CX} gate reduction, i.e. CXinitCXfinalCXinit.\frac{\textit{CX}_\text{init} - \textit{CX}_\text{final}}{\textit{CX}_\text{init}}.

The results are shown in the figures on the right.

Discussion.  On the left, we observe that both optimisers are able to find the optimum for circuits with up to 30 CX gates. Beyond that point, the time limit starts impacting Badger performance, which drops continuously and reaches 0% for inputs of 50 CX gates and above. Seadog on the other hand does not time out and is able to explore the entire (factorised) search space exhaustively up until 70 CX gates.

Observe that the Badger optimiser reaches the time limit for as few as 10 CX. Indeed, the complete naive search space size can be calculated to have 12q12^q states (each pair of qubits can be in one of 12 states). For 2q=62q = 6 we get 17281728 states, but this already reaches over 2000020'000 states for 2q=82q = 8.

CX gate count reduction (left) and runtime (right) for the Badger and Seadog optimisers. 100% gate count reduction is optimal. A timeout was set to 2 seconds.

CX gate count reduction (left) and runtime (right) for the Badger and Seadog optimisers. 100% gate count reduction is optimal. A timeout was set to 2 seconds.

Size of factorised search space for Seadog.

Size of factorised search space for Seadog.

On the other hand, the factorised search space will only contain 1212 states for each qubit pair. This results in a linear scaling of the search space size, as can clearly be seen in the second figure.

Where the runtime exceeds the 2 second timeout, this is due to pre- and post-optimisation steps such as memory allocation/deallocation, I/O, file parsing etc that are included in the measurements. The quadratic runtime scaling that we observe in Seadog is due to a hash function that is run on every state of the search space to detect and discard duplicates: as the number of states in the search space grows linearly with input size and each state requires a hash in linear time, the overall runtime grows quadratically. Future work may be able to address this issue by designing updateable hash functions that do not require the full graph to be rehashed when applying a local rewrite.

Future work should also investigate how to scale Seadog to larger input sizes on a broader class of problems. We have observed that the SAT-based extraction phase of Seadog corresponds to less than 1% of the runtime budget (under 15ms for all input sizes). Whilst being asymptotically exponential in the worst case, it is thus not currently a bottleneck. On the other hand, the number of states visited per second in the exploration phase is currently up to 10×10\times slower for Seadog compared to Badger. Further investigations into the causes of this are still required, but we expect that large performance improvements can be realised on the current implementation and as a result could scale to larger inputs.


  1. The only constraint on the notion of graph size |\cdot| is that it must be compatible with the subgraph relation: if GGG \subseteq G', then GG.|G| \leqslant |G'|. ↩︎

  2. Note that this counting is already an act of clemency: we are not counting all permutations of the rewrites, which would be considered separately by a naive exploration that applies one graph rewrite at a time. In this case, the search tree for Δ=1\Delta = 1 would contain n!2nn! \cdot 2^n graphs. ↩︎

  3. This can always be made to hold by replacing any set of mutually disjoint rewrites with their cartesian product, in effect viewing the application of multiple disjoint rewrites as one large rewrite. Thi comes at the cost of a larger value for s.s. ↩︎

  4. As a Python package on PyPI and a rust crate on crates.io). ↩︎


Chapter 6

Future Work and Conclusions

The time has now come to conclude this thesis. In summary, our claim is that given

  1. the modularity and expressiveness that quantum compilers will require to simultaneously express higher level abstractions, hardware primitives and interleaved quantum classical computation (cf. sections 3.3, 2.3, and 2.4),
  2. the challenge of scaling up quantum programs sizes to make the most of the computational capabilities of upcoming hardware (cf. sections 1.1 and 2.2),
  3. the linearity restrictions that quantum data imposes on the compiler’s intermediate representation (IR) of the computation (cf. sections 2.1, 3.3, and 3.4),

graph transformation systems (GTS) are uniquely positioned to serve as the backbone of a quantum compilation framework.

To this aim, chapter 3 presented minIR, a graph-based compiler IR with explicit support for linear types. To go along with it, we proposed the first formalisation of graph transformation semantics that preserve linearity.

Chapters 4 and 5 built on this foundation and solved two critical scaling problems for the adoption of GTS techniques in quantum compilers.

Pattern matching. Successful implementations of GTSs for quantum circuit optimisation rely on thousands to hundreds of thousands of transformation rules Xu, 2022Mingkuan Xu, Zikun Li, Oded Padon, Sina Lin, Jessica Pointing, Auguste Hirth, Henry Ma, Jens Palsberg, Alex Aiken, Umut A. Acar and Zhihao Jia. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 Xu, 2023Amanda Xu, Abtin Molavi, Lauren Pick, Swamit Tannu and Aws Albarghouthi. 2023. Synthesizing Quantum-Circuit Optimizers. Proceedings of the ACM on Programming Languages 7, PLDI (June 2023, 835--859). doi: 10.1145/3591254, for which techniques matching one pattern at a time become a significant bottleneck. Chapter 4 presented an approach based on state automata with an asymptotic runtime that is independent on the number of patterns. This resulted in a 20x speedup for a real-world pattern matching task of direct utility to quantum compilers.

Efficient rewrite space exploration. Applications of GTS to quantum compilation distinguish themselves – unfortunately – by a lack of successful rewriting strategies or other rule control mechanisms. Consequently, the optimisation of quantum computations is framed as a search problem over the space of all reachable graphs in the GTS. Chapter 5 introduced a novel confluently persistent data structure that uses the structure of the rewrite search space to speed up its exploration. In typical applications, the factorised search space thus obtained is conjectured to grow linearly with the size of the input – an exponential improvement over the naive search strategy, without which GTS-based compiler optimisations on real-world computations with thousands of gates will be infeasible.

In both cases, the guarantees that linear values provide and that minIR enforces translate into asymptotic runtime guarantees that cannot be derived otherwise. In the absence of linearity, the pattern matching of chapter 4 becomes an NP-hard problem; meanwhile, the graph rewriting space considered in chapter 5 would grow super-exponentially and require pruning heuristics for the extraction problem, as studied in Yang, 2021Yichen Yang, Mangpo Phitchaya Phothilimtha, Yisu Remy Wang, Max Willsey, Sudip Roy and Jacques Pienaar. 2021. Equality Saturation for Tensor Graph Superoptimization. CoRR abs/2101.01332. doi: 10.48550/ARXIV.2101.01332 and Bărbu., 2024George-Octavian Bărbulescu, Taiyi Wang, Zak Singh and Eiko Yoneki. 2024. Learned Graph Rewriting with Equality Saturation: A New Paradigm in Relational Query Rewrite and Beyond. arXiv: 2407.12794 [cs.DB].

Combined, these contributions lay the groundwork for a quantum compiler platform that is modular in the hardware primitives, high-level programming abstractions and transformation rules that it can model, and scalable in the size of the computation and number of rules that it can match and optimise over. Work on such a platform is well underway within the TKET2 open-source compiler, available on GitHub.

Further work could take many directions. The graph transformation semantics of chapter 3 that are presented operationally could for example be categorified and generalised. This would open many promising bridges and parallels to work in related domains, such as string diagrams, DPO-based GTSs and even the family of ZX calculi.

There are also immediate opportunities in extending the work of chapters 4 and 5, in particular around weakening the assumptions that had to be made on the structure of the graph, respectively on the properties of the GTS and graph domain. In both cases, a more in-depth study of how the runtime of actual implementations depend on properties of the inputs would be very informative. We suspect from anecdotal observations that many assumptions we have imposed can be relaxed with little impact on performance – conversely, there may be large variations in runtimes for different regimes within the asymptotic guarantees of our results.

Another crucially important question that this thesis has not addressed is the choice of transformation rules. Beyond the results of Xu, 2022Mingkuan Xu, Zikun Li, Oded Padon, Sina Lin, Jessica Pointing, Auguste Hirth, Henry Ma, Jens Palsberg, Alex Aiken, Umut A. Acar and Zhihao Jia. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 and Xu, 2023Amanda Xu, Abtin Molavi, Lauren Pick, Swamit Tannu and Aws Albarghouthi. 2023. Synthesizing Quantum-Circuit Optimizers. Proceedings of the ACM on Programming Languages 7, PLDI (June 2023, 835--859). doi: 10.1145/3591254 that we have referred to repeatedly throughout this corpus, very recent work by Amy and Lunderville Amy, 2025Matthew Amy and Joseph Lunderville. 2025. Linear and Non-linear Relational Analyses for Quantum Program Optimization. Proceedings of the ACM on Programming Languages 9, POPL (January 2025, 1072--1103). doi: 10.1145/3704873 has present what amounts to the first inroads into hybrid classical-quantum optimisations. Developing comprehensive transformation rules for hybrid computations would present significant a significant advance for the field.

Among the myriad of options, we opt to conclude this thesis with the discussion of two particularly promising avenues for future work. The first (section 6.1) relates to increasing the expressivity of the pattern matching language; such an extended framework would also enable fast pattern matching directly on the persistent data structure D\mathcal{D} of chapter 5, rather than having to match patterns in each graph of DDD \subseteq \mathcal{D} separately.

The second (section 6.2) is a proposal to use the persistent data structure of chapter 5 for large scale distributed graph rewriting. With this, the optimisation of quantum computations could be distributed across multiple machines, potentially scaling up to high-performance computing (HPC) clusters and opening the door to optimisation capabilities that could significantly advance the state of the art of quantum circuit optimisation.

6.1. More expressive pattern matching

Pattern matching as defined in chapter 4 is the problem of finding pattern embeddings PGP \hookrightarrow G for patterns from a fixed set of patterns PPP \in \mathcal{P}. We are interested in lifting two limitations of this definition.

Firstly, it would be desirable to be able to define patterns that are not a concrete graph instance, but instead a (potentially infinite) family of graphs. Examples of such pattern families that could be useful in quantum computing are

  • “a sequence of gates that commute with each other”, or
  • “a subgraph that only contains Clifford gates”, or
  • “all operations within the body of a loop”.

To express these patterns as concrete graph instances would require an infinite number of graphs. The study of pattern languages that allow the expression of such higher-level graphs is a mature field of graph transformations, with tools such as GrGen.NET Geiß, 2006Rubino Geiß, Gernot Veit Batz, Daniel Grund, Sebastian Hack and Adam Szalkowski. 2006. GrGen: A Fast SPO-Based Graph Rewriting Tool. In Graph Transformations. ICGT 2006.. Springer Berlin Heidelberg, 383--397. doi: 10.1007/11841883_27 offering many advanced capabilities. It would be of great interest to establish what classes of pattern languages could be supported by generalisations of the state automaton approach presented in chapter 4.

Secondly, our approach currently only supports linear values, and thus in its current form is unsuitable for hybrid quantum-classical computations. Coincidentally, supporting non-linear values is very similar to finding embeddings PDP \hookrightarrow \mathcal{D} of patterns into the confluently persistent data structure D\mathcal{D} of chapter 5. The case of a non-linear value that is used multiple times in a computation is syntactically very similar to having to consider a value in D\mathcal{D} that may be connected to operations in different ways, depending on the variant of the “multiverse” of equivalent graphs that are stored simultaneously in D\mathcal{D}.

Pattern matching generalisation #

The following generalisation of pattern matching might be able to achieve these two goals whilst still being compatible with the state automaton approach that we presented. We suggest defining patterns and how they match using three concepts:

Constraints. A pattern is given by a set of constraints C1,,CnC_1, \ldots, C_n. They encode the conditions under which a pattern matches. They would for instance assert that two vertices are connected by an edge, or that a vertex is of a certain type. A pattern that is a concrete graph PP would then at a minimum have a constraint for each edge in PP.

Constraints correspond to edges (transitions) in the state automaton. Pattern matching proceeds by evaluating all outgoing constraints from the current state, and proceeds to the states for which the respective constraint is satisfied.

Indexing schemes. An indexing scheme assigns each object (e.g. vertex) in the patterns a unique key in K\mathcal{K}, and each object (e.g. vertex) in the input domain GG a unique value V\mathcal{V}. Embeddings of patterns into GG are then given by key-value maps KV\mathcal{K} \to \mathcal{V}, mapping keyed objects from patterns to objects in the input domain. Each constraint CC has a set of keys associated with it; CC can then be evaluated by passing it all the values in V\mathcal{V} bound to its keys.

Indexing schemes are designed to give overlapping patterns the same key on their overlap, so that the overlap must only be matched once. This models how in chapter 4, patterns are clustered into patterns that share the same contracted tree and are differentiated by their contracted string tuples only.

Key-value map expansion. Indexing schemes abstract away the pattern and input data in such a way that the pattern matcher only needs to keep track of key-value maps KV\mathcal{K} \to \mathcal{V}. These maps can be created recursively using an expansion function

expand(φ,D)={φ1,,φn}.\textrm{expand}(\varphi, D) = \{ \varphi_1', \ldots, \varphi_n' \}.

This provides all the ways in which the domain of definition dom(φ)dom(\varphi) of an index map φ\varphi can be extended. The returned set of new index maps should coincide with φ\varphi on dom(φ)dom(\varphi) but expand their domain of definition to include new keys K\mathcal{K}. By making it possible to extend φ\varphi in more than one way, we can model the existence of non-linear values (i.e. the index map could be extended to any of the operations that uses a certain value vv), as well as the fact that a persistent data structure such as D\mathcal{D} may be keeping track of multiple versions of the graph, and thus expand a key in multiple ways.

Execution of the pattern matcher #

Starting from an empty key-value map φ\varphi_\varnothing at the root state of the state automaton, the pattern matcher keeps track of a set of key-value maps, along with for each map the state it is in. It then proceeds by repeatedly performing the following two actions:

  1. Expand the domain of definition of a key-value map φ\varphi by calling expand\textrm{expand};

  2. Evaluate the constraints for a key-value map φ\varphi; if the constraint is satisfied, move φ\varphi to the next state, otherwise try another constraint. If no constraint is satisfied, delete φ\varphi.

The performance of the pattern matcher will be highly dependent on choosing a smart ordering of these two actions, as well as prioritising the right key-value maps to be expanded and evaluated.

With this proposal, it would appear possible to combine the fast state automaton-based approach of chapter 4 and its scaling to a very large number of patterns, with a more expressive pattern language and support for non-linear types as well as persistent graph rewriting. An implementation of this is currently being worked on in the open-source portmatching project, available on GitHub.

6.2. Massively parallel graph rewriting

Persistent data structures – and particularly fully and confluently persistent ones – are well-suited for distributed applications. In persistent data structures, data can always be added but never deleted, and is thus immutable. This removes the need for locks and synchronisation primitives across processes. Furthermore, using confluence, edits can be made concurrently in different processes and then eventually merged asynchronously, as follows:

The contributions presented in chapter 5 thus translate directly into a proposal for a massively parallel graph rewriting system. In summary, we have shown that graph rewrites can be tracked in a persistent data structure D\mathcal{D} in the form of edits δ\delta. New edits added to D\mathcal{D} can refer to previous edits, and thus create an acyclic edit history. Sets of edits D\mathcal{D} and D\mathcal{D}' can also be merged (confluence) and as a result, new edits that build on top of edits from both D\mathcal{D} and D\mathcal{D}' can be defined.

We describe in slightly more detail what a massively parallel graph rewriting architecture might look like.

Inter-process communication #

During the rewriting process, the set of processes that are involved must regularly broadcast the edits they have added to (their copy of) the data D\mathcal{D}. Such broadcasted edits must then be merged by the other processes into their respective local copies. This is required so that progress that is made by one process can be shared and expanded on top of by other processes.

Technologies such as message-passing interface (MPI) Dongar., 1993Hempel, R., Hey, A., Dongarra. 1993. MPI: a message passing interface. In Proceedings of the 1993 ACM/IEEE conference on Supercomputing. ACM Press, 878--883. doi: 10.1145/169627.169855 would be well-suited to such inter-process communications. To reduce the number of messages that senders and receivers must process, edits should not be broadcasted one-by-one, but rather grouped together. For this, we propose the notion of a salient edit, reflecting that an edit is deemed of importance.

Non-salient edits are not broadcasted as they are added to D\mathcal{D}. When, on the other hand, an edit δ\delta is deemed salient, it is broadcasted along with all its ancestors A(δ)A(\delta) (i.e. all edits that δ\delta depends on). As the edit history deepens, it might become inefficient to broadcast all the ancestry of an edit, in which case more advanced communication protocols would have to be devised.

Finally, a procedure must be put in place to identify identical edits that may be added and/or broadcasted by different processes to avoid deduplication. Hashing techniques and hash tables are well-suited for this kind of problem.

Process types #

At a minimum, the distributed graph rewriting system should distinguish between two types of processes.

The vast majority of processes would be rewrite factories. Their purpose is to create new edits, add them to D\mathcal{D} and broadcast them whenever they are deemed salient. These processes will be responsible for driving forward the search space exploration and, in the end, the optimisation. A good candidate for a rewrite factory is the pattern matching automaton of chapter 4 and its generalisation just described in section 6.1. Different processes may specialise in different transformation rule sets; others still could implement dedicated optimisations such as ZX-based optimisations or optimal Clifford synthesis (see discussion in section 2.2).

The other type of process would be a result extractor; a read-only process that runs the SAT-based optimisation and graph extraction algorithm of section 5.4. Such a process would run the computation at regular intervals to track the optimisation progress.

As the distributed architecture grows in complexity, more tasks and more process types may be required. It might for instance be desirable to have a process that identifies under-explored parts of the search space to direct rewrite factories in that direction.

Using such an architecture, it might be possible for the first time to scale quantum compilation workloads to large clusters of machines. This could significantly advance compilation performance of quantum programs, a particularly valuable contribution at a time where quantum computers are on the edge of utility. Nevertheless, such distributed systems often prove difficult to design and run successfully. Open questions include how to coordinate the search across processes in such a way that the most promising parts of the search space are explored whilst avoid work duplication; will communication become the bottleneck in the computation; what are the most effective transformation rules and cost functions to use; and what are the limits of modern SAT solvers on our problem of interest.


Appendix

A. Prefix trees

Our main result is achieved by reducing a tree inclusion problem to the following problem.

String prefix matching.  Consider the following computational problem over strings. Let Σ\Sigma be a finite alphabet and consider W=(Σ)w\mathcal{W} = (\Sigma^*)^w the set of ww-tuples of strings over Σ\Sigma. For a string tuple (s1,,sw)W(s_1, \dots, s_w) \in \mathcal{W} and a set of string tuples DW\mathcal{D} \subseteq \mathcal{W}, the ww-dimensional string prefix matching consists in finding the set

{(p1,,pw)D  for all 1iw:pi is a prefix of si}.\{ (p_1, \dots, p_w) \in \mathcal{D} \ | \ \text{for all }1 \leq i \leq w: p_i\text{ is a prefix of }s_i \}.

This string problem can be solved using a ww-dimensional prefix tree. We give a short introduction to prefix trees for the string case but refer to standard literature for more details Knuth, 1999Donald Knuth. 1999. The Art of Computer Programming: Sorting and Searching, Volume 3. Addison-Wesley, Reading MA.

One-dimensional prefix tree.  Let P1,,PAP_1, \dots, P_\ell \in \mathcal{A}^\ast be strings on some alphabet A\mathcal{A}. Given an input string sAs\in\mathcal{A}^\ast, we wish to find the set of patterns {P1iPis}\{ P_{1 \leq i \leq \ell} | P_i \subseteq s\}, i.e. PiP_i is a prefix of ss.

The prefix tree of P1,,PP_1, \dots, P_\ell is a tree with a tree node for each prefix of a pattern. The children of an internal node are the strings that extend the prefix by one character. The root of the tree is the empty string. Each tree node also stores a list of matching patterns, with each pattern stored in the unique corresponding node. Every prefix tree has an empty string node, which is the root of the tree. For every inserted pattern of length at most LL nodes are inserted, one for every non-empty prefix of the pattern. Thus a one-dimensional prefix tree has at most L+1\ell \cdot L + 1 nodes and can be constructed in time O(L)O(\ell \cdot L).

Given an input sAs \in \mathcal{A}^\ast, we can find the set of matching patterns by traversing the prefix tree of P1,,PP_1, \dots, P_\ell starting from the root. We report the list of matching patterns at the current node and move to the child node that is still a prefix of ss, if it exists. This procedure continues until no more such child exists. In total the traversal takes time O(s)O(|s|), as every character of ss is visited at most once.

Note that in theory the number of reported pattern matches can dominate the runtime of the algorithm. We can avoid this by returning the list of matches as an iterator, stored as a list of pointers to the tree nodes matching lists.

Multi-dimensional prefix tree.  A ww-dimensional prefix tree for w>1w > 1 is defined recursively as a one-dimensional prefix tree that at each node stores a w1w-1-dimensional prefix tree. Given an input ww-tuple (s1,,sw)(A)w(s_1, \dots, s_w) \in (\mathcal{A}^\ast)^w, the traversal of the ww-dimensional prefix tree is done by traversing the one-dimensional prefix tree on the input s1s_1 until no child is a prefix of the input, and then recursively traversing the w1w-1-dimensional prefix tree on (s2,,sw)(s_2, \dots, s_w). Similarly to the one-dimensional case, the list of matching patterns is stored at prefix tree nodes and reported during traversal. The traversal thus takes time O(s1++sw)O(|s_1| + \cdots + |s_w|), as every character of ss is visited at most once.

For \ell tuples of size ww of words of maximum length LL, we can bound the number of nodes of the ww-dimensional prefix tree by 1+(L)w1 + (\ell \cdot L)^w. The runtime and space complexity of the construction of the ww-dimensional prefix tree is thus in O((L)w)O((\ell \cdot L)^w), summarised in the result:

Proposition 99.1Multi-dimensional string prefix matching
Let DW\mathcal{D} \subseteq \mathcal{W} be a set of string tuples and LL the maximum length of a string in a tuple of D\mathcal{D}. There is a prefix tree with at most (L)w+1(\ell \cdot L)^w + 1 nodes that encodes D\mathcal{D} that can be used to solve the ww-dimensional string prefix matching problem in time O(s1++sw)O(|s_1| + \cdots + |s_w|).

B. Lower bound on the number of patterns

Proposition 99.2

Let Nw,dN_{w,d} be the number of port graphs of width ww, depth dd and maximum degree Δ4\Delta \geq 4. We can lower bound

Nw,d>(w2e)Θ(wd),N_{w,d} > \left(\frac{w}{2e}\right)^{\Theta(wd)},

assuming wo(2d)w \leq o(2^d).

In the regime of interest, ww is small, so the assumption wo(2d)w \leq o(2^d) is not a restriction.

Let w,d>0w, d > 0 and Δ4\Delta \geq 4 be integers. We wish to lower bound the number of port graphs of depth dd, width ww and maximum degree Δ\Delta. It is sufficient to consider a restricted subset of such port graphs, whose size can be easily lower bounded. We will count a subset of CX quantum circuits, i.e. circuits with only CXCX gates, a two-qubit non-symmetric gate. Because we are using a single gate type, this is equivalent to counting a subset of port graphs with vertices of degree 4. Assume w.l.o.g that ww is a power of two. We consider CX circuits constructed from two circuits with ww qubits composed in sequence:

  • Fixed tree circuit: A log2(w)\log_2(w)-depth circuit that connects qubits pairwise in such a way that the resulting port graph is connected. We fix such a tree-like circuit and use the same circuit for all CX circuits. We can use this common structure to fix an ordering of the ww qubits, that refer to as qubits 1,,w1,\dots,w.
  • Bipartite circuit: A CX circuit of depth D=dlog2(w)D = d - \log_2(w) with exactly CX gates, each gate acting on a qubit and a qubit .

The following circuit illustrates the construction:

All that remains is to count the number of such bipartite circuits. Every slice of depth 1 must have w/2w / 2 CX gates acting on distinct qubits. Every qubit 11 to w/2w/2 must interact with one of the qubits w/2+1w/2+1 to ww, so there are (w/2)!(w/2)! such depth 1 slices. Repeating this depth 1 construction DD times and using Sterling’s approximation, we obtain a lower bound for the number of port graphs of depth dd, width ww and maximum degree at least 4:

((w2)!)D>wπ(w2e)wD/2=(w2e)Θ(wd)\left(\left(\frac{w}2\right)!\right)^D > \sqrt{w\pi}\left(\frac{w}{2e}\right)^{wD/2} = \left(\frac{w}{2e}\right)^{\Theta(w\cdot d)}

where we used w=o(2d)w = o(2^d) to obtain Θ(D)=Θ(d)\Theta(D) = \Theta(d) in the last step.