Selected techniques in modern classical compilation

3.5. Selected techniques in modern classical compilation

Optimisations in compilers are typically built as passes. Compiler authors then design sequences of these passes to create the default optimisation pipelines that most end-users rely on. This is not easy. For a large project like LLVM, the end result looks something like this – a thousand lines of meticulously commented code, carefully crafted and hand tuned to handle every optimisation edge case and perform well on thousands of benchmarks. This is the phase-ordering problem Click, 1995Cliff Click and Keith D. Cooper. 1995. Combining analyses, combining optimizations. ACM Transactions on Programming Languages and Systems 17, 2 (March 1995, 181--196). doi: 10.1145/201059.201061​ – one of the hardest problems in classical compilation.

Quantum compilers are not immune to this, either. The same pass ordering sorcery can be found just as well in TKET, a leading quantum compiler Sivara., 2020Seyon Sivarajah, Silas Dilkes, Alexander Cowtan, Will Simmons, Alec Edgington and Ross Duncan. 2020. t|ket⟩: a retargetable compiler for NISQ devices. Quantum Science and Technology 6, 1 (November 2020, 014003). doi: 10.1088/2058-9565/ab8e92. It does not look quite as scary yet, but the code is growing and some passes are already being called three times within the default sequence. This is hard to maintain and any change and new feature comes at the risk of degrading the compiler performance on established use cases.

If we are serious about adopting classical compiler tooling for quantum computations, we will need to find a more convincing solution to this problem. Fortunately, classical compilation is a mature field that has already experienced (and solved!) most of the challenges that quantum compilers have faced, and will ever face, in peephole optimisation. Our proposal, described in chapter 4, combines two modern compilation techniques that mitigate this and which we will review now: Superoptimisation and Equality saturation.

Superoptimisation #

Large compiler projects support a multitude of program representations and large sets of operations/instructions to be able to generate code for a wide array of devices and architectures. Designers of optimising compiler passes must thus necessarily put constraints on the program input format that the pass will accept. In all likelihood, the pass will furthermore only be applicable for certain cost functions or for a limited set of hardware target. This all leads to a proliferation of special purpose compiler passes that are bug-prone and must be carefully ordered to perform well across all the use cases of interest. On top of this, new instruction sets, architectures or new cost functions require new sets of compiler passes, in effect rebuilding the entire compilation pipeline.

As early as 1979, Fraser suggested deriving peephole optimisations automatically Fraser, 1979Christopher W. Fraser. 1979. A compact, machine-independent peephole optimizer. In Proceedings of the 6th ACM SIGACT-SIGPLAN symposium on Principles of programming languages - POPL ’79. ACM Press, 1--6. doi: 10.1145/567752.567753, an idea Massalin coined superoptimisation Massal., 1987Henry Massalin. 1987. Superoptimizer: a look at the smallest program. In Proceedings of the second international conference on Architectual support for programming languages and operating systems, October 1987. ACM, 122--126. doi: 10.1145/36206.36194. Instead of hard-coding the peephole optimisations as passes to be ordered and executed, useful valid program transformations are synthesised automatically. Early work used probabilistic verification, which meant that program transformations were generated ahead of time, and once verified manually, added to the compiler for use in optimisation Massal., 1987Henry Massalin. 1987. Superoptimizer: a look at the smallest program. In Proceedings of the second international conference on Architectual support for programming languages and operating systems, October 1987. ACM, 122--126. doi: 10.1145/36206.36194 Sands, 2011Duncan Sands. 2011. Super-optimizing LLVM IR. (November 2011). Retrieved on 13/01/2025 (LLVM Developer's meeting) from http://llvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdf. With advances in theorem proving techniques, program transformation synthesis and verification was further automated, resulting in end-to-end automatically generated and proven compiler transformations Bansal, 2006Sorav Bansal and Alex Aiken. 2006. Automatic generation of peephole superoptimizers. ACM SIGARCH Computer Architecture News 34, 5 (October 2006, 394--403). doi: 10.1145/1168919.1168906 Sasnau., 2017Raimondas Sasnauskas, Yang Chen, Peter Collingbourne, Jeroen Ketema, Jubi Taneja and John Regehr. 2017. Souper: A Synthesizing Superoptimizer. CoRR abs/1711.04422.

A particularly simple superoptimising compiler design was proposed in Jia, 2019Zhihao Jia, Oded Padon, James Thomas, Todd Warszawski, Matei Zaharia and Alex Aiken. 2019. TASO: optimizing deep learning computation with automatic generation of graph substitutions. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, October 2019. ACM, 47--62. doi: 10.1145/3341301.3359630, specially for the purposes of computation graph optimisation Fang, 2020Jingzhi Fang, Yanyan Shen, Yue Wang and Lei Chen. 2020. Optimizing DNN computation graph using graph substitutions. Proceedings of the VLDB Endowment 13, 12 (August 2020, 2734--2746). doi: 10.14778/3407790.3407857 in deep learning. The set of all small programs, up to a threshold, is generated ahead of time and partitioned into disjoint classes of equivalent programs. This concisely expresses every possible peephole optimisation up to the specified size: for every small enough subset of instructions of an input program, its equivalence class can be determined. Any replacement of that set of instructions with another program in the same equivalence class is a valid transformation, and thus a potential peephole optimisation.

This framework was adapted to quantum circuit optimisation in Xu, 2022Mingkuan Xu, Zikun Li, Oded Padon, Sina Lin, Jessica Pointing, Auguste Hirth, Henry Ma, Jens Palsberg, Alex Aiken, Umut A. Acar and Zhihao Jia. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 and separately in Xu, 2023Amanda Xu, Abtin Molavi, Lauren Pick, Swamit Tannu and Aws Albarghouthi. 2023. Synthesizing Quantum-Circuit Optimizers. Proceedings of the ACM on Programming Languages 7, PLDI (June 2023, 835--859). doi: 10.1145/3591254. On top of excellent performance, this approach is extremely flexible. For any supplied cost function, the compiler can explore all valid program transformations to find the sequence of transforms that minimise cost. This keeps the cost function-specific logic separate from the transformation semantics of the program, making it straightforward to replace or update the optimisation objective.

Furthermore, the set of supported primitives and architectures can easily be extended at any time by supplementing the equivalence classes with new programs. Any input using this extended set of primitives can thus be successfully compiled and any output program format constraints can be enforced by restricting the set of valid transformations.

The adaptation of superoptimisation to quantum optimisation of Xu, 2022Mingkuan Xu, Zikun Li, Oded Padon, Sina Lin, Jessica Pointing, Auguste Hirth, Henry Ma, Jens Palsberg, Alex Aiken, Umut A. Acar and Zhihao Jia. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 and Xu, 2023Amanda Xu, Abtin Molavi, Lauren Pick, Swamit Tannu and Aws Albarghouthi. 2023. Synthesizing Quantum-Circuit Optimizers. Proceedings of the ACM on Programming Languages 7, PLDI (June 2023, 835--859). doi: 10.1145/3591254 is however showing scaling difficulties: unlike classical superoptimisation which is usually designed to optimise small subroutines within programs, e.g. focusing on arithmetic instructions, single instruction multiple data (SIMD) etc., the technique should in principle be able to optimise quantum programs in their entirety. This leads to much larger programs that superoptimisation does not scale well to. An orthogonal scaling challenge has also been observed: in Xu, 2022Mingkuan Xu, Zikun Li, Oded Padon, Sina Lin, Jessica Pointing, Auguste Hirth, Henry Ma, Jens Palsberg, Alex Aiken, Umut A. Acar and Zhihao Jia. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433, optimisation performance was observed to improve markedly with larger sets of rewrite rules. However, each rewrite rule requires a separate run of pattern matching to find all possible applications. As a result, performance peaks at around 50 000 rewrite rules, after which the additional overhead from pattern matching becomes dominant, deteriorating the compilation results.

Equality saturation #

Superoptimisation resolves half of the phase ordering problem: it stops the proliferation of a multitude of hardware or input specific passes, replacing it instead with one, extensible and unified local rewrite-based compiler platform. However, in a sense it makes the other half of the problem worse: instead of building a compilation pipeline from a library of manually written passes, superoptimising compilers must discover sequences of rewrites out of a much larger pool of automatically generated transformations!

Equality saturation removes the phase ordering problem altogether. The technique was first proposed in Tate, 2009Ross Tate, Michael Stepp, Zachary Tatlock and Sorin Lerner. 2009. Equality saturation: a new approach to optimization. ACM SIGPLAN Notices 44, 1 (January 2009, 264--276). doi: 10.1145/1594834.1480915. A modern implementation of it was presented in Willsey, 2021Max Willsey. 2021. Practical and Flexible Equality Saturation. We will provide a succinct introduction to equality saturation below, but recommend the presentation of Willsey, 2021Max Willsey. 2021. Practical and Flexible Equality Saturation to the interested reader, as well as its implementation Willsey, 2025Max Willsey. 2025. egg: egraphs good. Retrieved on 14/01/2025 (code repository) from https://github.com/egraphs-good/egg and this blog discussion Bernst., 2024Max Bernstein. 2024. What's in an e-graph?. (Septempter 2024). Retrieved on 14/01/2025 (blog post) from https://bernsteinbear.com/blog/whats-in-an-egraph/. Finally, in spite its theoretical origins, equality saturation is also (slowly) making it into production compilers Fallin, 2022Chris Fallin. 2022. Cranelift: Using E-Graphs for Verified, Cooperating Middle-End Optimizations. (August 2022). Retrieved on 14/01/2025 (RFC) from https://github.com/bytecodealliance/rfcs/blob/main/accepted/cranelift-egraph.md.

Unlike a general purpose compiler utility, equality saturation is specifically a technique for term rewriting. Terms1, are algebraic expressions represented as trees, in which tree nodes correspond to operations, the children of an operation are the subterms passed as arguments to the operation and leaf nodes are either constants or unbound variables

x f 2 3 T h e t e r m f ( x 2 , 3 )

This representation is particularly suited representation for any functional (i.e. side effect free) classical computation. Every node of a term is identified with a term of its own: the subterm given by the subtree the node is the root of. Given equivalences between terms, term rewriting consists of finding subterms that match known equivalences. The matching subtrees are then be replaced with the new equivalent trees.

The core of equality saturation is a data structure that stores the term being compiled, along with the result of every possible rewrite that could be applied to it or to any of its subterms. Term optimisation proceeds in two stages. First, an exploration phase adds progressively more terms to the data structure in order to discover and capture all possible terms that the input can be rewritten to, until saturation (see below), or a timeout, is reached. In the second phase, the saturated data structure is passed to an extraction algorithm, tasked with finding the term that minimises the cost function of interest among all terms discovered during exploration.

The data structure that enables this is a generalisation of term trees. Just as in terms, nodes correspond to operations and have children subterms that correspond to the arguments of the operation. However, the data structure is persistent: when applying a rewrite, existing terms are never removed and thus information in the data structure is never lost. Instead, the new term introduced by a term rewrite is added to the data structure while the term that was matched remains unchanged. To record that both terms are equivalent, we extend the data structure we employ to also store equivalence classes of nodes, typically implemented as Union-Find data structures Galler, 1964Bernard A. Galler and Michael J. Fisher. 1964. An improved equivalence algorithm. Communications of the ACM 7, 5 (May 1964, 301--303). doi: 10.1145/364099.364331 Cormen, 2009Thomas H Cormen, Charles E Leiserson, Ronald L Rivest and Clifford Stein. 2009. Introduction to algorithms (Third edition ed.). MIT Press, Cambridge, Massachusetts. If we for instance applied the rewrite x2x+xx * 2 \mapsto x + x to the term above, we would obtain

x x x f 2 3

This diagram encodes that any occurence of the x * 2 term can equivalently be expressed by the x + x term. Henceforth, when matching terms for rewriting, both the * term and the + term are valid choices for the first argument of the f operation. Suppose for example the existence of a rewrite f(x+y,z)f(x,yz)f(x + y, z) \mapsto f(x, y * z), then this would match in the above data structure resulting in

x x x f 2 3 x f x 3

A consequence of persistency and the use of equivalence relations is that the ordering in which the rewrites are considered and applied becomes totally irrelevant! As presented, the exploration process would however never terminate and the data structure size would grow indefinitely: as more rewrites are applied, more and more terms are created, resulting in an ever increasing set of possible rewrites to be considered and processed. Equality saturation resolves this by enforcing a term uniqueness invariant: every term or subterm that is explored is expressed by exactly one node in the data structure. We can see in the above example that this is currently not the case: the term xx for instance, is present multiple times – so is 33. As a result, the nodes no longer form a forest of trees, but instead a directed acyclic graphs:

x f 2 3 f

This is commonly known as term sharing, or term graphs Willsey, 2021Max Willsey. 2021. Practical and Flexible Equality Saturation. Maintaining this invariance is not hard in practice: whenever a new term is about to be added by the application of a rewrite, it must first be checked whether the term exist already – something that can be done cheaply by keeping track of all hashes of existing terms. In the affirmative case, rather than adding a new term to the equivalence class of the matched term, the entire class of the existing node must be merged into it.

Note that when term sharing, it might be that not only the equivalence classes of the two terms must be merged, but also that merging of classes must be done recursively: given the terms f(x,3)f(x, 3) and f(y,3)f(y, 3), if the classes of xx and yy are merged (and thus xx and yy have been proven equivalent), then the classes of their respective parent f(x,3)f(x, 3) and f(y,3)f(y, 3) must also be merged. Doing so efficiently is non-trivial, so we will not go into details here and refer again to Willsey, 2021Max Willsey. 2021. Practical and Flexible Equality Saturation.

In the absence of terms of unbounded size, the uniqueness invariant guarantees that the exploration will eventually saturate: as rewrites are applied, there will come a point where all equivalent terms have been discovered, i.e. every applicable rewrite will correspond to an equivalence within an already known class, thereby not providing any new information. This marks the end of the exploration phase2.

The optimiser may then proceed to the extraction phase. It is not trivial to read out an optimised term out of the saturated term data structure. For every equivalence class in the data structure, a representative node must be chosen in such a way that the final term, extracted recursively from the root by always choosing the representative node of every term class, minimises the desired cost function3.

The strategy for choosing representative terms depends heavily on the cost function. In simple cases, such as minimising the total size of the extracted term, this can be done greedily in a reverse topological order, i.e. proceeding from the leaves towards the root Willsey, 2021Max Willsey. 2021. Practical and Flexible Equality Saturation. There are also more complex cases, however: if the cost function allows for sharing of subexpressions that may be used more than once in the computation, for instance, then finding the optimal solution will require more expensive computations such as solving boolean satisfiability (SAT) or Satisfiability Modulo Theories (SMT) problem instances Biere, 2021Armin Biere, Marijn J. H. Heule, Hans Maaren and Toby Walsh. 2021. Handbook of satisfiability (Second edition ed.). IOS Press, Amsterdam.

Quantum programs: trouble in paradise? #

Equality saturation is a fast developing subfield of compilation sciences with a growing list of applications. Unfortunately for us4, adapting the ideas to quantum computation presents several unsolved challenges.

The root of the problem lies in the program representation. Fundamentally, the minIR representation we sketched out in section 3.3 – but also the quantum circuit representation – capture quantum computations, not as a term, but in a directed acyclic graph (DAG) structure.

This issue was studied by Yang, 2021Yichen Yang, Mangpo Phitchaya Phothilimtha, Yisu Remy Wang, Max Willsey, Sudip Roy and Jacques Pienaar. 2021. Equality Saturation for Tensor Graph Superoptimization. CoRR abs/2101.01332. doi: 10.48550/ARXIV.2101.01332 in the context of tensor graphs optimisation. They propose decomposing computation DAGs into a tuple of (overlapping) terms, one for each output of the computation. In theory, this should allow for the full use of the equality saturation workhorse with very minimal adjustements. In their application, however, they found exponential size increases in the term data structure used as a result of the cartesian product between the decomposed terms of DAG rewrite rules and had to limit the number of applications of multi-term rewrite rules to 2 for performance reasons.

DAG rewriting also introduces new issues in the extraction phase of equality saturation. The authors of Yang, 2021Yichen Yang, Mangpo Phitchaya Phothilimtha, Yisu Remy Wang, Max Willsey, Sudip Roy and Jacques Pienaar. 2021. Equality Saturation for Tensor Graph Superoptimization. CoRR abs/2101.01332. doi: 10.48550/ARXIV.2101.01332 show that multi-term rewrites can introduce cycles in the term data structure, which are expensive to detect and account for in the extraction algorithm. Finally, such rewrites result in large overlaps between terms. This means that greedy divide-and-conquer extraction heuristics that do not model cost savings from subexpression sharing perform poorly, necessitating the use of more compute intensive SMT techniques for competitive results.

To make matters worse, there are also quantum-specific difficulties in using equality saturation. Term sharing and the hash-based uniqueness invariant enforced by the term data structure is fundamentally at odds with the linearity of quantum resources. At extraction time, all linear resources would have to be carefully tracked. Additional constraints would have to be added to ensure that expressions that depend on the same linear resources are mutually exclusive. Conversely, we would have to guarantee that for all terms that make up a multi-term rewrite, the linear resources used on their overlaps coincide.

Consider for instance the following simple rewrite that pushes X gates (\oplus) from the right of a CX gate to the left:

{ "qubits": [{ "id": 0 }, { "id": 1 }], "operations": [ { "gate": "X", "isControlled": true, "controls": [{ "qId": 0 }], "targets": [{ "qId": 1 }] }, { "gate": "X", "targets": [{ "qId": 0 }] } ] }
{ "qubits": [{ "id": 0 }, { "id": 1 }], "operations": [ { "gate": "X", "targets": [{ "qId": 0 }] }, { "gate": "X", "targets": [{ "qId": 1 }] }, { "gate": "X", "isControlled": true, "controls": [{ "qId": 0 }], "targets": [{ "qId": 1 }] } ] }

Both the left and right hand sides would be decomposed into two terms, one for each output qubit. The left terms could be written as X(CX0(0,1))andCX1(0,1)X(CX0(0, 1)) \quad\textrm{and}\quad CX1(0, 1) whereas the right terms would be CX0(X(0),X(1))andCX1(X(0),X(1))CX0(X(0), X(1)) \quad\textrm{and}\quad CX1(X(0), X(1)) where we introduced the term X()X(\cdot) for the single-qubit X gate and two terms CX1(,)CX1(\cdot, \cdot) and CX2(,)CX2(\cdot, \cdot) for the terms that produce the first, repsectively second, output of the two-qubit CX gate. 11 and 00 denote the input qubits of the computation. This would be interpreted as two different rewrites X(CX0(0,1))CX0(X(0),X(1))andCX1(0,1)CX1(X(0),X(1))\begin{aligned}X(CX0(0, 1)) &\mapsto CX0(X(0), X(1))\\\textrm{and}\quad CX1(0, 1) &\mapsto CX1(X(0), X(1))\end{aligned} but the crucial point is: unlike classical computations, we would have to enforce at extraction time that for every set of applicable gates, either both or none of these rewrites are applied. Otherwise, the extracted program would be unphysical.

This problem is further compounded by quantum entanglement. Indeed, consider the rewrite rule above applied to the second and third gate of the following circuit:

{ "qubits": [{ "id": 0 }, { "id": 1 }], "operations": [ { "gate": "X", "isControlled": true, "controls": [{ "qId": 0 }], "targets": [{ "qId": 1 }] }, { "gate": "X", "isControlled": true, "controls": [{ "qId": 0 }], "targets": [{ "qId": 1 }] }, { "gate": "X", "targets": [{ "qId": 0 }] } ] }
With some creative drawing, we can represent the resulting equality saturation data structure containing both the circuit before and after the rewrite as follows.
A superposed circuit

A sketch of an equality saturation data structure containing two versions of a circuit, before and after a rewrite. The two alternatives are represented by “splitting” each qubit wire. The top-most split wires correspond to the original circuit, the bottom-most split wires correspond to the circuit after the rewrite.

Now, suppose the existence of another rewrite rule, given by

{ "qubits": [{ "id": 0 }, { "id": 1 }, { "id": 2 }], "operations": [ { "gate": "X", "isControlled": true, "controls": [{ "qId": 0 }], "targets": [{ "qId": 2 }] }, { "gate": "X", "isControlled": true, "controls": [{ "qId": 0 }], "targets": [{ "qId": 1 }] }, { "gate": "X", "targets": [{ "qId": 0 }] }, { "gate": "X", "targets": [{ "qId": 2 }] } ] }
{ "qubits": [{ "id": 0 }, { "id": 1 }, { "id": 2 }], "operations": [ { "gate": "X", "targets": [{ "qId": 0 }] }, { "gate": "X", "targets": [{ "qId": 1 }] }, { "gate": "X", "isControlled": true, "controls": [{ "qId": 2 }], "targets": [{ "qId": 1 }] }, { "gate": "X", "isControlled": true, "controls": [{ "qId": 0 }], "targets": [{ "qId": 2 }] }, { "gate": "X", "isControlled": true, "controls": [{ "qId": 2 }], "targets": [{ "qId": 1 }] } ] }

If during rewriting we ignore the linearity constraint that we mentioned above in the context of the extraction procedure, the part of the diagram highlighted in red would be a valid match of the left hand side. Applying the rewrite to this match is not only unphysical, it is plain nonsensical:

A superposed circuit

The same equality saturation data structure after a second rewrite. For simplicity, we are not overlaying in the illustration the circuit between the first and second rewrite.

This applies entangling gates between different versions of the same qubit! In other words, linearity constraints would not only have to be taken into account during extraction, but also to restrict pattern matching and rewriting during exploration.


  1. Depending on the context, computer scientists also call them abstract syntax trees (AST) – it’s the same thing. ↩︎

  2. Of course, it is practical to also include a timeout parameter in implementations to guarantee timely termination even on large or ill-behaved systems. ↩︎

  3. Note that we are omitting a subtle point here that arises due to term sharing: depending on the cost function, it could be favourable to choose different representative nodes for the same class, for the different occurences of the term in the computation. ↩︎

  4. but fortunately for this thesis ↩︎