Inspiration: Superoptimisation

6.1. Inspiration: Superoptimisation

This needs to be reorganised

Optimisations in compilers are typically built as passes. Compiler authors then design sequences of these passes to create the default optimisation pipelines that most end-users rely on. This is not easy. For a large project like LLVM, the end result looks something like this – a thousand lines of meticulously commented code, carefully crafted and hand tuned to handle every optimisation edge case and perform well on thousands of benchmarks. This is the phase-ordering problem Click, 1995Cliff Click and Keith D. Cooper. 1995. Combining analyses, combining optimizations. ACM Transactions on Programming Languages and Systems 17, 2 (March 1995, 181--196). doi: 10.1145/201059.201061​ – one of the hardest problems in classical compilation.

Quantum compilers are not immune to this, either. The same pass ordering sorcery can be found just as well in TKET, a leading quantum compiler Sivara., 2020Seyon Sivarajah, Silas Dilkes, Alexander Cowtan, Will Simmons, Alec Edgington and Ross Duncan. 2020. t|ket⟩: a retargetable compiler for NISQ devices. Quantum Science and Technology 6, 1 (November 2020, 014003). doi: 10.1088/2058-9565/ab8e92. It does not look quite as scary yet, but the code is growing and some passes are already being called three times within the default sequence. This is hard to maintain and any change and new feature comes at the risk of degrading the compiler performance on established use cases.

If we are serious about adopting classical compiler tooling for quantum computations, we will need to find a more convincing solution to this problem. Fortunately, classical compilation is a mature field that has already experienced (and solved!) most of the challenges that quantum compilers have faced, and will ever face, in peephole optimisation. Our proposal, described in chapter 6, combines two modern compilation techniques that mitigate this and which we will review now: Superoptimisation and Equality saturation.


Large compiler projects support a multitude of program representations and large sets of operations/instructions to be able to generate code for a wide array of devices and architectures. Designers of optimising compiler passes must thus necessarily put constraints on the program input format that the pass will accept. In all likelihood, the pass will furthermore only be applicable for certain cost functions or for a limited set of hardware target. This all leads to a proliferation of special purpose compiler passes that are bug-prone and must be carefully ordered to perform well across all the use cases of interest. On top of this, new instruction sets, architectures or new cost functions require new sets of compiler passes, in effect rebuilding the entire compilation pipeline.

As early as 1979, Fraser suggested deriving peephole optimisations automatically Fraser, 1979Christopher W. Fraser. 1979. A compact, machine-independent peephole optimizer. In Proceedings of the 6th ACM SIGACT-SIGPLAN symposium on Principles of programming languages - POPL ’79. ACM Press, 1--6. doi: 10.1145/567752.567753, an idea Massalin coined superoptimisation Massal., 1987Henry Massalin. 1987. Superoptimizer: a look at the smallest program. In Proceedings of the second international conference on Architectual support for programming languages and operating systems, October 1987. ACM, 122--126. doi: 10.1145/36206.36194. Instead of hard-coding the peephole optimisations as passes to be ordered and executed, useful valid program transformations are synthesised automatically. Early work used probabilistic verification, which meant that program transformations were generated ahead of time, and once verified manually, added to the compiler for use in optimisation Massal., 1987Henry Massalin. 1987. Superoptimizer: a look at the smallest program. In Proceedings of the second international conference on Architectual support for programming languages and operating systems, October 1987. ACM, 122--126. doi: 10.1145/36206.36194 Sands, 2011Duncan Sands. 2011. Super-optimizing LLVM IR. (November 2011). Retrieved on 13/01/2025 (LLVM Developer's meeting) from http://llvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdf. With advances in theorem proving techniques, program transformation synthesis and verification was further automated, resulting in end-to-end automatically generated and proven compiler transformations Bansal, 2006Sorav Bansal and Alex Aiken. 2006. Automatic generation of peephole superoptimizers. ACM SIGARCH Computer Architecture News 34, 5 (October 2006, 394--403). doi: 10.1145/1168919.1168906 Sasnau., 2017Raimondas Sasnauskas, Yang Chen, Peter Collingbourne, Jeroen Ketema, Jubi Taneja and John Regehr. 2017. Souper: A Synthesizing Superoptimizer. CoRR abs/1711.04422.

A particularly simple superoptimising compiler design was proposed in Jia, 2019Zhihao Jia, Oded Padon, James Thomas, Todd Warszawski, Matei Zaharia and Alex Aiken. 2019. TASO: optimizing deep learning computation with automatic generation of graph substitutions. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, October 2019. ACM, 47--62. doi: 10.1145/3341301.3359630, specially for the purposes of computation graph optimisation Fang, 2020Jingzhi Fang, Yanyan Shen, Yue Wang and Lei Chen. 2020. Optimizing DNN computation graph using graph substitutions. Proceedings of the VLDB Endowment 13, 12 (August 2020, 2734--2746). doi: 10.14778/3407790.3407857 in deep learning. The set of all small programs, up to a threshold, is generated ahead of time and partitioned into disjoint classes of equivalent programs. This concisely expresses every possible peephole optimisation up to the specified size: for every small enough subset of instructions of an input program, its equivalence class can be determined. Any replacement of that set of instructions with another program in the same equivalence class is a valid transformation, and thus a potential peephole optimisation.

This framework was adapted to quantum circuit optimisation in Xu, 2022Mingkuan Xu, Zikun Li, Oded Padon, Sina Lin, Jessica Pointing, Auguste Hirth, Henry Ma, Jens Palsberg, Alex Aiken, Umut A. Acar and Zhihao Jia. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 and separately in Xu, 2023Amanda Xu, Abtin Molavi, Lauren Pick, Swamit Tannu and Aws Albarghouthi. 2023. Synthesizing Quantum-Circuit Optimizers. Proceedings of the ACM on Programming Languages 7, PLDI (June 2023, 835--859). doi: 10.1145/3591254. On top of excellent performance, this approach is extremely flexible. For any supplied cost function, the compiler can explore all valid program transformations to find the sequence of transforms that minimise cost. This keeps the cost function-specific logic separate from the transformation semantics of the program, making it straightforward to replace or update the optimisation objective.

Furthermore, the set of supported primitives and architectures can easily be extended at any time by supplementing the equivalence classes with new programs. Any input using this extended set of primitives can thus be successfully compiled and any output program format constraints can be enforced by restricting the set of valid transformations.

The adaptation of superoptimisation to quantum optimisation of Xu, 2022Mingkuan Xu, Zikun Li, Oded Padon, Sina Lin, Jessica Pointing, Auguste Hirth, Henry Ma, Jens Palsberg, Alex Aiken, Umut A. Acar and Zhihao Jia. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 and Xu, 2023Amanda Xu, Abtin Molavi, Lauren Pick, Swamit Tannu and Aws Albarghouthi. 2023. Synthesizing Quantum-Circuit Optimizers. Proceedings of the ACM on Programming Languages 7, PLDI (June 2023, 835--859). doi: 10.1145/3591254 is however showing scaling difficulties: unlike classical superoptimisation which is usually designed to optimise small subroutines within programs, e.g. focusing on arithmetic instructions, single instruction multiple data (SIMD) etc., the technique should in principle be able to optimise quantum programs in their entirety. This leads to much larger programs that superoptimisation does not scale well to. An orthogonal scaling challenge has also been observed: in Xu, 2022Mingkuan Xu, Zikun Li, Oded Padon, Sina Lin, Jessica Pointing, Auguste Hirth, Henry Ma, Jens Palsberg, Alex Aiken, Umut A. Acar and Zhihao Jia. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433, optimisation performance was observed to improve markedly with larger sets of rewrite rules. However, each rewrite rule requires a separate run of pattern matching to find all possible applications. As a result, performance peaks at around 50 000 rewrite rules, after which the additional overhead from pattern matching becomes dominant, deteriorating the compilation results.