6.1. Inspiration: Superoptimisation
This needs to be reorganised
Optimisations in compilers are typically built as passes. Compiler authors then design sequences of these passes to create the default optimisation pipelines that most end-users rely on. This is not easy. For a large project like LLVM, the end result looks something like this – a thousand lines of meticulously commented code, carefully crafted and hand tuned to handle every optimisation edge case and perform well on thousands of benchmarks. This is the phase-ordering problem Click, 1995. 1995. Combining analyses, combining optimizations. ACM Transactions on Programming Languages and Systems 17, 2 (March 1995, 181--196). doi: 10.1145/201059.201061 – one of the hardest problems in classical compilation.
Quantum compilers are not immune to this, either. The same pass ordering sorcery can be found just as well in TKET, a leading quantum compiler Sivara., 2020. 2020. t|ket⟩: a retargetable compiler for NISQ devices. Quantum Science and Technology 6, 1 (November 2020, 014003). doi: 10.1088/2058-9565/ab8e92. It does not look quite as scary yet, but the code is growing and some passes are already being called three times within the default sequence. This is hard to maintain and any change and new feature comes at the risk of degrading the compiler performance on established use cases.
If we are serious about adopting classical compiler tooling for quantum computations, we will need to find a more convincing solution to this problem. Fortunately, classical compilation is a mature field that has already experienced (and solved!) most of the challenges that quantum compilers have faced, and will ever face, in peephole optimisation. Our proposal, described in chapter 6, combines two modern compilation techniques that mitigate this and which we will review now: Superoptimisation and Equality saturation.
Large compiler projects support a multitude of program representations and large sets of operations/instructions to be able to generate code for a wide array of devices and architectures. Designers of optimising compiler passes must thus necessarily put constraints on the program input format that the pass will accept. In all likelihood, the pass will furthermore only be applicable for certain cost functions or for a limited set of hardware target. This all leads to a proliferation of special purpose compiler passes that are bug-prone and must be carefully ordered to perform well across all the use cases of interest. On top of this, new instruction sets, architectures or new cost functions require new sets of compiler passes, in effect rebuilding the entire compilation pipeline.
As early as 1979, Fraser suggested deriving peephole optimisations automatically Fraser, 1979. 1979. A compact, machine-independent peephole optimizer. In Proceedings of the 6th ACM SIGACT-SIGPLAN symposium on Principles of programming languages - POPL ’79. ACM Press, 1--6. doi: 10.1145/567752.567753, an idea Massalin coined superoptimisation Massal., 1987. 1987. Superoptimizer: a look at the smallest program. In Proceedings of the second international conference on Architectual support for programming languages and operating systems, October 1987. ACM, 122--126. doi: 10.1145/36206.36194. Instead of hard-coding the peephole optimisations as passes to be ordered and executed, useful valid program transformations are synthesised automatically. Early work used probabilistic verification, which meant that program transformations were generated ahead of time, and once verified manually, added to the compiler for use in optimisation Massal., 1987. 1987. Superoptimizer: a look at the smallest program. In Proceedings of the second international conference on Architectual support for programming languages and operating systems, October 1987. ACM, 122--126. doi: 10.1145/36206.36194 Sands, 2011. 2011. Super-optimizing LLVM IR. (November 2011). Retrieved on 13/01/2025 (LLVM Developer's meeting) from http://llvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdf. With advances in theorem proving techniques, program transformation synthesis and verification was further automated, resulting in end-to-end automatically generated and proven compiler transformations Bansal, 2006. 2006. Automatic generation of peephole superoptimizers. ACM SIGARCH Computer Architecture News 34, 5 (October 2006, 394--403). doi: 10.1145/1168919.1168906 Sasnau., 2017. 2017. Souper: A Synthesizing Superoptimizer. CoRR abs/1711.04422.
A particularly simple superoptimising compiler design was proposed in Jia, 2019. 2019. TASO: optimizing deep learning computation with automatic generation of graph substitutions. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, October 2019. ACM, 47--62. doi: 10.1145/3341301.3359630, specially for the purposes of computation graph optimisation Fang, 2020. 2020. Optimizing DNN computation graph using graph substitutions. Proceedings of the VLDB Endowment 13, 12 (August 2020, 2734--2746). doi: 10.14778/3407790.3407857 in deep learning. The set of all small programs, up to a threshold, is generated ahead of time and partitioned into disjoint classes of equivalent programs. This concisely expresses every possible peephole optimisation up to the specified size: for every small enough subset of instructions of an input program, its equivalence class can be determined. Any replacement of that set of instructions with another program in the same equivalence class is a valid transformation, and thus a potential peephole optimisation.
This framework was adapted to quantum circuit optimisation in Xu, 2022. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 and separately in Xu, 2023. 2023. Synthesizing Quantum-Circuit Optimizers. Proceedings of the ACM on Programming Languages 7, PLDI (June 2023, 835--859). doi: 10.1145/3591254. On top of excellent performance, this approach is extremely flexible. For any supplied cost function, the compiler can explore all valid program transformations to find the sequence of transforms that minimise cost. This keeps the cost function-specific logic separate from the transformation semantics of the program, making it straightforward to replace or update the optimisation objective.
Furthermore, the set of supported primitives and architectures can easily be extended at any time by supplementing the equivalence classes with new programs. Any input using this extended set of primitives can thus be successfully compiled and any output program format constraints can be enforced by restricting the set of valid transformations.
The adaptation of superoptimisation to quantum optimisation of Xu, 2022. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 and Xu, 2023. 2023. Synthesizing Quantum-Circuit Optimizers. Proceedings of the ACM on Programming Languages 7, PLDI (June 2023, 835--859). doi: 10.1145/3591254 is however showing scaling difficulties: unlike classical superoptimisation which is usually designed to optimise small subroutines within programs, e.g. focusing on arithmetic instructions, single instruction multiple data (SIMD) etc., the technique should in principle be able to optimise quantum programs in their entirety. This leads to much larger programs that superoptimisation does not scale well to. An orthogonal scaling challenge has also been observed: in Xu, 2022. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433, optimisation performance was observed to improve markedly with larger sets of rewrite rules. However, each rewrite rule requires a separate run of pattern matching to find all possible applications. As a result, performance peaks at around 50 000 rewrite rules, after which the additional overhead from pattern matching becomes dominant, deteriorating the compilation results.