Future Work and Conclusions

Chapter 6

Future Work and Conclusions

The time has now come to conclude this thesis. In summary, our claim is that given

  1. the modularity and expressiveness that quantum compilers will require to simultaneously express higher level abstractions, hardware primitives and interleaved quantum classical computation (cf. sections 3.3, 2.3, and 2.4),
  2. the challenge of scaling up quantum programs sizes to make the most of the computational capabilities of upcoming hardware (cf. sections 1.1 and 2.2),
  3. the linearity restrictions that quantum data imposes on the compiler’s intermediate representation (IR) of the computation (cf. sections 2.1, 3.3, and 3.4),

graph transformation systems (GTS) are uniquely positioned to serve as the backbone of a quantum compilation framework.

To this aim, chapter 3 presented minIR, a graph-based compiler IR with explicit support for linear types. To go along with it, we proposed the first formalisation of graph transformation semantics that preserve linearity.

Chapters 4 and 5 built on this foundation and solved two critical scaling problems for the adoption of GTS techniques in quantum compilers.

Pattern matching. Successful implementations of GTSs for quantum circuit optimisation rely on thousands to hundreds of thousands of transformation rules Xu, 2022Mingkuan Xu, Zikun Li, Oded Padon, Sina Lin, Jessica Pointing, Auguste Hirth, Henry Ma, Jens Palsberg, Alex Aiken, Umut A. Acar and Zhihao Jia. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 Xu, 2023Amanda Xu, Abtin Molavi, Lauren Pick, Swamit Tannu and Aws Albarghouthi. 2023. Synthesizing Quantum-Circuit Optimizers. Proceedings of the ACM on Programming Languages 7, PLDI (June 2023, 835--859). doi: 10.1145/3591254, for which techniques matching one pattern at a time become a significant bottleneck. Chapter 4 presented an approach based on state automata with an asymptotic runtime that is independent on the number of patterns. This resulted in a 20x speedup for a real-world pattern matching task of direct utility to quantum compilers.

Efficient rewrite space exploration. Applications of GTS to quantum compilation distinguish themselves – unfortunately – by a lack of successful rewriting strategies or other rule control mechanisms. Consequently, the optimisation of quantum computations is framed as a search problem over the space of all reachable graphs in the GTS. Chapter 5 introduced a novel confluently persistent data structure that uses the structure of the rewrite search space to speed up its exploration. In typical applications, the factorised search space thus obtained is conjectured to grow linearly with the size of the input – an exponential improvement over the naive search strategy, without which GTS-based compiler optimisations on real-world computations with thousands of gates will be infeasible.

In both cases, the guarantees that linear values provide and that minIR enforces translate into asymptotic runtime guarantees that cannot be derived otherwise. In the absence of linearity, the pattern matching of chapter 4 becomes an NP-hard problem; meanwhile, the graph rewriting space considered in chapter 5 would grow super-exponentially and require pruning heuristics for the extraction problem, as studied in Yang, 2021Yichen Yang, Mangpo Phitchaya Phothilimtha, Yisu Remy Wang, Max Willsey, Sudip Roy and Jacques Pienaar. 2021. Equality Saturation for Tensor Graph Superoptimization. CoRR abs/2101.01332. doi: 10.48550/ARXIV.2101.01332 and Bărbu., 2024George-Octavian Bărbulescu, Taiyi Wang, Zak Singh and Eiko Yoneki. 2024. Learned Graph Rewriting with Equality Saturation: A New Paradigm in Relational Query Rewrite and Beyond. arXiv: 2407.12794 [cs.DB].

Combined, these contributions lay the groundwork for a quantum compiler platform that is modular in the hardware primitives, high-level programming abstractions and transformation rules that it can model, and scalable in the size of the computation and number of rules that it can match and optimise over. Work on such a platform is well underway within the TKET2 open-source compiler, available on GitHub.

Further work could take many directions. The graph transformation semantics of chapter 3 that are presented operationally could for example be categorified and generalised. This would open many promising bridges and parallels to work in related domains, such as string diagrams, DPO-based GTSs and even the family of ZX calculi.

There are also immediate opportunities in extending the work of chapters 4 and 5, in particular around weakening the assumptions that had to be made on the structure of the graph, respectively on the properties of the GTS and graph domain. In both cases, a more in-depth study of how the runtime of actual implementations depend on properties of the inputs would be very informative. We suspect from anecdotal observations that many assumptions we have imposed can be relaxed with little impact on performance – conversely, there may be large variations in runtimes for different regimes within the asymptotic guarantees of our results.

Another crucially important question that this thesis has not addressed is the choice of transformation rules. Beyond the results of Xu, 2022Mingkuan Xu, Zikun Li, Oded Padon, Sina Lin, Jessica Pointing, Auguste Hirth, Henry Ma, Jens Palsberg, Alex Aiken, Umut A. Acar and Zhihao Jia. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 and Xu, 2023Amanda Xu, Abtin Molavi, Lauren Pick, Swamit Tannu and Aws Albarghouthi. 2023. Synthesizing Quantum-Circuit Optimizers. Proceedings of the ACM on Programming Languages 7, PLDI (June 2023, 835--859). doi: 10.1145/3591254 that we have referred to repeatedly throughout this corpus, very recent work by Amy and Lunderville Amy, 2025Matthew Amy and Joseph Lunderville. 2025. Linear and Non-linear Relational Analyses for Quantum Program Optimization. Proceedings of the ACM on Programming Languages 9, POPL (January 2025, 1072--1103). doi: 10.1145/3704873 has present what amounts to the first inroads into hybrid classical-quantum optimisations. Developing comprehensive transformation rules for hybrid computations would present significant a significant advance for the field.

Among the myriad of options, we opt to conclude this thesis with the discussion of two particularly promising avenues for future work. The first (section 6.1) relates to increasing the expressivity of the pattern matching language; such an extended framework would also enable fast pattern matching directly on the persistent data structure D\mathcal{D} of chapter 5, rather than having to match patterns in each graph of DDD \subseteq \mathcal{D} separately.

The second (section 6.2) is a proposal to use the persistent data structure of chapter 5 for large scale distributed graph rewriting. With this, the optimisation of quantum computations could be distributed across multiple machines, potentially scaling up to high-performance computing (HPC) clusters and opening the door to optimisation capabilities that could significantly advance the state of the art of quantum circuit optimisation.

6.1. More expressive pattern matching

Pattern matching as defined in chapter 4 is the problem of finding pattern embeddings PGP \hookrightarrow G for patterns from a fixed set of patterns PPP \in \mathcal{P}. We are interested in lifting two limitations of this definition.

Firstly, it would be desirable to be able to define patterns that are not a concrete graph instance, but instead a (potentially infinite) family of graphs. Examples of such pattern families that could be useful in quantum computing are

  • “a sequence of gates that commute with each other”, or
  • “a subgraph that only contains Clifford gates”, or
  • “all operations within the body of a loop”.

To express these patterns as concrete graph instances would require an infinite number of graphs. The study of pattern languages that allow the expression of such higher-level graphs is a mature field of graph transformations, with tools such as GrGen.NET Geiß, 2006Rubino Geiß, Gernot Veit Batz, Daniel Grund, Sebastian Hack and Adam Szalkowski. 2006. GrGen: A Fast SPO-Based Graph Rewriting Tool. In Graph Transformations. ICGT 2006.. Springer Berlin Heidelberg, 383--397. doi: 10.1007/11841883_27 offering many advanced capabilities. It would be of great interest to establish what classes of pattern languages could be supported by generalisations of the state automaton approach presented in chapter 4.

Secondly, our approach currently only supports linear values, and thus in its current form is unsuitable for hybrid quantum-classical computations. Coincidentally, supporting non-linear values is very similar to finding embeddings PDP \hookrightarrow \mathcal{D} of patterns into the confluently persistent data structure D\mathcal{D} of chapter 5. The case of a non-linear value that is used multiple times in a computation is syntactically very similar to having to consider a value in D\mathcal{D} that may be connected to operations in different ways, depending on the variant of the “multiverse” of equivalent graphs that are stored simultaneously in D\mathcal{D}.

Pattern matching generalisation #

The following generalisation of pattern matching might be able to achieve these two goals whilst still being compatible with the state automaton approach that we presented. We suggest defining patterns and how they match using three concepts:

Constraints. A pattern is given by a set of constraints C1,,CnC_1, \ldots, C_n. They encode the conditions under which a pattern matches. They would for instance assert that two vertices are connected by an edge, or that a vertex is of a certain type. A pattern that is a concrete graph PP would then at a minimum have a constraint for each edge in PP.

Constraints correspond to edges (transitions) in the state automaton. Pattern matching proceeds by evaluating all outgoing constraints from the current state, and proceeds to the states for which the respective constraint is satisfied.

Indexing schemes. An indexing scheme assigns each object (e.g. vertex) in the patterns a unique key in K\mathcal{K}, and each object (e.g. vertex) in the input domain GG a unique value V\mathcal{V}. Embeddings of patterns into GG are then given by key-value maps KV\mathcal{K} \to \mathcal{V}, mapping keyed objects from patterns to objects in the input domain. Each constraint CC has a set of keys associated with it; CC can then be evaluated by passing it all the values in V\mathcal{V} bound to its keys.

Indexing schemes are designed to give overlapping patterns the same key on their overlap, so that the overlap must only be matched once. This models how in chapter 4, patterns are clustered into patterns that share the same contracted tree and are differentiated by their contracted string tuples only.

Key-value map expansion. Indexing schemes abstract away the pattern and input data in such a way that the pattern matcher only needs to keep track of key-value maps KV\mathcal{K} \to \mathcal{V}. These maps can be created recursively using an expansion function

expand(φ,D)={φ1,,φn}.\textrm{expand}(\varphi, D) = \{ \varphi_1', \ldots, \varphi_n' \}.

This provides all the ways in which the domain of definition dom(φ)dom(\varphi) of an index map φ\varphi can be extended. The returned set of new index maps should coincide with φ\varphi on dom(φ)dom(\varphi) but expand their domain of definition to include new keys K\mathcal{K}. By making it possible to extend φ\varphi in more than one way, we can model the existence of non-linear values (i.e. the index map could be extended to any of the operations that uses a certain value vv), as well as the fact that a persistent data structure such as D\mathcal{D} may be keeping track of multiple versions of the graph, and thus expand a key in multiple ways.

Execution of the pattern matcher #

Starting from an empty key-value map φ\varphi_\varnothing at the root state of the state automaton, the pattern matcher keeps track of a set of key-value maps, along with for each map the state it is in. It then proceeds by repeatedly performing the following two actions:

  1. Expand the domain of definition of a key-value map φ\varphi by calling expand\textrm{expand};

  2. Evaluate the constraints for a key-value map φ\varphi; if the constraint is satisfied, move φ\varphi to the next state, otherwise try another constraint. If no constraint is satisfied, delete φ\varphi.

The performance of the pattern matcher will be highly dependent on choosing a smart ordering of these two actions, as well as prioritising the right key-value maps to be expanded and evaluated.

With this proposal, it would appear possible to combine the fast state automaton-based approach of chapter 4 and its scaling to a very large number of patterns, with a more expressive pattern language and support for non-linear types as well as persistent graph rewriting. An implementation of this is currently being worked on in the open-source portmatching project, available on GitHub.

6.2. Massively parallel graph rewriting

Persistent data structures – and particularly fully and confluently persistent ones – are well-suited for distributed applications. In persistent data structures, data can always be added but never deleted, and is thus immutable. This removes the need for locks and synchronisation primitives across processes. Furthermore, using confluence, edits can be made concurrently in different processes and then eventually merged asynchronously, as follows:

The contributions presented in chapter 5 thus translate directly into a proposal for a massively parallel graph rewriting system. In summary, we have shown that graph rewrites can be tracked in a persistent data structure D\mathcal{D} in the form of edits δ\delta. New edits added to D\mathcal{D} can refer to previous edits, and thus create an acyclic edit history. Sets of edits D\mathcal{D} and D\mathcal{D}' can also be merged (confluence) and as a result, new edits that build on top of edits from both D\mathcal{D} and D\mathcal{D}' can be defined.

We describe in slightly more detail what a massively parallel graph rewriting architecture might look like.

Inter-process communication #

During the rewriting process, the set of processes that are involved must regularly broadcast the edits they have added to (their copy of) the data D\mathcal{D}. Such broadcasted edits must then be merged by the other processes into their respective local copies. This is required so that progress that is made by one process can be shared and expanded on top of by other processes.

Technologies such as message-passing interface (MPI) Dongar., 1993Hempel, R., Hey, A., Dongarra. 1993. MPI: a message passing interface. In Proceedings of the 1993 ACM/IEEE conference on Supercomputing. ACM Press, 878--883. doi: 10.1145/169627.169855 would be well-suited to such inter-process communications. To reduce the number of messages that senders and receivers must process, edits should not be broadcasted one-by-one, but rather grouped together. For this, we propose the notion of a salient edit, reflecting that an edit is deemed of importance.

Non-salient edits are not broadcasted as they are added to D\mathcal{D}. When, on the other hand, an edit δ\delta is deemed salient, it is broadcasted along with all its ancestors A(δ)A(\delta) (i.e. all edits that δ\delta depends on). As the edit history deepens, it might become inefficient to broadcast all the ancestry of an edit, in which case more advanced communication protocols would have to be devised.

Finally, a procedure must be put in place to identify identical edits that may be added and/or broadcasted by different processes to avoid deduplication. Hashing techniques and hash tables are well-suited for this kind of problem.

Process types #

At a minimum, the distributed graph rewriting system should distinguish between two types of processes.

The vast majority of processes would be rewrite factories. Their purpose is to create new edits, add them to D\mathcal{D} and broadcast them whenever they are deemed salient. These processes will be responsible for driving forward the search space exploration and, in the end, the optimisation. A good candidate for a rewrite factory is the pattern matching automaton of chapter 4 and its generalisation just described in section 6.1. Different processes may specialise in different transformation rule sets; others still could implement dedicated optimisations such as ZX-based optimisations or optimal Clifford synthesis (see discussion in section 2.2).

The other type of process would be a result extractor; a read-only process that runs the SAT-based optimisation and graph extraction algorithm of section 5.4. Such a process would run the computation at regular intervals to track the optimisation progress.

As the distributed architecture grows in complexity, more tasks and more process types may be required. It might for instance be desirable to have a process that identifies under-explored parts of the search space to direct rewrite factories in that direction.

Using such an architecture, it might be possible for the first time to scale quantum compilation workloads to large clusters of machines. This could significantly advance compilation performance of quantum programs, a particularly valuable contribution at a time where quantum computers are on the edge of utility. Nevertheless, such distributed systems often prove difficult to design and run successfully. Open questions include how to coordinate the search across processes in such a way that the most promising parts of the search space are explored whilst avoid work duplication; will communication become the bottleneck in the computation; what are the most effective transformation rules and cost functions to use; and what are the limits of modern SAT solvers on our problem of interest.