A Framework for Scalable Graph Rewriting

Chapter 6

A Framework for Scalable Graph Rewriting

With this chapter we have arrived at the core of this thesis: a proposal for a new quantum compilation framework. In summary, our claim is that given

  1. the challenge of scaling up quantum programs sizes to make the most of the computational capabilities of upcoming hardware (cf. sections 2.1 and 2.3) and
  2. the modularity and expressiveness that quantum compilers will require to simultaneoulsy express higher level abstractions, hardware primitives and interleaved quantum classical computation (cf. sections 3.3, 2.4, and 2.5),

graph rewriting is uniquely positioned to serve as the backbone of a quantum compilation framework.

Our proposal draws much from the design and techniques of classical compilers (cf. sections 5.1 and 3.3). Quantum, however, distinguishes itself in two ways, forming the cornerstones of our design. The focus on small, local graph transformations for quantum optimisation is justified by the groundbreaking work by Clément et al Cléme., 2023Alexandre Clément, Nicolas Heurtel, Shane Mansfield, Simon Perdrix and Benoît Valiron. 2023. A Complete Equational Theory for Quantum Circuits. In 2023 38th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), June 2023. IEEE, 1--13. doi: 10.1109/lics56636.2023.10175801 Cléme., 2024Alexandre Clément, Noé Delorme and Simon Perdrix. 2024. Minimal Equational Theories for Quantum Circuits. In Proceedings of the 39th Annual ACM/IEEE Symposium on Logic in Computer Science, July 2024. ACM, 1--14. doi: 10.1145/3661814.3662088. They showed that the rich algebraic structure of quantum circuits can be fully captured and expressed using local rewrite rules1. Our compiler can therefore restrict program manipulation to local transformations without losing expressiveness. This design choice in turn opens the door for large scale optimisation and compilation on parallel or even distributed hardware.

Equally important, the linear types of quantum computing (cf. section 3.3) significantly constrain the space of possible program transformations. Our contributions in this thesis highlight how these restrictions can be leveraged to create quantum-specific variants of classical compilation techniques that scale much more favourably. This makes approaches that are too expensive for classical compilers (cf. section 5.1) perfectly feasible2 in the context of quantum compilation.


  1. More precisely, they show that any two equivalent quantum circuits can be transformed into each other using a finite number of local rewriting rules. ↩︎

  2. Or at least, less unfeasible. ↩︎

6.1. Inspiration: Superoptimisation

This needs to be reorganised

Optimisations in compilers are typically built as passes. Compiler authors then design sequences of these passes to create the default optimisation pipelines that most end-users rely on. This is not easy. For a large project like LLVM, the end result looks something like this – a thousand lines of meticulously commented code, carefully crafted and hand tuned to handle every optimisation edge case and perform well on thousands of benchmarks. This is the phase-ordering problem Click, 1995Cliff Click and Keith D. Cooper. 1995. Combining analyses, combining optimizations. ACM Transactions on Programming Languages and Systems 17, 2 (March 1995, 181--196). doi: 10.1145/201059.201061​ – one of the hardest problems in classical compilation.

Quantum compilers are not immune to this, either. The same pass ordering sorcery can be found just as well in TKET, a leading quantum compiler Sivara., 2020Seyon Sivarajah, Silas Dilkes, Alexander Cowtan, Will Simmons, Alec Edgington and Ross Duncan. 2020. t|ket⟩: a retargetable compiler for NISQ devices. Quantum Science and Technology 6, 1 (November 2020, 014003). doi: 10.1088/2058-9565/ab8e92. It does not look quite as scary yet, but the code is growing and some passes are already being called three times within the default sequence. This is hard to maintain and any change and new feature comes at the risk of degrading the compiler performance on established use cases.

If we are serious about adopting classical compiler tooling for quantum computations, we will need to find a more convincing solution to this problem. Fortunately, classical compilation is a mature field that has already experienced (and solved!) most of the challenges that quantum compilers have faced, and will ever face, in peephole optimisation. Our proposal, described in chapter 6, combines two modern compilation techniques that mitigate this and which we will review now: Superoptimisation and Equality saturation.


Large compiler projects support a multitude of program representations and large sets of operations/instructions to be able to generate code for a wide array of devices and architectures. Designers of optimising compiler passes must thus necessarily put constraints on the program input format that the pass will accept. In all likelihood, the pass will furthermore only be applicable for certain cost functions or for a limited set of hardware target. This all leads to a proliferation of special purpose compiler passes that are bug-prone and must be carefully ordered to perform well across all the use cases of interest. On top of this, new instruction sets, architectures or new cost functions require new sets of compiler passes, in effect rebuilding the entire compilation pipeline.

As early as 1979, Fraser suggested deriving peephole optimisations automatically Fraser, 1979Christopher W. Fraser. 1979. A compact, machine-independent peephole optimizer. In Proceedings of the 6th ACM SIGACT-SIGPLAN symposium on Principles of programming languages - POPL ’79. ACM Press, 1--6. doi: 10.1145/567752.567753, an idea Massalin coined superoptimisation Massal., 1987Henry Massalin. 1987. Superoptimizer: a look at the smallest program. In Proceedings of the second international conference on Architectual support for programming languages and operating systems, October 1987. ACM, 122--126. doi: 10.1145/36206.36194. Instead of hard-coding the peephole optimisations as passes to be ordered and executed, useful valid program transformations are synthesised automatically. Early work used probabilistic verification, which meant that program transformations were generated ahead of time, and once verified manually, added to the compiler for use in optimisation Massal., 1987Henry Massalin. 1987. Superoptimizer: a look at the smallest program. In Proceedings of the second international conference on Architectual support for programming languages and operating systems, October 1987. ACM, 122--126. doi: 10.1145/36206.36194 Sands, 2011Duncan Sands. 2011. Super-optimizing LLVM IR. (November 2011). Retrieved on 13/01/2025 (LLVM Developer's meeting) from http://llvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdf. With advances in theorem proving techniques, program transformation synthesis and verification was further automated, resulting in end-to-end automatically generated and proven compiler transformations Bansal, 2006Sorav Bansal and Alex Aiken. 2006. Automatic generation of peephole superoptimizers. ACM SIGARCH Computer Architecture News 34, 5 (October 2006, 394--403). doi: 10.1145/1168919.1168906 Sasnau., 2017Raimondas Sasnauskas, Yang Chen, Peter Collingbourne, Jeroen Ketema, Jubi Taneja and John Regehr. 2017. Souper: A Synthesizing Superoptimizer. CoRR abs/1711.04422.

A particularly simple superoptimising compiler design was proposed in Jia, 2019Zhihao Jia, Oded Padon, James Thomas, Todd Warszawski, Matei Zaharia and Alex Aiken. 2019. TASO: optimizing deep learning computation with automatic generation of graph substitutions. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, October 2019. ACM, 47--62. doi: 10.1145/3341301.3359630, specially for the purposes of computation graph optimisation Fang, 2020Jingzhi Fang, Yanyan Shen, Yue Wang and Lei Chen. 2020. Optimizing DNN computation graph using graph substitutions. Proceedings of the VLDB Endowment 13, 12 (August 2020, 2734--2746). doi: 10.14778/3407790.3407857 in deep learning. The set of all small programs, up to a threshold, is generated ahead of time and partitioned into disjoint classes of equivalent programs. This concisely expresses every possible peephole optimisation up to the specified size: for every small enough subset of instructions of an input program, its equivalence class can be determined. Any replacement of that set of instructions with another program in the same equivalence class is a valid transformation, and thus a potential peephole optimisation.

This framework was adapted to quantum circuit optimisation in Xu, 2022Mingkuan Xu, Zikun Li, Oded Padon, Sina Lin, Jessica Pointing, Auguste Hirth, Henry Ma, Jens Palsberg, Alex Aiken, Umut A. Acar and Zhihao Jia. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 and separately in Xu, 2023Amanda Xu, Abtin Molavi, Lauren Pick, Swamit Tannu and Aws Albarghouthi. 2023. Synthesizing Quantum-Circuit Optimizers. Proceedings of the ACM on Programming Languages 7, PLDI (June 2023, 835--859). doi: 10.1145/3591254. On top of excellent performance, this approach is extremely flexible. For any supplied cost function, the compiler can explore all valid program transformations to find the sequence of transforms that minimise cost. This keeps the cost function-specific logic separate from the transformation semantics of the program, making it straightforward to replace or update the optimisation objective.

Furthermore, the set of supported primitives and architectures can easily be extended at any time by supplementing the equivalence classes with new programs. Any input using this extended set of primitives can thus be successfully compiled and any output program format constraints can be enforced by restricting the set of valid transformations.

The adaptation of superoptimisation to quantum optimisation of Xu, 2022Mingkuan Xu, Zikun Li, Oded Padon, Sina Lin, Jessica Pointing, Auguste Hirth, Henry Ma, Jens Palsberg, Alex Aiken, Umut A. Acar and Zhihao Jia. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433 and Xu, 2023Amanda Xu, Abtin Molavi, Lauren Pick, Swamit Tannu and Aws Albarghouthi. 2023. Synthesizing Quantum-Circuit Optimizers. Proceedings of the ACM on Programming Languages 7, PLDI (June 2023, 835--859). doi: 10.1145/3591254 is however showing scaling difficulties: unlike classical superoptimisation which is usually designed to optimise small subroutines within programs, e.g. focusing on arithmetic instructions, single instruction multiple data (SIMD) etc., the technique should in principle be able to optimise quantum programs in their entirety. This leads to much larger programs that superoptimisation does not scale well to. An orthogonal scaling challenge has also been observed: in Xu, 2022Mingkuan Xu, Zikun Li, Oded Padon, Sina Lin, Jessica Pointing, Auguste Hirth, Henry Ma, Jens Palsberg, Alex Aiken, Umut A. Acar and Zhihao Jia. 2022. Quartz: Superoptimization of Quantum Circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 2022. Association for Computing Machinery, 625--640. doi: 10.1145/3519939.3523433, optimisation performance was observed to improve markedly with larger sets of rewrite rules. However, each rewrite rule requires a separate run of pattern matching to find all possible applications. As a result, performance peaks at around 50 000 rewrite rules, after which the additional overhead from pattern matching becomes dominant, deteriorating the compilation results.

6.2. Overview

The framework we propose can be summarised by the diagram below. It is composed of two main components: pattern matching is tasked with finding all possible applications of rewrite rules from the set of pre-defined rules. Importantly, for this computation, we introduce an offline pre-processing step that builds a data structure that is amenable to pattern matching from the supplied set of rewrite rules. This allows us to obtain significant speedups in the exploration phase of the framework. The design of the pattern matching engine is described in more detail in section 6.3, while the original contributions with a detailed presentation of the pre-processing computation and an analysis of the asymptotic speedup are presented in chapter 4.

The second component is charged with the application and prioritisation of the discovered rewrites. Depending on the cost function and heuristics employed, it may select a subset of rewrites or any combination of multiple rewrites to produce zero, one or many new candidate programs that are equivalent to the input program. We discuss several rewriting strategies in section 6.4. Beyond that, we argue that optimisation based on local rewriting must be done in parallel to scale to large input sizes. For this purpose, section 6.4 presents a refinement of our framework for the distributed setting. To enable this, a new distributed data structure inspired by equality saturation is presented in chapter 5.

The simple heuristic driving our optimisation loop, consisting of pattern matching and rewrite application, relies on a priority queue, which orders all candidate programs by increasing cost function. This allows us to keep track of all the best candidates seen so far, with the top of the queue always containing the current best optimised program. Optimisation can thus be terminated at any time based on a timeout. This corresponds to a simple backtracking search, which greedily chooses the best candidate at each step, and retreats to optimising previous candidates once all better candidates have been exhausted. Improvements to the search heuristic are an obvious area for future work, which we do not explore in this thesis but briefly discuss in our concluding chapter, chapter 7.

Proposed quantum compiler optimisation framework

Overview of the proposed quantum compiler optimisation framework. The two main components (white square boxes in the figure) are discussed in details in sec. 4.2 and 4.3 respectively.

6.3. Pattern matching engine

As discussed in the review of superoptimisation (sec. 3.5), the adaptation of the technique to quantum computing benefits from the use of thousands of rewrite rules. However, the performance reaches a ceiling as the number of rewrite rules grows, when pattern matching becomes the bottleneck. This was the main motivation for the work in Mondada, 2024Luca Mondada and Pablo Andrés-Martínez. 2024. Scalable Pattern Matching in Computation Graphs. In 15th International Workshop on Graph Computation Models, which speeds up pattern matching by introducing a pre-computed state automaton data structure. The result is pattern matching in a runtime that is independent on the number of rewrite rules being considered – reaching a 20x speedup on datasets of practical interest.

We are intersted in a fast pattern matching engine because it sits at the center of the “rewriter” component that appears in the diagram above – and in particular on which the offline pre-processing phase is based. Let us zoom in a bit on the rewriter component to understand. The workings of the “Rewriter” (offline) and “Pattern matching” (online) components of the overview diagram presented earlier can be refined into the following schema.

Rewriter component of the proposed quantum compiler optimisation framework

Detailed view of the “Rewriter” and “Pattern matching” components of the optimisation framework.

Instead of taking ready-to-apply rewrite rules as input to the offline pre-computation phase, we now take a large set of quantum programs, typically enumerated automatically. To construct rewrite rules, we cluster the programs into sets of equivalent programs. This is particularly simple in the case of small quantum circuits, as the program can be equivalently represented by a unitary, which can for instance be hashed and put into buckets to gather semantically equivalent programs together.

The valid rewrite rules correspond to transformations between programs within the same equivalence class. One may consider all n2n^2 possible rewrite rules within a class of size nn, or alternatively pick a representative CC and only consider 2n22n - 2 rules to and from CC. An arbitrary rewrite between two elements of a class can then be expressed by composing two rewrite rules to and from CC1.

The set of all patterns of interest are then all left hand sides of the rewrite rules – i.e. every program in any of the equivalence classes. These are passed off to the pattern matching engine, that will first pre-process the patterns offline to build a pattern matcher, and then use the matcher to find all matches in the data. With a simple map from the left hand sides to the right hand sides of the rewrite rules, we can then convert every pattern match into a rewrite to be considered during optimisation.

The inner workings of the pattern matching algorithm are discussed in-depth for the use case of portgraph matching in chapter 4. In this section, we instead present pattern matching abstractions and a novel user interface that surfaces the capabilities of the pattern matching engine in a declarative and extensible way. This allows the user to define pattern matching languages that are tailored to their domain of application and the rewrite rules of interest, without going into the details of the pattern matching engine.

Indexing schemes #

We abstract away the domain of definition of patterns and the host data, i.e. the data being matched, using a structure that we call an indexing scheme. The main object of interest within indexing schemes are index maps, partial maps from a key set K\mathcal K to a value set V\mathcal V φ:KV.\varphi: \mathcal K \rightharpoonup \mathcal V. Pattern matches will be expressed as instances of index maps – K\mathcal K and V\mathcal V are thus typically chosen so that there are (obvious) injective maps PKP \hookrightarrow \mathcal K and DVD \hookrightarrow \mathcal V for any pattern PPP \in \mathcal P and data DDD \in \mathcal D in the application domain. For simplicity of presentation – and in a blatant abuse of notation – we will assume in this section that these injective maps are the identity, i.e. that always PKP \subseteq \mathcal K and DVD \subseteq \mathcal V. As a result, whenever the domain of defintion dom(φ)\textrm{dom}(\varphi) of φ\varphi contains PP and is injective on it, the index map restircted to PP can be viewed as a pattern embedding of PP into DD: φP:PD.\varphi|_P: P \hookrightarrow D.

Note that P=D\mathcal P = \mathcal D and K=V\mathcal K = \mathcal V is often a valid choice – this corresponds to concrete patterns that will match data exactly. Keeping the interface more general, however, allows for more expressive pattern languages that may capture entire subsets of D\mathcal D as a single pattern.

Index maps enable pattern matching through the repeated expansion of their domain of definition: starting from the empty indexing map φ:KV\varphi_\varnothing: \mathcal K \to \mathcal V with dom(φ)\textrm{dom}(\varphi_\varnothing), new index maps are created using an user-provided expansion function: expand(φ,D)={φ1,,φn}.\textrm{expand}(\varphi, D) = \{ \varphi_1', \ldots, \varphi_n' \}. Informally, this provides all the ways in which dom(φ)\textrm{dom}(\varphi) can be extended. This can be specified in terms of the following three properties:

  1. φφi\varphi \subseteq \varphi_i' for all 1in1 \leqslant i \leqslant n  (valid extensions).

  2. For all valid embedding f:PDf: P \hookrightarrow D such that φPf\varphi|_P \subseteq f, there exists 1in1 \leqslant i \leqslant n such that φiPf\varphi_i'|_P \subseteq f  (preserve all embeddings).

  3. dom(φi)dom(φ)\textrm{dom}(\varphi_i') \neq \textrm{dom}(\varphi) for all 1in1 \leqslant i \leqslant n  (progress must be made).

where we introduced φφ\varphi \subseteq \varphi' as a shorthand to mean that φ\varphi' is equal to φ\varphi when restricted to dom(φ)\textrm{dom}(\varphi): φdom(φ)=φdom(φ).\varphi'|_{\textrm{dom}(\varphi)} = \varphi|_{\textrm{dom}(\varphi)}.

Using this, the pattern matcher can proceed by keeping track of a set of index maps F\mathcal F that satisfy the invariant for all f:PD, there exists φF such that φPf\textrm{for all }f: P \hookrightarrow D, \textrm{ there exists } \varphi \in \mathcal F \textrm{ such that } \varphi|_P \subseteq f This holds trivially for F={φ}\mathcal F = \{\varphi_\varnothing\}. The expansion step FFφFexpand(φ,D)\mathcal F \mapsto \mathcal F \cup \bigcup_{\varphi \in \mathcal F} \textrm{expand}(\varphi, D) maintains this invariant, as a direct corollary of property 2. Inductive application of property 3 guarantees that all index maps with at most nn elements will be discovered after nn iterations. Writing F(i)\mathcal F^{(i)} for the index maps after the ii-th iteration and Δ=maxPPP\Delta = \max_{P \in \mathcal P} |P| for the maximal cardinality of the patterns, we can thus formulate the following “completeness” property for our pattern matcher: for all f:PD, there exists 1iΔ such that φF(i) and φP=f.\textrm{for all }f: P \hookrightarrow D, \textrm{ there exists }1 \leqslant i \leqslant \Delta \textrm{ such that }\varphi \in \mathcal F^{(i)}\textrm{ and }\varphi|_P = f.

To make sure, on the other hand, that only valid embeddings2 are present in F(i)\mathcal F^{(i)}, we introduce constraints.

Constraints and Predicates #

At its simplest, a constraint CC is a boolean-valued (higher-order) function that is used to filter out invalid index maps: C:(KV){0,1}C: (\mathcal K \rightharpoonup \mathcal V) \to \{0, 1\} Given a set of constraints C1,,CnC_1, \ldots, C_n, the idea is then to interleave index map expansions with pruning using the constraints F{fF  C1(f)Cn(f)}\mathcal F \mapsto \{ f \in \mathcal F\ |\ C_1(f) \wedge \cdots \wedge C_n(f) \} To obtain and efficient (and correct) implementation of pattern matching based on such constraints, it is however useful to specify some properties that the constraints should satisfy, in particular with respect to the extension of index maps3: is it valid to evaluate a constraint on index maps that are only partial embeddings? If so, on which partial maps can the constraint be successfully evaluated? And do index map extensions preserve constraint validity, or must constraints be evaluated repeatedly after each extension?

We constrain the expressivity of constraints with the introduction of predicates. A predicate Π\Pi is also a boolean-valued function4, but it is instead evaluated on a tuple of values in V\mathcal V: Π:Vp{0,1}.\Pi: \mathcal V^p \to \{0, 1\}. We refer to pp as the arity of the predicate. We then (re-)define constraints as tuples of a predicate Π\Pi (of arity pp), along with pp keys k1,,kpKk_1, \ldots, k_p \in \mathcal K. A constraint C=(Π,k1,,kp)C = (\Pi, k_1, \ldots, k_p) can then only be evaluated on an index map φ\varphi if {k1,,kp}dom(φ)\{k_1, \ldots, k_p\} \subseteq \textrm{dom}(\varphi). It is then given by C(φ)=Π(φ(k1),,φ(kp)).C(\varphi) = \Pi(\varphi(k_1), \ldots, \varphi(k_p)). Not only does this define a clear condition of when it is valid to evaluate a constraint on a partial index map, it also ensures that for all index maps φ1,φ2:KV\varphi_1, \varphi_2: \mathcal K \rightharpoonup \mathcal V such that {k1,,kp}dom(φ1)\{k_1, \ldots, k_p\} \subseteq \textrm{dom}(\varphi_1) and {k1,,kp}dom(φ2)\{k_1, \ldots, k_p\} \subseteq \textrm{dom}(\varphi_2), we have C(φ1)=C(φ2).C(\varphi_1) = C(\varphi_2).

This completes a bird’s eye view on how pattern embeddings are constructed and matched by the pattern matching engine. The key point is that this interface will allow us to match multiple patterns simultaneously by grouping patterns that share constraints together. Where necessary, pattern embeddings that satisfy conflicting constraints will be tracked separately using more than one set of index maps.

Making pattern matching efficient (aka. “implementation details”) #

The title of this section may tempt you to skip the next few paragraphs – don’t! There is more consideration sthat go into this than first meets the eye.

A reasonable proxy for the runtime of the pattern matcher is the total number of constraints that must be evaluated. This is the accumulation5 of the total number of constraints that the patterns being matched require, aggregated over all the index maps that we must keep track of throughout the computation. Our overall strategy to minimise this is summarised as the combination of the following three principles

  1. Minimise the number of index maps to keep track of by maximising index map overlaps between patterns.

  2. Express patterns using constraints that are shared or can be evaluated jointly across multiple patterns.

  3. Prune away candidate index maps that do not extend to valid embeddings as early as possible.

To an extent, these are of course application domain dependent and can only be achieved with the willing cooperation of a dedicated user6 – principle 1 is for instance best achieved through some form of canonicalisation of the patterns, such that similar patterns can share the same keys. However, it remains up to the interface designer to encourage this behaviour and ensure that the pattern matching engine can make the most of a well thought out implementation. Continuing our example, we have already seen that our interface requires each pattern and data to be mapped explicitly to common key K\mathcal K and value sets V\mathcal V, which is the first step towards canonical identifiers.

We further restrict the number of index maps that must be kept track of by refining the expansion function we introduced earlier to be able to specify the key kdom(φ)k \notin \textrm{dom}(\varphi) that an index map φ:KV\varphi: \mathcal K \rightharpoonup \mathcal V should be extended to. The set expandk(φ,D)={φ1,,φn}.\textrm{expand}_k(\varphi, D) = \{ \varphi_1', \ldots, \varphi_n' \}. must satisfy the following modified properties:

  1. φφi\varphi \subseteq \varphi_i' for all 1in1 \leqslant i \leqslant n  (valid extensions).

  2. for all valid embedding f:PDf: P \hookrightarrow D such that φPf\varphi|_P \subseteq f and kPk \in P, there exists 1in1 \leqslant i \leqslant n such that φiPf\varphi_i'|_P \subseteq f  (preserve all embeddings of kk).

  3. kdom(φi)k \in \textrm{dom}(\varphi_i') for all 1in1 \leqslant i \leqslant n.  (include kk).

The most obvious change is in the third property, where instead of requiring that the domain of definition is enlarged in some way, we now require explicitly that kk be added to the domain of definition. The other change is a weakening of property 2. Unlike the original expansion function, in this case the expansion is no longer “complete”, in the sense that for some f:PDf: P \hookrightarrow D with kPk \notin P, it may be that φiP⊈f\varphi_i'|_P \not\subseteq f for any φiexpandk(φ,D)\varphi_i' \in \textrm{expand}_k(\varphi, D), even though φPf\varphi|_P \subseteq f.

With the expandk\textrm{expand}_k function, we can extend index maps to cover exactly the subset of K\mathcal K required to evaluate a constraint of interest, thus avoiding to extend index map beyond the necessary keys (and accumulate multiple versions of them in F\mathcal F).

Now, how do we choose which constraints we consider, and in which order? This is key not only to minimising the number of index maps when using the expandk\textrm{expand}_k function, but also to maximise the number of constraints that can be shared across patterns.

Rather than relying on rigid decompositions of patterns into sets of constraints, we achieve this by selecting and applying one constraint at at time from the patterns. As we progress, we can pick constraints that will result in the best performance, dynamically and specially for the patterns under consideration.

Concretely, we establish a protocol whereby patterns are asked to nominate constraints that they would like to evaluate. The results are pooled together and the constraint that is shared by the most patterns is applied. The patterns can then be updated, conditioned on the application of the constraint and the process can be repeated until all patterns have been reduced to tautologies. Further benefitting performance, it suffices to run this once during the pre-processing phase ahead of any pattern matching, so that this optimisation implements principle 2 without incurring any runtime pattern matching cost.

To further maximise constraint overlaps, we introduce constraint classes. They are sets of constraints that the user can specify to indicate that the constraints share a logical relation – in statistics words, that the constraints are not independent. A particularly useful property of constraint classes would be a relation of pairwise mutual exclusion between elements of the class: the satisfaction of any of the constraints in the set precludes any other from being satisfied. More trivially, a constraint class can also group together constraints that should be evaluated jointly, for a more efficient implementation.

Patterns having an overlapping constraint class is thus a useful generalisation of shared constraints that is much more broadly applicable. This undoubtedly begs the question of how various constraints and constraint classes, possibly of greatly differing “value”, should be compared with and prioritised over one another – besides by counting the number of patterns that share them.

We propose a measure to quantify the “utility” of a constraint for matching a given pattern. Using this we can prioritise highly valuable constraints, which, when their evaluation is negative, result in early prunings of irrelevant index maps (principle 3). We define the expansion factor αC\alpha_C of a constraint CC as the expected relative change in the number of the index maps that results from conditioning on CC: αC=EF[{fFC(f)}F].\alpha_C = \mathbb{E}_{\mathcal F}\left[\,\frac{|\{f \in \mathcal F \mid C(f)\}|}{|\mathcal F|}\,\right]. This value is of course hair rippingly hard to compute precisely in most cases – presumably, it would require a model of the precise distribution of the index maps during pattern matching that would have to depend on the distribution that the data being matched is drawn from, along with all the previous constraints that have been applied.

We could spend the rest of this thesis (and then some) doing these computations, or we can choose to ignore all these intricacies, fix F\mathcal F to be the subset of the index maps on which CC is defined, and just take the frequentist’s interpretation of the ratio within the expectation value as the probability that the constraint is satisfied: αCPfF[C(f)]\alpha_C \approx \mathbb{P}_{f \sim \mathcal F}\left[\,C(f)\,\right] with fFf \in \mathcal{F} drawn uniformly at random. Much easier! From here on, we take the latter as the definition of αC\alpha_C.

By asking the user to provide some heuristic approximation of αC\alpha_C, we can gauge the effect on the number of index maps of picking a constraint and conditioning on it as roughly FαCF|\mathcal F| \to \alpha_C \cdot |\mathcal F|

We propose a further generalisation of this metric to quantify the value of any subset of constraints in a constraint class. For a constraint class B\mathcal B, we define the expansion factor of SB\mathcal S \subseteq \mathcal B as αS=EfF[{CSC(f)}].\alpha_{\mathcal S} = \mathbb{E}_{f \sim \mathcal F}\left[\, |\{ C \in \mathcal{S} \mid C(f)\}| \,\right]. In the (uninteresting) case of pairwise independent constraints in S\mathcal S, this simplifies to CSαC\sum_{C \in \mathcal S} \alpha_C, as one would expect.

Being a strict generalisation of the single constraint case, in the following (and in our implementation), we always consider expansion factors over sets of constraints. Importantly, however, we restrict expansion factors to only be defined over sets of constraints within a shared constraint class – this simplifies significantly the implementation as it does not require to handle any arbitrary combination of constraints, and makes constraint classes an important tool to express constraint dependencies in a way that the pattern matcher can reason about.

Lower values of α\alpha will result in a smaller number of index maps. For every constraint we are considering applying, we can query the constraint classes that it is part of and group constraints by class. We can then obtain the expansion factor associated with the constraints in each of the constraint classes and pick the class with the smallest expansion factor. The following pseudo-code summarises the constraint selection procedure. Given a list of patterns, it returns a list of constraints from one constraint class such that the expansion factor is minimised.

fn best_constraints(patterns: [Pattern]) -> [Constraint] {
    // For each pattern, collect all constraints that it nominates
    // and group by constraint class
    class_to_constraints = {}
    for p in patterns {
        for c in nominate_constraints(p) {
            for cls in constraint_classes(c) {
                constraints = class_to_constraints[cls] or []
                constraints.append(c)
                class_to_constraints[cls] = constraints
            }
        }
    }

    // Find class with smallest expansion factor
    best_constraints = None
    best_ef = None
    for (cls, constraints) in class_to_constraints {
        ef = expansion_factor(cls, constraints)
        if best_ef == None || ef < best_ef {
            best_constraints = constraints
            best_ef = ef
        }
    }
    return best_constraints
}

Finally, after selecting the constraints to be applied, the patterns are updated by applying the constraints one-by-one and constructing the pattern matcher for the (now simplified) patterns recursively:

fn build_matcher(patterns: [Pattern], root = None) -> Tree {
    if root == None {
        root = new_tree()
    }
    for c in best_constraints {
        new_patterns = [condition_on(p, c) for p in patterns]
        child = root.add_child(c)
        build_matcher(new_patterns, child)
    }
    return root
}

Of course, the real implementation must keep track of additional information, such as the set of bound keys and the pattern matching at each node. Nodes that represent the same pattern matching state are also merged to reduce the size of the data structure.

This has certainly all been very abstract – and probably rather confusing. To illustrate the ideas we just presented, the next section presents an implementation of this interface for string pattern matching. Among others this will include concrete implementations for the various functions (nominate_constraints, constraint_classes, expansion_factor, condition_on etc.) called in the pseudo-code.

An example #

The pattern matching framework we are proposing can of course model the matching problems on graphs and compiler IRs that we are mostly interested in. It can however be applied just as well to other domains.

Among the simplest possible use cases of this is pattern matching on strings. Over the course of the previous paragraphs, we introduced all the main ideas and concepts using a mathematical language. In this section, we will use string matching as an excuse to revisit the same ideas, this time staying much closer to the perspective of the user and the actual programming interface as implemented in the Rust library portmatching7. I suspect the reader will fall into one of two camps – either excited that the rumblings in this thesis might at last show some early signs of coherence, or utter disbelief that the presentation is about to become more applied still. In either case, rest assured that no knowledge of Rust is expected – we will mimick the real interface as much as possible, but the “code” we will write will be straight-forward pseudo-code8

Let us start with the indexing scheme. For this we propose integer valued key and value sets KN\mathcal K \simeq \mathbb N, VN\mathcal V \simeq \mathbb N. To distinguish between K\mathcal K and V\mathcal V, we will write their elements as PatternPos(i) and DataPos(i) respectively. We use the indices to refer to the position of characters within the pattern and data strings.

For a pattern pp and a data string DD with P|P| and D|D| characters respectively, a pattern embedding is thus a map {PatternPos(i)0i<P}{PatternPos(i+s)0i<P}\{\,\texttt{PatternPos}(i) \mid 0 \leqslant i < |P| \}\to\{\texttt{PatternPos}(i + s) \mid 0 \leqslant i < |P|\} that maps PatternPos(i) to DataPos(s + i) for some sNs \in \mathbb N and such that the pattern matches the characters in DD between positions ss and s+Ps+|P|, i.e. P=D[s:s+i]P = D[s:s+i].

To define the indexing scheme, we provide the following two functions:

fn required_bindings(key: PatternPos) -> [PatternPos] {
    if key == PatternPos(0) {
        return []
    } else {
        return [PatternPos(0)]
    }
}

fn list_bind_options(
    key: PatternPos, data: String, known_bindings: Map<PatternPos, DataPos>
) -> [DataPos] {
    if key == PatternPos(0) {
        // Every position in data is valid
        return [DataPos(i) for i in [0, 1, .. data.len() - 1] ]
    } else {
        PatternPos(pattern_offset) = key
        DataPos(root_binding) = map.get(PatternPos(0))
        if root_binding + key < data.len() {
            // Valid position is obtained by adding offset to root binding
            return [DataPos(root_binding + pattern_offset)]
        } else {
            // The only valid position would be beyond the end of data
            return []
        }
    }
}

The first is independent of the data string being matched. It is thus sufficient to call it once for each pattern key in the offline precomputation phase. The keys returned are the dependency of the key passed as argument: they must be bound before key can itself be bound. This defines a dependency graph of all key bindings, from which a total order of key bindings can be determined (assuming the dependency graph is acyclic). In our case, we declare that before any index i>0i > 0 can be bound, first a binding for the index 00 must be provided. The latter index, on the other hand, can be bound without requiring any prior bindings.

The second function captures the binding logic proper and corresponds to the expandk\textrm{expand}_k function introduced in the previous section. The user implementing this function may safely assume that all required bindings, as returned by required_bindings, have already been bound and can be accessed using the binding map passed as third argument. We see in the implementation why we made every other binding depend on PatternPos(0): once the latter is bound (to any position in the data string), all other pattern posistions resolve to a unique DataPos9, making the implementation of list_bind_options very simple.

We now turn to the set of valid predicates, from which pattern constraints are defined. To add a bit of spice here (and incidentally support string matching capabilities more powerful than regular expressions), we consider a somewhat extended pattern language. Suppose our data strings are drawn from an alphabet Σ\Sigma, i.e. DΣD \in \Sigma^*. Instead of taking pattern strings in Σ\Sigma^*, we instead extend the alphabet with a (disjoint) set of symbols XX and take patterns in P(ΣX)P \in (\Sigma \cup X)^*. Characters of a pattern in Σ\Sigma must match the character they are mapped to in DD exactly. Characters in X\mathcal X, meanwhile, can match any character; however, any two occurences of the same symbol in a pattern must map to identical characters in DD.

In our implementation, we fix Σ\Sigma to be lowercase letters 'a''z' and identify symbols in X\mathcal X by prefixing a character with $, e.g. $x and $y. For instance, the pattern ab$xc$x will match fooabccc starting from position 3 (i.e. mapping PatternPos(0) to DataPos(3)), but will not match abccd, as $x would have to be assigned once to c and once to d.

To capture these matching semantics we introduce two predicates10

  • ConstVal(c) for all c in Σ\Sigma
  • BindingEq

The first predicate is of arity 1: given an index DataPos(i), it will evaluate to true if d[i] == c is satisfied. This corresponds to matching characters of the pattern string within Σ\Sigma. On the other hand, BindingEq, of arity 2, compares the characters at two positions DataPos(i) and DataPos(j) for equality: d[i] == d[j]. Such a predicate is used for every occurence of a symbol of X\mathcal X in the pattern beyond the first one.

We provide this predicate evaluation in the simple function check.

fn check(predicate, args: [DataPos], data: String) -> bool {
    if predicate == ConstVal(c) {
        [DataPos(i)] = args
        return d[i] == c
    } else {  // predicate == BindingEq
        [DataPos(i), DataPos(j)] = args
        return d[i] == d[j]
    }
}

We then define our pattenrs by a set of constraints that must be satisfied. Each constraint is expressed by a predicate along with one or two pattern keys PatternPos(i), indicating which characters the predicate applies to. Continuing the example of the pattern, ab$xc$x, we can express it by the set of constraints

[
  (ConstVal('a'), PatternPos(0)),
  (ConstVal('b'), PatternPos(1)),
  (ConstVal('c'), PatternPos(3)),
  (BindingEq(PatternPos(2), PatternPos(4))
]

Note that in this latter representation the name of the variable `xx is no longer specified – this simplifies the pattern. It will also increase the overlap of contraints between patterns by removing duplicates that differ in variable name only.

The task of the pattern matcher is then, given some data string, to find a binding of the positions between PatternPos(0) and PatternPos(4) to some data positions DataPos(i)DataPos(i + 4) for some i such that all constraints above are satisfied11.

A concrete decomposition of the pattern would include the ordering in which the constraints should be evaluated. For all but the simplest cases, however, fixing such a constraint decomposition and the evaluation order is likely to yield inefficient pattern matchers. As we discussed, the API instead expects the pattern decomposition to be provided through the functions nominate_constraints, constraint_classes, expansion_factor and condition_on.

For the implementations of nominate_constraints and condition_on, we rely on the ConstraintPattern type provided by portmatching. Provided with the set of constraints above, an instance of ConstraintPattern will keep track of which constraints have already been satisfied through successive calls to condition_on. Calls to nominate_constraints will return the set of constraints that are still left to be satisfied.

To conclude, we must indicate which constraints should be grouped together and how to estimate their value. We note that when evaluated on the same DataPos(i), any two ConstVal constraints are mutually exclusive. We can interpret a BindingEq arg1 arg2 constraint similarly, by viewing it as identical to a ConstVal(c) arg2 constraint, where c is the character at the position of arg1. We hence choose to group together constraints that apply to the same position. We introduce for this the classes AtPositionClass(i), for integer i.

fn constraint_classes(predicate, args: [DataPos]) -> [ConstraintClass] {
    if predicate == ConstVal {
        [DataPos(i)] = args
        return [AtPositionClass(i)]
    } else {  // predicate == BindingEq
        [DataPos(i), DataPos(j)] = args
        // assuming i < j, and that DataPos(i) is bound before DataPos(j)
        return [AtPositionClass(j)]
    }
}

To simplify here, we chose to only assign BindingEq constraints to a single class – this allows us to assume that the first key was already bound, and therefore treat the constraint in the same way as ConstVal constraints. We could also assign BindingEq to the two classes corresponding to the two positions it applies to, but it then becomes harder to give a good estimate for the expansion_factor. Assuming every character in Σ\Sigma is equally likely to occur, this gives the straight-forward approximation of the expansion factor α\alpha12

fn expansion_factor(constraints: [Constraint]) -> f64 {
    return 1.0 / 26.0 * constraints.len()
}

There is still the branching factor!

Rewriting rule used in practice #


  1. A bit like travelling by TGV↩︎

  2. Given that we called the previous property completeness, we could call this soundness↩︎

  3. It would be possible in theory to delay all evaluations of the constraints until the end of the pattern matching process, but not without incurring a blow up of F|\mathcal F|↩︎

  4. …and will also be used to filter index maps, just like constraints. ↩︎

  5. Or an “integral”, as an engineer would call it. ↩︎

  6. Thankfully, this often happens to be ourselves. ↩︎

  7. Available on crates.io↩︎

  8. Call it pseudo-Rust if you will – pseudo-code with a Rust-inspired syntax, but simple enough for anyone to follow. The complete, working implementation of this example is available within the portmatching crate in the module portmatching::concrete::string↩︎

  9. That may or may not exist in DD↩︎

  10. To be precise, ConstVal is an entire family of predicates. ↩︎

  11. Note: that the bound DataPos indices will form a contiguous interval is already guaranteed by our design of the indexing scheme – replacing with an indexing scheme allowing non-contiguous but monotonic indices, we would obtain pattern matching on string subsequences. ↩︎

  12. A better approximation in expansion_factor should take into account that multiple BindingEq constraints can result in the same constraint, if the data contains multiple repetitions of the same character. One could for example treat the two constraint types independently, and assume that BindingEq constraints are satisfied with probability 1/261/26, independently of one another. ↩︎

6.4. Rewriting Strategies and Distributed Architecture