Bounding the search space size

5.5. Bounding the search space size

We show in this section that under some assumptions on the GTSs that hold in the use cases of interest to quantum compilation, there is a provable gap between the size of the search space of all reachable graphs in the GTS and the size of the corresponding confluently persistent data structure D\mathcal{D}.

Let us introduce first the notion of overwriting rewrites.

Definition 5.3Overwriting rewrite

For two rewrites r1r_1 and r2r_2, we say that r2r_2 overwrites r1r_1, written r1r2r_1 \twoheadrightarrow r_2, if the deletion set V2V^-_2 of δ2\delta_2 includes a vertex of the vertex set V1V_1 of the replacement graph of r1r_1

V2V1.V^-_2 \cap V_1 \neq \varnothing.

The definition can identically be applied to events. In this case, the overwriting events are precisely given by the parent-child relation: the set of all overwriting events of δD\delta \in \mathcal{D} is by definition the set of parents P(δ)P(\delta) of δ\delta.

Our argument relies on the comparison of asymptotic bounds for the sizes of two sets GΔ\mathcal{G}_\Delta and DΔ\mathcal{D}_\Delta, which we now define. Consider a GTS and an input graph GG. A graph GG' is reachable from GG within depth Δ0\Delta \geqslant 0 if there is a sequence of rewrites r1,,rr_1, \dots, r_\ell in the GTS from GG to GG' such that all subsequences rβ1,,rβkr_{\beta_1}, \dots, r_{\beta_k} formed of overwriting rewrites rβirβi+1r_{\beta_i} \twoheadrightarrow r_{\beta_{i + 1}} have length at most kΔk \leqslant \Delta.

The set GΔ\mathcal{G}_\Delta is the set of all graphs reachable within depth Δ\Delta. We derive:

  1. a lower bound on the size of GΔ|\mathcal{G}_\Delta|, the space of all graphs reachable in at most Δ\Delta rewrites from some input GG, and
  2. an upper bound on the size of the equivalent confluently persistent data structure DΔ\mathcal{D}_\Delta, i.e. such that
    GΔ={flat(D)DΓ(DΔ)}.\mathcal{G}_\Delta = \{ flat(D) \mid D \in \Gamma(\mathcal{D}_\Delta) \}.

In order to obtain bounds, we will introduce hypotheses that the GTSs must satisfy. Throughout this section, we will illustrate and motivate the restrictions that they impose in the following two use cases.

Use case 1: ζ\zeta-complete GTS #

The first GTS we consider can be defined on any graph domain that has a notion of graph size (e.g. based on number of nodes, number of edges, etc.1) and for any graph size ζ\zeta. The GTS is such that for any subgraph HGH \subseteq G of size H=ζ|H| = \zeta, there is at least one transformation rule in the GTS that matches HH. We will call this case CompleteGTS.

This is the use case of quantum superoptimisation discussed in section 3.1 and used for benchmarking in section 4.7. In those cases, the transformation rules are obtained by enumerating all small circuits up to a certain size ζ\zeta, thus guaranteeing that any subcircuit of size ζ\zeta will be matched by the GTS.

Note that there is also an (obvious) upper bound on the number of transformation rules that can match on any given subgraph: the total number of transformation rules in the GTS.

Use case 2: single-rule GTS in a uniform domain #

At the other extreme of the GTS spectrum, we can consider a GTS made of a single (arbitrary) transformation rule. In this case, we require that graphs are drawn from a domain uniformly at random, so that for any subgraph HGH \subseteq G, all patterns of size H|H| are equally likely. We will call this case SingleRuleGTS.

In this case, we will not show that our hypotheses hold for all inputs, but rather that they hold with a high probability. We will phrase our statements as a function of ϵ>0\epsilon > 0 and will require that they hold with probability 1ϵ1 - \epsilon for randomly drawn GG.

This regime is interesting as it is the simplest instance of problem domains for which few assumptions can be made about the GTS themselves, but all inputs are expected to be equally likely.

Lower bound on the naive search tree #

The event history of the set of graphs GΔ\mathcal{G}_\Delta defines a tree TΔT_\Delta, where GΔ\mathcal{G}_\Delta are the nodes and GpGΔG_p \in \mathcal{G}_\Delta is the parent of GcGΔG_c \in \mathcal{G}_\Delta if there is a rewrite rr rewriting GpG_p to GcG_c in the GTS. Paths in TΔT_\Delta are sequences of rewrites. We call TΔT_\Delta the naive search tree of the GTS. We wish to derive a lower bound for TΔ|T_\Delta|.

Graph partitioning #

For fixed search depth Δ\Delta, let n>0n > 0 be the largest integer such that for all graphs GTΔG' \in T_\Delta, there exists disjoint subgraphs Π1,,Πn\Pi_1, \ldots, \Pi_n of GG' that satisfies the following property, for all 1in:1 \leq i \leq n:

there exists a rewrite in the GTS that can be applied to Πi\Pi_i.

Hypothesis 1Linear scaling of nn

For a fixed GTS, a fixed depth Δ\Delta and a family of input graphs GG, we have the scaling n=Θ(G)n = \Theta(|G|).

We conjecture that this scaling holds for many GTSs of interest:

Proposition 5.5
Hypothesis 1 holds for CompleteGTS and for SingleRuleGTS probabilistically.

In CompleteGTS, it suffices to partition any input GG into n=Gζ=Θ(G)n = \lfloor \frac{|G|}\zeta \rfloor = \Theta(|G|) disjoint subgraphs of size at least ζ\zeta. Each subgraph will match a rule of the GTS by definition.

For SingleRuleGTS, let ϵ>0\epsilon > 0. Let L|L| be the size of the left hand side LL of the GTS rule. By assumption, for any subgraph HH of size L|L| of an input GG, there is a constant probability pp that the rule matches HH. For a subgraph HH of size kLk |L|, the probability of the rule not matching in HH is (1p)k(1 - p)^k. Picking

k>11+δlnϵ+lnLGln(1p)k > \frac1{1 + \delta}\frac{\ln \epsilon + \ln {\frac{|L|}{|G|}}}{\ln (1 - p)}

for some δ>0\delta > 0 ensures that whenever δ>lnkkln(1p)\delta > - \frac{\ln k}{k \ln (1 - p)} (i.e. for kk large enough),

klnkln(1p)>(1+δ)k>lnϵ+lnLGln(1p)k>lnϵnln(1p), \begin{aligned} &k - \frac {\ln k}{\ln (1 - p)} > (1 + \delta)k > \frac{\ln \epsilon + \ln {\frac{|L|}{|G|}}}{\ln (1 - p)}\\ \Leftrightarrow\quad &k > \frac{\ln{\frac \epsilon n}}{\ln (1 - p)}, \end{aligned}

where n=GkLn = \frac{|G|}{k |L|} was chosen. It follows that a partition of GG into n\lfloor n \rfloor disjoint subgraphs of size at least kLk |L| satisfies the hypothesis with probability 1ϵ1 - \epsilon.

Lower bound on TΔ|T_\Delta| #

Fix the tree depth Δ\Delta and the GTS. Any rewrite from the GTS removes at most a constant KK number of vertices from the graph it applies on. Thus, Any graph in TΔT_\Delta is at least of size (1KΔ)G=Θ(G)(1 - K\Delta)|G| = \Theta(|G|). Let nn' be the smallest value of nn for a graph GminGΔG_{min} \in \mathcal G_\Delta. Whenever hypothesis 1 applies, we have n=Θ(Gmin)=Θ(G)n' = \Theta(|G_{min}|) = \Theta(|G|).

For each GTΔG' \in T_\Delta and Πi\Pi_i, pick a rewrite ri(G)r_i(G') that applies to Πi\Pi_i and let RR be the set of all such rewrites

RΔ={ri(G)1in,GTΔ}.R_\Delta = \{ r_i(G') \mid 1 \leq i \leq n', G' \in T_\Delta \}.

We can consider the subtree TΔTΔT'_\Delta \subseteq T_\Delta that only contains graphs obtained by applying rewrites in RΔR_\Delta.

For Δ=1\Delta = 1, the search tree TΔT'_\Delta will contain 2n2^{n'} graphs: for each subgraph Πi\Pi_i of the input graph GG, we can choose to either apply ri(G)r_i(G) or not2.

By repeating Δ\Delta times the search tree of size 2n2^{n'} for Δ=1\Delta = 1, we obtain a lower bound

GΔ=TΔTΔ=(2n)Δ.|\mathcal{G}_\Delta| = |T_\Delta| \geqslant |T'_\Delta| = (2^{n'})^\Delta.

We frame this result as the following proposition.

Proposition 5.6Lower bound for GΔ|\mathcal{G}_\Delta|
The naive search tree size GΔ|\mathcal{G}_\Delta| is in Ω(2Δn)\Omega(2^{\Delta \cdot n'}).

As a result, for any GTS satisfying Hypothesis 1 the size of GΔ\mathcal G_\Delta grows at least exponentially with input graph size GG and search depth Δ\Delta.

Upper bound on the factorised search space #

Consider two rewrites r1r_1 and r2r_2. If neither overwrites the other, i.e. r1↠̸r2r_1 \not\twoheadrightarrow r_2 and r2↠̸r1r_2 \not\twoheadrightarrow r_1, then the order in which they are applied is irrelevant (see also Proposition 5.2). The persistent data structure D\mathcal{D} uses this symmetry explicitly when exploring the set of reachable graphs.

This drastically reduces the size of the event history of DΔ\mathcal{D}_\Delta. The event history defines a directed acyclic graph FΔF_\Delta that is the equivalent for DΔ\mathcal{D}_\Delta to the naive search tree TΔT_\Delta of GΔ\mathcal{G}_\Delta. The vertices of FΔF_\Delta are the flattened histories of events in DΔ\mathcal{D}_\Delta:

V(FΔ)={flat({δ})δDΔ},V(F_\Delta) = \{flat(\{\,\delta\,\}) \mid \delta \in \mathcal{D}_\Delta\},

with an edge from flat({δp})flat(\{\,\delta_p\,\}) to flat({δc})flat(\{\,\delta_c\,\}) if there is a parent-child relation δpP(δc)\delta_p \in P(\delta_c). We call FΔF_\Delta the factorised search space of DΔ\mathcal{D}_\Delta.

By construction, any graph GV(T)G' \in V(T) in the naive search tree maps injectively to a subgraph SFΔS \subseteq F_\Delta of the factorised search space, given by the subgraph of FF induced by the rewrites on the path from GG to GG' in TT.

Whereas our earlier discussion was focused on proving a lower bound for the size of the search tree, we now show an upper bound on the number of graphs FΔ|F_\Delta| in the factorised search space.

Graph covering #

Instead of considering partitions of graphs in TΔT_\Delta as we did above, we now consider coverings of graphs GG' in FΔF_\Delta, i.e. a set of subgraphs Γ1,,Γn\Gamma_1, \ldots, \Gamma_n such that their union is GG' but that might not be disjoint.

Let mm, ss and γ\gamma be parameters and fix a covering Γ1,,Γm\Gamma_1, \ldots, \Gamma_m for each graph GV(FΔ)G' \in V(F_\Delta) such that:

  • for all GG' and all 1im1 \leqslant i \leqslant m, there are at most ss applicable rewrites to the ii-th covering set Γi\Gamma_i of GG'. Furthermore, all rewrites within Γi\Gamma_i are mutually exclusive, i.e. they modify a shared subgraph so that it is never possible to apply more than one rewrite among the (up to ss) applicable ones3;
  • for all GG' and for all rewrites rr that apply to GG', there is 1im1 \leqslant i \leqslant m such that rr applies to Γi\Gamma_i (i.e. the matching subgraph of rr is fully contained within one of the coverings). Furthermore, the matching subgraph of rr overlaps with at most γ\gamma distinct Γi\Gamma_i;
  • for all rewrites rr that apply to the covering set Γi\Gamma_i of a graph GG', the image r(Γi)Γir(\Gamma_i) \subseteq \Gamma_i' must be a subgraph of the ii-th covering subgraph Γi\Gamma_i' of r(G)r(G').

The first condition is satisfied whenever the size of the coverings can be bounded: in that case ss can be chosen based on the number of distinct subgraphs that can be contained in a covering set and the number of rules that can apply to each. The second condition is related to the connectivity between the covering sets: γ\gamma can thus often be derived by considering how many neighbours a covering set has, and how many of those neighbours can a match of a GTS rule span.

The third condition above can be understood as “rewrites must preserve the coverings”. In other words, the coverings are chosen such that a graph mutation produced by the application of a rewrite rE(FΔ)r \in E(F_\Delta) on GG is always contained within a single covering subgraph of r(G)r(G').

Hypothesis 2Linear scaling of mm and constant s,γs, \gamma

For a fixed GTS, a fixed depth Δ\Delta and a family of input graphs GG, we have the scaling m=Θ(G)m = \Theta(|G|) and s=Θ(1)s = \Theta(1) and γ=Θ(1)\gamma = \Theta(1).

These conditions along with Hypothesis 2 are somewhat restrictive and future work should explore how to relax them. For our use cases CompleteGTS and SingleRuleGTS, we restrict our considerations to the special case where the graph domain is quantum circuits. We make the further simplifying assumptions (similar results can be obtained with variations on these assumptions)

  • transformation rules have two-qubit circuits as left and right hand sides, and
  • the number of qubits on the inputs is fixed (i.e. the number of gates on each qubit scales with circuit size).

Define \ell to be the largest number of gates on any one qubit in the left hand sides of the GTS transformation rules. Consider a partition of the gates on each qubit into sequences of \ell gates. We can obtain a covering of a quantum circuit GG by considering covering sets Γ1,,Γm\Gamma_1, \ldots, \Gamma_m defined for all 1im1 \leq i \leq m such that for all matches HH of a left hand side of a transformation rule of the GTS, HΓiH \subseteq \Gamma_i if and only if there is a gate vv in HH such that vv is in the ii-th sequence of \ell gate on some qubit in GG. Imposing the condition that rewrites must preserve coverings, the covering of the input fixes the covering of all reachable graphs GG' in the GTS.

Proposition 5.7
Restrict the graph domain to quantum circuits, the GTS to two-qubit rules and the inputs to a fixed number of qubits. Then hypothesis 2 holds for CompleteGTS and for SingleRuleGTS.

Let GG be the input circuit with qq qubits. Consider the covering Γ1,,Γm\Gamma_1, \dots, \Gamma_m of GG as constructed above. By construction m=N/=Θ(G)m = \lceil N / \ell \rceil = \Theta(|G|), where NN is the maximum number of gates on a qubit of GG.

The covering set Γi\Gamma_i contains the set ViV_i of gates composed of the ii-th sequence of \ell gates for each qubit in GG. Furthermore, for all vViv \in V_i, if HH is a match of a two-qubit rule that contains vv, then HH may contain at most 212\ell - 1 other gates. Hence by construction, Γi2V=2q2=Θ(1)|\Gamma_i| \leqslant 2\ell|V| = 2q\ell^2 = \Theta(1). This is a constant and thus there is a constant s=Θ(1)s = \Theta(1) such that for all 1im1 \leqslant i \leqslant m there are most ss matches of a two-qubit rule that intersect Γi\Gamma_i.

Finally, any match HH spans two qubits; the gates on each qubit (at most \ell) may belong to at most two distinct sequences of \ell gates of that qubit. Thus, any match HH spans at most γ=4\gamma = 4 distinct covering sets. These arguments made no assumption on properties of the rule set and thus apply equally to CompleteGTS and SingleRuleGTS.

Upper bound on FΔ|F_\Delta| #

The preservation of coverings under rewrites allows us to consider a covering of V(FΔ)V(F_\Delta): for each 1im1 \leqslant i \leqslant m, let FΔ(i)V(FΔ)F_\Delta^{(i)} \subseteq V(F_\Delta) be the set of graphs in V(FΔ)V(F_\Delta) that are the result of a rewrite of its ii-th covering subgraph. Every graph in V(FΔ)V(F_\Delta) is the result of a rewrite on some covering subgraph ii, or is the input graph GG. So, from a bound UΔFΔ(i)U_\Delta \geqslant |F_\Delta^{(i)}| for all 1im1 \leqslant i \leqslant m, we can obtain a bound

V(FΔ)1+mUΔ.|V(F_\Delta)| \leqslant 1 + m \cdot U_\Delta.

The bound UΔU_\Delta can be obtained recursively: U1=sU_1 = s upper bounds by definition the number of rewrites in any covering subgraph of the root graph GG, and thus the number of graphs in F1(i)F_1^{(i)}. We then proceed by induction for 1<δΔ1 < \delta \leqslant \Delta.

A rewrite rE(FΔ)r \in E(F_\Delta) overlaps with at most γ\gamma other covering subgraphs. It can overwrite at most one previous rewrite for each subgraph, and thus will have at most γ\gamma parent graphs in sets FΔ1(i1),,FΔ1(iγ)F_{\Delta-1}^{(i_1)}, \cdots, F_{\Delta-1}^{(i_\gamma)}. Each of the γ\gamma sets is of size at most Uδ1U_{\delta - 1}. Furthermore, there are at most ss rewrites in any covering subgraph. We thus obtain the recursion:

UδsUδ1γ,U_\delta \leqslant s \cdot U_{\delta - 1}^\gamma,

Unrolling the recursion, we can write this as

Uδs1+γ+γ2++γδ1=sγδ1γ1=sΘ(γδ1).U_\delta \leqslant s^{1 + \gamma + \gamma^2 + \dots + \gamma^{\delta - 1}} = s^{\frac{\gamma^\delta - 1}{\gamma - 1}} = s^{\Theta(\gamma^{\delta - 1})}.

Recalling that by construction FΔ=DΔ|F_\Delta| = |\mathcal{D}_\Delta|, we obtain:

Proposition 5.8Upper bound for DΔ|\mathcal{D}_\Delta|

The factorised search space size DΔ|\mathcal{D}_\Delta| is in msΘ(γΔ1)m \cdot s^{\Theta(\gamma^{\Delta - 1})}.

Discussion and empirical exploratory analysis #

We have derived bounds on the size of the search spaces and shown that under some assumptions on the properties of the GTS, the factorised search space grows linearly in the size of the input graph GG. This stands in stark contrast to the lower bound of the naive search tree, which scales exponentially with the size of the input graph.

However, when considering the overall optimisation problem of finding the optimal solution over the set of reachable graphs in a GTS, the exponential overhead does not disappear: it is rather shifted to the extraction phase that relies on a SAT solver. It is therefore an open question whether the factorised search space can be used to improve optimisation problems on GTSs.

To this end, we devise a simple numerical experiment that assesses the potential of using the unfolding construction as presented in this chapter in the context of quantum computation optimisation.

The toy problem.  We consider a very simple circuit optimisation problem that is desiged to require a deep search space (i.e. a large number of rewrites) to be solved. This will exacerbate the scaling difference between an optimiser that must traverse the naive search space and another that relies on the factorised representation instead.

The inputs are quantum circuits composed of two-qubit CX\textit{CX} and single-qubit Rz\textit{Rz} rotation gates. The angles of the rotations are not relevant and set randomly. They are of the following form:

i.e. each pair of subsequent qubits have 2 CX\textit{CX} gates at either end and 10 Rz\textit{Rz} rotation in-between, on the control qubits of the CX\textit{CX} gates. These circuits admit a very simple optimisation that can be expressed by the following two transformation rules:

Given the objective of minimising the number of CX\textit{CX} gates, the optimiser must commute the leftmost CX\textit{CX} gates through all of the rotation gates, until the two CX\textit{CX} on each qubit are adjacent and cancel out. We study the performance of the optimisers as we increase the number 2q2q of qubits in the circuit.

Optimisers.  We define two optimisers. Badger is a backtracking search through the naive search space of reachable graphs in the GTS: starting from the input, the search space is expanded by computing all possible rewrites at a given state. States with the lowest cost function are processed first. This is similar to an A* search Hart, 1968Peter Hart, Nils Nilsson and Bertram Raphael. 1968. A Formal Basis for the Heuristic Determination of Minimum Cost Paths. IEEE Transactions on Systems Science and Cybernetics 4, 2 (100--107). doi: 10.1109/tssc.1968.300136.

Seadog on the other hand performs the backtracking search on the factorised search space instead: when expanding a state of the search space, only rewrites that overlap with the last rewrite are considered and added to the search space, as discussed in section 5.4. In a second phase, the search space is encoded as a SAT problem that is solved using Z3 Moura, 2008Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Tools and Algorithms for the Construction and Analysis of Systems. Springer Berlin Heidelberg, 337--340. doi: 10.1007/978-3-540-78800-3_24.

The Badger optimiser is released and publicly available as part of the open-source TKET package4. The Seadog optimiser on the other hand is still in early development; more benchmarks and a release will follow.

Results.  We ran the experiment on an Apple M3 Max CPU (4.05GHz) for inputs between 22 (1212 gates) and 7878 qubits (468468 gates). Both optimisers ran on a single core. For each instance, we set a timeout of 22 seconds and report the relative CX\textit{CX} gate reduction, i.e.

CXinitCXfinalCXinit.\frac{\textit{CX}_\text{init} - \textit{CX}_\text{final}}{\textit{CX}_\text{init}}.

The results are shown in the figures on the right.

Discussion.  On the left, we observe that both optimisers are able to find the optimum for circuits with up to 30 CX gates. Beyond that point, the time limit starts impacting Badger performance, which drops continuously and reaches 0% for inputs of 50 CX gates and above. Seadog on the other hand does not time out and is able to explore the entire (factorised) search space exhaustively up until 70 CX gates.

Observe that the Badger optimiser reaches the time limit for as few as 10 CX. Indeed, the complete naive search space size can be calculated to have 12q12^q states (each pair of qubits can be in one of 12 states). For 2q=62q = 6 we get 17281728 states, but this already reaches over 2000020'000 states for 2q=82q = 8.

CX gate count reduction (left) and runtime (right) for the Badger and Seadog optimisers. 100% gate count reduction is optimal. A timeout was set to 2 seconds.

CX gate count reduction (left) and runtime (right) for the Badger and Seadog optimisers. 100% gate count reduction is optimal. A timeout was set to 2 seconds.

Size of factorised search space for Seadog.

Size of factorised search space for Seadog.

On the other hand, the factorised search space will only contain 1212 states for each qubit pair. This results in a linear scaling of the search space size, as can clearly be seen in the second figure.

Where the runtime exceeds the 2 second timeout, this is due to pre- and post-optimisation steps such as memory allocation/deallocation, I/O, file parsing etc that are included in the measurements. The quadratic runtime scaling that we observe in Seadog is due to a hash function that is run on every state of the search space to detect and discard duplicates: as the number of states in the search space grows linearly with input size and each state requires a hash in linear time, the overall runtime grows quadratically. Future work may be able to address this issue by designing updateable hash functions that do not require the full graph to be rehashed when applying a local rewrite.

Future work should also investigate how to scale Seadog to larger input sizes on a broader class of problems. We have observed that the SAT-based extraction phase of Seadog corresponds to less than 1% of the runtime budget (under 15ms for all input sizes). Whilst being asymptotically exponential in the worst case, it is thus not currently a bottleneck. On the other hand, the number of states visited per second in the exploration phase is currently up to 10×10\times slower for Seadog compared to Badger. Further investigations into the causes of this are still required, but we expect that large performance improvements can be realised on the current implementation and as a result could scale to larger inputs.


  1. The only constraint on the notion of graph size |\cdot| is that it must be compatible with the subgraph relation: if GGG \subseteq G', then GG.|G| \leqslant |G'|. ↩︎

  2. Note that this counting is already an act of clemency: we are not counting all permutations of the rewrites, which would be considered separately by a naive exploration that applies one graph rewrite at a time. In this case, the search tree for Δ=1\Delta = 1 would contain n!2nn! \cdot 2^n graphs. ↩︎

  3. This can always be made to hold by replacing any set of mutually disjoint rewrites with their cartesian product, in effect viewing the application of multiple disjoint rewrites as one large rewrite. Thi comes at the cost of a larger value for s.s. ↩︎

  4. As a Python package on PyPI and a rust crate on crates.io). ↩︎