Learning Home Catalog Composer
Learning
Home Catalog Composer Return to course
Learning

Quantum algorithmic foundations

Download the slides for this lesson.

Introduction

Quantum algorithms provide provable advantages over classical algorithms in the query model — but what about more standard models of computation, where problem inputs are given explicitly rather than in the form of an oracle or black box? This turns out to be a much more difficult question to answer, and to address it we must first establish a solid foundation upon which to base our investigation. That's the purpose of this lesson.

We'll begin by discussing computational cost, for both classical and quantum computations, and how it can be measured. This is a general notion that can be applied to a wide range of computational problems — but to keep things simple we'll mainly examine it through the lens of computational number theory, which addresses problems including basic arithmetic, computing greatest common divisors, and integer factorization, which are computational tasks that are likely to be familiar to most readers. While computational number theory is a narrow application domain, these problems serve well to illustrate the basic issues — and as a bonus they happen to be highly relevant to the lesson following this one.

Our focus will be on algorithms, as opposed to the ever-improving hardware on which they're run — so we'll be more concerned with how the cost of running an algorithm scales as the specific problem instances it is run on grow in size, rather than how many seconds, minutes, or hours some particular computation requires. We focus on this aspect of computational cost in recognition of the fact that algorithms have fundamental importance, and will naturally be deployed against larger and larger problem instances using faster and more reliable hardware as technology develops.

Finally, we turn to a critically important task, which is running classical computations on quantum computers. The reason this task is important is not because we hope to replace classical computers with quantum computers (which seems extremely unlikely to happen any time soon, if ever), but rather because it opens up many interesting possibilities for quantum algorithms. Specifically, classical computations running on quantum computers become available as subroutines — effectively leveraging decades of research and development on classical algorithms in pursuit of quantum computational advantages.

Two examples: factoring and GCDs

The classical computers that exist today are incredibly fast, and their speed seems to be ever increasing. For this reason, some might be inclined to believe that computers are so fast that no computational problem is beyond their reach.

This belief is false. Some computational problems are so inherently difficult that, although there exist algorithms to solve them, no computer on the planet Earth today is fast enough to run these algorithms to completion on even moderately sized inputs within the lifetime of a human — or even within the lifetime of the Earth itself.

To explain further, let's introduce the integer factorization problem.

Integer factorization
Input: an integer N2N\geq 2
Output: the prime factorization of NN

By the prime factorization of N,N, we mean a list of the prime factors of NN and the powers to which they must be raised to obtain NN by multiplication. For example, the prime factors of 1212 are 22 and 3,3, and to obtain 1212 we must take the product of 22 to the power 22 and 33 to the power 1:1:

12=22312 = 2^2 \cdot 3

Up to the ordering of the prime factors, there is only one prime factorization for each positive integer N2,N\geq 2, which is a fact known as the fundamental theorem of arithmetic.

The SymPy symbolic mathematics package for Python includes a function factorint that solves the integer factorization problem for whatever input NN we choose.

Copy to clipboard

Output:

{2: 2, 3: 1}

Factoring small numbers like 1212 is easy, but when the number NN to be factored gets larger, the problem becomes more difficult. For example, running factorint on a significantly larger number will cause a noticeable delay.

Copy to clipboard

Output:

{3: 2, 74519450661011221: 1, 5073729280707932631243580787: 1}

For even larger values of N,N, things become impossibly difficult, at least as far as we know. For example, the RSA Factoring Challenge, which was run by RSA Laboratories from 1991 to 2007, offered a cash prize of $100,000 to factor the following number, which has 309 decimal digits (and 1024 bits when written in binary). The prize for this number was never collected and its prime factors remain unknown.

Copy to clipboard

Output:

135066410865995223349603216278805969938881475605667027524485143851526510604859533833940287150571909441798207282164471551373680419703964191743046496589274256239341020864383202110372958725762358509643110564073501508187510676594629205563685529475213500852879416377328533906109750544334999811150056977236890927563

We need not bother running factorint on RSA1024, it wouldn't finish within our lifetimes.

The fastest known algorithm for factoring large integers is known as the number field sieve. As an example of this algorithm's use, the RSA challenge number RSA250, which has 250 decimal digits (and 829 bits in its binary representation), was factored using the number field sieve in 2020. The computation required thousands of CPU core-years, distributed across tens of thousands of machines around the world.

Copy to clipboard

Output:

True

The security of the RSA public-key cryptosystem is based on the computational difficulty of integer factoring; an efficient algorithm for integer factoring would break it.

Next let's consider a related but very different problem, which is computing the greatest common divisor (or GCD) of two integers.

Greatest common divisor (GCD)
Input: nonnegative integers NN and M,M, at least one of which is positive
Output: the greatest common divisor of NN and MM

The greatest common divisor of two numbers is the largest integer that evenly divides both of them.

This problem is easy for computers — it has roughly the same computational cost as multiplying the two input numbers together. The following code cell runs the gcd function from the Python math module on two numbers that are both quite a bit larger than RSA1024 in the blink of an eye. (In fact, RSA1024 is the GCD of the two numbers in this example.)

Copy to clipboard

Output:

135066410865995223349603216278805969938881475605667027524485143851526510604859533833940287150571909441798207282164471551373680419703964191743046496589274256239341020864383202110372958725762358509643110564073501508187510676594629205563685529475213500852879416377328533906109750544334999811150056977236890927563

We can push it even further and compute GCDs for numbers with thousands of digits in no time.

Copy to clipboard

Output:

113

This is possible because we have very efficient algorithms for computing GCDs, the most well-known of which is Euclid's algorithm, discovered over 2,000 years ago.

Could there be a fast algorithm for integer factorization that we just haven't discovered yet, allowing large numbers like RSA1024 to be factored in the blink of an eye? The answer is yes. Although we might expect that an efficient algorithm for factoring as simple and elegant as Euclid's algorithm for computing GCDs would have been discovered by now, there is nothing that rules out the existence of a very fast classical algorithm for integer factoring, beyond the fact that we've failed to find one thus far. One could be discovered tomorrow — but don't hold your breath. Generations of mathematicians and computer scientists have searched, and factoring numbers like RSA1024 remains beyond our reach.

Measuring computational cost

Next we'll discuss a mathematical framework through which computational cost can be measured. It should be understood that this discussion is narrowly focused on the needs of this series — the analysis of algorithms and computational complexity are entire subjects onto themselves, and have much more to say about these notions.

As a starting point, consider the following figure (which first appeared in Lesson 5), which represents a very high level abstraction of a computation.

Illustration of a standard computation.

The computation itself could be modeled or described in a variety of ways, such as by a computer program written in Python, a Turing machine, a Boolean circuit, or a quantum circuit. Our focus will be on circuits (both Boolean and quantum).

Encodings and input length

Let's begin with the input and output of a computational problem, which we'll assume are binary strings. Different symbols could be used, but we'll keep things simple for the purposes of this discussion by restricting our attention to binary string inputs and outputs.

Through binary strings, we can encode a variety of interesting objects that the problems we're interested in solving might concern, such as numbers, vectors, matrices, and graphs, as well as lists of these and other objects. For example, to encode nonnegative integers, we can use binary notation. The following table lists the binary encoding of the first nine nonnegative integers, along with the length (meaning the total number of bits) of each encoding.

numberbinary encodinglength
001
111
2102
3112
41003
51013
61103
71113
810004

We can easily extend this encoding to handle both positive and negative integers by appending a sign bit to the representations if we choose. Sometimes it's also convenient to allow binary representations of nonnegative integers to have leading zeros, which don't change the value being encoded but can allow representations to fill up a string or word of a fixed size.

Using binary notation to represent nonnegative integers is both common and efficient, but if we wanted to we could choose a different way to represent nonnegative integers using binary strings, such as the ones suggested in the following table. The specifics of these alternatives are not important to this discussion — the point is only to clarify that we do have choices for the encodings we use. (In this table, the symbol ε\varepsilon represents the empty string, which has no symbols in it and length equal to zero. Naturally, to avoid an obvious source of confusion, we use a special symbol such as ε\varepsilon to represent the empty string rather than literally writing nothing.)

numberunary encodinglexicographic encoding
0ε\varepsilonε\varepsilon
100
2001
300000
4000001
50000010
600000011
70000000000
800000000001

Other types of inputs, such as vectors and matrices, or more complicated objects like descriptions of molecules, can also be encoded as binary strings. Just like we have for nonnegative integers, a variety of different encoding schemes can be selected or invented. For whatever scheme we come up with to encode inputs to a given problem, we interpret the length of an input string as representing the size of the problem instance being solved.

For example, the number of bits required to express a nonnegative integer NN in binary notation, which is sometimes denoted lg(N),\operatorname{lg}(N), is given by the following formula.

lg(N)={1N=01+log2(N)N1\operatorname{lg}(N) = \begin{cases} 1 & N = 0\\ 1 + \lfloor \log_2 (N) \rfloor & N \geq 1 \end{cases}

Assuming that we use binary notation to encode the input to the integer factoring problem, the input length for the number NN is therefore lg(N).\operatorname{lg}(N). Note, in particular, that the length (or size) of the input NN is not NN itself — when NN is large we don't need nearly this many bits to express NN in binary notation.

From a strictly formal viewpoint, whenever we consider a computational problem or task, it should be understood that a specific scheme has been selected for encoding whatever objects are given as input or produced as output. This allows computations that solve interesting problems to be viewed abstractly as transformations from binary string inputs to binary string outputs — and naturally the details of how objects are encoded as binary strings must be important to computations at some level.

Usually, though, we don't worry all that much about these details when we're analyzing computational cost, so that we can avoid getting into details of secondary importance. The basic reasoning is that we expect the computational cost of converting back and forth between "reasonable" encoding schemes to be insignificant compared with the cost of solving the actual problem. In those situation in which this is not the case, the details can (and should) be clarified.

For example, a very simple computation with low cost converts between the binary representation of a nonnegative integer and its lexicographic encoding (which we have not explained in detail, but it can be inferred from the table). For this reason, the computational cost of integer factoring wouldn't differ significantly if we decided to switch from using one of these encodings to the other for the input N.N. On the other hand, encoding nonnegative integers in unary notation incurs an exponential blow-up in the total number of symbols required, and we would not consider it to be a "reasonable" encoding scheme.

Elementary operations

Now let's consider the computation itself, which is represented by the blue rectangle in the figure above. The way that we'll measure computational cost is to count the number of elementary operations that each computation requires. Intuitively speaking, an elementary operation is one involving a small, fixed number of bits or qubits that can be performed quickly and easily — such as computing the AND of two bits. (In contrast, running the factorint function is not reasonably viewed as being an elementary operation.)

Formally speaking, there are different choices for what constitutes an elementary operation depending on the computational model being used. Our focus will be on circuit models, and specifically quantum and Boolean circuits.

Universal gate sets

For circuit-based models of computation, it is typical that each gate is viewed as an elementary operation. This leads to the question of precisely which gates we allow.

Focusing for the moment on quantum circuits, we've seen several gates thus far in the series, including X,X, Y,Y, Z,Z, H,H, S,S, and TT gates, swap gates, controlled versions of gates (including controlled-NOT, Toffoli, and Fredkin gates), as well as gates that represent standard basis measurements. In the context of the CHSH game we also saw a few additional rotation gates.

We also discussed query gates (within the query model), and saw that any unitary operation U,U, acting on any number of qubits, can be viewed as being a gate if we so choose. We'll disregard both of these options for the sake of this discussion. We won't be working in the query model (although the implementation of query gates using elementary operations is discussed later in the lesson), and viewing arbitrary unitary gates, potentially acting on millions of qubits, as being elementary operations does not lead to meaningful or realistic notions of computational cost.

Sticking with quantum gates that operate on small numbers of qubits, one approach to deciding which ones are to be considered elementary is to tease out a precise criterion — but this is not the standard approach or the one we'll take. Rather, we simply make a choice.

Here's one standard choice, which we shall adopt as the default gate set for quantum circuits:

  • Single-qubit unitary gates from this list: X,X, Y,Y, Z,Z, H,H, S,S, S,S^{\dagger}, T,T, and TT^{\dagger}
  • Controlled-NOT gates
  • Single-qubit standard basis measurements

A common alternative is to view Toffoli, Hadamard, and SS gates as being elementary, in addition to standard basis measurements. (Sometimes all single-qubit gates are viewed as being elementary, although this does lead to an unrealistically powerful model when noise and inaccuracies are not properly accounted for.)

The unitary gates in our default collection form what's called a universal gate set. This means that we can approximate any unitary operation, on any number of qubits, and to any degree of accuracy we wish, with circuits composed of these gates alone. To be clear, the definition of universality places no requirements on the cost of such approximations, meaning the number of gates from our set that we need. Indeed, a fairly simple argument based on the mathematical notion of measure reveals that most unitary operations must have extremely high cost. Proving the universality of quantum gate sets is not a simple matter and won't be covered in this series.

For Boolean circuits, we'll take AND, OR, NOT, and FANOUT gates to be the ones representing elementary operations. We don't actually need both AND gates and OR gates — we can use De Morgan's laws to convert from either one to the other by placing NOT gates on all three input and output wires — but nevertheless it's both typical and convenient to allow both AND and OR gates. AND, OR, NOT, and FANOUT gates form a universal set for deterministic computations, meaning that any function from any fixed number of input bits to any fixed number of output bits can be implemented with these gates.

The principle of deferred measurement

Standard measurement gates can appear within quantum circuits — but sometimes it's convenient to delay them until the end, so that we can view quantum computations as consisting of a unitary part (representing the computation itself), followed by a simple read-out phase where qubits are measured and the results are output.

This can always be done, provided that we're willing to add an additional qubit for each standard basis measurement. In the following figure, the circuit on the right illustrates how this can be done for the gate on the left.

Deferring measurements

Specifically, the classical bit in the circuit on the left is replaced by a qubit on the right (initialized to the 0\vert 0\rangle state), and the standard basis measurement is replaced by a controlled-NOT gate, followed by a standard basis measurement on the bottom qubit. The point is that the standard basis measurement in the right-hand circuit can be pushed all the way to the end of the circuit. If the classical bit in the circuit on the left is later used as a control bit, we can use the bottom qubit in the circuit on the right as a control instead, and the overall effect will be the same. (We are assuming that the classical bit in the circuit on the left doesn't get overwritten after the measurement takes place by another measurement — but if it did we could always just use a new classical bit rather than overwriting one that was used for a previous measurement.)

Circuit size and depth

Circuit size

The total number of gates in a circuit is referred to as that circuit's size. Thus, presuming that the gates in our circuits represent elementary operations, we have that a circuit's size represents the number of elementary operations it requires — or, in other words, its computational cost. We write size(C)\operatorname{size}(C) to refer to the size of a given circuit C.C.

For example, consider the following Boolean circuit (from Lesson 3) for computing the XOR of two bits.

Boolean circuit for XOR

The size of this circuit is 7, because there are 7 gates in total. (Fanout operations are not always counted as being gates, but for the purposes of this lesson we will count FANOUT operations as being gates.)

Time, cost, and circuit depth

In the previous lesson we discussed the importance of time as a resource, or limiting constraint, on computations. The examples above, such as the task of factoring RSA1024, reinforce this viewpoint. The factorint function doesn't fail to factor RSA1024 per se, it's just that we don't have enough time to let it finish.

The notion of computational cost, as the number of elementary operations required to perform a computation, is intended to be an abstract proxy for the time required to implement a computation. Each elementary operation requires a certain amount of time to perform, and the more of them a computation needs to perform the longer it's going to take, at least in general. In the interest of simplicity, we'll continue to make this association between computational cost and the time required to run algorithms as we continue on in the series.

But notice that the size of a circuit doesn't necessarily correspond directly to how long it takes to run. In our Boolean circuit for computing the XOR of two bits, for instance, the two FANOUT gates could be performed simultaneously, as could the two NOT gates, as well as the two AND gates.

A different way to measure efficiency for circuits, which takes this possibility of parallelization into account, is by their depth. This is the minimum number of layers of gates needed within the circuit, where the gates within in each layer operate on different wires. Equivalently, the depth of a circuit is the maximum number of gates encountered on any path from an input wire to an output wire. For the circuit above, for instance, the depth is 4.

Circuit depth is one way to formalize the running time of parallel computations. It's an advanced topic, and there exist very sophisticated circuit constructions known to minimize the depth required for certain computations. There are also some fascinating unanswered questions concerning circuit depth. (For example, much remains unknown about the circuit depth required to compute GCDs.) We won't have too much more to say about circuit depth in this series, aside from including a few interesting facts concerning circuit depth as we go along, but it should be clearly acknowledged that parallelization is a potential source of computational advantages.

Assigning costs to different gates

One final note concerning circuit size and computational cost is that it is possible to assign different costs to different gates, rather than viewing every gate as contributing equally to cost.

For example, as was already mentioned, FANOUT gates are often viewed as being free for Boolean circuits — which is to say that we could choose that FANOUT gates have zero cost. As another example, when we're working in the query model and we count the number of queries that a circuit makes to an input function (in the form of a black box), we're effectively assigning unit cost to query gates and zero cost to other gates, such as Hadamard gates. A final example is that we sometimes weigh the costs of different gates depending on how difficult they are to implement, depending on the particular hardware implementation we have in mind.

While all of these options are sensible in different contexts, for this lesson we'll keep things simple and stick with circuit size as a representation of computational cost.

Cost as a function of input length

We're primarily interested in how computational cost scales as inputs become larger and larger. This leads us to represent the costs of algorithms as functions of the input length.

Families of circuits

Inputs to a given computational problem can vary in length, potentially becoming arbitrarily large. Every circuit, on the other hand, has a fixed number of gates and wires. For this reason, when we think about algorithms in terms of circuits, we generally need infinitely large families of circuits to represent algorithms. By a family of circuits, we mean a sequence of circuits that grow in size, allowing larger and larger inputs to be accommodated.

For example, imagine that we have a classical algorithm for integer factorization, such as the one used by factorint. Like all classical algorithms, it is possible to implement this algorithm using Boolean circuits — but to do it we'll need a separate circuit for each possible input length. If we looked at the resulting circuits for different input lengths, we would see that their sizes naturally grow as the input length grows — reflecting the fact that factoring 4-bit integers is much easier and requires far fewer elementary operations than factoring 1024-bit integers, for instance.

This leads us to represent the computational cost of an algorithm by a function t,t, defined so that t(n)t(n) is the number of gates in the circuit that implements the algorithm for nn bit inputs. In more formal terms, an algorithm in the Boolean circuit model is described by a sequence of circuits {C1,C2,C3,},\{C_1, C_2, C_3,\ldots\}, where CnC_n solves whatever problem we're talking about for nn-bit inputs (or, more generally, for inputs whose size is parameterized in some way by nn), and the function tt representing the cost of this algorithm is defined as

t(n)=size(Cn).t(n) = \operatorname{size}(C_n).

For quantum circuits the situation is similar, where larger and larger circuits are needed to accommodate longer and longer input strings.

Example: integer addition

To explain this further, let's take a moment to consider a much simpler problem than integer factoring, or even computing GCDs, which is the problem of addition.

Integer addition
Input: integers NN and MM
Output: N+MN+M

How might we design Boolean circuits for solving this problem?

Just to keep things simple, let's restrict our attention to the case where NN and MM are both nonnegative integers represented by nn-bit registers using binary notation. We'll allow for any number of leading zeros in these encodings, so that 0N,M2n1.0\leq N,M\leq 2^n - 1. The output will be an (n+1)(n+1)-bit binary string representing the sum, which is the maximum number of bits we need to express the result.

We begin with an algorithm — the standard algorithm for addition of binary representations — which is the base 22 analogue to the way addition is taught in elementary/primary schools around the world. This algorithm can be implemented with Boolean circuits as follows.

Starting from the least significant bits, we can compute their XOR to determine the least significant bit for the sum. Then we can compute the carry bit, which is the AND of the two least significant bits of NN and M.M. Sometimes these two operations together are known as a half adder. Using the XOR circuit we've now seen a few times together with an AND gate and two FANOUT gates, we can build a half adder with 10 gates. If for some reason we changed our minds and decided to include XOR gates in our set of elementary operations, we would need 1 AND gate, 1 XOR gate, and 2 FANOUT gates to build a half adder.

A half-adder

Moving on to the more significant bits we can use a similar procedure, but this time including the carry bit from each previous position into our calculation. By cascading 2 half adders and taking the OR of the carry bits they produce, we can create what's known as a full adder. (Fortunately both carry bits can't be 1,1, so we don't need to worry about that case.) This requires 21 gates in total: 2 AND gates, 2 XOR gates (each requiring 7 gates to implement), one OR gate, and 4 FANOUT gates.

A full-adder

Finally, by cascading the full adders, we obtain a Boolean circuit for nonnegative integer addition having 21(n1)+10=21n1121 (n-1) + 10 = 21 n - 11 gates. Had we decided to include XOR gates in our set of elementary operations, we'd need 2n12n-1 AND gates, 2n12n-1 XOR gates, n1n-1 OR gates, and 4n24n-2 FANOUT gates, for a total of 9n59n-5 gates. If in addition we decide not to count FANOUT gates, it's 5n35n-3 gates.

For example, the following circuit computes the sum of two 4-bit integers.

A circuit for addition of two 4-bit integers

Asymptotic notation

On the one hand, it is good to know precisely how many gates are needed to perform various computations, like in the example of integer addition above. These details are important for actually building the circuits.

On the other hand, if we perform analyses at this level of detail for all the computations we're interested in, including ones for tasks that are much more complicated than addition, we'll very quickly be buried in details. To keep things manageable, and to intentionally suppress details of secondary importance, we typically use Big-O notation when analyzing algorithms. Through this notation we can express the order at which functions grow.

Formally speaking, if we have two functions g(n)g(n) and h(n),h(n), we write that g(n)=O(h(n))g(n) = O(h(n)) if there exists a positive real number c>0c > 0 and a positive integer n0n_0 such that

g(n)ch(n)g(n) \leq c\cdot h(n)

for all nn0.n \geq n_0. Typically h(n)h(n) is chosen to be as simple an expression as possible, so that the notation can be used to reveal the limiting behavior of a function in simple terms. For example, 17n3257n2+65537=O(n3).17 n^3 - 257 n^2 + 65537 = O(n^3).

This notation can be extended to functions having multiple arguments in a fairly straightforward way. For instance, if we have two functions g(n,m)g(n,m) and h(n,m)h(n,m) defined on positive integers nn and m,m, we write that g(n,m)=O(h(n,m))g(n,m) = O(h(n,m)) if there exists a positive real number c>0c > 0 and a positive integer k0k_0 such that

g(n,m)ch(n,m)g(n,m) \leq c\cdot h(n,m)

whenever n+mk0.n+m \geq k_0.

Connecting this notation to the example of nonnegative integer addition, we conclude that there exists a family of Boolean circuits {C1,C2,,},\{C_1, C_2,\ldots,\}, where CnC_n adds two nn-bit nonnegative integers together, such that size(Cn)=O(n).\operatorname{size}(C_n) = O(n). This reveals the most essential feature of how the cost of addition scales with the input size: it scales linearly. Notice also that it doesn't depend on the specific detail of whether we consider XOR gates to have unit cost or cost 7.7. In general, using Big-O notation allows us to make statements about computational costs that aren't sensitive to such low-level details.

More examples

Here are a few more examples of problems from computational number theory, beginning with multiplication.

Integer multiplication
Input: integers NN and MM
Output: NMNM

Creating Boolean circuits for this problem is more difficult than creating circuits for addition — but by thinking about the standard multiplication algorithm, we can come up with circuits having size O(n2)O(n^2) for this problem (assuming NN and MM are both represented by nn-bit binary representations). More generally, if NN has nn bits and MM has mm bits, there are Boolean circuits of size O(nm)O(nm) for multiplying NN and M.M.

There are, in fact, other ways to multiply that scale better. For instance, the Schönhage-Strassen multiplication algorithm can be used to create a Boolean circuits for multiplying two nn-bit integers having size O(nlg(n)lg(lg(n))).O(n \operatorname{lg}(n) \operatorname{lg}(\operatorname{lg}(n))). The intricacy of this method causes a lot of overhead, however, making it only practical for numbers having tens of thousands of bits.

Another basic problem is division, which we interpret to mean computing both the quotient and remainder given an integer divisor and dividend.

Integer division
Input: integers NN and M0M\neq0
Output: integers qq and rr satisfying 0r<M0\leq r < |M| and N=qM+rN = q M + r

The cost of integer division is similar to multiplication: if NN has nn bits and MM has mm bits, there are Boolean circuits of size O(nm)O(nm) for solving this problem. Like multiplication, there are asymptotically superior methods.

We can now compare known algorithms for computing GCDs with those for addition and multiplication. Euclid's algorithm for computing the GCD of an nn-bit number NN and an mm-bit number MM requires Boolean circuits of size O(nm),O(nm), similar to the standard algorithms for multiplication and division. Also similar to multiplication and division, there are asymptotically faster GCD algorithms — including ones requiring O(n(lg(n))2lg(lg(n)))O(n(\operatorname{lg}(n))^2 \operatorname{lg}(\operatorname{lg}(n))) elementary operations to compute the GCD of two nn-bit numbers.

A somewhat more expensive computation that arises in number theory is modular exponentiation.

Integer modular exponentiation
Input: integers N,N, K,K, and MM with K0K\geq 0 and M1M\geq 1
Output: NK(mod M)N^K \hspace{1mm} (\text{mod }M)

By NK(mod M)N^K\hspace{1mm} (\text{mod }M) we mean the remainder when NKN^K is divided by MM — meaning the unique integer rr satisfying 0r<M0\leq r < M and NK=qM+rN^K = q M + r for some integer q.q.

If NN has nn bits, MM has mm bits, and KK has kk bits, this problem can solved by Boolean circuits having size O(km2+nm)O(k m^2 + nm) — which is not at all obvious at first glance. The solution is not to first compute NKN^K and then take the remainder — that would require exponentially many bits just to store the number NK.N^K. Rather, we can use the power algorithm (known alternatively as the binary method and repeated squaring), where we use the binary representation of KK to perform the entire computation modulo M.M. Assuming N,N, M,M, and KK are all nn-bit numbers, we obtain an O(n3)O(n^3) algorithm — or a cubic time algorithm. (Once again, there are asymptotically faster algorithms.)

Here's a code cell that computes a modular exponent for fairly large input numbers.

Copy to clipboard

Output:

5420808908959774060288987536719770456217318912094898225713892936909049029205875268391016773496273016272919329382669533127141405381618319955871147811133071683113976741106560564348611362839209748910974112799894385464900617664468329271717655067495341858822704829649579551724483155264208314048742469387847020407987537873027379036515942355206425576104942394452390040081069041858525217985044712018028991394635458870225358476099232040714610377672598679720015462702833934580459151626243131178465271884695260294198503010478207039551370657556801980546328939683389782893101937977023160027020824612118350981172299997784862364914426784395705304625946981620876565903630321426713564601636615003526450559450988257054265576673783996389684225946503950767764911765912500478971120407712043832726706539999202935732936820400534970855338294653369081634048804115320256182211145544094816107629124112742249226319353536843143250675559791743355892993051078488285292927178023438402583054839832109503781471596500072959107707020065782555861325253891174468479828455070701800024212593144971400832140786063620665224339027746048978496589265602430286475229240500730395030244167373760773225724090156004049352594635867027935465962414764456373585290956999305794964617561548270565216080421238370652518997421499178722058617256564053290758153841574743341020230723665105568383923335524981262395314837292291351213579853233800664543959697570668968932545105391200894345122403901126855808339292892144618034647640594129638891809347433516954054255735374114229687467253661509404329081038467536976485314005457794642793240728907716748113582623047896927299525364803461784339899349011331842366119834985254767022797817541982842136038797346045524543446837609691961446346898034481960162369906547207851740703325595273724313509866798139791862738941412435088490790473992542655256187359052648995226579813780101763700632784177318998593960123977376065188642849714402071821332656099246595192273548124579308954369379166791375504366177963367167357734204250134523901012053728414505662083719483277149998297345600850207968594611610502058119754052458059616153460321784141539990056380808571117172677389938997159425776949065037133138197247161115127659585109470076784212846689843449874294528288829117435818727532583190741742686601090580534050289592016761135679603364904072305569448382741870366244397396582565453282641402195407023981530691751496898222633942198230270907098913253995651978611545066193950188718245653197837791596426329560941347753939650429611447245577078846931722372616367550006355620360276455568080512099035179175416756143233950934214736621950039902506644221443998435341460452476466171419972558395644689971465747907674896790822097660307109062931957694684033978412952567599593488687529011526359106169809555979667401866723277931201510177630302133847932158278990312682334549377652520626935835739547712983314320798573894229076437190704856364396327795151484278956526648597260084828697308427169704251546673832583609705456575281533722178029103192256765343504645112826902073960223407989624481672668703972408414665894403347181679160774217536153364422436840721557202572348021932511877708639899866042064925150997870115610130091923308716372491505600234638328560252998561255435872867079194898819838898311502430190295512215594677576275248220954675704199712880196887650126401810574270118295619216693001247689245264372564423355830421087279204264834276290790089692636984198176930764597788529620410821188522782016777910796700483449826

It's a little bit slower than computing GCDs, but it's still fast.

A different example, outside of the domain of computational number theory, arose in the previous lesson. For the post-processing step of Simon's algorithm, we need to compute the null space modulo 22 of an n×mn\times m matrix of binary values (so here the input length is nmnm bits in total). We can do this using Gaussian elimination with O(nmmin{n,m})O(n m \min\{n,m\}) elementary operations, which is O(n3)O(n^3) elementary operations if m=O(n).m=O(n). Even for a 1000×10001000\times 1000 binary matrix, which is a million bits of input, the computation time is on the order of seconds.

Copy to clipboard

Output:

GF([[0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1,
     0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0,
     0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1,
     0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1,
     0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1,
     1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1,
     0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0,
     0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0,
     0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1,
     1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
     0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1,
     1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1,
     1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1,
     1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0,
     1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0,
     1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1,
     1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1,
     0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1,
     1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1,
     0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1,
     0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,
     0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0,
     0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0,
     1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0,
     1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1,
     0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0,
     1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0,
     0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0,
     1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1,
     0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0,
     1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1,
     1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0,
     0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1,
     1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1,
     0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0,
     1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0,
     0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1,
     0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0,
     0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0,
     1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1,
     0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1,
     0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1,
     1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0,
     1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1]], order=2)

Cost of integer factorization

In contrast to the algorithms just discussed, known algorithms for integer factorization are much more expensive — as we might expect from the examples in the first section of the lesson.

If we consider a simple trial division algorithm that searches through the list 2,,N2,\ldots,\sqrt{N} for prime factors of an input number N,N, then we have N\sqrt{N} iterations — which means O(2n/2)O(2^{n/2}) iterations when NN is an nn-bit number. Each iteration requires a trial division, which means O(n2)O(n^2) elementary operations for each iteration (using a standard algorithm for integer division). So we end up with circuits of size O(n22n/2),O(n^2 2^{n/2}), which is exponential in the input size n.n.

There are algorithms for integer factorization having better scaling. The number field sieve mentioned earlier, for instance, which is an algorithm that makes use of randomness, is generally believed (but not rigorously proven) to require

2O(n1/3lg2/3(n))2^{O\bigl(n^{1/3} \operatorname{lg}^{2/3}(n)\bigr)}

elementary operations to factor nn-bit integers with high probability. While it is quite significant that nn is raised to the power 1/3,1/3, the fact that n1/3n^{1/3} appears in the exponent is still a problem that causes poor scaling — and explains in part why RSA1024 remains outside of its domain of applicability.

Polynomial versus exponential cost

As we have seen, classical algorithms for integer addition, multiplication, division, and computing greatest common divisors allow us to solve these problems in the blink of an eye for inputs with thousands of bits. Addition has linear cost while the other three problems have quadratic cost (or subquadratic cost using asymptotically fast algorithms). Modular exponentiation is more expensive but can still be done pretty efficiently, with cubic cost (or sub-cubic cost using asymptotically fast algorithms). Computing the null space modulo 22 of an n×nn\times n binary matrix has cost O(n3)O(n^3) (bearing in mind that there are n2n^2 input bits for this problem).

All of these algorithms are examples having polynomial cost, meaning that they have cost O(nc)O(n^c) for some choice of a fixed constant c>0.c > 0. As a rough, first-order approximation, algorithms having polynomial cost are abstractly viewed as representing efficient algorithms.

In contrast, known classical algorithms for integer factoring have exponential cost. Sometimes the cost of the number field sieve is described as sub-exponential, but in complexity theory it is more typical to reserve this term for algorithms whose cost is

O(2nε)O\left(2^{n^{\varepsilon}}\right)

for every ε>0.\varepsilon > 0.

The so-called NP-complete problems are a class of problems not known to (and widely conjectured not to) have polynomial-cost algorithms. A circuit-based formulation of the exponential-time hypothesis posits something even stronger, which is that no NP-complete problem can have a sub-exponential cost algorithm.

The association of polynomial-cost algorithms with efficient algorithms must be understood as being a loose abstraction. Of course, if an algorithm's cost scales as n1000n^{1000} or n1000000n^{1000000} for inputs of size n,n, then it's a stretch to describe that algorithm as efficient.

However, even an algorithm having cost that scales as n1000000n^{1000000} must be doing something clever to avoid having exponential cost, which is generally what we expect of algorithms based in some way on "brute force" or "exhaustive search." Even the sophisticated refinements of the number field sieve, for instance, fail to avoid this exponential scaling in cost. Polynomial-cost algorithms, on the other hand, manage take advantage of the problem structure in some way that avoids an exponential scaling.

In practice, the identification of a polynomial cost algorithm for a problem is just a first step toward actual efficiency. Through algorithmic refinements, polynomial-cost algorithms with large exponents can sometimes be improved dramatically, lowering the cost to a more "reasonable" polynomial scaling. Sometimes things become easier when they're known to be possible — so the identification of a polynomial-cost algorithm for a problem can also have the effect of inspiring new, more efficient algorithms.

As we consider advantages of quantum computing over classical computing, our eyes are generally turned first toward exponential advantages, or at least super-polynomial advantages — ideally finding polynomial-cost quantum algorithms for problems not known to have polynomial-cost classical algorithms. Theoretical advantages on this order have the greatest chances to lead to actual practical advantages — but identifying such advantages is an extremely difficult challenge. Only a few examples are currently known, but the search continues.

Polynomial (but not super-polynomial) advantages in computational cost of quantum over classical are also interesting and should not be dismissed — but given the current gap between quantum and classical computing technology, they do seem rather less compelling at the present time. One day, though, they could become significant. Grover's algorithm, which is covered in Lesson 8, provides an example of a quadratic advantage of quantum over classical for so-called unstructured searching, which could potentially be broadly applicable.

A hidden cost of circuit computation

There is one final issue that is worth mentioning, although we will not concern ourselves with it further in this series. There's a "hidden" computational cost when we're working with circuits, and it concerns the specifications of the circuits themselves. As inputs get longer, we need larger and larger circuits — but somehow we're going to need to get our hands on the descriptions of these circuits if we're going to implement them.

For all of the examples we've discussed, or will discuss later in the series, there's some underlying algorithm from which the circuits are derived. Usually the circuits in a family follow some basic pattern that's easy to extrapolate to larger and larger inputs — such as cascading full adders to create Boolean circuits for addition or performing layers of Hadamard gates and other gates in some simple to describe pattern.

But what happens if there are prohibitive computational costs associated with the patterns of the circuits themselves? For instance, the description of each member CnC_n in a circuit family could, in principle, be determined by some extremely difficult to compute function of n.n.

The answer is that this is a problem, and we must place additional restrictions on families of circuits beyond having polynomial cost in order for them to truly represent efficient algorithms. The property of uniformity for circuits does this by stipulating that it must (in various precise formulations) be computationally easy to obtain the description of each circuit in a family. All of the circuit families we'll discuss do indeed have this property, so we won't concern ourselves with it further — but this is an important issue to be aware of when studying circuit models of computation from a formal viewpoint.

Classical computations on quantum computers

We'll now turn our attention to implementing classical algorithms on quantum computers. We'll see that any computation that can be performed with a classical Boolean circuit can also be performed by a quantum circuit with asymptotically the same computational cost. Moreover, this can be done in a "clean" manner, which is an important requirement for using these computations as subroutines inside of larger quantum computations.

Simulating Boolean circuits with quantum circuits

Boolean circuits are composed of AND, OR, NOT, and FANOUT gates. To simulate Boolean circuit with quantum circuit, we'll begin by showing how each of these four gates can be simulated by quantum gates. Once that's done, converting a given Boolean circuit to a quantum circuit is a simple matter of simulating one gate at a time. We'll only need NOT gates, controlled-NOT gates, and Toffoli gates to do this, which are all deterministic operations in addition to being unitary.

Toffoli gates

Toffoli gates can alternatively be described as controlled-controlled-NOT gates, whose action on standard basis states is as the following figure suggests.

Toffoli gate

The matrix representation of this gate is as follows.

(1000000001000000001000000000000100001000000001000000001000010000)\begin{pmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \end{pmatrix}

Another way to think about Toffoli gates is that they're essentially query gates for the AND function, in the sense that they follow exactly the same pattern we saw in the previous lesson for the unitary query gate implementation of arbitrary functions with binary string inputs and outputs.

Toffoli gates are not included in the default gate set discussed earlier in the lesson, but it is possible to construct a Tofolli gate from H,H, T,T, T,T^{\dagger}, and CNOT gates as the following Qiskit code cell illustrates.

Copy to clipboard

Output:

[1000000001000000001000000000000100001000000001000000001000010000] \begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ \end{bmatrix}

Simulating Boolean gates with Toffoli, controlled-NOT, and NOT gates

A single Toffoli gate, used in conjunction with a few NOT gates, can implement an AND and OR gate, as the following diagrams suggest.

Simulating AND and OR gates with Toffoli gates

FANOUT gates can easily be implemented using controlled-NOT gates like this:

Simulating FANOUT gates with CNOT gates

In all three cases, the bits that the AND, OR, and FANOUT gates act upon come in from the left as inputs, and in addition we require one workspace qubit initialized to the zero state. These workspace qubits appear within the dotted rectangles to illustrate that they're new, and one of the costs of implementing Boolean gates and circuits through the method being described is the inclusion of these new qubits.

For the AND and OR gates, in addition to the output bits we also have two additional qubits left over. For example, inside the dotted rectangle in the diagram for an AND gate, the top two qubits are left in the states a\vert a\rangle and b.\vert b\rangle. These bits appear inside the dotted rectangles to illustrate that they're no longer needed — they aren't output by the Boolean logic gates we're simulating, and we can simply ignore them (for now at least).

The remaining Boolean gate, the NOT gate, is also a gate in our default set of quantum gates, so we don't need to worry about simulating them in a special way.

Gate by gate simulation of Boolean circuits

Now suppose that we have an ordinary Boolean circuit named C,C, composed of AND, OR, NOT, and FANOUT gates, and having nn input bits and mm of output bits. We'll let tt be the total number of gates in C,C, and we'll give the name ff to the function that CC computes — so this function takes the form f:ΣnΣmf:\Sigma^n\rightarrow\Sigma^m where Σ={0,1}.\Sigma = \{0,1\}.

Now consider what happens when we go one at a time through the AND, OR, and FANOUT gates of C,C, replacing each one by the simulation described above, including the addition of a workspace qubit. We'll arrange the qubits of RR so that the nn input bits of CC are on the top and the workspace qubits are on the bottom. As a result of performing these substitutions, we obtain a new circuit RR that operates as this figure suggests:

Reversible circuit simulation

Here, kk is the number of workspace qubits that need to be added, and gg is a function of the form g:ΣnΣn+kmg:\Sigma^n \rightarrow \Sigma^{n+k-m} that describes the states of the leftover qubits after RR is executed.

In the figure, the qubits corresponding to the output f(x)f(x) are on the top and the remaining, leftover qubits storing g(x)g(x) are on the bottom. We can force this to happen if we wish by rearranging the qubits using swap gates. A swap gate can be implemented with three controlled-NOT gates like this:

Swapping with cNOT gates

As we'll see in the next section, it isn't essential to rearrange the output qubits like this, but it's easy enough to do it if we choose.

The function gg that describes the classical states of the leftover qubits is determined somehow by the circuit C,C, but we don't really need to worry all that much about it, because we really don't care specifically what state these qubits are in when the computation finishes. The letter gg comes after f,f, so it is a reasonable name for this function on that account, but there's a better reason to pick the name g;g; it's short for garbage — or stuff left over from the simulation that's cluttering up these qubits.

Cleaning up the garbage

If our only interest is in computing whatever function ff is computed by a given Boolean circuit CC with a quantum circuit, we don't need to proceed any further than the gate-by-gate simulation just described — and we can accept that in addition to the answer we'll have a bunch of garbage left over.

This, however, won't be good enough if we want to perform classical computations as subroutines within larger quantum computations, because those garbage qubits will cause problems. As we saw in the previous lesson, the phenomenon of interference is critically important to quantum algorithms, and garbage qubits can ruin the interference patterns needed to make quantum algorithms work.

Fortunately, it's not to difficult to clean up the garbage, so-to-speak. The key is to use the fact that because RR is a quantum circuit, we can run it in reverse — by simply replacing each gate with its inverse and applying them in the reverse order — thereby obtaining a quantum circuit for the operation R.R^{\dagger}. (Toffoli gates, CNOT gates, and NOT gates are their own inverses, so in this case running RR in reverse is just a matter of applying the gates in the reverse order — but more generally any quantum circuit can be reversed as just described.)

So, what we can do is to add mm more qubits (recalling that mm is the number of output bits of the function ff), use CNOT gates to copy the output of RR onto these qubits, and then reverse RR to clean up the garbage. The following figure illustrates the resulting circuit and describes its action on standard basis states.

Garbage-free computation

If we put a box around the entire circuit and call it Q,Q, it looks like this:

Simulation as a query gate

If CC has tt gates, then QQ has O(t)O(t) gates.

If we disregard the kk additional workspace qubits, what we have is a circuit QQ that functions exactly like a query gate for the function f.f. If we simply want to compute the function ff on some string x,x, we can set y=0my = 0^m and the resulting value f(x)f(x) will appear on the bottom mm qubits — or we can feed in a different state to the bottom mm qubits if we wish (perhaps to make use of a phase kickback, like in Deutsch's or the Deutsch-Jozsa algorithm). This means that for any query algorithm, if we have a Boolean circuit that computes the input function, we can replace each query gate with a unitary implementation as just described and the query algorithm will function correctly.

Note that the workspace qubits are needed to make this process work — but they are, in fact, returned to their initial states once the combined circuit is executed, so they could be used again as workspace qubits for other purposes. There are strategies that reduce the number of workspace qubits required, but we won't discuss those strategies here.

Implementing invertible functions

The construction just described allows us to simulate any Boolean circuit with a quantum circuit in a garbage-free manner, following this basic pattern: if CC is a Boolean circuit implementing a function f:ΣnΣm,f:\Sigma^n \rightarrow \Sigma^m, then we obtain a quantum circuit QQ that operates as the next equation describes on standard basis states.

Q(y0kx)=yf(x)0kxQ \bigl( \vert y \rangle \vert 0^k \rangle \vert x\rangle\bigr) = \vert y \oplus f(x) \rangle \vert 0^k \rangle \vert x\rangle

The number kk indicates how many workspace qubits are required, and depends on how many gates CC has and which ones they are.

This will be enough for the purposes of this series, but it is possible to take this one step further when the function ff itself is invertible. To be precise, suppose that the function ff takes the form f:ΣnΣn,f:\Sigma^n \rightarrow \Sigma^n, and suppose furthermore there exists a function f1f^{-1} such that f1(f(x))=xf^{-1}(f(x)) = x for every xΣn.x\in\Sigma^n. (The function f1f^{-1} is necessarily unique when it exists.)

This means that the operation that transforms x\vert x \rangle into f(x)\vert f(x) \rangle for every xΣnx\in\Sigma^n is unitary — so we might hope to build a quantum circuit that implements the unitary operation defined by

Ux=f(x)U \vert x \rangle = \vert f(x) \rangle

for every xΣn.x\in\Sigma^n. To be clear, the fact that this is a unitary operation relies on ff being invertible — it's not unitary when ff isn't invertible. Disregarding the workspace qubits, UU is different from the operation that the circuit QQ implements because we're not keeping a copy of the input around and XORing it to an arbitrary string, we're replacing xx by f(x).f(x).

So the question is: when ff is invertible, can we do this? The answer is yes — provided that in addition to having a Boolean circuit that computes f,f, we also have one that computes f1.f^{-1}. (So this won't be a shortcut to inverting functions computationally if we don't already know how to do that.) The following diagram illustrates how it can be done by composing two quantum circuits, QfQ_f and Qf1,Q_{f^{-1}}, which are obtained individually for the functions ff and f1f^{-1} through the method described above.

Simulation of an invertible function

Was this page helpful?