# Quantum query algorithms

Download the slides for this lesson.

## Introduction

This course investigates *computational* advantages offered by quantum information. That is, we will think about what we can do with *quantum computers*, and the advantages they can offer over ordinary classical computers. To be specific, our focus will be on what we can do with a *single* quantum computer — as opposed to a distributed setting where multiple quantum computers interact over a network, for instance. (There are, in fact, quantum advantages to be found in distributed settings, where communication and cryptography come into play, but this topic is outside of the scope of this unit.)

We will begin with a natural question: What are the advantages that a quantum computer might potentially offer?

The first potential advantage, which is paramount among all others, is that quantum computers might provide *faster* solutions to some computational problems. Time is a truly precious resource — and it is this potential, that quantum computers may allow us to solve certain computational problems that classical computers are too slow to solve, that has driven quantum computing research for the past few decades.

There are other computational resources besides time that can be considered. The amount *computer memory* required to perform computations — usually referred to as the *space* required for computations — is one alternative that is often studied. As it turns out, however, quantum computers have a limited potential to offer advantages in space usage over classical computers. Computer memory is also relatively inexpensive and, unlike time, can be reused. For these reasons, time is of greater concern, and will be our primary focus.

One thing that quantum computers cannot do is to provide computational solutions to problems that classical computers cannot solve — irrespective of the resources required — such as the famous *halting problem* formulated by Alan Turing in the 1930s. Quantum computers can be *simulated* by classical computers, so any computational problem that can be solved by a quantum computer can also be solved by a classical computer, though it might take the classical computer much, much longer to find a solution.

While the time required to solve problems is our main concern, we will deviate slightly from this focus for the purposes of this first lesson. What we will do is to formulate a simple algorithmic framework — known as the *query model* — and explore the advantages that quantum computers offer within this framework.

The query model of computation is like a petri dish for quantum algorithmic ideas. It is rigid and unnatural, in the sense that it does not accurately represent the sorts of computational problems we generally care about in practice. Nevertheless, it has proved to be incredibly useful as a tool for developing quantum algorithmic techniques, including ones that power the most well-known quantum algorithms (such as Shor's factoring algorithm). It also happens to be a very useful framework for *explaining* these techniques.

After introducing the query model, we will discuss the very first quantum algorithm discovered, which is *Deutsch's algorithm*, along with an extension of Deutsch's algorithm known as the *Deutsch-Jozsa algorithm*. These algorithms demonstrate quantifiable advantages of quantum over classical computers, and in fact the Deutsch-Jozsa algorithm can be used to solve multiple computational problems in the query model framework. We will then discuss a related quantum algorithm — known as *Simon's algorithm* — which, for reasons that will be explained, offers a more robust and satisfying advantage of quantum over classical computations.

## Pre-course Survey

Before we begin, please take a moment to complete our pre-course survey, which is important to help improve our content offerings and user experience.

## The query model of computation

When we model computations in mathematical terms, we typically have in mind the sort of process represented by the following figure, where information is provided as input, a computation takes place, and output is produced.

It is true that the computers we use today continuously receive input and produce output, essentially interacting with both us and with other computers in a way not reflected by the figure. Here, however, the intention is not to represent the ongoing operation of computers, but rather to create a simple abstraction of computation, focusing on isolated computational tasks.

For example, the input might encode a number, a vector, a matrix, a graph, a description of a molecule, or something more complicated, while the output encodes a solution to the computational task we have in mind. The key point is that the input is provided to the computation, usually in the form of a binary string, with no part of it being hidden.

### High-level description

In the *query model* of computation, on the other hand, the entire input is not provided to the computation. Rather, the input is made available in the form of a *function*, which the computation accesses by making *queries*. Alternatively, we may view that computations in the query model have to bits (or segments of bits) of the input.

We often refer to the input as being provided by an *oracle* or *black box* in the context of the query model. Both terms suggest that a complete description of the input is hidden from the computation, with the only way to access it being to ask questions. It is as if we're consulting the Oracle at Delphi about the input: she won't tell us everything she knows, she only answers specific questions. The term *black box* makes sense especially when we think about the input as being represented by a function: we cannot look inside the function and understand how it works, we can only evaluate it on arguments we select.

We're going to be working exclusively with binary strings throughout the lesson, so let's write $\Sigma = \{0,1\}$ to refer to the binary alphabet throughout for the sake of brevity. We'll be thinking about different computational problems, with some simple examples described shortly, but for all of them the input will be represented by a function taking the form

$f:\Sigma^n \rightarrow \Sigma^m$for two positive integers $n$ and $m.$ Naturally, we could choose a different name in place of $f,$ but we'll stick with $f$ throughout the lesson.

To say that a computation makes a *query* means that some string $x \in \Sigma^n$ is selected and then the string $f(x)\in\Sigma^m$ is made available. The precise way that this works for quantum algorithms will be discussed shortly — we need to make sure that this is possible to do with a unitary quantum operation allowing queries to be made in superposition — but for now we can think about it intuitively at a high level.

Finally, the way that we will measure efficiency of query algorithms is simple: we'll count the number of queries to the input they require. This is related to the time required to perform a computation, but it's not exactly the same because we're ignoring the time required for operations other than the queries, and we're treating the queries as if they each have unit cost. (We can also take the operations besides the queries into account, and that is sometimes done — but restricting our attention just to the number of queries helps to keep things simple.)

### Examples of query problems

Here are a few simple examples of query problems.

**OR.**The input function takes the form $f:\Sigma^n \rightarrow \Sigma$ (so $m=1$ for this problem). The task is to output $1$ if there exists a string $x\in\Sigma^n$ for which $f(x) = 1,$ and to output $0$ if there is no such string. If we think about the function $f$ as representing a sequence of $2^n$ bits to which we have random access, the problem is to compute the OR of these bits.**Parity.**The input function again takes the form $f:\Sigma^n \rightarrow \Sigma.$ The task is to determine whether the number of strings $x\in\Sigma^n$ for which $f(x) = 1$ is*even*or*odd*. To be precise, the required output is $0$ if the set $\{x\in\Sigma^n : f(x) = 1\}$ has an even number of elements and $1$ if it has an odd number of elements. If we think about the function $f$ as representing a sequence of $2^n$ bits to which we have random access, the problem is to compute the parity (or XOR) of these bits.**Minimum.**The input function takes the form $f:\Sigma^n \rightarrow \Sigma^m$ for any choices of positive integers $n$ and $m.$ The required output is the string $y \in \{f(x) : x\in\Sigma^n\}$ that comes first in the lexicographic (i.e., dictionary) ordering of $\Sigma^m.$ If we think about the function $f$ as representing a sequence of $2^n$ integers encoded as strings of length $m$ in binary notation to which we have random access, the problem is to compute the minimum of these integers.

Sometimes we also consider query problems where we have a *promise* on the input. What this means is that we're given some sort of guarantee on the input, and we're not responsible for what happens when this guarantee is not met. Another way to describe this type of problem is that some input functions (the ones for which the promise is not satisfied) are considered as "don't care" inputs. No requirements at all are placed on algorithms when they're given "don't care" inputs.

Here's one example of a problem with a promise:

**Unique search.**The input function takes the form $f:\Sigma^n \rightarrow \Sigma,$ and we are*promised*that there is exactly one string $z\in\Sigma^n$ for which $f(z) = 1,$ with $f(x) = 0$ for all strings $x\neq z.$ The task is to find this unique string $z.$

All four of the examples just described are natural in the sense that they're easy to describe and we can imagine a variety of situations or contexts in which they might arise — but some query problems of interest aren't like this at all. In the study of the query model we sometimes come up with very complicated and highly contrived problems where it's difficult to imagine that anyone would ever actually want to solve them in practice. This doesn't mean the problems aren't interesting, it's just part of the study of the model to look for extremes that reveal potential advantages of quantum computing. Sometimes things that seems contrived or unnatural can provide unexpected clues or inspire new ideas. Shor's quantum algorithm for factoring, which was directly inspired by Simon's algorithm, is a great example.

### Query gates

When we're working with circuit models of computation, queries are made by special gates called *query gates*. The simplest way to do this for classical Boolean circuits is to define query gates that compute the input function $f$ directly, as the following figure suggests.

When we create a Boolean circuit for a query problem, the input function $f$ is accessed through these gates, and the number of queries that the circuit makes is simply the number of query gates that appear in the circuit. In this case the wires of the circuit are initialized to fixed values, which should be considered as part of the algorithm rather than as part of the input.

For example, here is a Boolean circuit with classical query gates that solves the parity problem described above for a function of the form $f:\Sigma\rightarrow\Sigma$:

This algorithm makes two queries because there are two query gates. The way it works is that the function $f$ is queried on the two possible inputs, $0$ and $1,$ and the results are plugged into the Boolean circuit from Lesson 3 that computes the XOR.

For quantum circuits, this definition of query gates doesn't work very well because we'll get non-unitary gates for some functions $f,$ and we won't be able to apply them to quantum states. What we do instead is to define *unitary query gates* that operate as this figure suggests on standard basis states:

Here our assumption is that $x\in\Sigma^n$ and $y\in\Sigma^m$ are arbitrary strings. The notation $y\oplus f(x)$ refers to the *bitwise exclusive OR* of strings (which have length $m$ in this case). For example, $001 \oplus 101 = 100.$

Intuitively speaking, what the gate $U_f$ does (for any chosen function $f$) is to echo the top input string $x$ and XOR the function value $f(x)$ onto the bottom input string $y.$ This is a unitary operation for every function $f.$ To be more precise, as a matrix $U_f$ is always a *permutation matrix*, meaning a matrix with a single $1$ in each row and each column, with all other entries being $0.$ Applying a permutation matrix to a vector simply shuffles the entries of the vector (hence the term *permutation matrix*), and therefore does not change that vector's Euclidean norm — revealing that permutation matrices are always unitary.

Notice that when we analyze query algorithms by simply counting the number of queries that a query algorithm makes, we're completely ignoring the difficulty of physically constructing the query gates (for both the classical and quantum versions just described). That might seem unreasonable — but we must keep in mind that we're not trying to describe practical computing or fully account for the resources required. Rather, we're just defining a theoretical model that helps to shed light on the potential advantages of quantum computing. We will have more to say about this point in the lesson following this one when we turn our attention to a more standard model of computation where inputs are given explicitly to circuits as binary strings.

## Deutsch's algorithm

Deutsch's algorithm solves the parity problem (described above) for the special case that $n = 1.$ In the context of quantum computing this problem is sometimes referred to as *Deutsch's problem*, and we will follow that nomenclature in this lesson — but really it's just the simplest possible nontrivial version of the parity problem.

To be precise, the input is represented by a function $f:\Sigma \rightarrow \Sigma$ from one bit to one bit. As we observed in Lesson 1, there are 4 such functions:

$\rule[-10mm]{0mm}{15mm} \begin{array}{c|c} a & f_1(a)\\ \hline 0 & 0\\ 1 & 0 \end{array} \qquad \begin{array}{c|c} a & f_2(a)\\ \hline 0 & 0\\ 1 & 1 \end{array} \qquad \begin{array}{c|c} a & f_3(a)\\ \hline 0 & 1\\ 1 & 0 \end{array} \qquad \begin{array}{c|c} a & f_4(a)\\ \hline 0 & 1\\ 1 & 1 \end{array}$The first and last of these functions are *constant* and the middle two are *balanced*, meaning that the two possible output values for the function occur the same number of times as we range over the inputs. Deutsch's problem is to determine which of these two categories the input function belongs to: constant or balanced.

Deutsch's problem

Input: a function $f:\{0,1\}\rightarrow\{0,1\}$

Output: $0$ if $f$ is constant, $1$ if $f$ is balanced

If we view the input function $f$ in Deutsch's problem as representing random access to a string, we're thinking about a two-bit string: $f(0)f(1).$

$\begin{array}{cc} \mathsf{function} & \mathsf{string}\\ \hline f_1 & 00 \\ f_2 & 01 \\ f_3 & 10 \\ f_4 & 11 \end{array}$When viewed in this way, Deutsch's problem is to compute the parity (or, equivalently, the exclusive-OR) of the two bits.

Any classical query algorithm for this problem must query both bits, $f(0)$ and $f(1).$ If we learn that $f(1) = 1,$ for instance, the answer could still be $0$ (in case $f(0) = 1$) or $1$ (in case $f(0) = 0$). Every other case is similar: knowing just one of two bits doesn't provide any information at all about their parity. So, the Boolean circuit described in the previous section is the best we can do in terms of the number of queries required to solve this problem.

### Quantum circuit description

Deutsch's algorithm solves Deutsch's problem using a single query, therefore providing a quantifiable advantage of quantum over classical computations. This may be a modest advantage — one query as opposed to two — but we have to start somewhere. Scientific advancements often have seemingly humble origins.

Here is a quantum circuit that describes Deutsch's algorithm:

### Analysis

To analyze Deutsch's algorithm, we will trace through the action of the circuit above and identify the states of the qubits at the times suggested by this figure:

The initial state is $\vert 1\rangle \vert 0 \rangle,$ and the two Hadamard operations on the left-hand side of the circuit transform this state to

$\vert \pi_1 \rangle = \vert - \rangle \vert + \rangle = \frac{1}{2} \bigl( \vert 0\rangle - \vert 1\rangle \bigr) \vert 0\rangle + \frac{1}{2} \bigl( \vert 0\rangle - \vert 1\rangle \bigr) \vert 1\rangle.$Notice here that we've only partially expanded out the expression of this state (by expanding $\vert + \rangle$ but not $\vert - \rangle$). There is nothing *a priori* that tells us that we should express the state in this way, but it turns out to be convenient for the analysis.

Next, the $U_f$ gate is performed. According to the definition of the $U_f$ gate, the value of the function $f$ for the classical state of the top/rightmost qubit is XORed onto the bottom/leftmost qubit, which transforms $\vert \pi_1\rangle$ into the state

$\vert \pi_2 \rangle = \frac{1}{2} \bigl( \vert 0 \oplus f(0) \rangle - \vert 1 \oplus f(0) \rangle \bigr) \vert 0 \rangle + \frac{1}{2} \bigl( \vert 0 \oplus f(1) \rangle - \vert 1 \oplus f(1) \rangle \bigr) \vert 1 \rangle.$We can simplify this expression by observing that the formula

$\vert 0 \oplus a\rangle - \vert 1 \oplus a\rangle = (-1)^a \bigl( \vert 0\rangle - \vert 1\rangle \bigr)$works for both possible values $a\in\Sigma.$ More explicitly, the two cases are as follows.

$\begin{aligned} \vert 0 \oplus 0\rangle - \vert 1 \oplus 0\rangle & = \vert 0 \rangle - \vert 1 \rangle = (-1)^0 \bigl( \vert 0\rangle - \vert 1\rangle \bigr)\\ \vert 0 \oplus 1\rangle - \vert 1 \oplus 1\rangle & = \vert 1 \rangle - \vert 0\rangle = (-1)^1 \bigl( \vert 0\rangle - \vert 1\rangle \bigr) \end{aligned}$Thus, we can alternatively express $\vert\pi_2\rangle$ like this:

$\begin{aligned} \vert\pi_2\rangle & = \frac{1}{2} (-1)^{f(0)} \bigl( \vert 0 \rangle - \vert 1 \rangle \bigr) \vert 0 \rangle + \frac{1}{2} (-1)^{f(1)} \bigl( \vert 0 \rangle - \vert 1 \rangle \bigr) \vert 1 \rangle \\ & = \vert - \rangle \biggl( \frac{(-1)^{f(0)} \vert 0\rangle + (-1)^{f(1)} \vert 1\rangle}{\sqrt{2}}\biggr). \end{aligned}$Something interesting has happened here! Although the action of the $U_f$ gate on standard basis states leaves the top/rightmost qubit alone and XORs the function value onto the bottom/leftmost qubit, here we see that the state of the top/rightmost qubit has changed (in general) while the state of the bottom/leftmost qubit remains the same — specifically being in the $\vert - \rangle$ state before and after the $U_f$ gate is performed. This phenomenon is known as the *phase kickback*, and we will have more to say about it shortly.

With one final simplification, which is to pull the factor of $(-1)^{f(0)}$ outside of the sum, we obtain this expression of the state $\vert\pi_2\rangle$:

$\begin{aligned} \vert\pi_2\rangle & = (-1)^{f(0)} \vert - \rangle \biggl( \frac{\vert 0\rangle + (-1)^{f(0) \oplus f(1)} \vert 1\rangle}{\sqrt{2}}\biggr) \\ & = \begin{cases} (-1)^{f(0)} \vert - \rangle \vert + \rangle & \text{if $f(0) \oplus f(1) = 0$}\\[1mm] (-1)^{f(0)} \vert - \rangle \vert - \rangle & \text{if $f(0) \oplus f(1) = 1$}. \end{cases} \end{aligned}$Notice that in this expression, we have $f(0) \oplus f(1)$ in the exponent of $-1$ rather than $f(1) - f(0),$ but we obtain the same value after exponentiating either way. This is because the value $(-1)^k$ for any integer $k$ depends only on whether $k$ is even or odd.

Applying the final Hadamard gate to the top qubit leaves us with the state

$\vert \pi_3 \rangle = \begin{cases} (-1)^{f(0)} \vert - \rangle \vert 0 \rangle & \text{if $f(0) \oplus f(1) = 0$}\\[1mm] (-1)^{f(0)} \vert - \rangle \vert 1 \rangle & \text{if $f(0) \oplus f(1) = 1$}, \end{cases}$which leads to the correct outcome with probability $1$ when the right/topmost qubit is measured.

### Further remarks on the phase kickback

Before moving on, let's look at the analysis above from a slightly different angle that sheds some light on the phase kickback phenomenon.

First, notice the following formula works for all choices of bits $b,c\in\Sigma.$

$\vert b \oplus c\rangle = X^c \vert b \rangle$This can be verified by checking it for the two possible values $c = 0$ and $c = 1$:

$\begin{aligned} \vert b \oplus 0 \rangle & = \vert b\rangle = \mathbb{1} \vert b \rangle = X^0 \vert b \rangle\\ \vert b \oplus 1 \rangle & = \vert \neg b\rangle = X \vert b \rangle = X^1 \vert b \rangle. \end{aligned}$Using this formula, we see that

$U_f \bigl(\vert b\rangle \vert a \rangle\bigr) = \vert b \oplus f(a) \rangle \vert a \rangle = \bigl(X^{f(a)}\vert b \rangle\bigr) \vert a \rangle$for every choice of bits $a,b\in\Sigma.$ Because this formula is true for $b=0$ and $b=1,$ we see by linearity that

$U_f \bigl( \vert \psi \rangle \vert a \rangle \bigr) = \bigl(X^{f(a)}\vert \psi \rangle\bigr) \vert a \rangle$for all qubit state vectors $\vert \psi\rangle,$ and therefore

$U_f \bigl( \vert - \rangle \vert a \rangle \bigr) = \bigl(X^{f(a)} \vert - \rangle \bigr) \vert a \rangle = (-1)^{f(a)} \vert - \rangle \vert a \rangle.$The key that makes this work is that $X\vert - \rangle = - \vert - \rangle.$ In mathematical terms, the vector $\vert - \rangle$ is an *eigenvector* of the matrix $X$ having *eigenvalue* $-1.$ (We'll discuss eigenvectors and eigenvalues in greater detail in Lesson 7, where the phase kickback phenomenon is generalized to other unitary operations.)

Keeping in mind that scalars float freely through tensor products, we find an alternative way of reasoning how the operation $U_f$ transforms $\vert \pi_1\rangle$ into $\vert \pi_2\rangle$ in the analysis above:

$\begin{aligned} \vert \pi_2 \rangle & = U_f \bigl( \vert - \rangle \vert + \rangle \bigr)\\ & = \frac{1}{\sqrt{2}} U_f \bigl(\vert - \rangle \vert 0\rangle \bigr) + \frac{1}{\sqrt{2}} U_f \bigl(\vert - \rangle \vert 1\rangle \bigr)\\ & = \vert - \rangle \biggl( \frac{(-1)^{f(0)} \vert 0\rangle + (-1)^{f(1)} \vert 1\rangle}{\sqrt{2}}\biggr). \end{aligned}$### Implementation

First we'll define a quantum circuit that implements a query gate for one of the four functions $f_1,$ $f_2,$ $f_3,$ or $f_4$ from one bit to one bit described above (and in Lesson 1). As was discussed previously, the implementation of query gates is not really a part of Deutsch's algorithm itself — here we're essentially just showing one way to prepare the input (in the form of a circuit implementation of a query gate).

No output produced

We can see what each circuit looks like using the

Output:

Next we will create the actual quantum circuit for Deutsch's algorithm, substituting the query gate with a quantum circuit implementation given as an argument. Shortly we'll plug in one of the four circuits defined by the function

No output produced

Again we can see what the circuit looks like using the

Output:

Finally, we'll create a function that runs the circuit just defined one time and outputs the appropriate result: "constant" or "balanced."

No output produced

The following code cell runs Deutsch's algorithm on any one of the four functions defined above.

Output:

```
'balanced'
```

## The Deutsch-Jozsa algorithm

Deutsch's algorithm provides an example of a quantum algorithm for a query problem that outperforms classical algorithms, but the advantage is quite modest: one query versus two. The Deutsch-Josza algorithm extends this advantage, and it can in fact be used to solve a couple of different query problems.

Here is a quantum circuit description of the Deutsch-Jozsa algorithm. (There may also be a classical post-processing step depending on the specific problem being solved.)

Of course, we haven't actually discussed what problems this algorithm solves; that is done in the subsections that follow.

### The Deutsch-Jozsa problem

We will begin with the query problem for which the Deutsch-Josza algorithm was originally intended, which is known as the *Deutsch-Jozsa problem*.

For this problem, the input function takes the form $f:\Sigma^n \rightarrow \Sigma$ for an arbitrary positive integer $n.$ Like Deutsch's problem, the task is to output $0$ if $f$ is constant and $1$ if $f$ is balanced (which again means that the number of input strings on which the function takes the value $0$ is equal to the number of input strings on which the function takes the value $1$).

Notice that when $n$ is larger than $1,$ there are functions of the form $f:\Sigma^n \rightarrow \Sigma$ that are neither constant nor balanced. If $n = 2,$ for instance, the function defined as

$\begin{aligned} f(00) & = 0 \\ f(01) & = 0 \\ f(10) & = 0 \\ f(11) & = 1 \end{aligned}$falls into neither of these two categories. For the Deutsch-Jozsa problem, we simply don't worry about functions like this — they are considered to be "don't care" inputs. That is, for this problem we have a *promise* that $f$ is either constant or balanced.

Deutsch-Jozsa problem

Input: a function $f:\{0,1\}^n\rightarrow\{0,1\}$

Promise: $f$ is either constant or balanced

Output: $0$ if $f$ is constant, $1$ if $f$ is balanced

The Deutsch-Jozsa algorithm, with its single query, solves this problem in the following sense: if every one of the $n$ measurement outcomes is $0,$ then the function $f$ is constant; and otherwise, if at least one of the measurement outcomes is $1,$ then the function $f$ is balanced. Another way to say this is that there is a classical post-processing step, which is to compute the OR of the measurement outcomes.

#### Algorithm analysis

To analyze the performance of the Deutsch-Jozsa algorithm for the Deutsch-Jozsa problem, it's helpful to begin by thinking about the action of a single layer of Hadamard gates.

A Hadamard operation can be expressed as a matrix in the usual way,

$H = \begin{pmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\[2mm] \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \end{pmatrix},$but we can also express this operation in terms of its action on standard basis states:

$\begin{aligned} H \vert 0\rangle & = \frac{1}{\sqrt{2}} \vert 0 \rangle + \frac{1}{\sqrt{2}} \vert 1 \rangle\\[1mm] H \vert 1\rangle & = \frac{1}{\sqrt{2}} \vert 0 \rangle - \frac{1}{\sqrt{2}} \vert 1 \rangle. \end{aligned}$These two equations can be combined into a single formula,

$H \vert a \rangle = \frac{1}{\sqrt{2}} \vert 0 \rangle + \frac{1}{\sqrt{2}} (-1)^a \vert 1 \rangle = \frac{1}{\sqrt{2}} \sum_{b\in\{0,1\}} (-1)^{ab} \vert b\rangle,$which is true for both choices of $a\in\Sigma.$

Now suppose that instead of just a single qubit we have $n$ qubits, and we perform a Hadamard operation on each. The combined operation on the $n$ qubits is described by the tensor product $H\otimes \cdots \otimes H$ ($n$ times), which we write as $H^{\otimes n}$ for succinctness and clarity. Using the formula from above, followed by expanding and then simplifying, we can express the action of this combined operation on the classical states of $n$ qubits like this:

$\begin{aligned} & H^{\otimes n} \vert x_{n-1} \cdots x_1 x_0 \rangle \\ & \qquad = \bigl(H \vert x_{n-1} \rangle \bigr) \otimes \cdots \otimes \bigl(H \vert x_{0} \rangle \bigr) \\ & \qquad = \Biggl( \frac{1}{\sqrt{2}} \sum_{y_{n-1}\in\Sigma} (-1)^{x_{n-1} y_{n-1}} \vert y_{n-1} \rangle \Biggr) \otimes \cdots \otimes \Biggl( \frac{1}{\sqrt{2}} \sum_{y_{0}\in\Sigma} (-1)^{x_{0} y_{0}} \vert y_{0} \rangle \Biggr) \\ & \qquad = \frac{1}{\sqrt{2^n}} \sum_{y_{n-1}\cdots y_0 \in \Sigma^n} (-1)^{x_{n-1}y_{n-1} + \cdots + x_0 y_0} \vert y_{n-1} \cdots y_0 \rangle \end{aligned}$Here, by the way, we're writing binary strings of length $n$ as $x_{n-1}\cdots x_0$ and $y_{n-1}\cdots y_0,$ following the convention for numbering the individual bits used in Qiskit.

This formula provides us with a useful tool for analyzing the quantum circuit above. After the first layer of Hadamard gates is performed, the state of the $n+1$ qubits (including the leftmost/bottom qubit, which is treated separately from the rest) is

$\bigl( H \vert 1 \rangle \bigr) \bigl( H^{\otimes n} \vert 0 \cdots 0 \rangle \bigr) = \vert - \rangle \otimes \frac{1}{\sqrt{2^n}} \sum_{x_{n-1}\cdots x_0 \in \Sigma^n} \vert x_{n-1} \cdots x_0 \rangle.$When the $U_f$ operation is performed, this state is transformed into

$\vert - \rangle \otimes \frac{1}{\sqrt{2^n}} \sum_{x_{n-1}\cdots x_0 \in \Sigma^n} (-1)^{f(x_{n-1}\cdots x_0)} \vert x_{n-1} \cdots x_0 \rangle$through exactly the same phase kick-back phenomenon that we saw in the analysis of Deutsch's algorithm.

Then the second layer of Hadamard gates is performed, which (by the same formula as above) transforms this state to

$\vert - \rangle \otimes \frac{1}{2^n} \sum_{x_{n-1}\cdots x_0 \in \Sigma^n} \sum_{y_{n-1}\cdots y_0 \in \Sigma^n} (-1)^{f(x_{n-1}\cdots x_0) + x_{n-1}y_{n-1} + \cdots + x_0 y_0} \vert y_{n-1} \cdots y_0 \rangle.$