Learning Home Catalog Composer
Learning
Home Catalog Composer Return to course
Learning

General measurements

Download the slides for this lesson.

Introduction

Measurements provide an interface between quantum and classical information. When a measurement is performed on a system in a quantum state, classical information is extracted, revealing something about that quantum state — and generally changing or destroying it in the process. In the simplified formulation of quantum information (as presented in the Basics of quantum information course), we typically limit our attention to projective measurements, including the simplest type of measurement: standard basis measurements. The concept of a measurement can, however, be generalized beyond projective measurements.

In this lesson we'll consider measurements in greater generality. We'll discuss a few different ways that general measurements can be described in mathematical terms, and we'll connect them to concepts discussed previously in the course.

We'll also take a look at a couple of notions connected with measurements, namely quantum state discrimination and quantum state tomography. Quantum state discrimination refers to a situation that arises commonly in quantum computing and cryptography, where a system is prepared in one of a known collection of states, and the goal is to determine, by means of a measurement, which state was prepared. For quantum state tomography, on the other hand, many independent copies of a single, unknown quantum state are made available, and the goal is to reconstruct a density matrix description of that state by performing measurements on the copies.

Mathematical formulations of measurements

The lesson begins with two equivalent mathematical descriptions of measurements:

  1. General measurements can be described by collections of matrices, one for each measurement outcome, in a way that generalizes the description of projective measurements.
  2. General measurements can be described as channels whose outputs are always classical states (represented by diagonal density matrices).

We'll restrict our attention to measurements having finitely many possible outcomes. Although it is possible to define measurements with infinitely many possible outcomes, they're much less typically encountered in the context of computation and information processing, and they also require some additional mathematics (namely measure theory) to be properly formalized.

Our initial focus will be on so-called destructive measurements, where the output of the measurement is a classical measurement outcome alone — with no specification of the post-measurement quantum state of whatever system was measured. Intuitively speaking, we can imagine that such a measurement destroys the quantum system itself, or that the system is immediately discarded once the measurement is made. Later in the lesson we'll broaden our view and consider non-destructive measurements, where there's both a classical measurement outcome and a post-measurement quantum state of the measured system.

Measurements as collections of matrices

Suppose X\mathsf{X} is a system that is to be measured, and assume for simplicity that the classical state set of X\mathsf{X} is {0,,n1}\{0,\ldots, n-1\} for some positive integer n,n, so that density matrices representing quantum states of X\mathsf{X} are n×nn\times n matrices. We won't actually have much need to refer to the classical states of X,\mathsf{X}, but it will be convenient to refer to n,n, the number of classical states of X.\mathsf{X}. We'll also assume that the possible outcomes of the measurement are the integers 0,,m10,\ldots,m-1 for some positive integer m.m. Note that we're just using these names to keep things simple; it's straightforward to generalize everything that follows to other finite sets of classical states and measurement outcomes, renaming them as desired.

Recall that a projective measurement is described by a collection of projection matrices that sum to the identity matrix. In symbols, {Π0,,Πm1}\{\Pi_0,\ldots,\Pi_{m-1}\} describes a projective measurement of X\mathsf{X} if each Πa\Pi_a is an n×nn\times n projection matrix and the following condition is met.

Π0++Πm1=IX\Pi_0 + \cdots + \Pi_{m-1} = \mathbb{I}_{\mathsf{X}}

When such a measurement is performed on a system X\mathsf{X} while it's in a state described by some quantum state vector ψ,\vert\psi\rangle, each outcome a{0,,m1}a\in\{0,\ldots,m-1\} is obtained with probability equal to Πaψ2.\|\Pi_a\vert\psi\rangle\|^2. (We also have that the post-measurement state of X\mathsf{X} is obtained by normalizing the vector Πaψ,\Pi_a\vert\psi\rangle, but we're ignoring the post-measurement state for now.)

If the state of X\mathsf{X} is described by a density matrix ρ\rho rather than a quantum state vector ψ,\vert\psi\rangle, then we can alternatively express the probability to obtain the outcome aa as Tr(Πaρ).\operatorname{Tr}(\Pi_a \rho). If ρ=ψψ\rho = \vert \psi\rangle\langle\psi\vert is a pure state, then the two expressions are equal:

Tr(Πaρ)=Tr(Πaψψ)=ψΠaψ=ψΠaΠaψ=Πaψ2.\operatorname{Tr}(\Pi_a \rho) = \operatorname{Tr}(\Pi_a \vert \psi\rangle\langle\psi \vert) = \langle \psi \vert \Pi_a \vert \psi \rangle = \langle \psi \vert \Pi_a \Pi_a \vert \psi \rangle = \|\Pi_a\vert\psi\rangle\|^2.

Here we're using the cyclic property of the trace for the second equality, and for the third equality we're using the fact that each Πa\Pi_a is a projection matrix, and therefore satisfies Πa2=Πa.\Pi_a^2 = \Pi_a. In general, if ρ\rho is a convex combination

ρ=k=0N1pkψkψk\rho = \sum_{k = 0}^{N-1} p_k \vert \psi_k\rangle\langle \psi_k \vert

of pure states, then the expression Tr(Πaρ)\operatorname{Tr}(\Pi_a \rho) coincides with the average probability for the outcome a,a, owing to the fact that this expression is linear in ρ.\rho.

Tr(Πaρ)=k=0N1pkTr(Πaψkψk)=k=0N1pkΠaψk2\operatorname{Tr}(\Pi_a \rho) = \sum_{k = 0}^{N-1} p_k \operatorname{Tr}(\Pi_a \vert \psi_k\rangle\langle\psi_k\vert) = \sum_{k = 0}^{N-1} p_k \|\Pi_a\vert\psi_k\rangle\|^2

A mathematical description for general measurements is obtained by relaxing the definition of projective measurements. Specifically, we allow the matrices in the collection describing the measurement to be arbitrary positive semidefinite matrices rather than projections. (Projections are always positive semidefinite; they can alternatively be defined as positive semidefinite matrices whose eigenvalues are all either 0 or 1.) In particular, a general measurement of a system X\mathsf{X} having outcomes 0,,m10,\ldots,m-1 is specified by a collection of positive semidefinite matrices {P0,,Pm1}\{P_0,\ldots,P_{m-1}\} whose rows and columns correspond to the classical states of X\mathsf{X} and that meet the condition

P0++Pm1=IX.P_0 + \cdots + P_{m-1} = \mathbb{I}_{\mathsf{X}}.

If the system X\mathsf{X} is measured while it is in a state described by the density matrix ρ,\rho, then each outcome a{0,,m1}a\in\{0,\ldots,m-1\} appears with probability Tr(Paρ).\operatorname{Tr}(P_a \rho).

As we must naturally demand, the vector of outcome probabilities

(Tr(P0ρ),,Tr(Pm1ρ))\bigl(\operatorname{Tr}(P_0 \rho),\ldots,\operatorname{Tr}(P_{m-1} \rho)\bigr)

of a general measurement always forms a probability vector, for any choice of a density matrix ρ.\rho. The following two observations establish that this is the case.

  1. Each value Tr(Paρ)\operatorname{Tr}(P_a \rho) must be nonnegative, owing to the fact that the trace of the product of any two positive semidefinite matrices is always nonnegative:

    Q,R0  Tr(QR)0.Q, R \geq 0 \; \Rightarrow \: \operatorname{Tr}(QR) \geq 0.

    One way to argue this fact is to use spectral decompositions of QQ and RR together with the cyclic property of the trace to express the trace of the product QRQR as a sum of nonnegative real numbers, which must therefore be nonnegative.

  2. The condition P0++Pm1=IXP_0 + \cdots + P_{m-1} = \mathbb{I}_{\mathsf{X}} together with the linearity of the trace ensures that the probabilities sum to 1.1.

    a=0m1Tr(Paρ)=Tr(a=0m1Paρ)=Tr(Iρ)=Tr(ρ)=1\sum_{a = 0}^{m-1} \operatorname{Tr}(P_a \rho) = \operatorname{Tr}\Biggl(\sum_{a = 0}^{m-1} P_a \rho\Biggr) = \operatorname{Tr}(\mathbb{I}\rho) = \operatorname{Tr}(\rho) = 1

Example 1: any projective measurement

Projections are always positive semidefinite, so every projective measurement is an example of a general measurement.

For example, a standard basis measurement of a qubit can be represented by {P0,P1}\{P_0,P_1\} where

P0=00=(1000)andP1=11=(0001).P_0 = \vert 0\rangle\langle 0\vert = \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix} \quad\text{and}\quad P_1 = \vert 1\rangle\langle 1\vert = \begin{pmatrix} 0 & 0 \\ 0 & 1 \end{pmatrix}.

Measuring a qubit in the state ρ\rho results in outcome probabilities as follows.

Prob(outcome=0)=Tr(P0ρ)=Tr(00ρ)=0ρ0Prob(outcome=1)=Tr(P1ρ)=Tr(11ρ)=1ρ1\begin{aligned} \operatorname{Prob}(\text{outcome} = 0) & = \operatorname{Tr}(P_0 \rho) = \operatorname{Tr}\bigl(\vert 0\rangle\langle 0\vert \rho\bigr) = \langle 0\vert \rho \vert 0 \rangle \\[0.5mm] \operatorname{Prob}(\text{outcome} = 1) & = \operatorname{Tr}(P_1 \rho) = \operatorname{Tr}\bigl(\vert 1\rangle\langle 1\vert\rho\bigr) = \langle 1 \vert \rho \vert 1 \rangle \end{aligned}

Example 2: a non-projective qubit measurement

Suppose X\mathsf{X} is a qubit, and define two matrices as follows.

P0=(23131313)P1=(13131323)P_0 = \begin{pmatrix} \frac{2}{3} & \frac{1}{3}\\[2mm] \frac{1}{3} & \frac{1}{3} \end{pmatrix} \qquad P_1 = \begin{pmatrix} \frac{1}{3} & -\frac{1}{3}\\[2mm] -\frac{1}{3} & \frac{2}{3} \end{pmatrix}

These are both positive semidefinite matrices: they're Hermitian, and in both cases the eigenvalues happen to be 1/2±5/6,1/2 \pm \sqrt{5}/6, which are both positive. We also have that P0+P1=I,P_0 + P_1 = \mathbb{I}, and therefore {P0,P1}\{P_0,P_1\} describes a measurement.

If the state of X\mathsf{X} is described by a density matrix ρ\rho and we perform this measurement, then the probability of obtaining the outcome 00 is Tr(P0ρ)\operatorname{Tr}(P_0 \rho) and the probability of obtaining the outcome 11 is Tr(P1ρ).\operatorname{Tr}(P_1 \rho). For instance, if ρ=++\rho = \vert + \rangle \langle + \vert then the probabilities for the two outcomes 00 and 11 are as follows.

Tr(P0ρ)=Tr((23131313)(12121212))=(2312+1312)+(1312+1312)=12+13=56\operatorname{Tr}(P_0 \rho) = \operatorname{Tr}\left( \begin{pmatrix} \frac{2}{3} & \frac{1}{3}\\[2mm] \frac{1}{3} & \frac{1}{3} \end{pmatrix} \begin{pmatrix} \frac{1}{2} & \frac{1}{2}\\[2mm] \frac{1}{2} & \frac{1}{2} \end{pmatrix} \right) = \biggl(\frac{2}{3} \cdot \frac{1}{2} + \frac{1}{3} \cdot \frac{1}{2}\biggr) + \biggl(\frac{1}{3}\cdot\frac{1}{2} + \frac{1}{3}\cdot\frac{1}{2}\biggr) = \frac{1}{2} + \frac{1}{3} = \frac{5}{6}Tr(P1ρ)=Tr((13131323)(12121212))=(13121312)+(1312+2312)=0+16=16\operatorname{Tr}(P_1 \rho) = \operatorname{Tr}\left( \begin{pmatrix} \frac{1}{3} & -\frac{1}{3}\\[2mm] -\frac{1}{3} & \frac{2}{3} \end{pmatrix} \begin{pmatrix} \frac{1}{2} & \frac{1}{2}\\[2mm] \frac{1}{2} & \frac{1}{2} \end{pmatrix} \right) = \biggl(\frac{1}{3} \cdot \frac{1}{2} - \frac{1}{3} \cdot \frac{1}{2}\biggr) + \biggl(-\frac{1}{3}\cdot\frac{1}{2} + \frac{2}{3}\cdot\frac{1}{2}\biggr) = 0 + \frac{1}{6} = \frac{1}{6}

Example 3: tetrahedral measurement

Define four single-qubit quantum state vectors as follows.

ϕ0=0ϕ1=130+231ϕ2=130+23e2πi/31ϕ3=130+23e2πi/31\begin{aligned} \vert\phi_0\rangle & = \vert 0 \rangle\\ \vert\phi_1\rangle & = \frac{1}{\sqrt{3}}\vert 0 \rangle + \sqrt{\frac{2}{3}} \vert 1\rangle \\ \vert\phi_2\rangle & = \frac{1}{\sqrt{3}}\vert 0 \rangle + \sqrt{\frac{2}{3}} e^{2\pi i/3} \vert 1\rangle \\ \vert\phi_3\rangle & = \frac{1}{\sqrt{3}}\vert 0 \rangle + \sqrt{\frac{2}{3}} e^{-2\pi i/3} \vert 1\rangle \end{aligned}

These four states are sometimes known as tetrahedral states because they're vertices of a regular tetrahedron inscribed within the Bloch sphere.

Illustration of a tetrahedron inscribed in the Bloch sphere

The Cartesian coordinates of these four states on the Bloch sphere are

(0,0,1),(223,0,13),(23,23,13),(23,23,13),(0,0,1), \quad \left( \frac{2\sqrt{2}}{3} , 0 , -\frac{1}{3} \right), \quad \left( -\frac{\sqrt{2}}{3} , \sqrt{\frac{2}{3}} , -\frac{1}{3} \right), \quad \left( -\frac{\sqrt{2}}{3} , -\sqrt{\frac{2}{3}} , -\frac{1}{3} \right),

which can be verified by expressing the density matrices representations of these states as linear combinations of Pauli matrices.

ϕ0ϕ0=(1000)=I+σz2\vert \phi_0 \rangle\langle \phi_0 \vert = \begin{pmatrix} 1 & 0\\[1mm] 0 & 0 \end{pmatrix} = \frac{\mathbb{I} + \sigma_z}{2}ϕ1ϕ1=(13232323)=I+223σx13σz2\vert \phi_1 \rangle\langle \phi_1 \vert = \begin{pmatrix} \frac{1}{3} & \frac{\sqrt{2}}{3} \\[2mm] \frac{\sqrt{2}}{3} & \frac{2}{3} \end{pmatrix} = \frac{\mathbb{I} + \frac{2\sqrt{2}}{3} \sigma_x - \frac{1}{3}\sigma_z}{2}ϕ2ϕ2=(13132i6132+i623)=I23σx+23σy13σz2\vert \phi_2 \rangle\langle \phi_2 \vert = \begin{pmatrix} \frac{1}{3} & -\frac{1}{3\sqrt{2}} - \frac{i}{\sqrt{6}} \\[2mm] -\frac{1}{3\sqrt{2}} + \frac{i}{\sqrt{6}} & \frac{2}{3} \end{pmatrix} = \frac{\mathbb{I} - \frac{\sqrt{2}}{3} \sigma_x + \sqrt{\frac{2}{3}} \sigma_y - \frac{1}{3}\sigma_z}{2}ϕ3ϕ3=(13132+i6132i623)=I23σx23σy13σz2\vert \phi_3 \rangle\langle \phi_3 \vert = \begin{pmatrix} \frac{1}{3} & -\frac{1}{3\sqrt{2}} + \frac{i}{\sqrt{6}} \\[2mm] -\frac{1}{3\sqrt{2}} - \frac{i}{\sqrt{6}} & \frac{2}{3} \end{pmatrix} = \frac{\mathbb{I} - \frac{\sqrt{2}}{3} \sigma_x - \sqrt{\frac{2}{3}} \sigma_y - \frac{1}{3}\sigma_z}{2}

These four states are perfectly spread out on the Bloch sphere, each one equidistant from the other three and with the angles between any two of them always being the same.

Now let us define a measurement {P0,P1,P2,P3}\{P_0,P_1,P_2,P_3\} of a qubit by setting PaP_a as follows for each a=0,,3.a=0,\ldots,3.

Pa=ϕaϕa2P_a = \frac{\vert\phi_a\rangle\langle\phi_a\vert}{2}

We can verify that this is a valid measurement as follows.

  1. Each PaP_a is evidently positive semidefinite, being a pure state divided by one-half. That is, each one is a Hermitian matrix and has one eigenvalue equal to 1/21/2 and all other eigenvalues zero.
  2. The sum of these matrices is the identity matrix: P0+P1+P2+P3=I.P_0 + P_1 + P_2 + P_3 = \mathbb{I}. The expressions of these matrices as linear combinations of Pauli matrices makes this straightforward to verify.

Measurements as channels

A second way to describe measurements in mathematical terms is as channels.

Classical information can be viewed as a special case of quantum information, insofar as we can identify probabilistic states with diagonal density matrices. So, in operational terms, we can think about measurements as being channels whose inputs are matrices describing states of whatever system is being measured and whose outputs are diagonal density matrices describing the resulting distribution of measurement outcomes.

We'll see shortly that any channel having this property can always be written in a simple, canonical form that ties directly to the description of measurements as collections of positive semidefinite matrices. Conversely, given an arbitrary measurement as a collection of matrices, there's always a valid channel having the diagonal output property that describes the given measurement as suggested in the previous paragraph. Putting these observations together, we find that the two descriptions of general measurements are equivalent.

Before proceeding further, let's be more precise about the measurement, how we're viewing it as a channel, and what assumptions we're making about it. As before, we'll suppose that X\mathsf{X} is the system to be measured, and that the possible outcomes of the measurement are the integers 0,,m10,\ldots,m-1 for some positive integer m.m. We let Y\mathsf{Y} be the system that stores measurement outcomes, so its classical state set is {0,,m1},\{0,\ldots,m-1\}, and we represent the measurement as a channel named Φ\Phi from X\mathsf{X} to Y.\mathsf{Y}. Our assumption is that Y\mathsf{Y} is classical — which is to say that no matter what state we start with for X,\mathsf{X}, the state of Y\mathsf{Y} we obtain is represented by a diagonal density matrix.

We can express in mathematical terms that the output of Φ\Phi is always diagonal in the following way. First define the completely dephasing channel Δm\Delta_m on Y.\mathsf{Y}.

Δm(σ)=a=0m1aσaaa\Delta_m(\sigma) = \sum_{a = 0}^{m-1} \langle a \vert \sigma \vert a\rangle \,\vert a\rangle\langle a\vert

This channel is analogous to the completely dephasing qubit channel Δ\Delta from the previous lesson. As a linear mapping, it zeros out all of the off-diagonal entries of an input matrix and leaves the diagonal alone. And now, a simple way to express that a given density matrix σ\sigma is diagonal is by the equation σ=Δm(σ).\sigma = \Delta_m(\sigma). In words, zeroing out all of the off-diagonal entries of a density matrix has no effect if and only if the off-diagonal entries were all zero to begin with. The channel Φ\Phi therefore satisfies our assumption — that Y\mathsf{Y} is classical — if and only if Φ(ρ)=Δm(Φ(ρ))\Phi(\rho) = \Delta_m(\Phi(\rho)) for every density matrix ρ\rho representing a state of X.\mathsf{X}.

Equivalence of the formulations

Channels to matrices

Suppose that we have a channel from X\mathsf{X} to Y\mathsf{Y} with the property that Φ(ρ)=Δm(Φ(ρ))\Phi(\rho) = \Delta_m(\Phi(\rho)) for every density matrix ρ.\rho. This may alternatively be expressed as follows.

Φ(ρ)=a=0m1aΦ(ρ)aaa(1)\Phi(\rho) = \sum_{a = 0}^{m-1} \langle a \vert \Phi(\rho) \vert a\rangle\, \vert a\rangle\langle a \vert \tag{1}

Like all channels, we can express Φ\Phi in Kraus form for some way of choosing Kraus matrices A0,,AN1.A_0,\ldots,A_{N-1}.

Φ(ρ)=k=0N1AkρAk\Phi(\rho) = \sum_{k = 0}^{N-1} A_k \rho A_k^{\dagger}

This provides us with an alternative expression for the diagonal entries of Φ(ρ):\Phi(\rho):

aΦ(ρ)a=k=0N1aAkρAka=k=0N1Tr(AkaaAkρ)=Tr(Paρ)\langle a \vert \Phi(\rho) \vert a\rangle = \sum_{k = 0}^{N-1} \langle a \vert A_k \rho A_k^{\dagger} \vert a\rangle = \sum_{k = 0}^{N-1} \operatorname{Tr}\bigl( A_k^{\dagger} \vert a\rangle\langle a \vert A_k \rho\bigr) = \operatorname{Tr}\bigl(P_a\rho\bigr)

for

Pa=k=0N1AkaaAk.P_a = \sum_{k = 0}^{N-1} A_k^{\dagger} \vert a\rangle\langle a \vert A_k.

Thus, for these same matrices P0,,Pm1P_0,\ldots,P_{m-1} we can express the channel Φ\Phi as follows.

Φ(ρ)=a=0m1Tr(Paρ)aa\Phi(\rho) = \sum_{a = 0}^{m-1} \operatorname{Tr}(P_a \rho) \vert a\rangle\langle a\vert

This expression is consistent with our description of general measurements in terms of matrices, as we see each measurement outcome appearing with probability Tr(Paρ).\operatorname{Tr}(P_a \rho).

Now let's observe that the two properties required of the collection of matrices {P0,,Pm1}\{P_0,\ldots,P_{m-1}\} to describe a general measurement are indeed satisfied. The first property is that they're all positive semidefinite matrices. One way to see this is to observe that, for every vector ψ\vert \psi\rangle having entries in correspondence with the classical state of X\mathsf{X} we have

ψPaψ=k=0N1ψAkaaAkψ=k=0N1aAkψ20.\langle \psi \vert P_a \vert \psi\rangle = \sum_{k = 0}^{N-1} \langle \psi \vert A_k^{\dagger} \vert a\rangle\langle a \vert A_k\vert \psi\rangle = \sum_{k = 0}^{N-1} \bigl\vert\langle a \vert A_k\vert \psi\rangle\bigr\vert^2 \geq 0.

The second property is that if we sum these matrices we get the identity matrix.

a=0m1Pa=a=0m1k=0N1AkaaAk=k=0N1Ak(a=0m1aa)Ak=k=0N1AkAk=IX\begin{aligned} \sum_{a = 0}^{m-1} P_a & = \sum_{a = 0}^{m-1} \sum_{k = 0}^{N-1} A_k^{\dagger} \vert a\rangle\langle a \vert A_k \\ & = \sum_{k = 0}^{N-1} A_k^{\dagger} \Biggl(\sum_{a = 0}^{m-1} \vert a\rangle\langle a \vert\Biggr) A_k \\ & = \sum_{k = 0}^{N-1} A_k^{\dagger} A_k \\ & = \mathbb{I}_{\mathsf{X}} \end{aligned}

The last equality follows from the fact that Φ\Phi is a channel, so its Kraus matrices must satisfy this condition.

Matrices to channels

Now let's verify that for any collection {P0,,Pm1}\{P_0,\ldots,P_{m-1}\} of positive semidefinite matrices satisfying P0++Pm1=IX,P_0 + \cdots + P_{m-1} = \mathbb{I}_{\mathsf{X}}, the mapping defined by

Φ(ρ)=a=0m1Tr(Paρ)aa\Phi(\rho) = \sum_{a = 0}^{m-1} \operatorname{Tr}(P_a \rho) \vert a \rangle\langle a\vert

is indeed a valid channel from X\mathsf{X} to Y.\mathsf{Y}.

One way to do this is to compute the Choi representation of this mapping.

J(Φ)=b,c=0n1bcΦ(bc)=b,c=0n1a=0m1bcTr(Pabc)aa=b,c=0n1a=0m1bbPaTccaa=a=0m1PaTaa\begin{aligned} J(\Phi) & = \sum_{b,c = 0}^{n-1} \vert b \rangle \langle c \vert \otimes \Phi(\vert b \rangle \langle c \vert)\\[1mm] & = \sum_{b,c = 0}^{n-1} \sum_{a = 0}^{m-1} \vert b \rangle \langle c \vert \otimes \operatorname{Tr}(P_a \vert b \rangle \langle c \vert) \vert a \rangle\langle a\vert\\[1mm] & = \sum_{b,c = 0}^{n-1} \sum_{a = 0}^{m-1} \vert b \rangle \langle b \vert P_a^T \vert c \rangle \langle c \vert \otimes \vert a \rangle\langle a\vert\\[1mm] & = \sum_{a = 0}^{m-1} P_a^T \otimes \vert a \rangle\langle a\vert \end{aligned}

The transpose of each PaP_a is introduced for the third equality because cPab=bPaTc.\langle c \vert P_a \vert b\rangle = \langle b \vert P_a^T \vert c\rangle. This allows for the expressions bb\vert b \rangle \langle b \vert and cc\vert c \rangle \langle c \vert to appear, which simplify to the identity matrix upon summing over bb and c,c, respectively.

By the assumption that P0,,Pm1P_0,\ldots,P_{m-1} are positive semidefinite, so too are P0T,,Pm1T.P_0^{T},\ldots,P_{m-1}^{T}. (Transposing a Hermitian matrix results in another Hermitian matrix, and the eigenvalues of any square matrix and its transpose always agree.) It follows that J(Φ)J(\Phi) is positive semidefinite. Tracing out the output system Y\mathsf{Y} (which is the system on the right) yields

TrY(J(Φ))=a=0m1PaT=IXT=IX,\operatorname{Tr}_{\mathsf{Y}} (J(\Phi)) = \sum_{a = 0}^{m-1} P_a^T = \mathbb{I}_{\mathsf{X}}^T = \mathbb{I}_{\mathsf{X}},

and so we conclude that Φ\Phi is a channel.

Partial measurements

Suppose that we have multiple systems that are collectively in a quantum state, and a general measurement is performed on one of the systems. This results in one of the measurement outcomes, selected at random according to probabilities determined by the measurement and the state of the system prior to the measurement. The resulting state of the remaining systems will then, in general, depend on which measurement outcome was obtained.

Let's examine how this works for a pair of systems (X,Z)(\mathsf{X},\mathsf{Z}) when the system X\mathsf{X} is measured. (We're naming the system on the right Z\mathsf{Z} because we'll take Y\mathsf{Y} to be a system representing the classical output of the measurement when we view it as a channel.) We can then easily generalize to the situation in which the systems are swapped as well as to three or more systems.

Suppose the state of (X,Z)(\mathsf{X},\mathsf{Z}) prior to the measurement is described by a density matrix ρ,\rho, which we can write as follows.

ρ=b,c=0n1bcρb,c\rho = \sum_{b,c = 0}^{n-1} \vert b\rangle\langle c\vert \otimes \rho_{b,c}

In this expression we're assuming the classical states of X\mathsf{X} are 0,,n1.0,\ldots,n-1.

We'll assume that the measurement itself is described by the collection of matrices {P0,,Pm1}.\{P_0,\ldots,P_{m-1}\}. This measurement may alternatively be described as a channel Φ\Phi from X\mathsf{X} to Y,\mathsf{Y}, where Y\mathsf{Y} is a new system having classical state set {0,,m1}.\{0,\ldots,m-1\}. Specifically, the action of this channel can be expressed as follows.

Φ(ξ)=a=0m1Tr(Paξ)aa\Phi(\xi) = \sum_{a = 0}^{m-1} \operatorname{Tr}(P_a \xi)\, \vert a \rangle \langle a \vert

Outcome probabilities

We're considering a measurement of the system X,\mathsf{X}, so the probabilities with which different measurement outcomes are obtained can depend only on ρX,\rho_{\mathsf{X}}, the reduced state of X.\mathsf{X}. In particular, the probability for each outcome a{0,,m1}a\in\{0,\ldots,m-1\} to appear can be expressed in three equivalent ways.

Tr(PaρX)=Tr(PaTrZ(ρ))=Tr((PaIZ)ρ)\operatorname{Tr}\bigl( P_a \rho_{\mathsf{X}}\bigr) = \operatorname{Tr}\bigl( P_a \operatorname{Tr}_{\mathsf{Z}}(\rho)\bigr) = \operatorname{Tr}\bigl( (P_a \otimes \mathbb{I}_{\mathsf{Z}}) \rho \bigr)

The first expression naturally represents the probability to obtain the outcome aa based on what we already know about measurements of a single system. To get the second expression we're simply using the definition ρX=TrZ(ρ).\rho_{\mathsf{X}} = \operatorname{Tr}_{\mathsf{Z}}(\rho). To get the third one requires more thought — and learners are encouraged to convince themselves that it is true. (Hint: the equivalence between the second and third expressions does not depend on ρ\rho being a density matrix or on each PaP_a being positive semidefinite. Try showing it first for tensor products of the form ρ=MN\rho = M\otimes N and then conclude that it must be true in general by linearity.)

While the equivalence of the first and third expressions in the previous equation may not be immediate, it does make sense. Starting from a measurement on X,\mathsf{X}, we're effectively defining a measurement of (X,Z),(\mathsf{X},\mathsf{Z}), where we simply throw away Z\mathsf{Z} and measure X.\mathsf{X}. Like all measurements, this new measurement can be described by a collection of matrices, and it's not surprising that this measurement is described by the collection {P0IZ,,Pm1IZ}.\{P_0\otimes\mathbb{I}_{\mathsf{Z}}, \ldots, P_{m-1}\otimes\mathbb{I}_{\mathsf{Z}}\}.

States conditioned on measurement outcomes

If we want to determine not only the probabilities for the different outcomes but also the resulting state of Z\mathsf{Z} conditioned on each measurement outcome, we can look to the channel description of the measurement. In particular, let's examine the state we get when we apply Φ\Phi to X\mathsf{X} and do nothing to Z.\mathsf{Z}.

(ΦIdZ)(ρ)=b,c=0n1Φ(bc)ρb,c=a=0m1b,c=0n1Tr(Pabc)aaρb,c=a=0m1aab,c=0n1Tr(Pabc)ρb,c=a=0m1aab,c=0n1TrX((PaIZ)(bcρb,c))=a=0m1aaTrX((PaIZ)ρ)\begin{aligned} (\Phi\otimes\operatorname{Id}_{\mathsf{Z}})(\rho) & = \sum_{b,c = 0}^{n-1} \Phi(\vert b\rangle\langle c\vert) \otimes \rho_{b,c}\\ & = \sum_{a = 0}^{m-1} \sum_{b,c = 0}^{n-1} \operatorname{Tr}(P_a \vert b\rangle\langle c\vert) \,\vert a\rangle \langle a \vert \otimes \rho_{b,c}\\ & = \sum_{a = 0}^{m-1} \vert a\rangle \langle a \vert \otimes \sum_{b,c = 0}^{n-1} \operatorname{Tr}(P_a \vert b\rangle\langle c\vert) \rho_{b,c}\\ & = \sum_{a = 0}^{m-1} \vert a\rangle \langle a \vert \otimes \sum_{b,c = 0}^{n-1} \operatorname{Tr}_{\mathsf{X}}\bigl((P_a\otimes\mathbb{I}_{\mathsf{Z}}) (\vert b\rangle\langle c\vert\otimes\rho_{b,c})\bigr)\\ & = \sum_{a = 0}^{m-1} \vert a\rangle \langle a \vert \otimes \operatorname{Tr}_{\mathsf{X}}\bigl((P_a \otimes \mathbb{I}_{\mathsf{Z}}) \rho\bigr) \end{aligned}

Note that this is a density matrix by virtue of the fact that Φ\Phi is a channel, so each matrix TrX((PaIZ)ρ)\operatorname{Tr}_{\mathsf{X}}\bigl((P_a \otimes \mathbb{I}_{\mathsf{Z}}) \rho) is necessarily positive semidefinite.

One final step transforms this expression into one that reveals what we're looking for.

a=0m1Tr((PaIZ)ρ)aaTrX((PaIZ)ρ)Tr((PaIZ)ρ)\sum_{a = 0}^{m-1} \operatorname{Tr}\bigl((P_a \otimes \mathbb{I}_{\mathsf{Z}}) \rho)\, \vert a\rangle \langle a \vert \otimes \frac{\operatorname{Tr}_{\mathsf{X}}\bigl((P_a \otimes \mathbb{I}_{\mathsf{Z}}) \rho)}{\operatorname{Tr}\bigl((P_a \otimes \mathbb{I}_{\mathsf{Z}}) \rho)}

This is an example of a classical-quantum state,

a=0m1p(a)aaσa,\sum_{a = 0}^{m-1} p(a)\, \vert a\rangle\langle a\vert \otimes \sigma_a,

like we saw in the Density matrices lesson. For each measurement outcome a{0,,m1},a\in\{0,\ldots,m-1\}, we have with probability p(a)=Tr((PaIZ)ρ)p(a) = \operatorname{Tr}\bigl((P_a \otimes \mathbb{I}_{\mathsf{Z}}) \rho) that Y\mathsf{Y} is in the classical state aa\vert a \rangle \langle a \vert and Z\mathsf{Z} is in the state

σa=TrX((PaIZ)ρ)Tr((PaIZ)ρ).(2)\sigma_a = \frac{\operatorname{Tr}_{\mathsf{X}}\bigl((P_a \otimes \mathbb{I}_{\mathsf{Z}}) \rho)}{\operatorname{Tr}\bigl((P_a \otimes \mathbb{I}_{\mathsf{Z}}) \rho)}. \tag{2}

That is, this is the density matrix we obtain by normalizing TrX((PaIZ)ρ)\operatorname{Tr}_{\mathsf{X}}\bigl((P_a \otimes \mathbb{I}_{\mathsf{Z}}) \rho) by dividing it by its trace. (Formally speaking, the state σa\sigma_a is only defined when the probability p(a)p(a) is nonzero; when p(a)=0p(a) = 0 this state is irrelevant, for it refers to a discrete event that occurs with probability zero.) Naturally the outcome probabilities are consistent with our previous observations.

In summary, this is what happens when the measurement {P0,,Pm1}\{P_0,\ldots,P_{m-1}\} is performed on X\mathsf{X} when (X,Z)(\mathsf{X},\mathsf{Z}) is in the state ρ.\rho.

  1. Each outcome aa appears with probability p(a)=Tr((PaIZ)ρ).p(a) = \operatorname{Tr}\bigl((P_a \otimes \mathbb{I}_{\mathsf{Z}}) \rho).
  2. Conditioned on obtaining the outcome a,a, the state of Z\mathsf{Z} is then represented by the density matrix σa\sigma_a shown in the equation (2), which is obtained by normalizing TrX((PaIZ)ρ).\operatorname{Tr}_{\mathsf{X}}\bigl((P_a \otimes \mathbb{I}_{\mathsf{Z}}) \rho).

Generalization

We can adapt this description to other situations, such as when the ordering of the systems is reversed or when there are three or more systems. Conceptually it is straightforward, although it can become cumbersome to write down the formulas.

In general, if we have rr systems X1,,Xr,\mathsf{X}_1,\ldots,\mathsf{X}_r, the state of the compound system (X1,,Xr)(\mathsf{X}_1,\ldots,\mathsf{X}_r) is ρ,\rho, and the measurement {P0,,Pm1}\{P_0,\ldots,P_{m-1}\} is performed on Xk\mathsf{X}_k, the following happens.

  1. Each outcome aa appears with probability

    p(a)=Tr((IX1IXk1PaIXk+1IXr)ρ).p(a) = \operatorname{Tr}\bigl((\mathbb{I}_{\mathsf{X}_1}\otimes \cdots \otimes\mathbb{I}_{\mathsf{X}_{k-1}} \otimes P_a \otimes \mathbb{I}_{\mathsf{X}_{k+1}} \otimes \cdots \otimes\mathbb{I}_{\mathsf{X}_r}) \rho\bigr).
  2. Conditioned on obtaining the outcome a,a, the state of (X1,,Xk1,Xk+1,,Xr)(\mathsf{X}_1,\ldots,\mathsf{X}_{k-1},\mathsf{X}_{k+1},\ldots,\mathsf{X}_r) is then represented by the following density matrix.

    TrXk((IX1IXk1PaIXk+1IXr)ρ)Tr((IX1IXk1PaIXk+1IXr)ρ)\frac{\operatorname{Tr}_{\mathsf{X}_k}\bigl((\mathbb{I}_{\mathsf{X}_1}\otimes \cdots \otimes\mathbb{I}_{\mathsf{X}_{k-1}} \otimes P_a \otimes \mathbb{I}_{\mathsf{X}_{k+1}} \otimes \cdots \otimes\mathbb{I}_{\mathsf{X}_r}) \rho\bigr)}{\operatorname{Tr}\bigl((\mathbb{I}_{\mathsf{X}_1}\otimes \cdots \otimes\mathbb{I}_{\mathsf{X}_{k-1}} \otimes P_a \otimes \mathbb{I}_{\mathsf{X}_{k+1}} \otimes \cdots \otimes\mathbb{I}_{\mathsf{X}_r}) \rho\bigr)}

Naimark's theorem

Naimark's theorem is a fundamental fact concerning measurements. It states that every general measurement can be implemented in a simple way that's reminiscent of Stinespring representations of channels: the system to be measured is first combined with an initialized workspace system, forming a compound system; then a unitary operation is performed on the compound system; and finally the workspace system is measured with respect to a standard basis measurement, yielding the outcome of the original general measurement.

Theorem statement and proof

Let X\mathsf{X} be a system and let {P0,,Pm1}\{P_0,\ldots,P_{m-1}\} be a collection of positive semidefinite matrices satisfying P0++Pm1=IXP_0 + \cdots + P_{m-1} = \mathbb{I}_{\mathsf{X}} that describes a measurement of X.\mathsf{X}. Also let Y\mathsf{Y} be a system whose classical state set is {0,,m1},\{0,\ldots,m-1\}, which is the set of possible outcomes of the given measurement.

Naimark's theorem states that there exists a unitary operation UU on the compound system (Y,X)(\mathsf{Y},\mathsf{X}) so that the implementation suggested by the following figure yields measurement outcomes that agree with the given measurement {P0,,Pm1},\{P_0,\ldots,P_{m-1}\}, meaning that the probabilities for the different possible measurement outcomes are precisely in agreement.

An implementation of a general measurement as in Naimark's theorem

To be clear, the system X\mathsf{X} starts out in some arbitrary state ρ\rho while Y\mathsf{Y} is initialized to the 0\vert 0\rangle state. The unitary operation UU is applied to (Y,X)(\mathsf{Y},\mathsf{X}) and then the system Y\mathsf{Y} is measured with a standard basis measurement, yielding some outcome a{0,,m1}.a\in\{0,\ldots,m-1\}. Note that, in the figure, the system X\mathsf{X} is pictured as part of the output of the circuit — but for now we won't concern ourselves with the state of X\mathsf{X} after UU is performed, and we can alternatively imagine that it's traced out. An implementation of a measurement in this way is clearly reminiscent of a Stinespring representation of a channel, and the mathematical underpinnings are similar as well. The difference here is that the workspace system is measured rather than being traced out like in the case of a Stinespring representation.

The fact that every measurement can be implemented in this way is pretty simple to prove, but we're going to need a fact concerning positive semidefinite matrices first.

Theorem. Suppose PP is an n×nn \times n positive semidefinite matrix. There exists a unique n×nn\times n positive semidefinite matrix QQ for which Q2=P.Q^2 = P. This unique positive semidefinite matrix is called the square root of PP and is denoted P.\sqrt{P}.

One way to find the square root of a positive semidefinite matrix is to first compute a spectral decomposition.

P=k=0n1λkψkψkP = \sum_{k=0}^{n-1} \lambda_k \vert \psi_k \rangle \langle \psi_k \vert

Because PP is positive semidefinite, its eigenvalues must be nonnegative real numbers, and by replacing them with their square roots we obtain an expression for the square root of P.P.

P=k=0n1λkψkψk\sqrt{P} = \sum_{k=0}^{n-1} \sqrt{\lambda_k} \vert \psi_k \rangle \langle \psi_k \vert

With this concept in hand, we're ready to prove Naimark's theorem. Under the assumption that X\mathsf{X} has nn classical states, a unitary operation UU on the pair (Y,X)(\mathsf{Y},\mathsf{X}) can be represented by an nm×nmnm\times nm matrix, which we can view as an m×mm\times m block matrix whose blocks are n×n.n\times n. The key to the proof is to take UU to be any unitary matrix that matches the following pattern.

U=(P0??P1??Pm1??)U = \begin{pmatrix} \sqrt{P_0} & \fbox{?} & \cdots & \fbox{?} \\[1mm] \sqrt{P_1} & \fbox{?} & \cdots & \fbox{?} \\[1mm] \vdots & \vdots & \ddots & \vdots\\[1mm] \sqrt{P_{m-1}} & \fbox{?} & \cdots & \fbox{?} \end{pmatrix}

For it to be possible to fill in the blocks marked with a question mark so that UU is unitary, it's both necessary and sufficient that the first nn columns, which are formed by the blocks P0,,Pm1,\sqrt{P_0},\ldots,\sqrt{P_{m-1}}, are orthonormal. We can then use the Gram-Schmidt orthogonalization process to fill in the remaining columns, as we've already seen a couple of times in this series.

The first nn columns of UU can be expressed as vectors in the following way, where c=0,,n1c = 0,\ldots,n-1 refers to the column number starting from 0.0.

γc=a=0m1aPac\vert\gamma_c\rangle = \sum_{a = 0}^{m-1} \vert a \rangle \otimes \sqrt{P_a} \vert c\rangle

We can compute the inner product between any two of them as follows.

γcγd=a,b=0m1abcPaPbd=c(a=0m1Pa)d=cd\langle \gamma_c \vert \gamma_d \rangle = \sum_{a,b = 0}^{m-1} \langle a \vert b \rangle \cdot \langle c \vert \sqrt{P_a}\sqrt{P_b}\, \vert d\rangle = \langle c \vert \Biggl(\sum_{a = 0}^{m-1} P_a \Biggr) \vert d\rangle = \langle c \vert d\rangle

This shows that these columns are in fact orthonormal, so we can fill in the remaining columns of UU in a way that guarantees the entire matrix is unitary.

It remains to check that the measurement outcome probabilities for the simulation are consistent with the original measurement. For a given initial state ρ\rho of X,\mathsf{X}, the measurement described by the collection {P0,,Pm1}\{P_0,\ldots,P_{m-1}\} results in each outcome a{0,,m1}a\in\{0,\ldots,m-1\} with probability Tr(Paρ).\operatorname{Tr}(P_a \rho).

To obtain the outcome probabilities for the simulation, let's first give the name σ\sigma to the state of (Y,X)(\mathsf{Y},\mathsf{X}) after UU has been performed. This state can be expressed as follows.

σ=U(00ρ)U=a,b=0m1abPaρPb\sigma = U \bigl(\vert 0\rangle \langle 0 \vert \otimes \rho\bigr) U^{\dagger} = \sum_{a,b=0}^{m-1} \vert a\rangle \langle b \vert \otimes \sqrt{P_a} \rho \sqrt{P_b}

Equivalently, in a block matrix form, we have the following equation.

σ=(P0??P1??Pm1??)(ρ00000000)(P0P1Pm1??????)=(P0ρP0P0ρPm1Pm1ρP0Pm1ρPm1)\sigma = \begin{pmatrix} \sqrt{P_0} & \fbox{?} & \cdots & \fbox{?} \\[1mm] \sqrt{P_1} & \fbox{?} & \cdots & \fbox{?} \\[1mm] \vdots & \vdots & \ddots & \vdots\\[1mm] \sqrt{P_{m-1}} & \fbox{?} & \cdots & \fbox{?} \end{pmatrix} \begin{pmatrix} \rho & 0 & \cdots & 0 \\[1mm] 0 & 0 & \cdots & 0 \\[1mm] \vdots & \vdots & \ddots & \vdots\\[1mm] 0 & 0 & \cdots & 0 \end{pmatrix} \begin{pmatrix} \sqrt{P_0} & \sqrt{P_1} & \cdots & \sqrt{P_{m-1}} \\[1mm] \fbox{?} & \fbox{?} & \cdots & \fbox{?} \\[1mm] \vdots & \vdots & \ddots & \vdots\\[1mm] \fbox{?} & \fbox{?} & \cdots & \fbox{?} \end{pmatrix} = \begin{pmatrix} \sqrt{P_0}\rho\sqrt{P_0} & \cdots & \sqrt{P_0}\rho\sqrt{P_{m-1}} \\[1mm] \vdots & \ddots & \vdots\\[1mm] \sqrt{P_{m-1}}\rho\sqrt{P_0} & \cdots & \sqrt{P_{m-1}}\rho\sqrt{P_{m-1}} \end{pmatrix}

Notice that the entries of UU falling into the blocks marked with a question mark have no influence on the outcome by virtue of the fact that we're conjugating a matrix of the form 00ρ\vert 0 \rangle \langle 0 \vert \otimes \rho — the question mark entries are always multiplied by zero entries of 00ρ\vert 0 \rangle \langle 0 \vert \otimes \rho when the matrix product is computed.

Now we can analyze what happens when a standard basis measurement is performed on Y.\mathsf{Y}. The probabilities of the possible outcomes are given by the diagonal entries of the reduced state σY\sigma_{\mathsf{Y}} of Y.\mathsf{Y}.

σY=a,b=0m1Tr(PaρPb)ab\sigma_{\mathsf{Y}} = \sum_{a,b=0}^{m-1} \operatorname{Tr}\Bigl(\sqrt{P_a} \rho \sqrt{P_b}\Bigr) \vert a\rangle \langle b \vert

In particular, using the cyclic property of the trace, we see that the probability to obtain a given outcome a{0,,m1}a\in\{0,\ldots,m-1\} is as follows.

aσYa=Tr(PaρPa)=Tr(Paρ)\langle a \vert \sigma_{\mathsf{Y}} \vert a \rangle = \operatorname{Tr}\Bigl(\sqrt{P_a} \rho \sqrt{P_a}\Bigr) = \operatorname{Tr}(P_a \rho)

This matches with the original measurement, establishing the correctness of the simulation.

Non-destructive measurements

So far in the lesson, we've concerned ourselves with destructive measurements, where the output consists of the classical measurement result alone and there is no specification of the post-measurement quantum state of the system that was measured. Non-destructive measurements, on the other hand, do precisely this. Specifically, non-destructive measurements describe not only the classical measurement outcome probabilities, but also the state of the system that was measured conditioned on each possible measurement outcome. Note that the term non-destructive refers to the system being measured but not necessarily its state, which could change significantly as a result of the measurement.

In general, for a given destructive measurement, there will be multiple (in fact infinitely many) non-destructive measurements that are compatible with the given destructive measurement, meaning that the classical measurement outcome probabilities match precisely with the destructive measurement. So, there isn't a unique way to define the post-measurement quantum state of a system for a given measurement. It is, in fact, possible to generalize non-destructive measurements even further, so that they produce a classical measurement outcome along with a quantum state output of a system that isn't necessarily the same as the input system.

The notion of a non-destructive measurement is an interesting and useful abstraction. It should, however, be recognized that non-destructive measurements can always be described as compositions of channels and destructive measurements — so there is a sense in which the notion of a destructive measurement is the more fundamental one.

From Naimark's theorem

Consider the simulation of a general measurement like we have in Naimark's theorem. A simple way to obtain a non-destructive measurement from this simulation is revealed by the figure from before, where the system X\mathsf{X} is not traced out, but is part of the output. This yields both a classical measurement outcome a{0,,m1}a\in\{0,\ldots,m-1\} as well as a post-measurement quantum state of X.\mathsf{X}.

Let's describe these states in mathematical terms. We're assuming that the initial state of X\mathsf{X} is ρ,\rho, so that after the initialized system Y\mathsf{Y} is introduced and UU is performed, we have that (Y,X)(\mathsf{Y},\mathsf{X}) is in the state

σ=U(00ρ)U=a,b=0m1abPaρPb.\sigma = U \bigl(\vert 0\rangle \langle 0 \vert \otimes \rho\bigr) U^{\dagger} = \sum_{a,b=0}^{m-1} \vert a\rangle \langle b \vert \otimes \sqrt{P_a} \rho \sqrt{P_b}.

The probabilities for the different classical outcomes to appear are the same as before — they can't change as a result of us deciding to ignore or not ignore X.\mathsf{X}. That is, we obtain each a{0,,m1}a\in\{0,\ldots,m-1\} with probability Tr(Paρ).\operatorname{Tr}(P_a \rho).

Conditioned upon having obtained a particular measurement outcome a,a, the resulting state of X\mathsf{X} is given by this expression.

PaρPaTr(Paρ)\frac{\sqrt{P_a} \rho \sqrt{P_a}}{\operatorname{Tr}(P_a \rho)}

One way to see this is to represent a standard basis measurement of Y\mathsf{Y} by the completely dephasing channel Δm,\Delta_m, where the channel output describes classical measurement outcomes as (diagonal) density matrices. An expression of the state we obtain follows.

a,b=0m1Δm(ab)PaρPb=a=0m1aaPaρPa.\sum_{a,b=0}^{m-1} \Delta_m(\vert a\rangle \langle b \vert) \otimes \sqrt{P_a} \rho \sqrt{P_b} = \sum_{a=0}^{m-1} \vert a\rangle \langle a \vert \otimes \sqrt{P_a} \rho \sqrt{P_a}.

We can then write this state as a convex combination of product states,

a=0m1Tr(Paρ)aaPaρPaTr(Paρ),\sum_{a=0}^{m-1} \operatorname{Tr}(P_a \rho)\, \vert a\rangle \langle a \vert \otimes \frac{\sqrt{P_a} \rho \sqrt{P_a}}{\operatorname{Tr}(P_a \rho)},

which is consistent with the expression we've obtained for the state of X\mathsf{X} conditioned on each possible measurement outcome.

From a Kraus representation

There are alternative selections for UU in the context of Naimark's theorem that produce the same measurement outcome probabilities but give entirely different output states of X.\mathsf{X}.

For instance, one option is to substitute (IYV)U(\mathbb{I}_{\mathsf{Y}} \otimes V) U for U,U, where VV is any unitary operation on X.\mathsf{X}. The application of VV to X\mathsf{X} commutes with the measurement of Y\mathsf{Y} so the classical outcome probabilities do not change, but now the state of X\mathsf{X} conditioned on the outcome aa becomes

VPaρPaVTr(Paρ).\frac{V \sqrt{P_a} \rho \sqrt{P_a}V^{\dagger}}{\operatorname{Tr}(P_a \rho)}.

More generally, we could replace UU by the unitary matrix

(a=0m1aaVa)U\Biggl(\sum_{a=0}^{m-1} \vert a\rangle\langle a \vert \otimes V_a\Biggr) U

for any choice of unitary operations V0,,Vm1V_0,\ldots,V_{m-1} on X.\mathsf{X}. Again, the classical outcome probabilities are unchanged, but now the state of X\mathsf{X} conditioned on the outcome aa becomes

VaPaρPaVaTr(Paρ).\frac{V_a \sqrt{P_a} \rho \sqrt{P_a}V_a^{\dagger}}{\operatorname{Tr}(P_a \rho)}.

An equivalent way to express this freedom is connected with Kraus representations. That is, we can describe an mm-outcome non-destructive measurement of a system having nn classical states by a selection of n×nn\times n Kraus matrices A0,,Am1A_0,\ldots,A_{m-1} satisfying the typical condition for Kraus matrices.

a=0m1AaAa=IX(3)\sum_{a = 0}^{m-1} A_a^{\dagger} A_a = \mathbb{I}_{\mathsf{X}} \tag{3}

Assuming that the initial state of X\mathsf{X} is ρ,\rho, the classical measurement outcome is aa with probability

Tr(AaρAa)=Tr(AaAaρ)\operatorname{Tr}\bigl(A_a \rho A_a^{\dagger}\bigr) = \operatorname{Tr}\bigl(A_a^{\dagger} A_a \rho \bigr)

and conditioned upon the outcome being aa the state of X\mathsf{X} becomes

AaρAaTr(AaAaρ).\frac{A_a \rho A_a^{\dagger}}{\operatorname{Tr}(A_a^{\dagger}A_a \rho)}.

Note that this is equivalent to choosing the unitary operation UU in Naimark's theorem as follows.

U=(A0??A1??Am1??)U = \begin{pmatrix} A_{0} & \fbox{?} & \cdots & \fbox{?} \\[1mm] A_{1} & \fbox{?} & \cdots & \fbox{?} \\[1mm] \vdots & \vdots & \ddots & \vdots\\[1mm] A_{m-1} & \fbox{?} & \cdots & \fbox{?} \end{pmatrix}

In the previous lesson we observed that the columns formed by the blocks A0,,Am1A_0,\ldots,A_{m-1} are necessarily orthogonal, by virtue of the condition (3).(3).

Generalizations

There are even more general ways to formulate non-destructive measurements than the ways we've discussed. The notion of a quantum instrument (which won't be described here) represents one way to do this.

Quantum state discrimination and tomography

In the last part of the lesson, we'll briefly consider two tasks associated with measurements: quantum state discrimination and quantum state tomography.

  1. Quantum state discrimination

    For quantum state discrimination, we have a known collection of quantum states ρ0,,ρm1,\rho_0,\ldots,\rho_{m-1}, along with probabilities p0,,pm1p_0,\ldots,p_{m-1} associated with these states. (A succinct way of expressing this is to say that we have an ensemble {(p0,ρ0),,(pm1,ρm1)}\{(p_0,\rho_0),\ldots,(p_{m-1},\rho_{m-1})\} of quantum states.) A number a{0,,m1}a\in\{0,\ldots,m-1\} is chosen randomly according to the probabilities (p0,,pm1)(p_0,\ldots,p_{m-1}) and the system X\mathsf{X} is prepared in the state ρa.\rho_a. The goal is to determine, by means of a measurement of X\mathsf{X} alone, which value of aa was chosen.

    Thus, we have a finite number of alternatives, along with a prior — which is our knowledge of the probability for each aa to be selected — and the goal is to determine which alternative actually happened. This may be easy for some choices of states and probabilities, and for others, it may not be possible without some chance of making an error.

  2. Quantum state tomography

    For quantum state tomography, we have an unknown quantum state of a system — so unlike in quantum state discrimination there's typically no prior or any information about possible alternatives. This time, however, it's not a single copy of the state that's made available, but rather many independent copies are made available. That is, NN identical systems X1,,XN\mathsf{X}_1,\ldots,\mathsf{X}_N are each independently prepared in the state ρ\rho for some (possibly large) number N.N. The goal is to find an approximation of the unknown state, as a density matrix, by measuring the systems.

Discriminating between two states

The simplest case for quantum state discrimination is that there are two states to be discriminated: ρ0\rho_0 and ρ1.\rho_1.

Imagine a situation which a bit aa is chosen randomly: a=0a = 0 with probability p[0,1]p \in [0,1] and a=1a = 1 with probability 1p.1 - p. A system X\mathsf{X} is prepared in the state ρa,\rho_a, meaning ρ0\rho_0 or ρ1\rho_1 depending on the value of a,a, and given to us. It is our goal to correctly guess the value of aa by means of a measurement on X.\mathsf{X}. To be precise, we shall aim to maximize the probability that our guess is correct.

An optimal measurement

An optimal way to solve this problem begins with a spectral decomposition of a weighted difference between ρ0\rho_0 and ρ1,\rho_1, where the weights are the corresponding probabilities.

pρ0(1p)ρ1=k=0n1λkψkψkp \rho_0 - (1-p) \rho_1 = \sum_{k = 0}^{n-1} \lambda_k \vert \psi_k \rangle \langle \psi_k \vert

Notice that we have a minus sign rather than a plus sign in this expression: this is a weighted difference not a weighted sum.

We can maximize the probability of a correct guess by selecting a projective measurement {Π0,Π1}\{\Pi_0,\Pi_1\} as follows. First let's partition the elements of {0,,n1}\{0,\ldots,n-1\} into two disjoint sets S0S_0 and S1S_1 depending upon whether the corresponding eigenvalue of the weighted difference is nonnegative or negative.

S0={k{0,,n1}:λk0}S1={k{0,,n1}:λk<0}\begin{gathered} S_0 = \{k\in\{0,\ldots,n-1\} : \lambda_k \geq 0 \}\\[2mm] S_1 = \{k\in\{0,\ldots,n-1\} : \lambda_k < 0 \} \end{gathered}

We can then choose a projective measurement as follows.

Π0=kS0ψkψkandΠ1=kS1ψkψk\Pi_0 = \sum_{k \in S_0} \vert \psi_k \rangle \langle \psi_k \vert \quad\text{and}\quad \Pi_1 = \sum_{k \in S_1} \vert \psi_k \rangle \langle \psi_k \vert

(It doesn't actually matter in which set S0S_0 or S1S_1 we include the values of kk for which λk=0.\lambda_k = 0. Here we're choosing arbitrarily to include these values in S0.S_0.)

This is an optimal measurement in the situation at hand that minimizes the probability of an incorrect determination of the selected state.

Correctness probability

Now we will determine the probability of correctness for the measurement {Π0,Π1}.\{\Pi_0,\Pi_1\}.

To begin we don't really need to be concerned with the specific choice we've made for Π0\Pi_0 and Π1,\Pi_1, though it may be helpful to keep it in mind. For any measurement {P0,P1}\{P_0,P_1\} (not necessarily projective) we can write the correctness probability as follows.

pTr(P0ρ0)+(1p)Tr(P1ρ1)p \operatorname{Tr}(P_0 \rho_0) + (1 - p) \operatorname{Tr}(P_1 \rho_1)

Using the fact that {P0,P1}\{P_0,P_1\} is a measurement, so P1=IP0,P_1 = \mathbb{I} - P_0, we can rewrite this expression as follows.

pTr(P0ρ0)+(1p)Tr((IP0)ρ1)=pTr(P0ρ0)(1p)Tr(P0ρ1)+(1p)Tr(ρ1)=Tr(P0(pρ0(1p)ρ1))+1p\begin{gathered} p \operatorname{Tr}(P_0 \rho_0) + (1 - p) \operatorname{Tr}((\mathbb{I} - P_0) \rho_1) = p \operatorname{Tr}(P_0 \rho_0) - (1 - p) \operatorname{Tr}(P_0 \rho_1) + (1-p) \operatorname{Tr}(\rho_1)\\[1mm] = \operatorname{Tr}\bigl( P_0 (p \rho_0 - (1-p)\rho_1) \bigr) + 1 - p \end{gathered}

On the other hand, we could have made the substitution P0=IP1P_0 = \mathbb{I} - P_1 instead. That wouldn't change the value but it does give us an alternative expression.

pTr((IP1)ρ0)+(1p)Tr(P1ρ1)=pTr(ρ0)pTr(P1ρ0)+(1p)Tr(P1ρ1)=pTr(P1(pρ0(1p)ρ1))\begin{gathered} p \operatorname{Tr}((\mathbb{I} - P_1) \rho_0) + (1 - p) \operatorname{Tr}(P_1 \rho_1) = p \operatorname{Tr}(\rho_0) - p \operatorname{Tr}(P_1 \rho_0) + (1 - p) \operatorname{Tr}(P_1 \rho_1)\\[1mm] = p - \operatorname{Tr}\bigl( P_1 (p \rho_0 - (1-p)\rho_1) \bigr) \end{gathered}

The two expressions have the same value, so we can average them to give yet another expression for this value. (Averaging the two expressions is just a trick to simplify the resulting expression.)

12(Tr(P0(pρ0(1p)ρ1))+1p)+12(pTr(P1(pρ0(1p)ρ1)))=12Tr((P0P1)(pρ0(1p)ρ1))+12\begin{gathered} \frac{1}{2} \bigl(\operatorname{Tr}\bigl( P_0 (p \rho_0 - (1-p)\rho_1) \bigr) + 1-p\bigr) + \frac{1}{2} \bigl(p - \operatorname{Tr}\bigl( P_1 (p \rho_0 - (1-p)\rho_1) \bigr)\bigr)\\ = \frac{1}{2} \operatorname{Tr}\bigl( (P_0-P_1) (p \rho_0 - (1-p)\rho_1)\bigr) + \frac{1}{2} \end{gathered}

Now we can see why it makes sense to choose the projections Π0\Pi_0 and Π1\Pi_1 (as specified above) for P0P_0 and P1,P_1, respectively — because that's how we can make the trace in the final expression as large as possible. In particular,

(Π0Π1)(pρ0(1p)ρ1)=k=0n1λkψkψk.(\Pi_0-\Pi_1) (p \rho_0 - (1-p)\rho_1) = \sum_{k = 0}^{n-1} \vert\lambda_k\vert \cdot \vert \psi_k \rangle \langle \psi_k \vert.

So, when we take the trace, we obtain the sum of the absolute values of the eigenvalues — which is equal to what's known as the trace norm of the weighted difference.

Tr((Π0Π1)(pρ0(1p)ρ1))=k=0n1λk=pρ0(1p)ρ11\operatorname{Tr}\bigl( (\Pi_0-\Pi_1) (p \rho_0 - (1-p)\rho_1)\bigr) = \sum_{k = 0}^{n-1} \vert\lambda_k\vert = \bigl\| p \rho_0 - (1-p)\rho_1 \bigr\|_1

Thus, the probability that the measurement {Π0,Π1}\{\Pi_0,\Pi_1\} leads to a correct discrimination of ρ0\rho_0 and ρ1,\rho_1, given with probabilities pp and 1p,1-p, respectively, is as follows.

12+12pρ0(1p)ρ11\frac{1}{2} + \frac{1}{2} \bigl\| p \rho_0 - (1-p)\rho_1 \bigr\|_1

The fact that this is the optimal probability for a correct discrimination of ρ0\rho_0 and ρ1,\rho_1, given with probabilities pp and 1p,1-p, is commonly referred to as the Helstrom-Holevo theorem (or sometimes just Helstrom's theorem).

Discriminating three or more states

For quantum state discrimination when there are three or more states, there is no known closed-form solution for an optimal measurement, although it is possible to formulate the problem as a semidefinite program — which allows for efficient numerical approximations of optimal measurements with the help of a computer.

It is also possible to verify (or falsify) optimality of a given measurement in a state discrimination task through a condition known as the Holevo-Yuen-Kennedy-Lax condition. In particular, for the state discrimination task defined by the ensemble {(p0,ρ0),,(pm1,ρm1)},\{(p_0,\rho_0),\ldots,(p_{m-1},\rho_{m-1})\}, the measurement {P0,,Pm1}\{P_0,\ldots,P_{m-1}\} is optimal if and only if the matrix

Qa=b=0m1pbρbPbpaρaQ_a = \sum_{b = 0}^{m-1} p_b \rho_b P_b - p_a \rho_a

is positive semidefinite for every a{0,,m1}.a\in\{0,\ldots,m-1\}.

For example, consider the quantum state discrimination task in which one of the four tetrahedral states ϕ0,,ϕ3\vert\phi_0\rangle,\ldots,\vert\phi_3\rangle is selected uniformly at random. The tetrahedral measurement {P0,P1,P2,P3}\{P_0,P_1,P_2,P_3\} succeeds with probability

14Tr(P0ϕ0ϕ0)+14Tr(P1ϕ1ϕ1)+14Tr(P2ϕ2ϕ2)+14Tr(P3ϕ3ϕ3)=12.\frac{1}{4} \operatorname{Tr}(P_0 \vert\phi_0\rangle\langle \phi_0 \vert) + \frac{1}{4} \operatorname{Tr}(P_1 \vert\phi_1\rangle\langle \phi_1 \vert) + \frac{1}{4} \operatorname{Tr}(P_2 \vert\phi_2\rangle\langle \phi_2 \vert) + \frac{1}{4} \operatorname{Tr}(P_3 \vert\phi_3\rangle\langle \phi_3 \vert) = \frac{1}{2}.

This is optimal by the Holevo-Yuen-Kennedy-Lax condition, as a calculation reveals that Qa=14(Iϕaϕa)0Q_a = \frac{1}{4}(\mathbb{I} - \vert\phi_a\rangle\langle\phi_a\vert) \geq 0 for a=0,1,2,3.a = 0,1,2,3.

Quantum state tomography

Finally, we'll briefly discuss the problem of quantum state tomography. For this problem, we're given a large number NN of independent copies of an unknown quantum state ρ,\rho, and the goal is to reconstruct an approximation ρ~\tilde{\rho} of ρ.\rho. To be clear, this means that we wish to find a classical description of a density matrix ρ~\tilde{\rho} that is as close as possible to ρ.\rho.

We can alternatively describe the set-up in the following way. An unknown density matrix ρ\rho is selected, and we're given access to NN quantum systems X1,,XN,\mathsf{X}_1,\ldots,\mathsf{X}_N, each of which has been independently prepared in the state ρ.\rho. Thus, the state of the compound system (X1,,XN)(\mathsf{X}_1,\ldots,\mathsf{X}_N) is

ρN=ρρρ(N times)\rho^{\otimes N} = \rho \otimes \rho \otimes \cdots \otimes \rho \quad \text{($N$ times)}

The goal is to perform measurements on the systems X1,,XN\mathsf{X}_1,\ldots,\mathsf{X}_N and, based on the outcomes of those measurements, to compute a density matrix ρ~\tilde{\rho} that closely approximates ρ.\rho. This turns out to be a fascinating problem and there is ongoing research on it.

Different types of strategies for approaching the problem may be considered. For example, we can imagine a strategy where each of the systems X1,,XN\mathsf{X}_1,\ldots,\mathsf{X}_N is measured separately, in turn, producing a sequence of measurement outcomes. Different specific choices for which measurements are performed can be made, including adaptive and non-adaptive selections. In other words, the choice of what measurement is performed on a particular system might or might not depend on the outcomes of prior measurements. Based on the sequence of measurement outcomes, a guess ρ~\tilde{\rho} for the state ρ\rho is derived — and again there are different methodologies for doing this.

An alternative approach is to perform a single joint measurement of the entire collection, where we think about (X1,,XN)(\mathsf{X}_1,\ldots,\mathsf{X}_N) as a single system and select a single measurement whose output is a guess ρ~\tilde{\rho} for the state ρ.\rho. This can lead to an improved estimate over what is possible for separate measurements of the individual systems, although a joint measurement on all of the systems together is likely to be much more difficult to implement.

Qubit tomography using Pauli measurements

We'll now consider quantum state tomography in the simple case where ρ\rho is a qubit density matrix. We assume that we're given qubits X1,,XN\mathsf{X}_1,\ldots,\mathsf{X}_N that are each independently in the state ρ,\rho, and our goal is to compute an approximation ρ~\tilde{\rho} that is close to ρ.\rho.

Our strategy will be to divide the NN qubits X1,,XN\mathsf{X}_1,\ldots,\mathsf{X}_N into three roughly equal-size collections, one for each of the three Pauli matrices σx,\sigma_x, σy,\sigma_y, and σz.\sigma_z. Each qubit is then measured independently as follows.

  1. For each of the qubits in the collection associated with σx\sigma_x we perform a σx\sigma_x measurement. This means that the qubit is measured with respect to the basis {+,},\{\vert + \rangle, \vert -\rangle\}, which is an orthonormal basis of eigenvectors of σx,\sigma_x, and the corresponding measurement outcomes are the eigenvalues associated with the two eigenvectors: +1+1 for the state +\vert + \rangle and 1-1 for the state .\vert -\rangle. By averaging together the outcomes over all of the states in the collection associated with σx,\sigma_x, we obtain an approximation of the expectation value

    +ρ+ρ=Tr(σxρ).\langle + \vert \rho \vert + \rangle - \langle - \vert \rho \vert - \rangle = \operatorname{Tr}(\sigma_x \rho).
  2. For each of the qubits in the collection associated with σy\sigma_y we perform a σy\sigma_y measurement. Such a measurement is similar to a σx\sigma_x measurement, except that the measurement basis is { ⁣+ ⁣i, ⁣ ⁣i},\{\vert\! +\!i \rangle, \vert\! -\!i \rangle\}, the eigenvectors of σy.\sigma_y. Averaging the outcomes over all of the states in the collection associated with σy,\sigma_y, we obtain an approximation of the expectation value

    +iρ ⁣+ ⁣iiρ ⁣ ⁣i=Tr(σyρ).\langle +i \vert \rho \vert \!+\!i \rangle - \langle -i \vert \rho \vert \!-\!i \rangle = \operatorname{Tr}(\sigma_y \rho).
  3. For each of the qubits in the collection associated with σz\sigma_z we perform a σz\sigma_z measurement. This time the measurement basis is the standard basis {0,1},\{\vert 0\rangle, \vert 1 \rangle\}, the eigenvectors of σz.\sigma_z. Averaging the outcomes over all of the states in the collection associated with σz,\sigma_z, we obtain an approximation of the expectation value

    0ρ01ρ1=Tr(σzρ).\langle 0 \vert \rho \vert 0 \rangle - \langle 1 \vert \rho \vert 1 \rangle = \operatorname{Tr}(\sigma_z \rho).

Once we have obtained approximations αxTr(σxρ),\alpha_x \approx \operatorname{Tr}(\sigma_x \rho), αyTr(σyρ),\alpha_y \approx \operatorname{Tr}(\sigma_y \rho), and αzTr(σzρ)\alpha_z \approx \operatorname{Tr}(\sigma_z \rho) by averaging the measurement outcomes for each collection, we can approximate ρ\rho as

ρ~=I+αxσx+αyσy+αzσz2I+Tr(σxρ)σx+Tr(σyρ)σy+Tr(σzρ)σz2=ρ.\tilde{\rho} = \frac{\mathbb{I} + \alpha_x \sigma_x + \alpha_y \sigma_y + \alpha_z \sigma_z}{2} \approx \frac{\mathbb{I} + \operatorname{Tr}(\sigma_x \rho) \sigma_x + \operatorname{Tr}(\sigma_y \rho) \sigma_y + \operatorname{Tr}(\sigma_z \rho) \sigma_z}{2} = \rho.

In the limit as NN approaches infinity, this approximation converges in probability to the true density matrix ρ\rho by the Law of Large Numbers, and well-known statistical bounds (such as Hoeffding's inequality) can be used to bound the probability that the approximation ρ~\tilde{\rho} deviates from ρ\rho by varying amounts.

An important thing to recognize, however, is that the matrix ρ~\tilde{\rho} obtained in this way may fail to be a density matrix. In particular, although it will always have trace equal to 1,1, it may fail to be positive semidefinite. There are different known strategies for "rounding" such an approximation ρ~\tilde{\rho} to a density matrix, one of them being to compute a spectral decomposition, replace any negative eigenvalues with 0,0, and then renormalize (by dividing the matrix we obtain by its trace).

Qubit tomography using the tetrahedral measurement

Another option for performing qubit tomography is to measure every qubit X1,,XN\mathsf{X}_1,\ldots,\mathsf{X}_N using the tetrahedral measurement {P0,P1,P2,P3}\{P_0,P_1,P_2,P_3\} described earlier. That is,

P0=ϕ0ϕ02,P1=ϕ1ϕ12,P2=ϕ2ϕ22,P3=ϕ3ϕ32P_0 = \frac{\vert \phi_0 \rangle \langle \phi_0 \vert}{2}, \quad P_1 = \frac{\vert \phi_1 \rangle \langle \phi_1 \vert}{2}, \quad P_2 = \frac{\vert \phi_2 \rangle \langle \phi_2 \vert}{2}, \quad P_3 = \frac{\vert \phi_3 \rangle \langle \phi_3 \vert}{2}

for

ϕ0=0ϕ1=130+231ϕ2=130+23e2πi/31ϕ3=130+23e2πi/31.\begin{aligned} \vert \phi_0 \rangle & = \vert 0 \rangle\\ \vert \phi_1 \rangle & = \frac{1}{\sqrt{3}} \vert 0 \rangle + \sqrt{\frac{2}{3}} \vert 1 \rangle\\ \vert \phi_2 \rangle & = \frac{1}{\sqrt{3}} \vert 0 \rangle + \sqrt{\frac{2}{3}} e^{2\pi i/3} \vert 1 \rangle\\ \vert \phi_3 \rangle & = \frac{1}{\sqrt{3}} \vert 0 \rangle + \sqrt{\frac{2}{3}} e^{-2\pi i/3} \vert 1 \rangle. \end{aligned}

Each outcome is obtained some number of times, which we will denote as nan_a for each a{0,1,2,3},a\in\{0,1,2,3\}, so that n0+n1+n2+n3=N.n_0 + n_1 + n_2 + n_3 = N. The ratio of these numbers with NN provides an estimate of the probability associated with each possible outcome:

naNTr(Paρ).\frac{n_a}{N} \approx \operatorname{Tr}(P_a \rho).

Finally, we shall make use of the following remarkable formula:

ρ=a=03(3Tr(Paρ)12)ϕaϕa.\rho = \sum_{a=0}^3 \Bigl( 3 \operatorname{Tr}(P_a \rho) - \frac{1}{2}\Bigr) \vert \phi_a \rangle \langle \phi_a \vert.

To establish this formula, we can use the following equation for the absolute values squared of inner products of tetrahedral states, which can be checked through direct calculations.

ϕaϕb2={1a=b13ab.\bigl\vert \langle \phi_a \vert \phi_b \rangle \bigr\vert^2 = \begin{cases} 1 & a=b\\ \frac{1}{3} & a\neq b. \end{cases}

Now, the four matrices

ϕ0ϕ0=(1000)ϕ1ϕ1=(13232323)ϕ2ϕ2=(1323e2πi/323e2πi/323)ϕ3ϕ3=(1323e2πi/323e2πi/323)\begin{aligned} \vert\phi_0\rangle \langle \phi_0 \vert & = \begin{pmatrix} 1 & 0\\[2mm] 0 & 0\end{pmatrix}\\[2mm] \vert\phi_1\rangle \langle \phi_1 \vert & = \begin{pmatrix} \frac{1}{3} & \frac{\sqrt{2}}{3}\\[2mm] \frac{\sqrt{2}}{3} & \frac{2}{3}\end{pmatrix}\\[2mm] \vert\phi_2\rangle \langle \phi_2 \vert & = \begin{pmatrix} \frac{1}{3} & \frac{\sqrt{2}}{3}e^{-2\pi i/3}\\[2mm] \frac{\sqrt{2}}{3}e^{2\pi i/3} & \frac{2}{3}\end{pmatrix}\\[2mm] \vert\phi_3\rangle \langle \phi_3 \vert & = \begin{pmatrix} \frac{1}{3} & \frac{\sqrt{2}}{3}e^{2\pi i/3}\\[2mm] \frac{\sqrt{2}}{3}e^{-2\pi i/3} & \frac{2}{3}\end{pmatrix} \end{aligned}

are linearly independent, so it suffices to prove that the formula is true when ρ=ϕbϕb\rho = \vert\phi_b\rangle\langle\phi_b\vert for b=0,1,2,3.b = 0,1,2,3. In particular,

3Tr(Paϕbϕb)12=32ϕaϕb212={1a=b0ab3 \operatorname{Tr}(P_a \vert\phi_b\rangle\langle\phi_b\vert) - \frac{1}{2} = \frac{3}{2} \vert \langle \phi_a \vert \phi_b \rangle \vert^2 - \frac{1}{2} = \begin{cases} 1 & a=b\\ 0 & a\neq b \end{cases}

and therefore

a=03(3Tr(Paϕbϕb)Tr(ϕbϕb)2)ϕaϕa=ϕbϕb.\sum_{a=0}^3 \biggl( 3 \operatorname{Tr}(P_a \vert\phi_b\rangle\langle\phi_b\vert) - \frac{\operatorname{Tr}(\vert\phi_b\rangle\langle\phi_b\vert)}{2}\biggr) \vert \phi_a \rangle \langle \phi_a \vert = \vert \phi_b\rangle\langle \phi_b \vert.

We arrive at an approximation of ρ:\rho:

ρ~=a=03(3naN12)ϕaϕa.\tilde{\rho} = \sum_{a=0}^3 \Bigl( \frac{3 n_a}{N} - \frac{1}{2}\Bigr) \vert \phi_a \rangle \langle \phi_a \vert.

This approximation will always be a Hermitian matrix having trace equal to one, but it may fail to be positive semidefinite. In this case, the approximation must be "rounded" to a density matrix, similar to the strategy involving Pauli measurements.

Was this page helpful?