Abstract arXiv:1811.10599v7 [quant-ph] 8 Jun 2020 · 2020. 6. 9. · arXiv:1811.10599v7 [quant-ph] 8 Jun 2020 Divergenceradii and thestrong converseexponent of classical-quantum channel

arX

iv:1

811.

1059

9v7

[qu

ant-

ph]

8 J

un 2

020

Divergence radii and the strong converse exponent of

classical-quantum channel coding with constant compositions

Milan Mosonyi1, 2, ∗ and Tomohiro Ogawa3, †

1MTA-BME Lendulet Quantum Information Theory Research Group2Mathematical Institute, Budapest University of Technology and Economics,

Egry Jozsef u 1., Budapest, 1111 Hungary.3Graduate School of Informatics and Engineering, University of Electro-Communications,

1-5-1 Chofugaoka, Chofu-shi, Tokyo, 182-8585, Japan.

Abstract

There are different inequivalent ways to define the Renyi capacity of a channel for a fixed in-put distribution P . In [IEEE Transactions on Information Theory, 41(1):26-34, 1995], Csiszar hasshown that for classical discrete memoryless channels there is a distinguished such quantity that hasan operational interpretation as a generalized cutoff rate for constant composition channel coding.We show that the analogous notion of Renyi capacity, defined in terms of the sandwiched quan-tum Renyi divergences, has the same operational interpretation in the strong converse problem ofconstant composition classical-quantum channel coding. Denoting the constant composition strongconverse exponent for a memoryless classical-quantum channel W with composition P and rate R

as sc(W,R,P ), our main result is that

sc(W,R,P ) = supα>1

α− 1

α[R − χ

∗α(W,P )] ,

where χ∗α(W,P ) is the P -weighted sandwiched Renyi divergence radius of the image of the channel.

I. INTRODUCTION

A classical-quantum channel W : X → S(H) models a device with a set of possible inputsX , which, on an input x ∈ X , outputs a quantum system with finite-dimensional Hilbert spaceH in state W (x). The channel is called classical if all output states commute, and hence canbe represented as probability distributions on some set Y; the interpretation of this is thatthe channel outputs the symbol y ∈ Y with probability (W (x))(y). We will only deal withi.i.d. (independent and identically distributed) channels, meaning that when the channel Wis used sequentially, it emits the state W (x) = W (x1) ⊗ . . . ⊗ W (xn) on an input sequencex = (x1, . . . , xn).A channel can be used for information transmission by suitable encoding at the sender’s, and

decoding at the receiver’s side, and it is one of the central problems in information theory to devisecodes that allow the transmission of a large amount of information with a small probability oferror, and to quantify the ultimate efficiency achievable by any code. The latter task can be bestdone in an asymptotic setting, where the channel is allowed to be used arbitrarily many times,and the aim is to find the best achievable error asymptotics for a given rate R of the amount oftransmittable information per channel use. The fundamental results of Shannon [64] for classical,and of Holevo [39] and Schumacher and Westmoreland [62] for classical-quantum channels, showthat there exists a critical rate C(W ), called the Shannon-, resp. Holevo capacity of the channel,below which reliable information transmission is possible in the sense of asymptotically vanishingerror probability, and above which it is not. Moreover, as the results of [14, 30, 52, 56, 72] show,the error probability for an optimal sequence of codes goes to zero exponentially fast for anyrate below C(W ), and it goes to one exponentially fast for any rate above C(W ). The optimalachievable exponents for a fixed coding rate R are called the direct exponent d(W,R) for ratesbelow, and the strong converse exponent sc(W,R) for rates above C(W ). The usefulness of thechannel for information transmission is completely characterized once these exponents are givenby some essentially computable expression for every possible rate R.

∗ [email protected]† [email protected]

http://arxiv.org/abs/1811.10599v7

mailto:[email protected]

mailto:[email protected]

2

This problem is famously open for the direct exponent and low rates even for classical channels.The strong converse exponent, on the other hand, has been given for every rate R in [24] forclassical, and in [50] for classical-quantum channels (see also [19, 20] for the classical case). Itcan be expressed as

sc(W,R) = supα>1

α− 1

α[R− χ∗

α(W )] , (I.1)

where χ∗α(W ) = χD∗

α(W ) is the sandwiched Renyi α-capacity of W , defined as

χ∗α(W ) := sup

P∈Pf (X )

χ∗α(W,P ) = sup

P∈Pf (X )

I∗α(W,P ). (I.2)

Here, Pf (X ) is the set of finitely supported probability distributions on X , and χ∗α(W,P ) and

I∗α(W,P ) are the P -weighted sandwiched Renyi divergence radius and the Renyi mutual infor-mation of the channel, respectively, the quantities of main interest in our paper.These notions may be defined more generally by choosing any quantum divergence ∆, i.e.,

some sort of generalized distance of quantum states, as

I∆(W,P ) := infσ∈S(H)

∆(W(P )‖P ⊗ σ) , (I.3)

(∆-mutual information), and

χ∆(W,P ) := infσ∈S(H)

∑

x∈X

P (x)∆(W (x)‖σ), (I.4)

(P -weighted ∆-radius), where P is a fixed, finitely supported probability distribution on theinput, and W(P ) =

∑x∈X P (x) |x〉〈x| ⊗W (x) is the joint state of the classical input and the

quantum output. The sandwiched Renyi mutual information and P -weighted radius are obtainedby choosing ∆ to be a sandwiched Renyi divergenceD∗

α [51, 71]. In the case whereW is a classicalchannel, and ∆ is a Renyi divergence, these quantities were studied by Sibson [66] and Augustin[9], respectively; see [19], and the recent works [16, 53, 54] for more references on their historyand their applications.The mutual information and the weighted radius provide quantifications of the information

transmission capacity of a channel from different perspectives. Indeed, the mutual informationmeasures the ∆-distance of the joint input-output state of the channel from the set of uncorrelatedstates, with the first marginal fixed. This can be interpreted as a measure of the maximal amountof correlation that can be created between the input and the output of the channel with a fixedinput distribution. The idea is that the more correlated the input and the output can be made,the more useful the channel is for information transmission. The weighted radius is geometricallymotivated, and the idea behind it is that the farther away some states are in ∆-distance (weightedby the input distribution P ), the more distinguishable they are, and the information transmissioncapacity of the channel is related to the number of far away states among the output states ofthe channel.The expression in (I.1) provides an operational interpretation for the sandwiched Renyi α-

capacities with α > 1, and singles them out as the operationally relevant quantifiers of the infor-mation transmission capacity of classical-quantum channels, from among various other quantities(e.g., Renyi capacities corresponding to other notions of quantum Renyi divergences [50]). It isnatural to ask whether the mutual information and the weighted divergence radius can be givensimilar operational interpretations in the context of channel coding. Note that the standardchannel coding problem does not yield an answer to this question, because after optimizationover the input distribution, the mutual information and the weighted radius give the same capac-ity, as expressed in (I.2). Therefore, to settle this problem, one needs to consider a refinement ofthe channel coding problem where the input distribution P appears on the operational side. Thiscan be achieved by considering constant composition coding, where the codewords are requiredto have the same empirical distribution for each message, and these empirical distributions arerequired to converge to a fixed distribution P on X as the number of channel uses goes to infinity.It was shown by Csiszar in [19] that in this setting

sc(W,R, P ) = supα>1

α− 1

α[R− χα(W,P )] , (I.5)

3

for any classical channel W and input distribution P , where sc(W,R, P ) is the strong converseexponent for coding rate R, and χα(W,P ) = χDα

(W,P ). Here, Dα is the classical Renyi α-divergence [61]. This shows that, maybe somewhat surprisingly, it is not the perhaps moreintuitive-looking concept of mutual information (I.3) but the geometric quantity (I.4) that cor-rectly captures the information transmission capacity of a classical channel.Our main result is an exact analogue of (I.5) for classical-quantum channels, given as

sc(W,R, P ) = supα>1

α− 1

α[R− χ∗

α(W,P )] , (I.6)

where χ∗α(W,P ) is the P -weighted sandwiched Renyi divergence radius of the channel. Thus we

establish that, in the strong converse domain, the operationally relevant notion of Renyi capacityfor a classical-quantum channel with fixed input distribution P is the P -weighted sandwichedRenyi divergence radius of the channel.The structure of the paper is as follows. After collecting some technical preliminaries in Section

II, we study the concepts of divergence radius and center for general divergences in Section IIIAand for the class of α-z quantum Renyi divergences in Section III C. One of our main results isthe additivity of the weighted α-z Renyi divergence radius for classical-quantum channels andcertain pairs (α, z), given in Section IIID. We prove it using a representation of the minimizingstate in (I.4) when ∆ = Dα,z is a quantum α-z Renyi divergence [7], as the fixed point of a certainmap on the state space. Analogous results have been derived very recently by Nakiboglu in [54]for classical channels, and by Cheng, Li and Hsieh in [16] for classical-quantum channels and thePetz-type Renyi divergences. Our results extend these with a different proof method, which inturn is closely related to the approach of Hayashi and Tomamichel for proving the additivity ofthe sandwiched Renyi mutual information [34].In Section IV, we prove our main result, (I.6). The non-trivial part of this is the inequality

LHS≤RHS, which we prove using a refinement of the arguments in [50]. First, in PropositionIV.5 we employ a suitable adaptation of the techniques of Dueck and Korner [24] and obtainthe inequality in terms of the log-Euclidean Renyi divergence, which gives a suboptimal bound.Then in Proposition IV.6 we use the asymptotic pinching technique from [50] to arrive at anupper bound in terms of the regularized sandwiched Renyi divergence radii, and finally we usethe previously established additivity property of these quantities to arrive at the desired bound.In the proof of Proposition IV.5 we need a constant composition version of the classical-quantumchannel coding theorem. Such a result was established, for instance, by Hayashi in [31], andvery recently by Cheng, Hanson, Datta and Hsieh in [17], with a different exponent, by refininganother random coding argument by Hayashi [30]. We give a slightly modified proof in AppendixC. Further appendices contain various technical ingredients of the proofs, and in Appendix A wegive a more detailed discussion of the concepts of divergence radius and mutual information forgeneral divergences and α-z Renyi divergences, which may be of independent interest.

II. PRELIMINARIES

For a finite-dimensional Hilbert space H, let B(H) denote the set of all linear operators on H,and let B(H)sa, B(H)+, and B(H)++ denote the set of self-adjoint, non-zero positive semi-definite(PSD), and positive definite operators, respectively. For an interval J ⊆ R, let B(H)sa,J := {A ∈B(H)sa : spec(A) ⊆ J}, i.e., the set of self-adjoint operators on H with all their eigenvalues inJ . Let S(H) := { ∈ B(H)+, Tr = 1} denote the set of density operators, or states, on H, andS(H)++ the set of invertible density operators.For a self-adjoint operator A, let PA

a := 1{a}(A) denote the spectral projection of A corre-

sponding to the singleton {a}. The projection onto the support of A is∑

a 6=0 PAa ; in particular,

if A is positive semi-definite, it is equal to limαց0 Aα =: A0. In general, we follow the convention

that real powers of a positive semi-definite operator A are taken only on its support, i.e., for anyx ∈ R, Ax :=

∑a>0 a

xPAa .

For projections P1, . . . , Pr on a Hilbert-space H, we denote by∨r

i=1 Pi the projection onto thesubspace spanned by

⋃ri=1 ranPi.

Given a self-adjoint operator A ∈ B(H)sa, the pinching by A is the operator FA : B(H) →B(H), FA(.) :=

∑a P

Aa (.)PA

a , i.e., the block-diagonalization with the eigen-projectors of A. By

4

the pinching inequality [29],

X ≤ | spec(A)|FA(X), X ∈ B(H)+.

Since FA can be written as a convex combination of unitary conjugations,

f (FA(B)) ≤ FA(f(B)), Tr g (FA(B)) ≤ Tr g(B), (II.7)

for any operator convex function f , and any convex function g on an interval J , and any B ∈B(H)sa,J . The second inequality above is due to the following well-known fact:

Lemma II.1 Let J ⊆ R be an interval and f : J → R be a function.

(i) If f is monotone increasing then Tr f(.) is monotone increasing on B(H)sa,J .

(ii) If f is convex then Tr f(.) is convex on B(H)sa,J .

For a differentiable function f defined on an interval J ⊆ R, let f [1] : J × J → R be its firstdivided difference function, defined as

f [1](a, b) :=

{f(a)−f(b)

a−b , a 6= b,

f ′(a), a = b,a, b ∈ J.

The proof of the following can be found, e.g., in [11, Theorem V.3.3] or [35, Theorem 2.3.1]:

Lemma II.2 If f is a continuously differentiable function on an open interval J ⊆ R then forany finite-dimensional Hilbert space H, A 7→ f(A) is Frechet differentiable on B(H)sa,J , and itsFrechet derivative Df(A) at a point A is given by

Df(A)(Y ) =∑

a,b

f [1](a, b)PAa Y P

Ab , Y ∈ B(H)sa.

It is straightforward to verify that in the setting of Lemma II.2, the function A 7→ Tr f(A) isalso Frechet differentiable on B(H)sa,J , and its Frechet derivative D(Tr ◦f)(A) at a point A isgiven by

D(Tr ◦f)(A)(Y ) = Tr f ′(A)Y, Y ∈ B(H)sa, (II.8)

where f ′ is the derivative of f as a real-valued function.

An operator A ∈ B(H⊗n) is symmetric, if UπAU∗π = A for all permutations π ∈ Sn, where Uπ

is defined by Uπx1 ⊗ . . .⊗ xn = xπ−1(1) ⊗ . . .⊗ xπ−1(n), xi ∈ H, i ∈ [n]. As it was shown in [31],for every finite-dimensional Hilbert space H and every n ∈ N, there exists a universal symmetricstate σu,n ∈ S(H⊗n) such that it is symmetric, it commutes with every symmetric state, and forevery symmetric state ω ∈ S(H⊗n),

ω ≤ vn,dσu,n,

where vn,d only depends on d = dimH and n, and it is polynomial in n.

By a generalized classical-quantum (gcq) channel we mean a map W : X → B(H)+, where Xis a non-empty set, and H is a finite-dimensional Hilbert space. It is a classical-quantum (cq)channel if ranW ⊆ S(H), i.e., each output of the channel is a normalized quantum state. A(generalized) classical-quantum channel is classical, if W (x)W (y) =W (y)W (x) for all x, y ∈ X .We remark that we do not require any further structure of X or the map W , and in particular,X need not be finite. Given a finite number of gcq channels Wi : Xi → B(Hi)+, their product isthe gcq channel

W1 ⊗ . . .⊗Wn :X1 × . . .×Xn → B(H1 ⊗ . . .⊗Hn)+

(x1, . . . , xn) 7→W1(x1)⊗ . . .⊗Wn(xn).

5

In particular, if all Wi are the same channel W then we use the notaion W⊗n =W ⊗ . . .⊗W .

We say that a function P : X → [0, 1] is a probability density function on a set X if 1 =∑x∈X P (x) := sup

{∑x∈X0

P (x) : X0 ⊆ X finite}. The support of P is suppP := {x ∈ X :

P (x) > 0}. We say that P is finitely supported if supp is a finite set, and we denote by Pf (X )the set of all finitely supported probability distributions. The Shannon entropy of a P ∈ Pf (X )is defined as

H(P ) := −∑

x∈X

P (x) logP (x).

For a sequence x ∈ Xn, the type Px ∈ Pf(X ) of x is the empirical distribution of x, defined as

Px :=1

n

n∑

i=1

δxi: y 7→

1

n|{k : xk = y}| , y ∈ X ,

where δx is the Dirac measure concentrated at x. We say that a probability distribution P onX is an n-type if there exists an x ∈ Xn such that P = Px. We denote the set of n-types byPn(X ). For an n-type P , let Xn

P := {x ∈ Xn : Px = P} be the set of sequences with the sametype P . A key property of types is that x, y ∈ Xn have the same type if and only if they arepermutations of each other, and for any x, y with Px = Py, we have

P⊗nx (y) = e−nH(Px). (II.9)

By Lemma 2.3 in [21], for any P ∈ Pn(X ),

(n+ 1)−| suppP |enH(P ) ≤ |XnP | ≤ enH(P ). (II.10)

For any P ∈ Pf (X ), any m ∈ N, and any a ∈ X ,

∑

x∈Xm

P⊗m(x)Px(a) =∑

x∈Xm

P⊗m(x)1

m

m∑

k=1

1{a}(xk)

=1

m

m∑

k=1

∑

x∈Xm

P⊗m(x)1{a}(xk) =1

m

m∑

k=1

P (a) = P (a). (II.11)

The following lemma is an extension of the minimax theorems due to Kneser [41] and Fan [25]to the case where f can take the value +∞. For a proof, see [26, Theorem 5.2].

Lemma II.3 Let X be a compact convex set in a topological vector space V and Y be a convexsubset of a vector space W . Let f : X × Y → R ∪ {+∞} be such that

(i) f(x, .) is concave on Y for each x ∈ X, and

(ii) f(., y) is convex and lower semi-continuous on X for each y ∈ Y .

Then

infx∈X

supy∈Y

f(x, y) = supy∈Y

infx∈X

f(x, y), (II.12)

and the infima in (II.12) can be replaced by minima.

III. DIVERGENCE RADII

A. General divergences

By a quantum divergence ∆ we mean a function on pairs of non-zero positive semi-definitematrices,

∆ : ∪d∈N

(B(Cd)+ × B(Cd)+

)→ R ∪ {±∞},

6

that is invariant under isometries, i.e., if V : Cd1 → Cd2 is an isometry then

∆(V V ∗‖V σV ∗) = ∆(‖σ), , σ ∈ B(Cd1)+.

Due to the isometric invariance, ∆ may be extended to pairs of non-zero PSD operators on anyfinite-dimensional Hilbert space H, by choosing any isometry V : H → Cd with large enough d,and defining

∆(‖σ) := ∆(V V ∗‖V σV ∗), , σ ∈ B(H)+.

The isometric invariance property guarantees that this extension is well-defined, in the sense thatthe value of ∆(V V ∗‖V σV ∗) is independent of the choice of d and V . Clearly, this extensionis again invariant under isometries, i.e., for any , σ ∈ B(H)+ and V : H → K isometry,∆ (V V ∗‖V σV ∗) = ∆(‖σ). Note that this implies that ∆ is invariant under extensions withpure states, i.e., ∆( ⊗ |ψ〉〈ψ| ‖σ ⊗ |ψ〉〈ψ|) = ∆(‖σ), where ψ is an arbitrary unit vector insome Hilbert space. Further properties will often be important. In particular, we say that adivergence ∆ is

• positive if ∆(‖σ) ≥ 0 for all density operators , σ, and it is strictly positive if ∆(‖σ) =0 ⇐⇒ = σ, again for density operators;

• monotone under CPTP maps if for any , σ ∈ B(H)+ and any CPTP (completely positiveand trace-preserving) map Φ : B(H) → B(K),

∆ (Φ()‖Φ(σ)) ≤ ∆(‖σ) ;

• jointly convex if for all i, σi ∈ B(H), i ∈ [r] := {1, . . . , r}, and probability distribution(pi)

ri=1,

∆

(r∑

i=1

pii

∥∥∥∥∥

r∑

i=1

piσi

)≤

r∑

i=1

pi∆(i‖σi) ;

• additive on tensor products if for any i, σi ∈ B(Hi)+, i = 1, 2,

∆(1 ⊗ 2‖σ1 ⊗ σ2) = ∆(1‖σ1) + ∆(2‖σ2).

• block additive if for any 1, 2, σ1, σ2 such that 01 ∨ σ01 ⊥ 02 ∨ σ

02 , we have

∆(1 + 2‖σ1 + σ2) = ∆(1‖σ1) + ∆(2‖σ2);

• homogeneous if

∆(λ‖λσ) = λ∆(‖σ), , σ ∈ B(H)+, λ ∈ (0,+∞).

Typical examples for divergences with some or all of the above properties are the relative entropyand some Renyi divergences and related quantities; see Section III B.

Remark III.1 It is well-known [58, 69] that a block additive and homogenous divergence ismonotone under CPTP maps if and only if it is jointly convex. The “only if” direction followsby applying monotonicity to :=

∑i pi |i〉〈i|E ⊗ i and σ :=

∑i pi |i〉〈i|E ⊗ σi under the partial

trace over the E system, where (|i〉)ri=1 is an ONS in HE. The “if” direction follows by usinga Stinespring dilation Φ(.) = TrE V (.)V ∗ with an isometry V : H → K ⊗ HE, and writingthe partial trace as a convex combination of unitary conjugations (e.g., by the discrete Weylunitaries).

Given a non-empty set of positive semi-definite operators S ⊆ B(H)+, its ∆-radius R∆(S) isdefined as

R∆(S) := infσ∈S(H)

sup∈S

∆(‖σ). (III.13)

7

If the above infimum is attained at some σ ∈ S(H) then σ is called a ∆-center of S. A variantof this notion is when, instead of minimizing the maximal ∆-distance, we minimize an averageddistance according to some finitely supported probability distribution P ∈ Pf(S). This yieldsthe notion of the P -weighted ∆-radius:

R∆,P (S) := infσ∈S(H)

∑

∈S

P ()∆(‖σ). (III.14)

If the above infimum is attained at some σ ∈ S(H) then σ is called a P -weighted ∆-center for S.

Remark III.2 For applications in channel coding, S will be the image of a classical-quantumchannel, and hence a subset of the state space. In this case minimizing over density operators σin (III.13) and (III.14) seems natural, while it is less obviously so when the elements of S aregeneral positive semi-definite operators. We discuss this further in Appendix A.

Remark III.3 Note that for any finitely supported probability distribution P on B(H)+, wehave, by definition, P () = 0 for all ∈ B(H)+ \ suppP , and hence

R∆,P (B(H)+) = R∆,P (suppP ) = R∆,P (S).

for any S ⊆ B(H)+ with suppP ⊆ S. That is, R∆,P (S) does not in fact depend on S, it is afunction only of P . Hence, if no confusion arises, we may simply denote it as R∆,P .

Remark III.4 The concepts of the divergence radius and P -weighted divergence radius can beunified (to some extent) by the notion of the (P, β)-weighted ∆-radius, which we explain in SectionA 1.

We will mainly be interested in the above concepts when S is the image of a gcq channelW : X → B(H)+, in which case we will use the notation

χ∆(W,P ) := R∆,P◦W−1(ranW ) = infσ∈S(H)

∑

x∈X

P (x)∆(W (x)‖σ), (III.15)

where (P ◦W−1)() :=∑

x∈X :W (x)= P (x). Note that, as far as these quantities are concerned,

the channel simply gives a parametrization of its image set, and the previously considered casecan be recovered by parametrizing the set by itself, i.e., by taking the gcq channel X := Sand W := idS . We will call (III.15) the P -weighted ∆-radius of the channel W , and any stateachieving the infimum in its definition a P -weighted ∆-center for W . We define the ∆-capacityof the channel W as

χ∆(W ) := supP∈Pf (X )

χ∆(W,P ). (III.16)

In the relevant cases for information theory, the ∆-capacity coincides with the ∆-radius of theimage of the channel, i.e.,

χ∆(W ) = R∆(ranW ) = infσ∈S(H)

supx∈X

∆(W (x)‖σ);

see Section A1.We will mainly be interested in the above quantities when ∆ is a quantum Renyi divergence.

For some further properties of these quantities for general divergences, see Appendix A1.

B. Quantum Renyi divergences

In this section we specialize to various notions of quantum Renyi divergences. For every pairof positive definite operators , σ ∈ B(H)++ and every α ∈ (0,+∞) \ {1}, z ∈ (0,+∞) let

Qα,z(‖σ) := Tr(

α2z σ

1−αz

α2z

)z.

8

These quantities were first introduced in [40] and further studied in [7]. The cases

Qα(‖σ) := Qα,1(‖σ) = Tr ασ1−α, (III.17)

Q∗α(‖σ) := Qα,α(‖σ) = Tr

(

12 σ

1−αα

12

)α, (III.18)

and

Q♭α(‖σ) := Qα,+∞(‖σ) := lim

z→+∞Qα,z(‖σ) = Tr eα log +(1−α) log σ (III.19)

are of special significance. (The last identity in (III.19) is due to the Lie-Trotter formula.) Hereand henceforth (t) stands for one of the three possible values (t) = { }, (t) = ∗ or (t) = ♭, where

{ } denotes the empty string, i.e., Q(t)α with (t) = { } is simply Qα.

These quantities are extended to general, not necessarily invertible positive semi-definite op-erators , σ ∈ B(H)+ as

Qα,z(‖σ) := limεց0

Qα,z(+ εI‖σ + εI) (III.20)

= limεց0

Qα,z(‖σ + εI) = limεց0

Qα,z(‖(1− ε)σ + εI/d) = s(α) supε>0

Qα,z(‖σ + εI),

(III.21)

for every z ∈ (0,+∞), where d := dimH,

s(α) := sgn(α− 1) =

{−1, α < 1,

1, α > 1, Qα,z := s(α)Qα,z ,

and the identities are easy to verify. For z = +∞, the extension is defined by (III.20); see [38, 50]for details.Various further divergences can be defined from the above quantities. The quantum α-z Renyi

divergences [7] are defined as

Dα,z(‖σ) :=1

α− 1log

Qα,z(‖σ)

Tr (III.22)

for any α ∈ (0,+∞) \ {1} and z ∈ (0,+∞]. It is easy to see that

α > 1, 0 � σ0 =⇒ Qα,z = Dα,z = +∞

for any z. Moreover, if α 7→ z(α) is continuously differentiable in a neighbourhood of 1, on whichz(α) 6= 0, or z(α) = +∞ for all α, then, according to [47, Theorem 1] and [50, Lemma 3.5],

D1(‖σ) := limα→1

Dα,z(α)(‖σ) =1

Tr D(‖σ) =: D1,z(‖σ), z ∈ (0,+∞],

where D(‖σ) is Umegaki’s relative entropy [70], defined as

D(‖σ) := Tr (log − log σ)

for positive definite operators, and extended as above for non-zero positive semidefinite operators.Of the Renyi divergences corresponding to the special Qα quantities discussed above, Dα isusually called the Petz-type Renyi divergence, D∗

α the sandwiched Renyi divergence [51, 71], andD♭

α the log-Euclidean Renyi divergence. For more on the above definitions and a more detailedreference to their literature, see, e.g., [50]. We will also use the max-relative entropy [23, 51, 60]:

D∗∞(‖σ) := lim

α→+∞D∗

α(‖σ) = log inf{λ > 0 : ≤ λσ}.

To discuss some important properties of the above quantities, let us introduce the followingregions of the α-z plane:

K0 : 0 < α < 1, z < min{α, 1− α}; K1 : 0 < α < 1, α ≤ z ≤ 1− α;

K2 : 0 < α < 1, max{α, 1− α} ≤ z ≤ 1; K3 : 0 < α < 1, 1− α ≤ z ≤ α;

K4 : 0 < α < 1, 1 ≤ z; K5 : 1 < α, α/2 ≤ z ≤ 1;

K6 : 1 < α, max{α− 1, 1} ≤ z ≤ α; K7 : 1 < α ≤ z;

The (α, z) values for which Dα,z is monotone under CPTP maps have been completely char-acterized in [2, 10, 15, 27, 36, 74] (cf. also [7, Theorem 1]). This can be summarized as follows.

9

Lemma III.5 Dα,z is monotone under CPTP maps ⇐⇒ Qα,z is monotone under CPTP maps

⇐⇒ Qα,z is jointly convex ⇐⇒ (α, z) ∈ K2 ∪K4 ∪K5 ∪K6.

Corollary III.6 Dα,z is jointly convex if (α, z) ∈ K2 ∪K4.

Proof Immediate from Lemma III.5, as the joint convexity of Qα,z implies the joint convexity

of Dα,z = 1α−1 log s(α)Qα,z whenever α ∈ (0, 1). �

Recall that a function f : C → R∪{+∞} on a convex set C is quasi-convex if f((1−t)x+ty) ≤max{f(x), f(y)} for all x, y ∈ C and t ∈ [0, 1].

Lemma III.7 On top of the cases discussed in Lemma III.5 and Corollary III.6, Dα,z is convex

in its second argument if (α, z) ∈ K3 ∪K6 ∪K7, and Qα,z is convex in its second argument if(α, z) ∈ K3 ∪K7. Moreover, Dα,z is jointly quasi-convex if (α, z) ∈ K5.

Proof The assertion about the quasi-convexity of Dα,z is immediate from the joint convexity of

Qα,z when (α, z) ∈ K5.Note that it is enough to prove convexity in the second argument for positive definite operators,

due to (III.21).Assume that (α, z) ∈ K2 ∪K3, i.e., 0 < α < 1, 1− α ≤ z ≤ 1. Then 0 < 1−α

z ≤ 1, and hence

σ 7→ σ1−αz is concave. Since B(H)+ ∋ A 7→ TrAz is both monotone and concave (see Lemma

II.1), we get that σ 7→ Qα,z(‖σ) is concave, from which the convexity of both Qα,z and Dα,z intheir second argument follows for (α, z) ∈ K3 (and also for K2, although that is already coveredby joint convexity).Assume next that (α, z) ∈ K6 ∪K7, i.e., 1 < α, and max{1, α− 1} ≤ z. Then −1 ≤ 1−α

z < 0,

and hence f : t 7→ t1−αz is a non-negative operator monotone decreasing function on (0,+∞).

Applying the duality of the Schatten p-norms to p = z, we have

Dα,z(‖σ) = supτ∈S(H)

z

α− 1log Tr

α2z σ

1−αz

α2z τ1−

1z = sup

τ∈S(H)

z

α− 1logωτ (f(σ)) ,

where ωτ (.) := Tr α2z (.)

α2z τ1−

1z is a positive functional. By [3, Proposition 1.1], Dα,z(‖.) is the

supremum of convex functions on B(H)++, and hence is itself convex. This immediately impliesthat Qα,z is convex in its second argument when (α, z) ∈ K6 ∪ K7 (of which the case K6 alsofollows from joint convexity). �

Lemma III.8 For any fixed ∈ B(H)+, the maps

σ 7→ Qα,z(‖σ) and σ 7→ Dα,z(‖σ)

are lower semi-continuous on B(H)+ for any α ∈ (0,+∞)\{1} and z ∈ (0,+∞), and for z = +∞and α > 1.

Proof The cases α ∈ (0,+∞) \ {1} and z ∈ (0,+∞) are obvious from the last expression in(III.21), and the case z = +∞ was discussed in [50, Lemma 3.27]. �

It is known that Dα, D∗α and D♭

α are non-negative on pairs of states [50, 51, 58], but it seemsthat the non-negativity of general α-z Renyi divergences has not been analyzed in the literatureso far. We show in Appendix A4 that they are indeed non-negative for any pair of parameters(α, z).

C. The Renyi divergence center

Let W : X → B(H)+ be a gcq channel. Specializing to ∆ = Dα,z in (III.15) yields the P -weighted Renyi (α, z) radii of the channel for a finitely supported input probability distributionP ∈ Pf (X ),

χα,z(W,P ) := χDα,z(W,P ) = inf

σ∈S(H)

∑

x∈X

P (x)Dα,z(W (x)‖σ) = minσ∈S(H)

∑

x∈X

P (x)Dα,z(W (x)‖σ).

(III.23)

10

The existence of the minimum is guaranteed by the lower semi-continuity stated in Lemma III.8.We will call any state σ achieving the minimum in (III.23) a P -weighted Dα,z center for W . Indirect connection with the notations introduced in (III.17)–(III.19), we will use the notations

χα(W,P ) := χα,1(W,P ), χ∗α(W,P ) := χα,α(W,P ), χ♭

α(W,P ) := χα,+∞(W,P ),

and for α = 1,

χ(W,P ) := χ1(W,P ) := χ1,z(W,P ), z ∈ (0,+∞),

which is just a generalization of the Holevo quantity for an ensemble {W (x), P (x)}x∈suppP ,wherethe W (x) need not be normalized. Moreover, we will use

χ∗∞(W,P ) := inf

σ∈S(H)

∑

x∈X

P (x)D∗∞(W (x)‖σ),

and call it the P -weighted max-Renyi radius. It is known (see, e.g., [50, Section IV]) that

limαց1

χ∗α(W,P ) = χ∗

1(W,P ) = χ(W,P ), limαր+∞

χ∗α(W,P ) = χ∗

∞(W,P ). (III.24)

It is sometimes convenient that it is enough to consider the infimum above over invertiblestates, i.e., we have

χα,z(W,P ) = infσ∈S(H)++

∑

x∈X

P (x)Dα,z(W (x)‖σ), (III.25)

which is obvious from the second expression in (III.21). Moreover, any minimizer of (III.23) hasthe same support as the joint support of the channel states {Wx}x∈suppP , with projection

∨

x∈suppP

W (x)0 =W (P )0,

at least for a certain range of (α, z) values, as we show below.

Lemma III.9 Let σ be a P -weighted Dα,z center for W . If (α, z) is such that Dα,z is quasi-convex in its second argument then σ0 ≤W (P )0.

Proof Define F(X) := W (P )0XW (P )0 + (I − W (P )0)X(I − W (P )0), X ∈ B(H), and letσ := W (P )0σW (P )0/TrW (P )0σ. We will show that Dα,z(W (x)‖σ) ≤ Dα,z(W (x)‖σ) for allx ∈ suppP , which will yield the assertion. Note that we can assume without loss of generalitythat W (P )0σ 6= 0, since otherwise Dα,z(W (x)‖σ) = +∞ for all x ∈ suppP , and hence σ clearlycannot be a minimizer for (III.23).According to the decomposition H = ranW (P )0 ⊕ ran(I −W (P )0), define the block-diagonal

unitary U :=

[I 00 −I

], so that F(.) = 1

2 ((.) + U(.)U∗). For every x ∈ X ,

Dα,z(W (x)‖F(σ)) ≤ max {Dα,z(W (x)‖σ), Dα,z(W (x)‖UσU∗)}

= max {Dα,z(W (x)‖σ), Dα,z(UW (x)U∗‖UσU∗)} = Dα,z(W (x)‖σ),

where the first inequality is due to quasi-convexity, and the first equality is due to the fact thatUW (x)U∗ =W (x). On the other hand,

Dα,z(W (x)‖F(σ)) = Dα,z(W (x)‖(TrW (P )0σ)σ)

= Dα,z (W (x)‖σ)− logTrW (P )0σ ≥ Dα,z (W (x)‖σ) ,

where the inequality is strict unless σ0 ≤W (P )0. �

For fixed W and P , we define

F (σ) :=∑

x∈X

P (x)Dα,z(W (x)‖σ), σ ∈ B(H)+.

In the following, we may naturally interpret W (x) as an operator acting on ranW (x) or onranW (P ).

11

Lemma III.10 F is Frechet-differentiable at every σ ∈ B(H)++, with Frechet-derivative DF (σ)given by

DF (σ) : Y 7→z

α− 1

∑

x∈X

P (x)1

Qα,z(W (x)‖σ)

· Tr∑

a,b

h[1]α,z(a, b)PσaW (x)

α2z

(W (x)

α2z σ

1−αz W (x)

α2z

)z−1

W (x)α2z P σ

b Y, (III.26)

where h[1]α,z is the first divided difference function of hα,z(t) := t

1−αz .

Proof We have F =∑

x∈X P (x)(gx ◦ ιx ◦Hα,z), where Hα,z : B(H)+ → B(H), Hα,z(σ) := σ1−αz

is Frechet differentiable at every σ ∈ B(H)++ with DHα,z(σ) : Y 7→∑

a,b h[1]α,z(a, b)P σ

a Y Pσb ,

according to Lemma II.2. For a fixed x, ιx : B(H) → B(ran(W (x)) is defined as A 7→W (x)

α2zAW (x)

α2z , and, as a linear map, it is Frechet differentiable at every A ∈ B(H), with

its derivative being equal to itself. Finally, gx : B(ranW (x)) → R is defined as gx(T ) := TrT z,and it is Frechet differentiable at every T ∈ B(ranW (x))++, with Frechet derivative Dgx(T ) :Y 7→ zTrT z−1Y , according to (II.8). If σ ∈ B(ranW (P ))++ then Hα,z(σ) ∈ B(ranW (P ))++,and ιx(Hα,z(σ)) ∈ B(ranW (x))++. Hence, we can apply the chain rule for derivatives, andobtain (III.26). �

Lemma III.11 Let σ be a P -weighted Dα,z center for W . If α ≥ 1 or α ∈ (0, 1) and 1 − α <z < +∞ then W (P )0 ≤ σ0.

Proof When α > 1 and W (P )0 � σ0, there exists an x ∈ suppP with W 0x � σ0 so that

Dα,z(W (x)‖σ) = +∞. Hence, σ cannot be a minimizer for (III.23).Assume for the rest that α ∈ (0, 1), and σ is such that W (P )0 � σ0; this is equivalent to

the existence of an x0 ∈ suppP such that Wx0Pσ0 6= 0. Let us define the state ω := cP σ

0 , withc := 1/TrP σ

0 . For every t ∈ [0, 1], let

σt := (1− t)σ + tω =∑

λ∈spec(σ)\{0}

(1 − t)λP σλ + tcP σ

0 ,

so that σt ∈ B(H)++ for every t ∈ (0, 1]. Note that if t < t0 := λmin(σ)/(c + λmin(σ)),where λmin(σ) is the smallest non-zero eigenvalue of σ, then P σt

ct = P σ0 , and P σt

(1−t)λ = P σλ ,

λ ∈ spec(σ) \ {0}.By Lemma III.10, the derivative of f(t) := F (σt) at any t ∈ (0, 1) is given by

f ′(t) = DF (σt)(ω − σ)

=z

α− 1

∑

x∈X

P (x)1

Qα,z(W (x)‖σt)

[h′α,z (ct) cTrAx,tP

σ0

−∑

λ∈spec(σ)\{0}

h′α,z ((1− t)λ) λTrAx,tPσλ

]

=∑

x∈X

P (x)1

Qα,z(W (x)‖σt)

[(1− t)

1−αz

−1∑

λ∈spec(σ)\{0}

λ1−αz TrAx,tP

σλ

− t1−αz

−1c1−αz TrAx,tP

σ0

],

where Ax,t :=W (x)α2z

(W (x)

α2z σ

1−αz

t W (x)α2z

)z−1

W (x)α2z .

Our aim will be to show that limtց0 f′(t) = −∞. This implies that f(t) < f(0) for small

enough t > 0, contradicting the assumption that F has a global minimum at σ. Note thatlimtց0Qα,z(W (x)‖σt) = Qα,z(W (x)‖σ), which is strictly positive for every x ∈ suppP . Indeed,the contrary would mean that Dα,z(W (x)‖σ) = +∞, contradicting again the assumption that F

12

has a global minimum at σ. Hence, the proof will be complete if we show that t1−αz

−1 TrAx0,tPσ0

diverges to +∞ while TrAx,tPσλ is bounded as tց 0 for any x ∈ suppP and λ ∈ spec(σ) \ {0}.

Note that for any t ∈ (0, t0) and z ≥ 1,

tcI ≤ σt ≤ I =⇒ (tc)1−αz I ≤ σ

1−αz

t ≤ I

=⇒ (tc)1−αz W (x)

αz ≤W (x)

α2z σ

1−αz

t W (x)α2z ≤W (x)

αz

=⇒ t1−αz c1W (x)0 ≤W (x)

α2z σ

1−αz

t W (x)α2z ≤ c3W (x)0

=⇒ t1−αz

(z−1)c2W (x)0 ≤[W (x)

α2z σ

1−αz

t W (x)α2z

]z−1

≤ c4W (x)0 (III.27)

=⇒ t1−αz

(z−1)c2W (x)αz ≤ Ax,t ≤ c4W (x)

αz , (III.28)

where c1 := c1−αz λmin(W (x))

αz > 0, c2 := cz−1

1 > 0, c3 := ‖W (x)‖αz > 0, c4 := cz−1

3 > 0, and theinequalities in (III.27)–(III.28) hold in the opposite direction when z ∈ (0, 1). This immediatelyimplies that

t1−αz

−1 TrAx0,tPσ0 ≥ t

1−αz

−1+ 1−αz

(z−1)c2 TrW (x0)αz P σ

0 −−−→tց0

+∞, z ≥ 1,

t1−αz

−1 TrAx0,tPσ0 ≥ t

1−αz

−1c4 TrW (x0)αz P σ

0 −−−→tց0

+∞, z ∈ (0, 1),

since TrW (x0)αz P σ

0 > 0 by assumption, 1−αz − 1 + 1−α

z (z − 1) = −α < 0, and 1−αz − 1 < 0 iff

1− α < z when z ∈ (0, 1).Next, observe that

(1− t)P σλ ≤ σt =⇒ (1− t)

1−αz P σ

λ ≤ σ1−αz

t

where the inequality follows since, by assumption, 0 < 1−αz < 1, and x 7→ xγ is operator

monotone on (0,+∞) for γ ∈ (0, 1). Hence,

0 ≤ TrAx,tPσλ ≤ (1− t)

α−1z TrAx,tσ

1−αz

t = (1− t)α−1z Qα,z(W (x)‖σt) −−−→

tց0Qα,z(W (x)‖σ),

which is finite. This finishes the proof. �

Remark III.12 Note that the region of (α, z) values given in Lemma III.11 covers z = 1 for allα ∈ (0,+∞], i.e., all the Petz-type Renyi divergences, and {(α, α) : α ∈ (1/2,+∞]}, i.e., thesandwiched Renyi divergences for every parameter α for which they are monotone under CPTPmaps, except for α = 1/2. It is an open question whether the condition z > 1 − α in LemmaIII.11 can be improved, or maybe completely removed.

Remark III.13 Note that the case α > 1 in Lemma III.11 is trivial, and this is the case that weactually need for the strong converse exponent of constant composition classical-quantum channelcoding in Section IV; more precisely, we need the case z = α > 1.

Let us define ΓD to be the set of (α, z) values such that for any gcq channel W and any inputprobability distribution P , any P -weighted Dα,z center σ for W satisfies σ0 = W (P )0. ThenCorollary III.6 and Lemmas III.7, III.9 and III.11 yield

ΓD ⊇ {(α, z) : α ∈ (0, 1), 1− α < z +∞} ∪ {(α, z) : α > 1, z ≥ max{α/2, α− 1}} .

The following characterization of the weighted Dα,z centers will be crucial in proving theadditivity of the weighted sandwiched Renyi divergence radius of a gcq channel.

Theorem III.14 Assume that (α, z) ∈ ΓD are such that Dα,z is convex in its second variable.Then σ is a P -weighted Dα,z center for W if and only if it is a fixed point of the map

ΦW,P,Dα,z(σ) :=

∑

x∈X

P (x)1

Qα,z(W (x)‖σ)

(σ

1−α2z W (x)

αz σ

1−α2z

)z(III.29)

defined on SW,P (H)++ := {σ ∈ S(H)+ : σ0 =W (P )0}.

13

Proof By the assumption that (α, z) ∈ ΓD, we may restrict the Hilbert space to be ranW (P )0,and assume that σ is invertible. Let F (A) :=

∑x∈X P (x)Dα,z(W (x)‖A), A ∈ B(H)++. Due to

the assumption that Dα,z is convex in its second variable, σ is a minimizer of F if and only ifDF (σ)(Y ) = 0 for all self-adjoint traceless Y . By Lemma III.10, this condition is equivalent to

λI =z

α− 1

∑

x∈X

P (x)1

Qα,z(W (x)‖σ)

∑

a,b

h[1]α,z(a, b)PσaW (x)

α2z

(W (x)

α2z σ

1−αz W (x)

α2z

)α−1

W (x)α2z P σ

b

for some λ ∈ R. Multiplying both sides by σ1/2 from the left and the right, and taking the trace,we get λ = −1. Hence, the above is equivalent to (by multiplying both sides by σ1/2 from theleft and the right)

σ =z

1− α

∑

x∈X

P (x)1

Qα,z(W (x)‖σ)

∑

a,b

a1/2b1/2h[1]α,z(a, b)Pσa W (x)

α2z

(W (x)

α2z σ

1−αz W (x)

α2z

)z−1

W (x)α2zP σ

b

=∑

a,b

P σa

(z

1− αa1/2b1/2h[1]α,z(a, b)ΦW,P,α,z(σ)

)P σb , (III.30)

where

ΦW,P,Dα,z(σ) :=

∑

x∈X

P (x)1

Q∗α,z(W (x)‖σ)

W (x)α2z

(W (x)

α2z σ

1−αz W (x)

α2z

)z−1

W (x)α2z .

Writing the operators in (III.30) in block form according to the spectral decomposition of σ, wesee that (III.30) is equivalent to

∀a, b : δa,baα−1z

+1P σa = P σ

a ΦW,P,α,z(σ)Pσb ⇐⇒ σ

α−1z

+1 = ΦW,P,Dα,z(σ)

⇐⇒ σ = σ1−α2z ΦW,P,Dα,z

(σ)σ1−α2z .

This can be rewritten as

σ =∑

x∈X

P (x)1

Qα,z(W (x)‖σ)σ

1−α2z W (x)

α2z

(W (x)

α2z σ

1−αz W (x)

α2z

)z−1

W (x)α2z σ

1−α2z

=∑

x∈X

P (x)1

Qα,z(W (x)‖σ)

(σ

1−α2z W (x)

αz σ

1−α2z

)z,

where the last identity follows from Xf(X∗X)X∗ = (idR f)(XX∗). �

Remark III.15 The special case z = 1 yields the characterization of the P -weighted Petz-typeRenyi divergence center as the fixed point of the map

ΦW,P,Dα(σ) :=

∑

x∈X

P (x)1

Qα(W (x)‖σ)σ

1−α2 W (x)ασ

1−α2 , σ ∈ SW,P (H)++, (III.31)

for any α ∈ (0,+∞)\{1}. Note that in the classical case Dα,z is independent of z, i.e., Dα,z = Dα

for all z > 0, and the above characterization of the minimizer has been derived recently byNakiboglu in [54, Lemma 13], using very different methods. Following Nakiboglu’s approach,Cheng, Gao and Hsieh has derived the above characterization for the Petz-type Renyi divergencecenter in [16, Proposition 4]. The advantage of Nakiboglu’s approach is that it also providesquantitative bounds of the deviation of

∑x P (x)Dα(W (x)‖σ) from χα,z(W,P ) for an arbitrary

state σ; however, it is not clear whether this approach can be extended to the case z 6= 1, inparticular, for z = α, which is the relevant case for the strong converse exponent of constantcomposition classical-quantum channel coding, as we will see in Section IV.

Remark III.16 A similar approach as in the above proof of Theorem III.14 was used by Hayashiand Tomamichel in [34, Appendix C] to characterize the optimal state for the sandwiched Renyimutual information as the fixed point of a non-linear map on the state space. We comment onthis in more detail in Section A 2. Hayashi and Tomamichel’s approach was used later in [18] togive another derivation of (III.31) for α ∈ (0, 1).

14

Example III.17 We say that a cq channel W is noiseless on suppP if W (x)W (y) = 0 forall x, y ∈ suppP , x 6= y, i.e., the output states corresponding to inputs in suppP are perfectlydistinguishable. A straightforward computation shows that if W is noiseless on suppP thenσ :=W (P ) =

∑x P (x)W (x) satisfies the fixed point equation (III.29) for any pair (α, z). Hence,

if (α, z) satisfies the conditions of Proposition III.14 then W (P ) is a minimizer for (III.23), andwe have

χα,z(W,P ) =∑

x∈X

P (x)Dα,z(W (x)‖W (P )) = H(P ) := −∑

x∈X

P (x) logP (x).

Thus, the Renyi (α, z) radius of W is equal to the Shannon entropy of the input distribution,independently of the value of (α, z).

Corollary III.18 If (α, z) satisfies the conditions of Proposition III.14, and Dα,z is monotoneunder CPTP maps then

χα,z(W,P ) ≤ H(P ) (III.32)

for any cq channel W and input distribution P .

Proof We may assume without loss of generality that X = suppP . Let W (x) := |ex〉〈ex| forsome orthonormal basis (ex)x∈suppP in a Hilbert spaceK, and let Φ(.) :=

∑x∈suppP W (x) 〈ex| (.) |ex〉,

which is a CPTP map from B(K) to B(H). We have W = Φ ◦ W , and the assertion follows fromExample III.17. �

Remark III.19 Our approach to prove (III.32) follows that of Csiszar [19]. A (much) simplerapproach to prove the inequality (III.32) was given by Nakiboglu [54, Lemma 13] (see also [16,Proposition 4] for an adaptation to various quantum Renyi divergences). Obviously,

χα,z(W,P ) ≤∑

x∈X

P (x)Dα,z(W (x)‖W (P )). (III.33)

Assume now that Dα,z satisfies the monotonicity property B(H)+ ∋ σ1 ≤ σ2 =⇒ Dα,z(‖σ1) ≥Dα,z(‖σ2) for any ∈ B(H)+. It is easy to see that this holds for every (α, z) with z ≥|α− 1|. In this case, we can lower bound W (P ) by P (x)W (x), and hence Dα,z(W (x)‖W (P )) ≤Dα,z(W (x)‖P (x)W (x)) = − logP (x), whence the RHS of (III.33) can be upper bounded byH(P ).

D. Additivity of the weighted Renyi radius

LetW (i) : X (i) → B(H(i))+, i = 1, 2, be gcq channels, and P (i) ∈ Pf (X (i)) be input probabilitydistributions. For any α ∈ (0,+∞) and z ∈ (0,+∞],

χα,z

(W (1) ⊗W (2), P (1) ⊗ P (2)

)

= infσ12∈S(H1⊗H2)

∑

x1∈X (1), x2∈X (2)

P (1)(x1)P(2)(x2)Dα,z

(W (1)(x1)⊗W (1)(x1)‖σ12

)

≤ infσi∈S(Hi)

∑

x1∈X (1), x2∈X (2)

P (1)(x1)P(2)(x2)Dα,z

(W (1)(x1)⊗W (2)(x2)‖σ1 ⊗ σ2

)

= infσi∈S(Hi)

∑

x1∈X (1), x2∈X (2)

P (1)(x1)P(2)(x2)

[Dα,z

(W (1)(x1)‖σ1

)+Dα,z

(W (2)(x2)‖σ2

)]

= χα,z

(W (1), P (1)

)+ χα,z

(W (2), P (2)

),

where the first and the last equalities follow by definition, the second equality by the additivityof the α-z divergences, and the inequality follows by restricting the infimum to product states.

15

Hence, χα,z is subadditive. In particular, for fixedW : X → S(H) and P ∈ Pf (X ), the sequencem 7→ χα,z(W

⊗m, P⊗m) is subadditive, and hence

limm→+∞

1

mχα,z(W

⊗m, P⊗m) = infm∈N

1

mχα,z(W

⊗m, P⊗m) ≤ χα,z(W,P ).

In fact,

1

mχα,z(W

⊗m, P⊗m) ≤ χα,z(W,P ) (III.34)

for all m ∈ N.As it turns out, we also have the stronger property of additivity, at least for (α, z) pairs for

which the optimal σ can be characterized by the fixed point equation (III.29).

Theorem III.20 (Additivity of the weighted Renyi radius) Let W (1) : X (1) → S(H(1)) andW (2) : X (2) → S(H(2)) be gcq channels, and P (i) ∈ Pf (X (i)), i = 1, 2, be input distributions.Assume, moreover, that α and z satisfy the conditions of Theorem III.14. Then

χα,z

(W (1) ⊗W (2), P (1) ⊗ P (2)

)= χα,z

(W (1), P (1)

)+ χα,z

(W (2), P (2)

). (III.35)

Proof Let σi be a minimizer of (III.23) for (W (i), P (i)). By Theorem III.14, this means thatΦW (i),P (i),Dα,z

(σi) = σi. It is easy to see that

ΦW (1)⊗W (2),P (1)⊗P (2),Dα,z(σ1 ⊗ σ2) = ΦW (1),P (1),Dα,z

(σ1)⊗ ΦW (2),P (2),Dα,z(σ2) = σ1 ⊗ σ2.

Hence, again by Proposition III.14, σ1⊗σ2 is a minimizer of (III.23) for (W (1)⊗W (2), P (1)⊗P (2)).This proves the assertion. �

Corollary III.21 For any gcq channel W : X → B(H)+, any P ∈ Pf (X ), and any pair (α, z)satisfying the conditions in Theorem III.14, we have

χα,z(W⊗m, P⊗m) = mχα,z(W,P ), m ∈ N.

We will need the following special case for the application to classical-quantum channel codingin the next section:

Corollary III.22 For any gcq channel W : X → B(H)+, any P ∈ Pf (X ), and any α ∈(1/2,+∞],

χ∗α(W

⊗m, P⊗m) = mχ∗α(W,P ), m ∈ N.

Remark III.23 As far as we are aware, the idea of proving the additivity of an informationquantity by characterizing some optimizer state as the fixed point of a non-linear operator on thestate space appeared first in [34]. We comment on this in more detail in Appendix A 2.

IV. STRONG CONVERSE EXPONENT WITH CONSTANT COMPOSITION

A. The main result

Let W : X → S(H) be a classical-quantum channel. A code Cn for n uses of the channel is apair Cn = (En,Dn), where En : [Mn] → Xn, Dn : [Mn] → B(H⊗n)+, where |Cn| := Mn ∈ N is

the size of the code, and Dn is a POVM, i.e.,∑Mn

i=1 Dn(i) = I. The average success probabilityof a code Cn is

Ps(W⊗n, Cn) :=

1

|Cn|

|Cn|∑

m=1

TrW⊗n(En(m))Dn(m).

16

A sequence of codes Cn = (En,Dn), n ∈ N, is called a sequence of constant composition codeswith asymptotic composition P ∈ Pf (X ) if there exists a sequence of types Pn ∈ Pn(X ), n ∈ N,such that limn→+∞ ‖Pn − P‖1 = 0, and En(k) ∈ Xn

Pnfor all k ∈ {1, . . . , |Cn|}, n ∈ N. (See

Section II for the notation and basic facts concerning types.) For any rate R ≥ 0, the strongconverse exponents of W with composition constraint P are defined as

sc(W,R, P ) := inf

{lim infn→+∞

−1

nlogPs(W

⊗n, Cn) : lim infn→+∞

1

nlog |Cn| ≥ R

}, (IV.36)

sc(W,R, P ) := inf

{lim supn→+∞

−1

nlogPs(W


1

nlog |Cn| ≥ R

}, (IV.37)

where the infima are taken over code sequences of constant composition P . We may define thefollowing variants:

sc(W,R, P )∗ := inf{lim infn→+∞

−1

nlogPs(W


1

nlog |Cn| ≥ R,

limn→+∞

max1≤k≤|Cn|

∥∥PEn(k) − P∥∥1= 0},

(IV.38)

sc(W,R, P )∗ := inf{lim supn→+∞

−1

nlogPs(W


1

nlog |Cn| ≥ R

limn→+∞

max1≤k≤|Cn|

∥∥PEn(k) − P∥∥1= 0}.

(IV.39)

Obviously, we have

sc(W,R, P )∗ ≤ sc(W,R, P )

≥ ≥

sc(W,R, P )∗ ≤ sc(W,R, P ).

Our main result is the following:

Theorem IV.1 For any classical-quantum channel W , and finitely supported probability distri-bution P on the input of W , and any rate R,

sc(W,R, P )∗ = sc(W,R, P ) = supα>1

α− 1

α[R− χ∗

α(W,P )] . (IV.40)

We will prove the equality in (IV.40) as two separate inequalities in Lemma IV.3 and Propo-sition IV.6.

Remark IV.2 According to (III.24), we have

limαց1

α− 1

α[R− χ∗

α(W,P )] = 0 =α− 1

α[R− χ∗

α(W,P )]∣∣∣α=1

,

limαր+∞

α− 1

α[R− χ∗

α(W,P )] = R− χ∗∞(W,P ) =

α− 1

α[R− χ∗

α(W,P )]∣∣∣α=+∞

,

where we use the natural convention α−1α

∣∣α=+∞

:= 1. With these, we may rewrite (IV.40) as

sc(W,R, P )∗ = sc(W,R, P ) = supα∈[1,+∞]

α− 1

α[R− χ∗

α(W,P )] .

While this rewriting is trivial, it will be useful when applying minimax theorems in Section IVB.

The following lower bound follows by a standard argument, due to Nagaoka [52]. For readers’convenience, we write out the details in Appendix B. Essentially the same argument, followingNagaoka’s method, was used already in [17] to show the slightly weaker bound with sc(W,R, P )in place of sc(W,R, P )∗.

17

Lemma IV.3 For any R > 0,

supα>1

α− 1

α[R− χ∗

α(W,P )] ≤ sc(W,R, P )∗.

Remark IV.4 Note that in the strong converse problem, one’s aim is to make the decay of thesuccess probability as slow as possible. Lemma IV.3 shows that one cannot find a better (i.e.,smaller) exponent than the RHS of (IV.40), and hence it is called the optimality part of thestrong converse theorem. Our concern in the rest will be the achievability part, i.e., that theexponent in (IV.40) can in fact be attained.

Thus, our aim in the rest is to show that the second term is upper bounded by the rightmostterm in (IV.40). We will follow the approach of [50], which in turn was inspired by [24]. A keytechnical ingredient in this approach is the so-called dummy channel technique, first introducedby Haroutunian [28]. We start with the following:

Proposition IV.5 For any R > 0,

sc(W,R, P ) ≤ supα>1

α− 1

α

[R− χ♭

α(W,P )]. (IV.41)

Proof We will show that

sc(W,R, P ) ≤ min{F1(W,R, P ), F2(W,R, P )}, (IV.42)

where

F1(W,R, P ) := infV :χ(V,P )>R

D(V ‖W |P ),

F2(W,R, P ) := infV :χ(V,P )≤R

[D(V ‖W |P ) +R− χ(V, P )] .

Here, the infima are over channels V : X → S(H) satisfying the indicated properties, and

D(V ‖W |P ) :=∑

x∈X

P (x)D(V (x)‖W (x)).

It was shown in [50, Theorem 5.12] that the RHS of (IV.42) is the same as the RHS of (IV.41).We first show that sc(W,R, P ) ≤ F1(W,R, P ). To this end, let r > F1(W,R, P ); then, by

definition, there exists a channel V : X → S(H) such that

D(V ‖W |P ) < r and χ(V, P ) > R.

Due to χ(V, P ) > R, Corollary C.3 yields the existence of a sequence of constant com-position codes Cn with composition Pn, n ∈ N, such that suppPn ⊆ suppP for all n,limn→+∞ ‖Pn − P‖1 = 0, the rate is lower bounded as 1

n log |Cn| ≥ R, n ∈ N, and limn→+∞ Ps(V⊗n, Cn) =

1. Note that for any message k,

Tr(V ⊗n(En(k))− enrW⊗n(En(k))

)+≥ Tr

(V ⊗n(En(k))− enrW⊗n(En(k))

)Dn(k),

and hence

TrW⊗n(En(k))Dn(k) ≥ e−nr[TrV ⊗n(En(k))Dn(k)− Tr

(V ⊗n(En(k))− enrW⊗n(En(k))

)+

],

This in turn yields, by averaging over k, that

Ps(W⊗n, Cn) ≥ e−nr

Ps(V

⊗n, Cn)−1

|Cn|

|Cn|∑

k=1

Tr(V ⊗n(En(k))− enrW⊗n(En(k))

)+

= e−nr

[Ps(V

⊗n, Cn)− Tr(V ⊗n(x(n))− enrW⊗n(x(n))

)+

],

18

where x(n) is any sequence in Xn with type Pn. Since D(V ‖W |P ) < r, Corollary D.2 yields thatlimn→+∞ Tr

(V ⊗n(x(n))− enrW⊗n(x(n))

)+= 0, and so finally

lim infn→+∞

1

nlogPs(W

⊗n, Cn) ≥ −r, whence sc(W,R, P ) ≤ r.

Since this holds for any r > F1(W,R, P ), we get sc(W,R, P ) ≤ F1(W,R, P ).From this, one can prove that also sc(W,R, P ) ≤ F2(W,R, P ), the same way as it was done

in [50, Lemma 5.11] (which in turn followed the proof in [24, Lemma 2]); one only has to makesure that the extension of the code can be done in a way that it remains constant compositionwith composition P , but that is easy to verify. �

From the above result, we can obtain the desired upper bound.

Proposition IV.6 For any R > 0,


α− 1

α[R− χ∗

α(W,P )] . (IV.43)

Proof We employ the asymptotic pinching technique from [50]. Let Wm : Xm → S(H⊗m) bedefined as

Wm(x) := FmW⊗m(x),

where Fm is the pinching by the universal symmetric state σu,m, introduced in Section II.Employing Proposition IV.5 withW 7→Wm, R 7→ Rm and P 7→ P⊗m, we get that for any R > 0,

there exists a sequence of codes C(m)k = (E

(m)k ,D

(m)k ) with constant composition P

(m)k ∈ Pk(Xn),

k ∈ N, such that 1k log |C

(m)k | ≥ mR for all k, limk→+∞

∥∥∥P (m)k − P⊗m

∥∥∥1= 0, and

lim supk→+∞

−1

klogPs(W

⊗km , C

(m)k ) ≤ sup

α>1

α− 1

α

[mR− χ♭

α(Wm, P⊗m)

].

For every k ∈ N, define Ckm := (E(m)k ,Fm D

(m)k ), which can be considered a code for W⊗km,

with the natural identifications (Xm)k ≡ X km and (H⊗m)⊗k ≡ H⊗km. For a general n ∈ N,

choose k ∈ N such that km ≤ n < (k + 1)m, and for every i = 1, . . . , |C(m)k |, define En(i) to be

Ekm(i) concatenated with n− km copies of some fixed x0 ∈ suppP , independent of i and n, and

let Dn(i) := Dkm(i)⊗ I⊗(n−km)H . Then it is easy to see that

lim infn→+∞

1

nlog |Cn| ≥ R,

and

lim supn→+∞

−1

nlogPs(W

⊗n, Cn) ≤ supα>1

α− 1

α

[R −

1

mχ♭α(Wm, P

⊗m)

]. (IV.44)

We need to show that the above sequence of codes is of constant composition P . Let x =

(x1, . . . , xk) ∈ (Xm)k ≡ X km be a codeword for C(m)k , and let P

(m)x and Px denote the corre-

sponding types when x is considered as an element of (Xm)k and of X km, respectively. For anya ∈ X ,

Px(a) =1

km

∑

x∈Xm

#{i : xi = x} ·#{j : xj = a} =∑

x∈Xm

P (m)x

(x)Px(a) =∑

x∈Xm

P(m)k (x)Px(a)

only depends on x through its type P(m)x = P

(m)k , which is independent of x. Thus, the type

of Ekm(i) is independent of i, i.e., Ckm is a constant composition code for every k ∈ N. For ageneral n ∈ N with km ≤ n < (k + 1)m, we have

PEn(i)(a) =km

nPEkm(i)(a) + δa,x0

n− km

n, i ∈ {1, . . . , |Cn| = |Ckm|},

19

and hence Cn is also of constant composition.Next, we show that limn→+∞ ‖Pn − P‖1 = 0, where Pn is the type of Cn. For km ≤ n <

(k + 1)m, we have

∑

a∈X

|Pn(a)− P (a)| ≤∑

a∈X

|Pn(a)− Pkm(a)|+∑

a∈X

|Pkm(a)− P (a)|,

and

∑

a∈X

|Pn(a)− Pkm(a)| =∑

a∈X

(1−

km

n

)Pkm(a) +

(1−

km

n

)= 2

(1−

km

n

)→ 0

as k → +∞. For the second term, we get

∑

a∈X

|Pkm(a)− P (a)| =∑

a∈X

∣∣∣∣∣∣

∑

x∈Xm

P(m)k (x)Px(a)−

∑

x∈Xm

P⊗m(x)Px(a)

∣∣∣∣∣∣

≤∑

x∈Xm

∣∣∣P (m)k (x)− P⊗m(x)

∣∣∣∑

a∈X

Px(a) =∥∥∥P (m)

k − P⊗m∥∥∥1,

where in the first identity we used (II.11), and the last expression goes to 0 as k → +∞ byassumption.Since we have established that the codes used in (IV.44) are of constant composition P , we

get that for any m ∈ N,


α− 1

α

[R−

1

mχ♭α(Wm, P

⊗m)

]. (IV.45)

According to [50, Lemma 4.10], χ♭α(Wm, P

⊗m) ≥ χ∗α(W

⊗m, P⊗m)− 3 log vm,d, and hence


α− 1

α

[R−

1

mχ∗α(W

⊗m, P⊗m)

]+ 3

log vm,d

m(IV.46)

≤ supα>1

α− 1

α

[R− inf

m∈N

1

mχ∗α(W

⊗m, P⊗m)

]+ 3

log vm,d

m(IV.47)

for every m ∈ N, from which


α− 1

α

[R− inf

m∈N

1

mχ∗α(W

⊗m, P⊗m)

]. (IV.48)

Finally, Corollary III.22 yields the desired bound (IV.43). �

Remark IV.7 The proofs of Propositions IV.5 and IV.6 follow closely the proof of the com-position constraint-free version in [50], the main ingredients of which are a suitable adaptationof the result by Dueck and Korner [24] to the classical-quantum setting, the expression of theDueck-Korner exponent in terms of the log-Euclidean Renyi capacities [50, Theorem 5.12], andthe transition from the log-Euclidean Renyi capacities to the sandwiched Renyi capacities us-ing the asymptotic pinching technique [50, Theorem 5.14]. Making the proof work for constantcomposition codes requires some new technical ingredients, like the constant composition ver-sion of the random coding exponent (see Appendix C for a discussion) or the evaluation of thenon-i.i.d. information spectrum quantity in Appendix D. Probably the most important differencebetween the two proofs, though, is that different additivity results are used to arrive at single-letterexpressions at the end of the proofs. Indeed, at the end of the proof of [50, Theorem 5.14] weutilized that the weighted sandwiched α-radius and the sandwiched α-mutual information yieldthe same α-capacity after optimization over the input distributions (see Corollary A.11), andthe known additivity of the sandwiched α-mutual information, Iα,α(W

(1) ⊗W (2), P (1) ⊗ P (2)) =

Iα,α(W(1), P (1)) + Iα,α(W

(2), P (2)), α > 1, [10, Theorem 11]. For a fixed input distribution,however, this transition from the weighted divergence radius to the mutual information is notpossible, and therefore we needed a new additivity result for the former quantities, which weestablished in Section IIID.

20

Remark IV.8 We have defined the strong converse exponent, and stated the main result, The-orem IV.1, using the average success probability. By the standard argument of throwing awaythe worse half of any code, it can be seen immediately that Theorem IV.1 holds unchanged ifthe strong converse exponent is defined using the worst case (minimal over all messages) successprobability.

Remark IV.9 Similarly to the strong converse exponents, one can define the direct exponentsas

d(W,R, P ) := sup

{lim infn→+∞

−1

nlog(1− Ps(W

⊗n, Cn)) : lim infn→+∞

1

nlog |Cn| ≥ R

}, (IV.49)

d(W,R, P ) := sup

{lim supn→+∞

−1

nlog(1− Ps(W

⊗n, Cn)) : lim infn→+∞

1

nlog |Cn| ≥ R

}, (IV.50)

where the suprema are taken over code sequences of constant composition P . The following,so-called sphere packing bound has been shown by Dalai and Winter in [22]:

d(W,R, P ) ≤ sup0<α<1

α− 1

α[R− χα(W,P )] . (IV.51)

Note that the right-hand sides of (IV.40) and (IV.51) are very similar to each other, except thatthe range of optimization is α > 1 in the former and α ∈ (0, 1) in the latter, and the weightedRenyi radii corresponding to the sandwiched Renyi divergences appear in the former, and to thePetz-type Renyi divergences in the latter. Also, while (IV.40) holds for any R > 0 (and is non-trivial for R > χ1(W,P )), it is known that (IV.51) holds as an equality only for high enoughrates (and is non-trivial for R < χ1(W,P )) for classical channels, and it is a long-standing openproblem if the same equality is true for classical-quantum channels. More refined bounds on thedirect exponent with improved sub-exponential corrections were obtained recently by Cheng, Hsiehand Tomamichel in [18].

B. Strong converse exponent for classical-quantum channel coding with cost constraint

Assume now that using an input x has some cost γ(x) ∈ R. The average cost of an inputsequence x ∈ Xn per channel use is then

γ(x) :=1

n

n∑

k=1

γ(xk),

and the (worst-case) cost of a code Cn = (En,Dn) for n uses of the channel is

γ(Cn) := max1≤m≤|Cn|

γ(En(m)).

In the problem of classical-quantum channel coding with cost constraint, one is seeking todetermine the trade-off between the coding rate and the error asymptotics when only codes witha fixed upper bound on their cost are allowed. The direct and strong converse exponents may bedefined analogously to (IV.49)–(IV.50) and (IV.36)–(IV.39). In particular, the strong converseexponents are

scγ<c(W,R) := inf

{lim infn→+∞

−1

nlogPs(W


1

nlog |Cn| ≥ R, lim sup

nγ(Cn) < c

},

(IV.52)

scγ<c(W,R) := inf

{lim supn→+∞

−1

nlogPs(W


1

nlog |Cn| ≥ R, lim sup

nγ(Cn) < c

}.

(IV.53)

Lower bounds on the direct and the strong converse exponents were given in Lemma 4.3 andLemma 4.4 in [32], in terms of the Petz-type Renyi mutual informations Iα,1(P ) (see Section A2for definitions).

21

The exact strong converse exponent can be obtained from Theorem IV.1 using the simpleobservation that for any x ∈ Xn,

γ(x) =∑

x∈X

Px(x)γ(x) = EPx(γ),

where Px is the type of x (see Section II). Let us introduce

Pf,γ<c(X ) := {P ∈ Pf (X ) : Ep(γ) < c}.

Similarly to the constant composition case, the following lower bound follows by a straightforwardapplication of Nagaoka’s method:

Lemma IV.10 In the above setting,

scγ<c(W,R) ≥ supα∈[1,+∞]

α− 1

α

[R − sup

P∈Pf,γ<c(X )

χ∗α(W,P )

]. (IV.54)

We give the proof in Appendix B.

Note that, with the change of variables u := α−1α , the RHS in (IV.54) can be rewritten as

supα∈[1,+∞]

α− 1

α

[R− sup

P∈Pf,γ<c(X )

χ∗α(W,P )

]= sup

0≤u≤1inf

P∈Pf,γ<c(X ){uR− f(P, u)}, (IV.55)

where

f(P, u) := uχ∗1

1−u

(W,P ), u ∈ [0, 1], P ∈ Pf (X ); (IV.56)

the case u = 1 is to be interpreted as limuց1 uχ∗

11−u

(W,P ) = χ∗∞(W,P ). It is clear that f is a

concave function of P for any fixed u. A highly non-trivial counterpart, proved very recently in[16], is the following:

Lemma IV.11 For any fixed P ∈ Pf(X ), f(P, .) is convex on [0, 1].

Corollary IV.12 For any cost function γ : X → R, and any c ∈ R,

supα∈[1,+∞]

α− 1

α

[R− sup

P∈Pf,γ<c(X )

χ∗α(W,P )

]= inf

P∈Pf,γ<c(X )sup

α∈[1,+∞]

α− 1

α[R− χ∗

α(W,P )] .

(IV.57)

Proof It is clear that the function h(u, P ) := uR − f(P, u) is convex in P for any fixed u. Forany fixed P , h(., P ) is clearly continuous, and, by Lemma IV.11, it is concave on [0, 1]. Thus, byLemma II.3,

infP∈Pf,γ<c(X )

sup0≤u≤1

{uR− f(P, u)} = sup0≤u≤1

infP∈Pf,γ<c(X )

{uR− f(P, u)}. (IV.58)

Changing the variable u to α := 1/(1− u) yields (IV.57). �

The exact strong converse exponent with cost constraint is given by the following:

Theorem IV.13 In the above setting,

scγ<c(W,R) = scγ<c(W,R) = supα∈[1,+∞]

α− 1

α

[R− sup

P∈Pf,γ<c(X )

χ∗α(W,P )

]. (IV.59)

22

Proof By Lemma IV.10, it is sufficient to prove that scγ<c(W,R) is upper bounded by the RHSof (IV.59). Let P ∈ Pf (X ) by such that EP (γ) < c. By Proposition IV.6, there exists a sequenceof constant composition codes (Cn)n∈N with asymptotic composition P such that

lim supn

−1

nlogPs(W

⊗n, Cn) ≤ supα∈[1,+∞]

α− 1

α[R− χ∗

α(W,P )] .

Moreover, this code sequence can be chosen so that Pn := PEn(k), k = 1, . . . , |Cn|, satisfiessuppPn ⊆ suppP , n ∈ N, (see Corollary C.3). Since limn ‖Pn − P‖1 = 0, we have γ(Cn) =EPn

(γ) < c for all large enough n. Thus,

scγ<c(W,R) ≤ supα∈[1,+∞]

α− 1

α[R − χ∗

α(W,P )] . (IV.60)

Since this holds for every P ∈ Pf (X ) with EP (γ) < c, we finally have

scγ<c(W,R) ≤ infPf,γ<c

supα∈[1,+∞]

α− 1

α[R− χ∗

α(W,P )] = supα∈[1,+∞]

α− 1

α

[R− sup

P∈Pf,γ<c(X )

χ∗α(W,P )

],

where the equality is due to Corollary IV.57. �

As it has been shown in [50], the strong converse exponent of a cq channel W with uncon-strained coding is given by

sc(W,R) := sc(W,R) = sc(W,R) = supα>1

α− 1

α[R− χ∗

α(W )] (IV.61)

for any R > 0, where χ∗α(W ) = supP∈Pf (X ) χ

∗α(W,P ), and sc(W,R) and sc(W,R) are defined

analogously to (IV.36)–(IV.37) by dropping the constant composition constraint. It is naturalto ask whether this optimal value can be achieved, or at least arbitrarily well approximated, byconstant composition codes, i.e., whether we have

infP∈Pf (X )

sc(W,R, P ) = sc(W,R). (IV.62)

In view of Theorem IV.1, this is equivalent to whether

infP∈Pf (X )

supα∈[1,+∞]

α− 1

α[R− χ∗

α(W,P )] = supα∈[1,+∞]

infP∈Pf (X )

α− 1

α[R− χ∗

α(W,P )] . (IV.63)

The answer to this question is affirmative: Indeed, by choosing a constant cost function γ ≡ γ0,and any c > γ0, (IV.63) follows as a special case of Corollary IV.12. In fact, (IV.61) follows as aspecial case of Theorem IV.13 with the above trivial choice of c and γ.

Remark IV.14 The identity (IV.63) was also stated in [16], although only as a formal identity,as the operational interpretation of supα∈[1,+∞]

α−1α [R− χ∗

α(W,P )], (i.e., Theorem IV.1), and

hence the equivalence of (IV.63) and (IV.62), had not yet been known then.

C. Generalized cutoff rates for constant compositions and for cost constraint

The convexity of f(P, .), defined in (IV.56), also plays an important role in establishing theweighted sandwiched Renyi divergence radii as generalized cutoff rates in the sense of Csiszar[19]. Following [19], for a fixed κ > 0, we define the generalized κ-cutoff rate Cκ(W,P ) for a cqchannel W and input distribution P as the smallest number R0 satisfying

sc(W,R, P ) ≥ κ(R−R0), R > 0. (IV.64)

The following extends the analogous result for classical channels in [19] to classical-quantumchannels, and gives a direct operational interpretation of the weighted sandwiched Renyi diver-gence radius of a cq channel as a generalized cutoff rate.

23

Proposition IV.15 For any κ ∈ (0, 1),

Cκ(W,P ) = χ∗1

1−κ

(W,P ),

or equivalently, for any α > 1,

χ∗α(W,P ) = Cα−1

α(W,P ).

Proof By Theorem IV.1, we have

sc(R,W,P ) = sup0<u<1

{uR− f(P, u)} ≥ κR− f(P, κ) = κ

(R−

1

κf(P, κ)

), κ ∈ (0, 1),

where the inequality is trivial. Since f(P, .) is convex according to [16], its left and rightderivatives at κ, ∂−f(P, .)(κ) and ∂+f(P, .)(κ), exist. Obviously, for any ∂−f(P, .)(κ) ≤ R ≤∂+f(P, .)(κ),

sup0<u<1

{uR− f(P, u)} = κR− f(P, κ) = κ

(R −

1

κf(P, κ)

),

showing that 1κf(P, κ) = χ∗

11−κ

(W,P ) is the minimal R0 for which (IV.64) holds for all R > 0. �

We can analogously define the generalized κ-cutoff rate Cκ,γ<c(W ) for a cq channel W , costfunction γ and threshold c, as the smallest number R0 satisfying

scγ<c(W,R) ≥ κ(R−R0), R > 0. (IV.65)

A completely similar argument as in the proof of Proposition IV.15, using that supP∈Pf,γ<c(P, .),

as the supremum of convex functions, is convex, yields the following:

Proposition IV.16 For any κ ∈ (0, 1),

Cκ,γ<c(W ) = supP∈Pf,γ<c

χ∗1

1−κ

(W,P ),

or equivalently, for any α > 1,

supP∈Pf,γ<c

χ∗α(W,P ) = Cα−1

α,γ<c(W,P ).

Finally, the generalized κ-cutoff rate Cκ(W ) without constraints is the smallest number R0

satisfying

sc(W,R) ≥ κ(R−R0), R > 0.

Using again a constant cost function γ ≡ γ0 and c > γ0, Proposition IV.16 yields the generalizedκ-cutoff rate representation of the Renyi capacities χ∗

α(W ) = supP∈Pf (X ) χ∗α(W,P ) for any α ∈

(1,+∞) in the context of constraint-free cq channel coding as follows:

Corollary IV.17 For any κ ∈ (0, 1),

Cκ(W ) = χ∗1

1−κ

(W ),

or equivalently, for any α ∈ (1,+∞),

χ∗α(W ) = Cα−1

α(W ).

24

Appendix A: Further properties of divergence radii

1. General divergences

Here we consider, among others, the connection between the divergence radius and theweighted divergence radius for a general divergence ∆. The following is a common generalizationand simplification of several results of the same kind for various Renyi divergences [42, 49, 50, 71].

Proposition A.1 Assume that ∆ is convex and lower semi-continuous in its second argument.Then

supP∈Pf (S)

R∆,P (S) = R∆(S) (A.1)

for any S ⊆ B(H)+.

Proof We have

R∆(S) = infσ∈S(H)

sup∈S

∆(‖σ) = infσ∈S(H)

supP∈Pf (S)

∑

∈S

P ()∆(‖σ)

= supP∈Pf (S)

infσ∈S(H)

∑

∈S

P ()∆(‖σ) = supP∈Pf (S)

R∆,P (S).

The first equality above is by definition, and the second one is trivial. The third one follows fromLemma II.3 by noting that

∑∈S P ()∆(‖σ) is convex and lower semi-continuous in σ on the

compact set S(H), and it is trivially concave (in fact, affine) on the convex set Pf (S). The lastequality is again by definition. �

Note that if ∆ is additive on tensor products then the corresponding divergence radius issubadditive by definition: for any S(i) ⊆ B(H(i))+, i = 1, 2,

R∆(S(1) ⊗S(2)) = inf

σ12∈S(H(1)⊗H(2))sup

1∈S(1),2∈S(2)

∆(1 ⊗ 2‖σ12)

≤ infσ1∈S(H(1)), σ2∈⊗S(H(2))

sup1∈S(1),2∈S(2)

∆(1 ⊗ 2‖σ1 ⊗ σ2)

= R∆(S(1)) +R∆(S

(2)), (A.2)

where S(1) ⊗S(2) := {1 ⊗ 2 : i ∈ S(i), i = 1, 2}. It is easy to see that if ∆ satisfies theconditions of Proposition A.1, and the weighted ∆-radius is additive, then so is the ∆-radius:

Proposition A.2 Assume that ∆ is convex and lower semi-continuous in its second argumentand it is additive on tensor products, and the weighted ∆-radius is additive in the sense thatR∆,P (1)⊗P (2) = R∆,P (1) + R∆,P (2) for any P (i) ∈ Pf (B(H(i))+, i = 1, 2. Then the ∆-radius isadditive, i.e., (A.2) holds as an equality.

Proof We have already seen subadditivity of the ∆-radius in (A.2). The converse inequalityfollows by

R∆(S(1) ⊗S(2)) = sup

P12∈Pf (S(1)⊗S(2))

R∆,P12(S(1) ⊗S(2))

≥ supP1∈Pf (S(1)), P2∈Pf (S(2))

R∆,P1⊗P2(S(1) ⊗ S(2))

= supP1∈Pf (S(1)), P2∈Pf (S(2))

{R∆,P1(S

(1)) +R∆,P2(S(2))}

= R∆(S(1)) +R∆(S

(2)),

where the first and the last identities are due to Proposition A.1, the inequality is trivial, andthe second equality follows by the additivity assumption on the weighted ∆-radius. �

25

Remark A.3 In the formalism of gcq channels, Proposition A.2 says that if ∆ is convex andlower semi-continuous in its second argument then

χ∆(W ) = R∆(ranW )

for any gcq channel W , and if ∆ is also additive on tensor products, then the additivity

χ∆(W(1) ⊗W (2), P (1) ⊗ P (2)) = χ∆(W

(1), P (1)) + χ∆(W(2), P (2))

for some gcq channels W (i) : X (i) → B(H(i))+ and any P (i) ∈ Pf (X(i)), i = 1, 2, implies the

additivity

χ∆(W(1) ⊗W (2)) = χ∆(W

(1)) + χ∆(W(2)).

We discuss this further for the case of α-z divergences at the end of Section A 2.

When ∆ is non-negative on suppP for some P ∈ Pf (B(H)) in the sense that ∆(‖σ) ≥ 0 forall ∈ suppP and σ ∈ S(H), it is possible to define a continuous approximation between the ∆-radius and the P -weighted ∆-radius as follows. Define for every β ∈ [1,+∞] the (P, β)-weighteddivergence radius of a set S ⊆ B(H)+ as

R∆,P,β(S) := infσ∈S(H)

‖∆(.‖σ)‖P,β := infσ∈S(H)

(∑∈S P ()∆(‖σ)β

)1/β, β ∈ [1,+∞),

sup∈suppP ∆(‖σ), β = +∞.(A.3)

Just like before, a σ ∈ S(H) is called a (P, β)-weighted ∆-centre if it attains the infimum in(A.3). Again, R∆,P,β(S) is in fact independent of S, and hence we will often drop it from thenotation.Note that R∆,P,1 = R∆,P , and when S is finite and suppP = S then R∆,P,+∞(S) = R∆(S).

In general, though, we need a further optimization to recover R∆ from R∆,P,+∞. According towell-known properties of the β-norms,

R∆,P,β1 ≤ R∆,P,β2 when β1 ≤ β2, and R∆,P,β ր R∆,P,+∞ as β ր +∞ (A.4)

for any P ∈ Pf (B(H)+). Moreover, it is clear from the definitions that

supP∈Pf (S)

R∆,P,β ≤ R∆(S) (A.5)

for any S ⊆ B(H)+ and β ∈ [1,+∞]. Under the conditions of Proposition A.1, the above holdsas an equality:

Proposition A.4 Assume that ∆ is non-negative on some S ⊆ B(H)+, and convex and lowersemi-continuous in its second argument. Then

supP∈Pf (S)

R∆,P,β(S) = R∆(S) (A.6)

for any β ∈ [1,+∞].

Proof Due to (A.5) and the monotonicity (A.4), it is enough to prove the assertion for β = 1,which has already been established in Proposition A.1. �

In the rest of the section we explore some properties of the divergence radius R∆. We willdenote the set of ∆-centers of S by C∆(S).

Lemma A.5 The ∆-radius satisfies the following simple properties.

(i) The ∆-radius is a monotone function of S, i.e., if S ⊆ S′ then R∆(S) ≤ R∆(S′).

(ii) If S ⊆ S′ and R∆(S) = R∆(S′) then C∆(S

′) ⊆ C∆(S).

26

(iii) If ∆ is quasi-convex in its first argument then R∆(S) = R∆(conv(S)) and C∆(S) =C∆(conv(S)).

(iv) If ∆ is lower semi-continuous in its first argument then R∆(S) = R∆(S) and C∆(S) =C∆(S).

Proof The monotonicity in (i) is obvious.Assume that the conditions of (ii) hold, and that σ is a ∆-centre for S′ (if C∆(S) = ∅ then

the assertion holds trivially). Then

R∆(S) ≤ sup∈S

∆(‖σ) ≤ sup∈S′

∆(‖σ) = R∆(S′) = R∆(S),

from which sup∈S ∆(‖σ) = R∆(S), i.e., σ is a ∆-centre of S.As a consequence of (i) and (ii), in (iii) we only have to prove that R∆(S) ≥ R∆(conv(S)) and

C∆(S) ⊆ C∆(conv(S)), and analogously in (iv), with S in place of conv(S).Assume that ∆ is quasi-convex in its first argument. For any ′ ∈ conv(S), there exists a

finitely supported probability distribution P′ ∈ Pf (S) such that ′ =∑

∈S P′(), and hence

∆(′‖σ) ≤ max∈suppP′∆(‖σ) ≤ sup∈S ∆(‖σ) for any σ ∈ S(H). Taking the supremum in

′ ∈ conv(S) and then the infimum in σ ∈ S(H) yields R∆(conv(S)) ≤ R∆(S). If σ ∈ C∆(S)then

R∆(conv(S)) ≤ sup′∈conv(S)

∆(′‖σ) ≤ sup′∈conv(S)

sup∈suppP′

∆(‖σ) = sup∈S

∆(‖σ) = R∆(S)

= R∆(conv(S)),

from which R∆(conv(S)) = sup′∈conv(S) ∆(′‖σ), i.e., σ ∈ C∆(conv(S)).

Assume now that ∆ is l.s.c. in its first argument, and let ′ ∈ S. Then there exists a sequence(n)n∈N ⊆ S converging to ′, and hence, by lower semi-continuity,

∆(′‖σ) ≤ lim infn→+∞

∆(n‖σ) ≤ sup∈S

∆(‖σ), σ ∈ S(H). (A.7)

Taking the supremum in ′ ∈ S and then the infimum in σ ∈ S(H) yields R∆(S) ≤ R∆(S). Ifσ ∈ C∆(S) then

R∆(S) ≤ sup′∈S

∆(′‖σ) ≤ sup∈S

∆(‖σ) = R∆(S) = R∆(S),

where the second inequality is due to (A.7). This yields that R∆(S) = sup′∈S ∆(′‖σ), i.e.,

σ ∈ C∆(S). �

Corollary A.6 If ∆ is quasi-convex and lower semi-continuous in its first argument then

R∆(S) = R∆(conv(S)) and C∆(S) = C∆(conv(S)).

for any S.

As a consequence, when studying the divergence radius for a divergence with the above prop-erties, we can often restrict our investigation to closed convex sets without loss of generality.

Proposition A.7 Assume that ∆ satisfies (A.1) for any S ⊆ B(H)+. Then R∆ is continuouson monotone increasing nets of subsets of B(H)+, i.e., if S ⊆ P (B(H)+) (all the subsets ofB(H)+) such that for all S, S′ ∈ S there exists an S′′ ∈ S with S ∪ S′ ⊆ S′′ then

R∆(∪S) = supS∈S

R∆(S). (A.8)

In particular, for any S ⊆ B(H)+,

R∆(S) = sup{R∆(S′) : S′ ⊆ S, S′ finite}. (A.9)

27

Proof It is clear from the monotonicity stated in Lemma A.5 that

R∆(∪S) ≥ supS∈S

R∆(S),

and hence we only have to prove the converse inequality. To this end, let

c < R∆(∪S) = supP∈Pf (∪S)

R∆,P (∪S),

where the equality holds by assumption. Then there exists a P ∈ Pf (∪S) for which c <R∆,P (∪S), and there exists an S ∈ S such that suppP ⊆ S, and hence R∆,P (∪S) = R∆,P (S).From this we get (A.8), and (A.9) follows immediately. �

Remark A.8 Theorem 3.5 in [57] states A.1 for the relative entropy. However, in their proofthey use (A.9) without any explanation. In the proof above we assumed that A.1 holds, so thequestion arises whether the proof in [57] can be made complete in some other way.

2. Generalized mutual information

For any gcq channel W : X → B(H)+, we define the lifted channel

W : X → S(HX ⊗H), W(x) := |x〉〈x| ⊗W (x).

Here, HX is an auxiliary Hilbert space, and {|x〉 : x ∈ X} is an orthonormal basis in it. Asa canonical choice, one can use HX = l2(X ), the L2-space on X with respect to the countingmeasure, and choose |x〉 := 1{x} to be the characteristic function (indicator function) of thesingleton {x}. Note that this is well-defined irrespectively of the cardinality of X . The classical-quantum state

W(P ) :=∑

x∈X

P (x) |x〉〈x| ⊗W (x)

plays the role of the joint distribution of the input and the output of the channel for a fixedfinitely supported input probability distribution P ∈ Pf (X ).

For a general divergence ∆ and a gcq channel W : X → B(H)+, we may define the ∆-mutualinformation between the classical input and the quantum output of the channel for a fixed inputdistribution P ∈ Pf (X ) as

I∆(W,P ) := infσ∈S(H)

∆(W(P )‖P ⊗ σ). (A.10)

The mutual information and the weighted divergence radius are different quantities in general(as we see below). However, they are equal if the divergence satisfies some simple properties:

Lemma A.9 Assume that ∆ is block additive and homogeneous. Then

I∆(W,P ) = χ∆(W,P )

for any gcq channel W and input distribution P .

28

Proof We have

I∆(W,P ) = infσ∈S(H)

∆(W(P )‖P ⊗ σ)

= infσ∈S(H)

∆

(∑

x∈X

P (x) |x〉〈x| ⊗W (x)∥∥∥∑

x∈X

P (x) |x〉〈x| ⊗ σ

)

= infσ∈S(H)

∑

x∈X

∆(P (x) |x〉〈x| ⊗W (x)‖P (x) |x〉〈x| ⊗ σ)

= infσ∈S(H)

∑

x∈X

P (x)∆ (|x〉〈x| ⊗W (x)‖ |x〉〈x| ⊗ σ)

= infσ∈S(H)

∑

x∈X

P (x)∆ (W (x)‖σ)

= χ∆(W,P ),

where the first two equalities are by definition, the third follows by block additivity, the fourthby homogeneity, the fifth by isometric invariance (see the beginning of Section IIIA), and thelast one is again by definition. �

In particular, all Qα,z are block additive and homogenous, and hence we have

Corollary A.10 For any (α, z), any gcq channel W and input distribution P ∈ Pf (X ), we have

IQα,z(W,P ) = χQα,z

(W,P ).

Note that the relative entropy D = D1 is also block additive and homogeneous, and hence thecorresponding mutual information and weighted D-radius coincide for any input distribution P :

χ1(W,P ) = I1(W,P ). (A.11)

Moreover, the relative entropy is even more special, as for any cq channel W , the P -weightedD-center coincides with the minimizer in (A.10), and can be explicitly given as W (P ). Indeed,for any state σ ∈ S(H), we have the simple identities (attributed to Donald)

D (W(P )‖P ⊗ σ) =∑

x∈X

P (x)D(W (x)‖σ) = D(W (P )‖σ) +∑

x∈X

P (x)D(W (x)‖W (P ))

= D(W (P )‖σ) +D (W(P )‖P ⊗W (P )) ,

and the assertion follows from the strict positivity of the relative entropy on pairs of states. (Notethat for this it is necessary that all the W (x) are normalized on suppP .) χ1(W,P ) = I1(W,P )is called the Holevo quantity of the ensemble {W (x), P (x)}x∈suppP in the quantum informationtheory literature.

It is easy to see that Dα,z is not block additive if α 6= 1, and in general the Dα,z mutualinformation Iα,z(W,P ) := IDα,z

(W,P ) and the weighted Dα,z divergence radius χα,z(W,P ) aredifferent quantities. However, they can be related by a simple inequality, as

Iα,z(W,P ) = infσ∈S(H)

1

α− 1logQα,z

(∑

x∈X

P (x) |x〉〈x| ⊗W (x)∥∥∥∑

x∈X

P (x) |x〉〈x| ⊗ σ

)(A.12)

= infσ∈S(H)

1

α− 1log∑

x∈X

P (x)Qα,z(W (x)‖σ) (A.13)

≤ infσ∈S(H)

∑

x∈X

P (x)1

α− 1logQα,z(W (x)‖σ) (if α ∈ (0, 1))

= infσ∈S(H)

∑

x∈X

P (x)Dα,z(W (x)‖σ)

= χα,z(W,P ),

29

where the second equality follows as in the proof of Lemma A.9, and the inequality is due tothe concavity of the logarithm. Obviously, the inequality holds in the opposite direction whenα > 1.The above inequality is in general strict. As an example, consider a noiseless channel as in

Example III.17 with a non-uniform input distribution P . Then we have

Iα,z(W,P ) ≤ Dα,z(W(P )‖P ⊗W (P )) =1

α− 1log∑

x∈X

P (x)Qα,z(W (x)‖W (P ))

=1

α− 1log∑

x∈X

P (x)P (x)1−α <∑

x∈X

P (x)1

α − 1logP (x)1−α

= H(P ) = χα,z(W,P ),

where the first inequality is by definition, the strict inequality follows from the strict concavityof the logarithm, and the last equality is due to Example III.17.While the mutual information Iα,z for Dα,z differs from the weighted channel radius χα,z for

Dα,z, it is a simple function of the weighted channel radius for Qα,z (and thus, by CorollaryA.10, also of the mutual information for Qα,z); indeed, moving the infimum over σ behind thelogarithm in (A.12) and (A.13), respectively, yields

Iα,z(W,P ) =1

α− 1log s(α)IQα,z

(W,P ) =1

α− 1log s(α)χQα,z

(W,P ). (A.14)

Moreover, the difference between Iα,z and Dα,z vanishes after optimizing over the input distri-bution:

Corollary A.11 If (α, z) are such that Dα,z is convex in its second argument then

supP∈Pf (X )

χα,z(W,P ) = RDα,z(ranW ), (A.15)

and if α = 1 or α 6= 1 and (α, z) are such that Qα,z is convex in its second argument then

supP∈Pf (X )

Iα,z(W,P ) = RDα,z(ranW ) (A.16)

for any gcq channel W : X → S(H).

Proof The identity in (A.15) is immediate from Proposition A.1. When α = 1, we haveI1,z(W,P ) = χ1,z(W,P ), and hence for the rest we assume that α 6= 1. The identity in (A.16)follows as

supP∈Pf (X )

Iα,z(W,P ) =1

α− 1log s(α) sup

P∈Pf (X )

χQα,z(W,P )

=1

α− 1log s(α)RQα,z

(ranW )

=1

α− 1log s(α) inf

σ∈S(H)supx∈X

Qα,z(W (x)‖σ)

= infσ∈S(H)

supx∈X

1

α− 1log s(α)Qα,z(W (x)‖σ)

= infσ∈S(H)

supx∈X

Dα,z(W (x)‖σ)

= RDα,z(ranW )

where the first equality is due to (A.14), the second one is due to Proposition A.1, and the restare obvious. �

Remark A.12 The above proof method is due to Csiszar [19], and extends various prior resultsfor different quantum Renyi divergences and ranges of parameters (α, z) in [42, 49, 50, 71]. Thespecial case

supP∈Pf (X )

χ1(W,P ) = RD(ranW ) = supP∈Pf (X )

I1(W,P )

was proved by different methods in [57, 63, 68].

30

Remark A.13 Mutual information quantifies the amount of correlation in a bipartite state bymeasuring its distance from the set of uncorrelated states. Following this general idea, one maygive two alternative definitions of mutual information between the input and the output of a gcqchannel W for a fixed input distribution P using a general divergence ∆: the ∆-“distance” ofW(P ) from the product of its marginals P ⊗W (P ):

I(2)∆ (W,P ) := ∆ (W(P )‖P ⊗W (P )) ,

or the ∆-“distance” of W(P ) from the set of uncorrelated states:

I(1)∆ (W,P ) := inf

ω∈S(HX ), σ∈S(H)∆(W(P )‖ω ⊗ σ) .

Both of these options seem more natural than the curiously asymmetric optimization in (A.10),and one might wonder why it is nevertheless the capacity χ∗

α(W ) := supP∈Pf (X ) I∗α(W,P ) cor-

responding to the version (A.10) (for the sandwiched Renyi divergence) that seems to obtain anoperational significance in channel coding, according to [19, 50].There seems to be at least two different resolutions of this question: First, as the more de-

tailed analysis of constant composition channel coding shows (according to [19] and our CorollaryIV.15), it is in fact not the mutual information, but the weighted divergence radius that obtainsa natural operational interpretation in channel coding; the mutual information only enters thepicture because its optimized version over all input distributions “happens to” coincide with theoptimized version of the divergence radius. In this context it would be interesting to know if anyof the trivial inequalities

supp∈Pf (X )

I(1)∆ (W,P ) ≤ sup

p∈Pf (X )

I∆(W,P ) ≤ supp∈Pf (X )

I(2)∆ (W,P )

holds as an equality for general W and P (at least for ∆ = D∗α with α > 1).

Second, according to (A.14), the mutual information for Renyi divergences is simply a functionof another weighted divergence radius, corresponding to the Q quantities rather than the Renyidivergences, and the optimization over σ is simply the optimization over the candidates for theweighted divergences centers, which is very natural, and in this context the problem of asymmetrydoes not even makes sense.

The same arguments as in Sections III C and III D yield the following statements, and hencewe omit their proofs.

Lemma A.14 Let σ be a P -weighted Qα,z center for W .

(1) If (α, z) is such that Qα,z is quasi-convex in its second argument then σ0 ≤W (P )0.

(2) If α > 1 or α ∈ (0, 1) and 1− α < z < +∞ then W (P )0 ≤ σ0.

Let us define ΓQ to be the set of (α, z) values such that for any gcq channel W and any input

probability distribution P , any P -weighted Qα,z center σ for W satisfies σ0 = W (P )0. ThenLemmas III.5, III.7, and A.14 yield

ΓQ ⊇ {(α, z) : α ∈ (0, 1), 1− α < z +∞} ∪ {(α, z) : α > 1, z ≥ max{α/2, α}} .

Proposition A.15 Assume that (α, z) ∈ ΓQ are such that Qα,z is convex in its second variable.

Then σ is a P -weighted Qα,z center for W if and only if it is a fixed point of the map

ΦW,P,Qα,z(σ) :=

1

τ

∑

x∈X

P (x)(σ

1−α2z W (x)

αz σ

1−α2z

)z, (A.17)

σ ∈ SW,P (H)++, where τ is the normalization factor

τ :=∑

x∈X

P (x)Qα,z(W (x)‖σ).

31

Remark A.16 Note that the fixed point equations in (A.17) and (III.29) look very similar, exceptthat the normalization of sigma is obtained “globally” in the former, and for each x individuallyin the latter.

Remark A.17 As we have mentioned above, the relative entropy (α = 1) is special in that for cqchannels the minimizer for the mutual information can be explicitly determined (as W (P )), andhence also the mutal information can be given by an explicit formula. The only other known familyof quantum Renyi divergences with these properties are the Petz-type Renyi divergences (corre-sponding to z = 1). Indeed, it is easy to verify that in this case we have the classical-quantumSibson identity [42, Lemma 2.2] Dα,1(W(P )‖P ⊗ σ) = Dα,1(σα,1‖σ) +

1α−1 log(Trω(α))

α, where

ω(α) :=

(∑

x

P (x)W (x)α

)1/α

, and σα,1 := ω(α)/Trω(α).

As a consequence, σα,1 is the unique minimizer for Iα,1(W,P ), and, by (A.14), it is also the

unique P -weighted Qα,1 center. Moreover,

χQα,1(W,P ) =

∑

x∈X

P (x)Qα,1(W (x)‖σα,1) = s(α)(Trω(α))α = s(α)

Tr

(∑

x

P (x)W (x)α

)1/α

α

.

Proposition A.18 (Additivity of the Dα,z mutual information) Let W (1) : X (1) → S(H(1)) and

W (2) : X (2) → S(H(2)) be gcq channels, and P (i) ∈ Pf (X (i)), i = 1, 2, be input distributions.Assume, moreover, that α and z satisfy the conditions of Proposition A.15. Then

χQα,z

(W (1) ⊗W (2), P (1) ⊗ P (2)

)= χQα,z

(W (1), P (1)

)· χQα,z

(W (2), P (2)

), (A.18)

Iα,z

(W (1) ⊗W (2), P (1) ⊗ P (2)

)= Iα,z

(W (1), P (1)

)+ Iα,z

(W (2), P (2)

). (A.19)

Remark A.19 In [34], a generalized notion of mutual information was studied, where the bi-partite state need not be classical-quantum, and the first marginal of the second argument is fixedbut not necessarily equal to the first marginal of the first argument. For sandwiched Renyi di-vergences a characterization of the optimal state in terms of a fixed point equation, as well asthe additivity of the generalized mutual information was obtained. Our approach is essentiallythe same as that of [34], and Propositions A.15 and A.18 are special cases of the results of [34]when z = α.

Remark A.20 Note that for the special case z = 1, the relations (A.18) and (A.19) followimmediately from the explicit expression for the minimizer in Remark A.17.

Finally, Corollary A.11, Propositions A.2 and A.18, and Theorem III.20 yield the following:

Proposition A.21 Assume that (α, z) are such that (α, z) ∈ ΓD and Dα,z is convex in its second

variable, or (α, z) ∈ ΓQ and Qα,z is convex in its second variable. Then the (α, z)-capacity is

additive in the sense that for any qcq channels W (i) : X (i) → B(H(i))+, i = 1, 2,

χα,z(W(1) ⊗W (2)) = χα,z(W

(1)) + χα,z(W(2)).

Proof The assertion under the first set of conditions follows immediately from Theorem III.20,Corollary A.11, and Proposition A.2, as we have already essentially stated in Remark A.3. Corol-lary A.11, and Propositions A.2 and A.18 yield the assertion under the second set of conditionsby a completely similar argument. �

Remark A.22 Additivity of the χα,1 capacities for classical-quantum channels and α > 1 wasfirst proved in [56, Lemma 2] by different methods from the above, following the method ofArimoto [5]. We credit the book [32] (especially Exercise 4.30) for the idea of the above approach,using the identity in (A.16) and the additivity of the mutual information. We remark that in [32],only the case z = 1 was considered, where additivity of the mutual information can be obtainedwithout Proposition A.18, as pointed out in Remark A.17.

32

3. PSD divergence center and radius

Note that while we defined the divergence radius and center for an arbitrary non-empty subsetS of B(H)+, in the definition of the center we restricted to density operators, i.e., PSD operatorswith trace 1. This is a natural choice when the set S consists of density operators itself, and,even more importantly, it leads to operationally relevant information measures as our main result,Theorem IV.1 shows.Moreover, restricting the set of possible divergence centers to density operators may be op-

erationally motivated even when the elements of the set S are not normalized to have trace 1.Indeed, the optimal success probability of discriminating quantum states 1, . . . , r with priorprobabilities p1, . . . , pr can be expressed as

P ∗s = exp

(RD∗

∞({p11, . . . , prr})

),

where D∗∞ := limα→+∞D∗

α = limα→+∞Dα,α is the max-relative entropy [23, 51, 60]. Thisfollows by simply rewriting the optimal success probability P ∗

s := max{∑r

i=1 piTr iMi :(Mi)

ri=1 POVM} using the duality of linear programming, as was done in [73] (see also [43]).

In this section we consider the alternative approach where the divergence center is allowedto be a general PSD operator. This is largely motivated by recent investigations in matrixanalysis regarding various concepts of multi-variate geometric matrix means, in particular, bythe approach of [12]. We comment on this in more detail at the end of the section.For a general divergence ∆, we define the P -weighted PSD ∆-radius as

R∆,P (S) := infσ∈B(H)+

∑

∈S

P ()∆(‖σ),

and we call any σ ∈ B(H)+ that attains the above infimum a P -weighted PSD ∆-center. For agcq channel W : X → B(H)+ and P ∈ Pf(X ) we define the P -weighted PSD ∆-radius of W asbefore:

χ∆(W,P ) := R∆,P◦W−1(ranW ).

Any PSD minimizer in the definition of R∆,P◦W−1(ranW ) will be called a P -weighted PSD∆-center for the channel W .It is easy to see that these quantities are meaningless for the Renyi divergences considered

before, as we always have

RDα,z ,P (S) = −∞, RQα,z,P(S) =

{−∞, α ∈ (0, 1),

0, α > 1,

due to the scaling laws

Dα,z(‖λσ) = Dα,z(‖σ)− logλ, Qα,z(‖λσ) = λ1−αQα,z(‖σ), λ ∈ (0,+∞).

Hence, in order to make sense of the PSD divergence radius, it seems necessary to modify thenotion of Renyi divergence for PSD operators.We consider two such options, motivated by Proposition A.28 in Section A4. One is a simple

rescaling of Dα,z, defined as

Dα,z(‖σ) :=1

α− 1log

Qα,z(‖σ)

(Tr )α(Tr σ)1−α=

1

α− 1logQα,z(‖σ)−

α

α− 1logTr + logTr σ

= Dα,z − logTr + logTrσ = Dα,z

(

Tr

∥∥∥ σ

Tr σ

).

The limit α→ 1 yields

D1(‖σ) := limα→1

Dα,z(α)(‖σ) = D1(‖σ) + logTr − log Trσ = D

(

Tr

∥∥∥ σ

Tr σ

)

33

for any function α 7→ z(α) that is continuously differentiable in a neighbourhood of 1, on which

z(α) 6= 0 [47]. Obviously, Dα,z(‖σ) = Dα,z(‖σ) for any pair of states , σ. Note that Dα,z is

a projective divergence, i.e., Dα,z(λ‖σ) = Dα,z(‖λσ) = Dα,z(‖σ) for any , σ ∈ B(H)+ andλ ∈ (0,+∞). As an immediate consequence, we have

RDα,z,P(S) = RDα,z ,P (S)−

∑

P () log Tr ,

i.e., we essentially recover the previously considered concept of the Dα,z-radius, apart from an

uninteresting term. Obviously, if σ is a PSD Dα,z-center then so is λσ for all λ ∈ (0,+∞).

Moreover, the normalized PSD Dα,z-centers can be characterized by the fixed point equation inTheorem III.14 for the (α, z) pairs for which Theorem III.14 holds.The other option we consider is a modification of the Tsallis quantum Renyi divergences, or

Tsallis relative entropies, defined as

Tα,z(‖σ) :=1

1− α(αTr + (1− α)Tr σ −Qα,z(‖σ)) .

We will call these quantities Tsallis (α, z)-divergences. The limit α → 1 yields

T1(‖σ) := limα→1

Tα,z(α)(‖σ) = D(‖σ)− Tr +Tr σ,

for any function α 7→ z(α) that is continuously differentiable in a neighbourhood of 1, on whichz(α) 6= 0 [47].

Remark A.23 The usual way to define the quantum Tsallis relative entropy is

T ′α(‖σ) :=

1

1− α(Tr −Qα,1(‖σ)) .

Note that T ′α coincides with Tα,1 on pairs of density operators but, unlike Tα,1, T

′α is not positive

on pairs of PSD operators for any (α, z). Moreover, the PSD divergence radius problem is trivialfor these quantities, as

RT ′α,z

(S) =

{−∞, α ∈ (0, 1),1

1−α

∑ P (), α > 1.

We consider the PSD divergence center for Tα,z in the gcq channel formalism, for easiercomparison with Theorem III.14 and Proposition A.15.

Proposition A.24 Let W : X → B(H)+ be a qcq channel.

(i) For any α ∈ (0,+∞) \ {1} and z ∈ (0,+∞),

χTα,z(W,P ) =

α

1− α

[TrW (P )−

(s(α)χQα,z

(W,P ))1/α]

. (A.20)

(ii) If σ is a P -weighted PSD Tα,z-center for W then Tr σ =∑

x P (x)Qα,z(W (x)‖σ), and

σ = σ/Tr σ is a P -weighted Qα,z-center for W .

(iii) If σ ∈ S(H) is a P -weighted Qα,z-center for W then

σ :=

(∑

x

P (x)Qα,z(W (x)‖σ)

)1/α

σ

is a P -weighted PSD Tα,z-center for W .

34

(iv) If (α, z) satisfy the conditions of Proposition A.15 then σ is a P -weighted PSD Tα,z-centerfor W if and only if it is a solution of the fixed point equation

σ =∑

x∈X

P (x)(σ

1−α2z W (x)

αz σ

1−α2z

)z. (A.21)

Proof We have

χTα,z(W,P ) = inf

σ∈B(H)+

∑

x

P (x)Tα,z(W (x)‖σ)

=α

1− αTrW (P ) + inf

σ∈B(H)+

[Tr σ −

1

1− α

∑

x


]

=α


σ∈S(H)infλ>0

[λ−

1

1− αλ1−α

∑

x


].

Differentiating w.r.t. λ yields that the optimal λ is

λ =

(∑

x


)1/α

=∑

x

P (x)Qα,z(W (x)‖σ), (A.22)

with σ := λσ. Writing it back to the previous equation, we get

χTα,z(W,P ) =

α


σ∈S(H)

α

α− 1

(∑

x


)1/α

,

which is exactly (A.20). This proves (i), and (ii) and (iii) are clear from the above argument.Finally, (iv) is immediate from the above and Proposition A.15. �

Remark A.25 Note that (i)–(iii) of Proposition A.24 hold true for any pair of divergences T qα

and Qqα related as

T qα(‖σ) =

1

1− α(αTr + (1− α)Tr σ −Qq

α(‖σ)) ,

provided that Qqα is a non-negative divergence that satisfies the scaling property Qq

α(‖λσ) =λα−1Qq

α(‖σ), and that there exists a σ ∈ B(H)+ such that Qqα(W (x)‖σ) < +∞ for all x ∈

suppP .

The above Proposition extends Theorem 8 in [13], where the fixed point characterization (A.21)was obtained in the case z = α = 1/2, by a somewhat different proof than above. The existenceof a solution of the fixed point equation (A.21) was studied in [1, Theorem 6.1] for the casez = α = 1/2, and their proof extends without alteration for more general (α, z) pairs as below.

Proposition A.26 Let α ∈ (0, 1). If there exist positive numbers λ, η, such that W (x) ∈[λI, ηI] := {A ∈ B(H)+ : λI ≤ A ≤ ηI} for all x ∈ suppP then the fixed point equation(A.21) has a solution, which is also in [λI, ηI].

Proof It is easy to see that the map on the RHS of (A.21) maps the compact convex set [λI, ηI]into itself, and hence, by Brouwer’s fixed point theorem, it has a fixed point. �

Remark A.27 The case of the Petz-type Tsallis divergences (z = 1) is again special in thatthe weighted PSD Tα,1 center and radius can be given explicitly. Indeed, by Remark A.17 andProposition A.24, the unique Tα,1 center is given by

σα,1 =

(∑

x

P (x)Qα,1(W (x)‖σα,1)

)1/αω(α)

Trω(α)= ω(α) =

(∑

x

P (x)W (x)α

)1/α

,

35

and

χTα,z(W,P ) =

α

α− 1

TrW (P )− Tr

(∑

x

P (x)W (x)α

)1/α .

Let us now explain how the above considerations are related to recent research in matrixanalysis. First, we recall that if 1, . . . r are points in a metric space (M,d), and (pi)

ri=1 is a

probability distribution then the p-weighted Frechet variance is infσ∈M

∑i pid

2(i, σ), and if σattains this infimum then it is called a p-weighted Frechet mean of 1, . . . r. These are analogousto our notions of weighted divergence radius and center. Note, however, that we only defineddivergences on PSD operators, which is more restrictive than the setting of the Frechet means,while it is also more general in the sense that a divergence does not need to be the squareof a metric. (Although some properties of the relative entropy are reminiscent to those of asquared Euclidean distance.) In particular, if f is an injective continuous function on an intervalM ⊆ R, then d(x, y) := |f(x)−f(y)| is a metric onM , and a straightforward computation showsthat for any 1, . . . , r ∈ M , and any probability distribution (pi)

ri=1, there is a unique Frechet

mean, which is exactly the generalized f -mean f−1 (∑r

i=1 pif(i)), and the Frechet variance is∑i pif(i)

2 − (∑

i pif(i))2.

Of particular importance to us are the cases M := (0,+∞), f(t) := tα, which yields the

α-power mean (∑

i piαi )

1/α, and f(t) := log t on the same set, which yields the geometric mean∏i

pi

i . These special cases are closely related to each other, as the geometric mean can berecovered from the α-power mean in the limit α ց 0,

limαց0

(∑

i

piαi

)1/α

=∏

i

pi

i ,

while the α-power mean is the unique solution of the fixed point equation

σ =∑

i

piGα(i‖σ), (A.23)

where Gα(i‖σ) := αi σ1−αi is the α-geometric mean of i and σ.

There are various ways to extend the α-geometric mean to operators, and these are closelyrelated to different definitions of Renyi divergences. Indeed, for every (α, z), the quantity

Gα,z(‖σ) :=(σ

1−α2z

αz σ

1−α2z

)z

is well-defined if α ∈ (0, 1), or α > 1 and 0 ≤ σ0, and it reduces to the α-geometric meanfor scalars for every z > 0. This is related to the (α, z)-Renyi divergences via Qα,z(‖σ) =TrGα,z(‖σ). The fixed point equation (A.21) is an exact analogue of (A.23), and hence wemay call the solution of (A.21) the P -weighted (α, z)-power mean of (W (x))x∈X , and denote itas Pα,z(W,P ), provided that it exists and is unique. (From here we switch to the gcq channelformalism for better comparison with the preceding part of the paper, but this is equivalentto considering subsets of PSD operators and probability distributions on them.) That is, theP -weighted (α, z)-power mean is nothing but the P -weighted Tα,z-center for W (or, in otherterminology, the P ◦W−1-weighted Tα,z-center of ranW ). In general, there is no explicit formulafor it; the case z = 1, discussed in Remark A.27, is an exception, and the resulting formula isprobably the most straightforward extension of the α-power mean from numbers to operators.Following the ideas of [46], a family of multivariate geometric means for operators may be definedas G(W,P ) := limαց0 Gα,z(α)(W,P ), where z(α) is some well-behaved funcion of α. It is aninteresting question if this limit always exists, how it depends on the choice of z(α), and how itrelates to other notions of multivariate geometric means.Probably the most studied notion of α-geometric mean for a pair of operators is the Kubo-Ando

geometric mean [44]

Gmaxα (‖σ) := σ1/2

(σ−1/2σ−1/2

)ασ1/2,

36

introduced by Kubo and Ando for α ∈ [0, 1]. This is also a special instance of a maximal f -divergence [37, 48] (with a minus sign for α ∈ (0, 1)), and can be extended to α > 1. It gives riseto the maximal Renyi divergence [67] Dmax

α and the maximal Tsallis divergence Tmaxα via

Dmaxα (‖σ) :=

1

α− 1log

1

Tr Qmax

α (‖σ),

Tmaxα (‖σ) :=

1

1− α(αTr + (1 − α)Tr σ −Qmax

α (‖σ)) ,

where Qmaxα (‖σ) := TrGmax

α (‖σ) for positive definite and σ, and it is extended to general PSDoperators via the smoothing procedure in (III.20). (In fact, Tmax

α is also a maximal f -divergence,corresponding to the convex function f(t) = 1

1−α (αt + (1 − α) − tα), which is operator convex

if and only if α ∈ [0, 2].) The positive version of Dmaxα can be defined again as Dmax

α (‖σ) :=

Dmaxα

(

Tr

∥∥ σTrσ

). Lim and Palfia [46] showed that when allW (x) are positive definite, the fixed

point equation

σ =∑

x

P (x)Gmaxα (W (x)‖σ)

has a unique solution, which we will call the max α-power mean and denote it as Pmaxα (W,P ).

Moreover,

limαց0

Pmaxα (W,P ) = GK(W,P ),

where the latter is the Karcher mean, which, in our terminology, is nothing else but the P ◦W−1-weighted PSD (D∗

∞)2-center of ranW , i.e.,

GK(W,P ) = argminσ∈B(H)+

∑

x

P (x)(D∗∞(W (x)‖σ))2.

By Remark A.25, (i)–(iii) of Proposition A.24 hold true for the pair Tmaxα and Qmax

α . However, ithas been shown in [59] that (iv) of Proposition A.24 is no longer true in this case. More precisely,an example is shown in [59] for W and P for which the max 1/2 power mean does not coincidewith the P -weighted Tmax

1/2 -center for W .

4. Positive Renyi divergences

In this section we establish the non-negativity of Dα,z and Tα,z for all α ∈ (0,+∞) \ {1} andz ∈ (0,+∞].

Proposition A.28 For every , σ ∈ B(H)+, and every z ∈ (0,+∞], we have

Qα,z(‖σ) ≤ (Tr )α(Tr σ)1−α ≤ αTr + (1− α)Tr σ, α ∈ (0, 1), (A.24)

Qα,z(‖σ) ≥ (Tr )α(Tr σ)1−α ≥ αTr + (1− α)Tr σ, α > 1, (A.25)

or equivivalently, Dα,z(‖σ) ≥ 0 and Tα,z(‖σ) ≥ 0 for all α ∈ (0,+∞) \ {1} and z ∈ (0,+∞].

Proof Note that the bounds are independent of z; in particular, it is enough to prove themfor all finite z, and the case z = +∞ follows by taking the limit z → +∞. Note also that thesecond inequalities in (A.24)–(A.25) follow by the trivial identity xαy1−α = y(x/y)α, and lowerbounding the convex function t 7→ s(α)tα on [0,+∞) by its tangent line at 1.To prove the first inequalities, we may assume w.l.o.g. that and σ are positive definite, due

to (III.20). Assume first that and σ are both diagonal in the same orthonormal basis withdiagonal elements r1, . . . , rd and s1, . . . , sd, respectively, where d := dimH. Concavity of t 7→ tα

for α ∈ (0, 1) yields

(Tr )α

(Tr σ)α=

(d∑

i=1

siTrσ

·risi

)α

≥d∑

i=1

siTrσ

·

(risi

)α

= (Tr σ)−1d∑

i=1

rαi s1−αi , (A.26)

37

which is exactly the first inequality in (A.24). When α > 1, the inequality in (A.26) is in theopposite direction, which yields the first inequality in (A.25).For the general case, we follow the approach at the beginning of Section 3 in [8]. Let s1(X) ≥

. . . ≥ sd(X) denote the decreasingly ordered singular values of an operator X . By the Gelfand-Naimark majorization theorem (see, e.g. [35, Theorem 4.3.4]), we have

k∏

j=1

sij

(σ

1−α2z

)sd+1−ij

(

α2z

)≤

k∏

i=1

si

(σ

1−α2z

α2z

)≤

k∏

i=1

si

(σ

1−α2z

)si(

α2z

)

for every 1 ≤ k ≤ d and 1 ≤ i1 < . . . < ij ≤ d. Taking it to the power 2z yields

k∏

j=1

sij (σ)1−α

sd+1−ij ()α ≤

k∏

i=1

si

((

α2z σ

1−αz

α2z

)z)≤

k∏

i=1

si (σ)1−α

si ()α.

Using that weak log-majorization implies weak majorization (see, e.g. [35, Proposition 4.1.6]),we obtain

k∑

j=1

sij (σ)1−α

sd+1−ij ()α ≤

k∑

i=1

si

((

α2z σ

1−αz

α2z

)z)≤

k∑

i=1

si (σ)1−α

si ()α

for every 1 ≤ k ≤ d. Taking now k = d, we see that the middle sum is equal to Qα,z(‖σ), the

rightmost sum can be upper bounded by(∑d

i=1 si())α (∑d

i=1 si(σ))1−α

= (Tr )α (Tr σ)1−α

when α ∈ (0, 1), by the same argument as in (A.26), and similarly, the leftmost sum can be lower

bounded by (Tr )α(Tr σ)

1−αwhen α > 1. �

Remark A.29 The first inequalities in (A.24)–(A.25) for (α, z) ∈ K2 ∪ K4 ∪ K5 ∪ K7 areimmediate from the monotonicity under taking the trace of both and σ, according to LemmaIII.5.

We can also establish strict positivity of Dα,z and Tα,z, except for the region

K0 : 0 < α < 1, z < min{α, 1− α}.

In the proof we also give an alternative proof for the first inequalities in (A.24)–(A.25) when(α, z) /∈ K0.

Proposition A.30 The second inequalities in (A.24)–(A.25) hold as equalities if and only ifTr = Trσ. If (α, z) ∈ ((0,+∞)× (0,+∞]) \K0 then the second inequalities in (A.24)–(A.25)hold as equalities if and only if /Tr = σ/Trσ.

Proof The assertion about the second inequalities is trivial from the strict convexity of t 7→s(α)tα when α ∈ (0,+∞) \ {1}. Hence, for the rest we analyze the first inequalities.It has been pointed out, e.g., in [47, Proposition 1], that the Araki-Lieb-Thirring inequality

[4, 45] implies the monotonicity

Qα,z1(‖σ) = Tr(

α2z1 σ

1−αz1

α2z1

) z1z2

z2≤ Tr

(

α2z2 σ

1−αz2

α2z2

)z2= Qα,z2(‖σ), z2 ≤ z1.

(A.27)

Hence, for α > 1 we have

Dα,z

(

Tr

∥∥∥ σ

Trσ

)= Dα,z(‖σ) ≥ Dα,+∞(‖σ) = Dα,+∞

(

Tr

∥∥∥ σ

Trσ

)≥ 0,

with equality if and only if /Tr = σ/Trσ, according to [50, Proposition 3.22]. This is exactlythe second inequality in (A.25) with the equality condition. If α ∈ (0, 1/2] and z ≥ α then

Dα,z

(

Tr

∥∥∥ σ

Trσ

)= Dα,z(‖σ) ≥ Dα,α(‖σ) = Dα,α

(

Tr

∥∥∥ σ

Trσ

)≥ 0,

38

with equality if and only if /Tr = σ/Trσ, according to [10, Theorem 5] (see also [50, Propo-sition 3.22]). If α ∈ [1/2, 1) and z ≥ 1− α then

Dα,z

(

Tr

∥∥∥ σ

Tr σ

)= Dα,z(‖σ) ≥ Dα,1−α(‖σ) = Dα,1−α

(

Tr

∥∥∥ σ

Trσ

)

=α

1− αD1−α,1−α

(

Tr

∥∥∥ σ

Tr σ

)≥ 0,

with equality if and only if /Tr = σ/Trσ, by the same argument as above. �

Corollary A.31 Let , σ ∈ B(H)+ and α ∈ (0,+∞), z ∈ (0,+∞]. Then

Dα,z(‖σ) ≥ logTr − logTrσ, (A.28)

Dα,z(‖σ) ≥ 0, (A.29)

Tα,z(‖σ) ≥ 0. (A.30)

If, moreover, (α, z) /∈ K0 then equality holds in (A.28) or (A.29) if and only if = λσ for someλ ∈ (0,+∞), and equality holds in (A.30) if and only if = σ.

Proof Immediate from Proposition A.30. �

5. Further properties of the Renyi divergence radii

Lemma A.32 For any σ ∈ S(H)++ and any α ∈ (0,+∞) \ {1}, z ∈ (0,+∞), the mapx 7→ Dα,z(W (x)‖σ) is bounded on X for any cq channel W , and hence the map P 7→∑

x∈X P (x)Dα,z(W (x)‖σ) is continuous on Pf(X ) in the variational norm.

Proof We prove the case α > 1; the case α ∈ (0, 1) follows the same way. If σ ∈ S(H)++ then

σ ≥ λmin(σ)I, and hence σ1−αz ≤ λmin(σ)

1−αz I. Using the monotonicity of A 7→ TrAz on B(H)+

for z > 0, we get

Dα,z(W (x)‖σ) ≤1

α− 1logTr

(W (x)

α2z λmin(σ)

1−αz IW (x)

α2z

)z

= − logλmin(σ) +1

α− 1logTrW (x)α ≤ − logλmin(σ),

proving the boundedness, and the assertion on continuity is immediate from this. �

Corollary A.33 The map P 7→ χ∗α(W,P ) is concave and upper semi-continuous on Pf(X ) in

the variational norm. In particular, if P ∈ Pf (X ) and Pn ∈ Pf (X ), n ∈ N, are such thatlimn→+∞ ‖Pn − P‖1 = 0 then

lim supn→+∞

χ∗α(W,Pn) ≤ χ∗

α(W,P ). (A.31)

Proof A combination of (III.25) and Lemma A.32 shows that χ∗α, as the infimum of continuous

affine functions, is upper semi-continuous and concave. Upper semi-continuity imples (A.31). �

Appendix B: Optimality of the strong converse exponents

Proof of Lemma IV.3: Let Cn = (En,Dn) be a code for n uses of the channel. Define theclassical-quantum states

Rn :=1

|Cn|

|Cn|∑

k=1

|k〉〈k| ⊗W⊗n(En(k)), Sn :=1

|Cn|

|Cn|∑

k=1

|k〉〈k| ⊗ σ⊗n,

39

and the POVM element

Tn :=

|Cn|∑

k=1

|k〉〈k| ⊗ Dn(k),

where (|k〉〈k|)|Cn|k=1 is a set of orthogonal rank 1 projections in some Hilbert space, and σ ∈ S(H)

is an arbitrary state. Then we have

TrRnTn =1

|Cn|

|Cn|∑

k=1

TrW⊗n(En(k))Dn(k) = Ps(W⊗n, Cn),

TrSnTn =1

|Cn|

|Cn|∑

k=1

Tr σ⊗nDn(k) =1

|Cn|.

For any σ ∈ S(H⊗n) such that W⊗n(En(k))0 ≤ σ0 for all k, and for all α ∈ (1,+∞), we get

Ps(W⊗n, Cn)

α

(1

|Cn|

)1−α

= (TrRnTn)α(TrSnTn)

1−α ≤ Q∗α (Rn‖Sn)

=1

|Cn|

|Cn|∑

k=1

Q∗α

(W⊗n(En(k))‖σ

⊗n)

=1

|Cn|

|Cn|∑

k=1

∏

x∈X

Q∗α(W (x)‖σ)nPEn(k)(x)

=1

|Cn|

|Cn|∑

k=1

exp

(n(α− 1)

∑

x∈X

PEn(k)(x)D∗α(W (x)‖σ)

)(B.1)

where the inequality is due to the monotonicity of the sandwiched Renyi divergence for α > 1.Note that the inequality between the first and the last expressions above holds trivially whenW⊗n(En(k))0 � σ0 for some k.A simple rearrangement yields that for any α ∈ (1,+∞) and any σ ∈ S(H),

1

nlogPs(W

⊗n, Cn) ≤ −α− 1

α

[1

nlog |Cn| − max

1≤k≤|Cn|

∑

x∈X

PEn(k)(x)D∗α(W (x)‖σ)

], (B.2)

where we used that the logarithm function is monotone increasing, and hence quasi-convex, toupper bound the logarithm of the convex combination in (B.1) by the logarithm of the maximumof the individual terms. A completely similar argument as above, using the monotonicity of themax-relative entropy D∗

∞, yields that (B.2) also holds for α = +∞, with the natural definitionα−1α

∣∣α=+∞

:= 1. Finally, the inequality in (B.2) holds trivially for α = 1, since then the RHS iszero.When σ ∈ S(H)++, we may use the continuity bound in the proof of Lemma A.32 to obtain

1

nlogPs(W

⊗n, Cn)

≤ −α− 1

α

[1

nlog |Cn| −

∑

x∈X

P (x)D∗α(W (x)‖σ) + max

1≤k≤|Cn|

∥∥PEn(k) − P∥∥1log λmin(σ)

]. (B.3)

Now, if (Cn)n∈N is a sequence of codes as in the definition (IV.38) of sc(W,R, P )∗ then takingthe limsup over n in (B.3), then the infimum over σ ∈ S(H)++, and finally the infimum overα > 1, yields

lim supn→+∞

1

nlogPs(W

⊗n, Cn) ≤ − supα>1

α− 1

α[R− χ∗

α(W,P )] , (B.4)

which is what we wanted to prove.

40

Remark B.1 When Cn is a constant composition code with composition Pn := PEn(k), k =1, . . . , |Cn|, the inequality in (B.2) yields, after taking the infimum over σ ∈ S(H) and α > 1,

1

nlogPs(W

⊗n, Cn) ≤ − supα>1

α− 1

α

[1

nlog |Cn| − χ∗

α(W,Pn)

]. (B.5)

Remark B.2 A similar bound to the one in (B.1) was used by Sheverdyaev [65] to bound thesuccess probability of feedback-assisted coding for classical channels. The inequality in [65] wasderived using Holder’s inequality, instead of the monotonicity argument above, which is adaptedfrom Nagaoka’s proof [52]. We remark that for classical Renyi divergences, monotonicity understochastic maps follows immediately from the convexity properties of the power functions, orequivalently, from a special case of Holder’s inequality, and vice versa, this case of Holder’sinequality can be viewed as a special case of the monotonicity of the Renyi divergences, hence thetwo approaches to obtain the upper bound on the success probability are the same on the technicallevel. On the other hand, proving monotonicity of the quantum Renyi divergences requires moreinvolved techniques, and it is the monotonicity-based approach of Nagaoka that has proved to befruitful in quantum information theory.

Proof of Lemma IV.10: Let Cn be a code for n uses of the channel with cost γ(Cn) < c. For anyσ ∈ S(H), and any message k = 1, . . . , |Cn|,

∑

x∈X

PEn(k)(x)D∗α(W (x)‖σ) ≤ sup

P∈Pf,γ<c(X )

∑

x∈X

P (x)D∗α(W (x)‖σ),

and hence the upper bound in (B.2) can be continued as

1

nlogPs(W

⊗n, Cn) ≤ −α− 1

α

[1

nlog |Cn| − sup

P∈Pf,γ<c(X )

∑

x∈X

P (x)D∗α(W (x)‖σ)

]. (B.6)

Let (Cn)n∈N be a sequence of codes with rate at least R, and lim supn γ(Cn) < c. Thenγ(Cn) < c for all large enough n, and taking the limsup over n in (B.6), and then the infima overσ ∈ S(H) and α ∈ [1,+∞], yields

lim supn→+∞

1

nlogPs(W

⊗n, Cn) ≤ − supα∈[1,+∞]

α− 1

α

[R− inf

σ∈S(H)sup

P∈Pf,γ<c(X )

∑

x∈X


].

(B.7)

Let hα(P, σ) :=∑

x∈X P (x)D∗α(W (x)‖σ). Then h is clearly concave in its first variable, and by

Lemmas III.7 and III.8, it is convex and lower semi-continuous in its second variable. Thus, byLemma II.3,

infσ∈S(H)

supP∈Pf,γ<c(X )

∑

x∈X

P (x)D∗α(W (x)‖σ) = sup

P∈Pf,γ<c(X )

infσ∈S(H)

∑

x∈X


= supP∈Pf,γ<c(X )

χ∗α(W,P ),

and the bound in (B.7) can be rewritten as

lim infn→+∞

−1

nlogPs(W

⊗n, Cn) ≥ supα∈[1,+∞]

α− 1

α

[R− sup

P∈Pf,γ<c(X )

χ∗α(W,P )

].

Since this holds for every sequence of codes as above, the assertion follows.

Appendix C: Random coding exponent with constant composition

Below we give a slightly different proof of the constant composition random coding bound firstproved in [17]. We start with the following random coding bound, given in [30, 33, 55].

41

Lemma C.1 Let W : X → S(H) be a classical-quantum channel, n ∈ N, R > 0, Mn :=⌈enR

⌉, and Qn ∈ Pf (Xn). For every x = (x1, . . . , xMn

) ∈ (Xn)Mn , there exists a code Cn,x =(En,x,Dn,x) for W

⊗n such that En,x(k) = xk, k ∈ [Mn], and

EQ⊗Mnn

Pe(W⊗n, Cn,x) ≤

∑

x∈Xn

Qn(x)TrW⊗n(x){W⊗n(x)− enRW⊗n(Qn) ≤ 0}

+ enR∑

x∈Xn

Qn(x)TrW⊗n(Qn){W

⊗n(x)− enRW⊗n(Qn) > 0}

≤ enR(1−α)∑

x∈Xn

Qn(x)TrW⊗n(x)αW⊗n(Qn)

1−α. (C.1)

In particular, there exists an x ∈ suppQn such that Pe(W⊗n, Cn,x) is upper bounded by the RHS

of (C.1).

From this, we can obtain the following:

Proposition C.2 Let W : X → S(H) be a classical-quantum channel, and let R > 0. For everyn ∈ N, and every type Pn ∈ Pn(X ), there exists a code Cn of constant composition Pn with rate1n log |Cn| ≥ R such that

1

nlogPe(W

⊗n, Cn) ≤ − sup0≤α≤1

(α− 1)

[R−

∑

x∈X

Pn(x)Dα(W (x)‖W (Pn)) + | suppPn|log(n+ 1)

n

].

(C.2)

Proof Let Xn := XnPn

⊆ Xn be the set of sequences with type Pn. Choosing Qn := 1|Xn|

1Xnin

Lemma C.1, we get the existence of codes Cn with constant composition Pn such that

Pe(W⊗n, Cn) ≤ enR(1−α) TrW⊗n(x)αW⊗n(Qn)

1−α

for any x ∈ Xn, where we used that W⊗n(Qn) =1

|Xn|

∑y∈Xn

W⊗n(y) is permutation-invariant.

Now we use the well-known facts that |XnPn

| ≥ (n+1)−| suppPn|enH(Pn), and that for any y ∈ XnPn

,

P⊗nn (y) = e−nH(Pn), (see (II.9) and (II.10)), to obtain that

W⊗n(Qn) =1

|Xn|

∑

y∈Xn

W⊗n(y) ≤ (n+ 1)| suppPn|e−nH(Pn)∑

y∈Xn

W⊗n(y)

= (n+ 1)| suppPn|∑

y∈Xn

P⊗nn (y)W⊗n(y)

≤ (n+ 1)| suppPn|∑

y∈Xn

P⊗nn (y)W⊗n(y)

= (n+ 1)| suppPn|W (Pn)⊗n.

Using that t 7→ t1−α is operator monotone on R+ for α ∈ [0, 1], we get that

Pe(W⊗n, Cn) ≤ (n+ 1)(1−α)| suppPn|enR(1−α) TrW⊗n(x)α

(W (Pn)

⊗n)1−α

.

From this (C.2) follows by simple algebra. �

Corollary C.3 Let W : X → S(H) be a classical-quantum channel, let R > 0, and P ∈ Pf(X ).For any sequence Pn ∈ Pn(X ), n ∈ N, such that ∪n∈N suppPn is finite, and limn→+∞ ‖Pn − P‖1 =0, there exists a sequence of codes (Cn)n∈N, where all Cn are of constant composition Pn, andevery Cn has rate 1

n log |Cn| ≥ R, such that

lim supn→+∞

1

nlogPe(W

⊗n, Cn) ≤ − sup0≤α≤1

(α− 1)

[R−

∑

x∈X

P (x)Dα(W (x)‖W (P ))

]. (C.3)

If, moreover, R <∑

x∈X P (x)D(W (x)‖W (P )) then Pe(W⊗n, Cn) goes to zero exponentially fast.

42

Proof Proposition C.2 yields the existence of a sequence of codes (Cn)n∈N with all the desiredproperties, except maybe for the upper bound (C.3), such that

1

nlogPe(W

⊗n, Cn) ≤ −

[R−

∑

x∈X

Pn(x)Dα(W (x)‖W (Pn)) + | suppPn|log(n+ 1)

n

](C.4)

for every n ∈ N and α ∈ [0, 1]. Taking first the limsup in n, and then the infimum in α ∈ [0, 1],the assertion follows. �

A variant of Corollary C.3 was given by Csiszar and Korner in [21, Theorem 10.2] for classicalchannels, where the RHS is

− sup1/2<α≤1

α− 1

α[R− χα(W,P )] .

Note that α−1α gives a strictly better prefactor, while χα(W,P ) ≤

∑x∈X P (x)Dα(W (x)‖W (P )).

However, the Csiszar-Korner bound is optimal for high enough rates, and hence it would bedesirable to obtain an exact analogue of it for classical-quantum channels.

Constant composition exponents were obtained also for classical-quantum channels before; forinstance, the following was stated in [31]: Let X be a finite set, H be a finite-dimensional Hilbertspace, let P be a probability mass function on X , and R > 0. Then there exists a sequence ofcodes (Cn)n∈N such that

limn→+∞

1

nlog |Cn| = R,

and for any classical-quantum channel W : X → S(H),

lim infn→+∞

−1

nlogPe(W

⊗n, Cn) ≥ sup0≤α≤1

α− 1

2− α[R− Iα(W,P )] . (C.5)

While this is not sufficiently detailed in [31], the codes above can indeed be chosen to be ofconstant composition.Note that the bound in (C.5) is not as strong as the classical universal random coding exponent

given by Csiszar and Korner, as 0 < α < 2− α for all α < 1, and χα(W, p) ≥ Iα(W, p), with theinequality being strict in general.

Appendix D: Evaluation of the information spectrum quantity

For every n ∈ N, let Hn be a finite-dimensional Hilbert space, and let n, σn ∈ S(Hn). Let

ψ(α) :=

{lim supn→+∞

1n log Tr αnσ

1−αn , α ∈ (0, 1],

lim supn→+∞1n log Tr

(1/2n σ

1−αα

n 1/2n

)α, α ∈ (1,+∞).

The following statement is a central observation in the information spectrum method, and itsproof follows by standard arguments. We include a complete proof for the readers’ convenience.

Lemma D.1 In the above setting,

limn→+∞

Tr(n − enrσn)+ = 1

{for all r ∈ R, if ψ(1) < 0,

for r < ∂−ψ(1), if ψ(1) = 0.(D.1)

If 0n ≤ σ0n for all large enough n, then ψ(1) = 0, and

limn→+∞

Tr(n − enrσn)+ = 0 for all r > ∂+ψ(1). (D.2)

43

Proof Let Sn,r := {n − enrσn > 0} be the Neyman-Pearson test with parameter r. Then

0 ≤ Tr(n − enrσn)+ = Tr nSn,r − enr Tr σnSn,r = 1− αn(Sn,r)− enrβn(Sn,r),

with αn(Sn,r) := Tr n(I−Sn,r) and βn(Sn,r) := TrσnSn,r being the type I and the type II errorprobabilities corresponding to the test Sn,r, respectively. In particular,

0 ≤ enr TrσnSn,r ≤ Tr nSn,r. (D.3)

By Audenaert’s inequality [6, Theorem 1],

en(r) := αn(Sn,r) + enrβn(Sn,r) = Tr n(I − Sn,r) + enr TrσnSn,r ≤ enr(1−α)Tr αnσ1−αn

for all α ∈ [0, 1], and hence

lim supn→+∞

1

nlog en(r) ≤ − sup

0≤α≤1(α − 1){r − ψ(α)/(α − 1)}. (D.4)

Since limαր1 ψ(α)/(α − 1) = ∂−ψ(1) if ψ(1) = 0, and +∞ otherwise, we obtain that en(r) → 0as n → +∞ for any r ∈ R when ψ(1) < 0 and for r < ∂−ψ(1) when ψ(1) = 0. Usingthat max{αn(Sn,r), e

nrβn(Sn,r)} ≤ en(r), we see that in these cases, also enrβn(Sn,r) → 0 and1− αn(Sn,r) → 1, and therefore Tr(n − enrσn)+ → 1. This proves (D.1).Next, we prove (D.2). By the monotonicity of the sandwiched Renyi divergences, we have, for

every 0 ≤ Tn ≤ I, and every α > 1,

Tr(1/2n σ

1−αα

n 1/2n

)α≥ (Tr nTn)

α(Tr σnTn)

1−α,

and therefore

lim supn→+∞

1

nlogTr nTn ≤ inf

α>1

α− 1

α

{lim supn→+∞

1

nlogTr σnTn +

ψ(α)

α− 1

}. (D.5)

Now, let Tn := Sn,r with some r. Then, by (D.4), we have

lim supn→+∞

1

nlogTrσnTn ≤ − sup

0≤α≤1{αr − ψ(α)} ≤ −r.

Hence, if r > ∂+ψ(1), then the RHS of (D.5) is negative, and thus Tr nSn,r → 0 as n → +∞.Using (D.3), we get that also enr Tr σnSn,r → 0, and hence Tr(n− enrσn)+ → 0, as required. �

Corollary D.2 For every n ∈ N, let Pn be an n-type and x(n) ∈ Xn be of type Pn. Assume that(Pn)n∈N converges to some P ∈ Pf (X ) and ∪n∈N suppPn is finite. If V (x)0 ≤ W (x)0 for allx ∈ suppPn and all n large enough, then

limn→+∞

Tr(V ⊗n(x(n))− enrW⊗n(x(n))

)+=

{1, r <

∑x∈X P (x)D(V (x)‖W (x)),

0, r >∑

x∈X P (x)D(V (x)‖W (x)).

Proof We use Lemma D.1 with n := V ⊗n(x(n)), σn :=W⊗n(x(n)). Then we have

ψn(α) :=1

nlogTr αnσ

1−αn =

∑

x∈X

Pn(x) log TrV (x)αW (x)1−α,

for α ∈ (0, 1], and

ψn(α) :=1

nlogTr

(1/2n σ

1−αα

n 1/2n

)α=∑

x∈X

Pn(x) log Tr(V (x)1/2W (x)

1−αα V (x)1/2

)α,

for α > 1. Hence, ψ(α) = limn→+∞ ψn(α) exists as a limit, and

ψ(α) =

{∑x∈X P (x) log TrV (x)αW (x)1−α, α ∈ (0, 1],

∑x∈X P (x) log Tr

(V (x)1/2W (x)

1−αα V (x)1/2

)α, α > 1.

44

By assumption, ψ(1) = 0, and

∂−ψ(1) =∑

x∈X

P (x) limαր1

Dα(V (x)‖W (x)) =∑

x∈X

P (x)D(V (x)‖W (x),

∂+ψ(1) =∑

x∈X

P (x) limαց1

D∗α(V (x)‖W (x)) =

∑

x∈X

P (x)D(V (x)‖W (x).

Hence, the assertion follows immediately from Lemma D.1. �

ACKNOWLEDGMENTS

We are grateful to Fumio Hiai, Peter Vrana, Andreas Winter, Hao-Chung Heng, Min-HsiuHsieh, Marco Tomamichel, Jozsef Pitrik and Daniel Virosztek for discussions. This work waspartially funded by the JSPS grant no. JP16K00012 (TO), by the National Research, Develop-ment and Innovation Office of Hungary via the research grants K124152, KH129601, the Quan-tum Technology National Excellence Program, Project Nr. 2017-1.2.1-NKP-2017-00001, and bya Bolyai Janos Fellowship of the Hungarian Academy of Sciences (MM). We are grateful toanonymous referees for their comments that helped to improve the presentation of our results,and particularly indebted to one of the referees for pointing out that our main result for con-stant composition codes can also be used to determine the strong converse exponent with costconstraint.

[1] Martial Agueh and Guillaume Carlier. Barycenters in the Wasserstein space. SIAM Journal onMathematical Analysis, 43(2):904–924, 2011.

[2] Shun-ichi Amari and Hiroshi Nagaoka. Methods of information geometry, volume 191 of Translationsof Mathematical Monographs. American Mathematical Society, Providence, RI, 2000.

[3] Tsuyoshi Ando and Fumio Hiai. Operator log-convex functions and operator means. MathematischeAnnalen, 350:611–630, 2011.

[4] H. Araki. On an inequality of Lieb and Thirring. Letters in Mathematical Physics, 19:167–170,1990.

[5] Suguru Arimoto. On the converse to the coding theorem for discrete memoryless channels. IEEETransactions on Information Theory, 19:357–359, May 1973.

[6] K.M.R. Audenaert, J. Calsamiglia, R. Mu noz Tapia, E. Bagan, Ll. Masanes, A. Acin, and F. Ver-straete. Discriminating states: The quantum Chernoff bound. Physical Review Letters, 98:160501,2007. arXiv:quant-ph/0610027.

[7] Koenraad M. R. Audenaert and Nilanjana Datta. α-z-relative Renyi entropies. J. Math. Phys.,56:022202, 2015. arXiv:1310.7178.

[8] Koenraad M. R. Audenaert and Fumio Hiai. Reciprocal Lie-Trotter formula. Linear and MultilinearAlgebra, 64(6):1220–1235, 2016.

[9] U. Augustin. Noisy channels. Habilitation thesis, Univ. of Erlangen-Numberg, 1978.[10] Salman Beigi. Sandwiched Renyi divergence satisfies data processing inequality. Journal of Mathe-

matical Physics, 54(12):122202, December 2013. arXiv:1306.5920.[11] Rajendra Bhatia. Matrix Analysis. Number 169 in Graduate Texts in Mathematics. Springer, 1997.[12] Rajendra Bhatia, Stephane Gaubert, and Tanvi Jain. Matrix versions of the Hellinger distance.

Letters in Mathematical Physics, 2019.[13] Rajendra Bhatia, Tanvi Jain, and Yongdo Lim. On the Bures-Wasserstein distance between positive

definite matrices. Expositiones Mathematicae, 2018.[14] M. V. Burnashev and A. S. Holevo. On the reliability function for a quantum communication

channel. Problems of Information Transmission, 34(2):97–107, 1998.[15] E. A. Carlen, R. L. Frank, and E. H. Lieb. Some operator and trace function convexity theorems.

Linear Algebra Appl., 490:174–185, 2016.[16] Hao-Chung Cheng, Li Gao, and Min-Hsiu Hsieh. Properties of noncommutative Renyi and Augustin

information. arXiv:1811.04218, 2018.[17] Hao-Chung Cheng, Eric P. Hanson, Nilanjana Datta, and Min-Hsiu Hsieh. Duality between source

coding with quantum side information and c-q channel coding. arXiv:1809.11143, 2018.[18] Hao-Chung Cheng, Min-Hsiu Hsieh, and Marco Tomamichel. Quantum sphere-packing bounds

with polynomial prefactors. IEEE Transactions on Information Theory, 65(5):2872–2898, 2019.arXiv:1704.05703v2.

45

[19] Imre Csiszar. Generalized cutoff rates and Renyi’s information measures. IEEE Transactions onInformation Theory, 41(1):26–34, January 1995.

[20] Imre Csiszar and Janos Korner. Information theory: coding theorems for discrete memoryless chan-nels. Akademiai Kiado, Budapest, 1981.

[21] Imre Csiszar and Janos Korner. Information theory: coding theorems for discrete memoryless chan-nels, 2nd edition. Cambridge University Press, 2011.

[22] Marco Dalai and Andreas Winter. Constant compositions in the sphere packing bound forclassical-quantum channels. IEEE Transactions on Information Theory, 63(9):5603 – 5617, 2017.arXiv:1509.00715.

[23] Nilanjana Datta. Min- and max-relative entropies and a new entanglement monotone. IEEE Trans-actions on Information Theory, 55(6):2816–2826, 2009.

[24] G. Dueck and J. Korner. Reliability function of a discrete memoryless channel at rates abovecapacity. IEEE Transactions on Information Theory, 25:82–85, 1979.

[25] Ky Fan. Minimax theorems. Proc Natl Acad Sci USA., 39(1):42–47, 1953.[26] Balint Farkas and Szilard Revesz. Potential theoretic approach to rendezvous numbers. Monatshefte

fur Mathematik, 148(4):309–331, 2006.[27] Rupert L. Frank and Elliott H. Lieb. Monotonicity of a relative Renyi entropy. Journal of Mathe-

matical Physics, 54(12):122201, December 2013. arXiv:1306.5358.[28] E. A. Haroutunian. Bounds to the error probability exponent for semicontinuous memoryless chan-

nels. Prob. Peredach. Inform., 4:37–48, 1968. Problems Inform. Transmission, 4:4 (1968), 29?39.[29] Masahito Hayashi. Optimal sequence of POVM’s in the sense of Stein’s lemma in quantum hypoth-

esis testing. J. Phys. A: Math. Gen., 35:10759–10773, 2002.[30] Masahito Hayashi. Error exponent in asymmetric quantum hypothesis testing and its application to

classical-quantum channel coding. Physical Review A, 76(6):062301, December 2007. arXiv:quant-ph/0611013.

[31] Masahito Hayashi. Universal coding for classical-quantum channel. Commun. Math. Phys.,289(3):1087–1098, May 2009.

[32] Masahito Hayashi. Quantum Information Theory: Mathematical Foundation, 2nd ed. GraduateTexts in Physics. Springer, 2017.

[33] Masahito Hayashi and Hiroshi Nagaoka. General formulas for capacity of classical-quantum channels.IEEE Transactions on Information Theory, 49(7):1753–1768, July 2003. arXiv:quant-ph/0206186.

[34] Masahito Hayashi and Marco Tomamichel. Correlation detection and an operational interpretationof the Renyi mutual information. Journal of Mathematical Physics, 57:102201, 2016.

[35] F. Hiai. Matrix analysis: Matrix monotone functions, matrix means, and majorization. Interdisci-plinary Information Sciences, 16:139–248, 2010.

[36] F. Hiai. Concavity of certain matrix trace and norm functions. Linear Algebra Appl., 439:1568–1589,2013.

[37] F. Hiai and M. Mosonyi. Different quantum f -divergences and the reversibility of quantum opera-tions. Rev. Math. Phys., 29, 2017.

[38] Fumio Hiai and Denes Petz. The Golden-Thompson trace inequality is complemented. LinearAlgebra Appl., 181:153–185, 1993.

[39] Alexander S. Holevo. The capacity of the quantum channel with general signal states. IEEETransactions on Information Theory, 44(1):269–273, January 1998.

[40] V. Jaksic, Y. Ogata, Y. Pautrat, and C.-A. Pillet. Entropic fluctuations in quantum statisticalmechanics. an introduction. In Quantum Theory from Small to Large Scales, August 2010, volume 95of Lecture Notes of the Les Houches Summer School. Oxford University Press, 2012.

[41] Hellmuth Kneser. Sur un teoreme fondamental de la theorie des jeux. C. R. Acad. Sci. Paris,234:2418–2420, 1952.

[42] Robert Koenig and Stephanie Wehner. A strong converse for classical channel coding using entangledinputs. Physical Review Letters, 103(7):070504, August 2009. arXiv:0903.2838.

[43] Robert Konig, Renato Renner, and Christian Schaffner. The operational meaning of min- andmax-entropy. IEEE Transactions on Information Theory, 55(9):4337–4347, 2009.

[44] F. Kubo and T. Ando. Means of positive linear operators. Math. Ann., 246:205–224, 1980.[45] E.H. Lieb and W. Thirring. Studies in mathematical physics. University Press, Princeton, 1976.[46] Yongdo Lim and Miklos Palfia. Matrix power means and the Karcher mean. Journal of Functional

Analysis, 262(4):1498–1514, 2012.[47] Mingyan Simon Lin and Marco Tomamichel. Investigating properties of a family of quantum renyi

divergences. Quantum Information Processing, 14(4):1501–1512, 2015.[48] K. Matsumoto. A new quantum version of f -divergence. In Nagoya Winter Workshop 2015: Reality

and Measurement in Algebraic Quantum Theory, pages 229–273, 2018. arXiv:1311.4722.[49] Milan Mosonyi and Fumio Hiai. On the quantum Renyi relative entropies and related capacity

formulas. IEEE Transactions on Information Theory, 57(4):2474–2487, April 2011.[50] Milan Mosonyi and Tomohiro Ogawa. Strong converse exponent for classical-quantum channel

coding. Communications in Mathematical Physics, 355(1):373–426, 2017.

46

[51] Martin Muller-Lennert, Frederic Dupuis, Oleg Szehr, Serge Fehr, and Marco Tomamichel. Onquantum Renyi entropies: A new generalization and some properties. Journal of MathematicalPhysics, 54(12):122203, December 2013. arXiv:1306.3142.

[52] Hiroshi Nagaoka. Strong converse theorems in quantum information theory. Proceedings of ERATOWorkshop on Quantum Information Science, page 33, 2001. Also appeared in Asymptotic Theoryof Quantum Statistical Inference, ed. M. Hayashi, World Scientific, 2005.

[53] Baris Nakiboglu. The Renyi capacity and center. arXiv:1608.02424, 2016.[54] Baris Nakiboglu. The Augustin capacity and center. Problems of Information Transmission, 55:299–

342, 2019. arXiv:1803.07937.[55] Tomohiro Ogawa. An information-spectrum approach to hash properties for quantum states and

relation to channel coding. In The 28th Symposium on Information Theory and its Applications(SITA2015), November 2015.

[56] Tomohiro Ogawa and Hiroshi Nagaoka. Strong converse to the quantum channel coding theo-rem. IEEE Transactions on Information Theory, 45(7):2486–2489, November 1999. arXiv:quant-ph/9808063.

[57] Masanori Ohya, Denes Petz, and Noboru Watanabe. On capacities of quantum channels. Probabilityand Mathematical Statistics, 17:179–196, 1997.

[58] Denes Petz. Quasi-entropies for finite quantum systems. Reports in Mathematical Physics, 23:57–65,1986.

[59] Jozsef Pitrik and Daniel Virosztek. Quantum Hellinger distances revisited. arXiv:1903.10455, 2019.[60] Renato Renner. Security of Quantum Key Distribution. PhD thesis, Swiss Federal Institute of

Technology Zurich, 2005. Diss. ETH No. 16242.[61] Alfred Renyi. On measures of entropy and information. In Proc. 4th Berkeley Sym-

pos. Math. Statist. and Prob., volume I, pages 547–561. Univ. California Press, Berkeley, California,1961.

[62] Benjamin Schumacher and Michael Westmoreland. Sending classical information via noisy quantumchannels. Physical Review A, 56(1):131–138, July 1997.

[63] Benjamin Schumacher and Michael D. Westmoreland. Optimal signal ensembles. Physical ReviewA, 63:022308, 2001.

[64] C.E. Shannon. A mathematical theory of communication. The Bell System Technical Journal,27:379–423, 623/656, 1948.

[65] A. Yu. Sheverdyaev. Lower bound for error probability in a discrete memoryless channel withfeedback. Probl. Peredachi Inf., 18(4), 1982.

[66] R. Sibson. Information radius. Z. Wahrscheinlichkeitsth. Verw. Gebiete, 14:149–161, 1969.[67] M. Tomamichel. Quantum Information Processing with Finite Resources, volume 5 of Mathematical

Foundations, SpringerBriefs in Math. Phys. Springer, 2016.[68] Marco Tomamichel and Vincent Y.F. Tan. Second-order asymptotics for the classical capacity of

image-additive quantum channels. Communications in Mathematical Physics, 338:103–137, 2015.[69] A. Uhlmann. Endlich dimensionale dichtmatrizen, ii. Wiss. Z. Karl-Marx-University Leipzig, 22:139–

177, 1973.[70] H. Umegaki. Conditional expectation in an operator algebra. Kodai Math. Sem. Rep., 14:59–85,

1962.[71] Mark M. Wilde, Andreas Winter, and Dong Yang. Strong converse for the classical capacity of

entanglement-breaking and Hadamard channels via a sandwiched Renyi relative entropy. Commu-nications in Mathematical Physics, 331(2):593–622, October 2014. arXiv:1306.1586.

[72] Andreas Winter. Coding theorem and strong converse for quantum channels. IEEE Transactionson Information Theory, 45(7):24812485, 1999.

[73] H. P. Yuen, R. S. Kennedy, and M. Lax. Optimum testing of multiple hypothesis in quantumdetection theory. IEEE Trans. Inf. Theory, 21(2):125–134, 1975.

[74] Haonan Zhang. Carlen-Frank-Lieb conjecture and monotonicity of α-z Renyi relative entropy.arXiv:1811.01205, 2018.

Documents

Abstract arXiv:1811.10599v7 [quant-ph] 8 Jun 2020 · 2020. 6. 9. · arXiv:1811.10599v7 [quant-ph] 8 Jun 2020 Divergenceradii and thestrong converseexponent of classical-quantum channel