A Separability-Entanglement Classiﬁer via Machine …k whose marginals (reduced density matrices) ˆ AB i equal to ˆ AB for each 1 i k. The set of all k-extendible states, denoted

A Separability-Entanglement Classifier via Machine Learning

Sirui Lu,1, ∗ Shilin Huang,2, 3, ∗ Keren Li,1, 2 Jun Li,2, 4, † Jianxin Chen,5

Dawei Lu,2, 6, ‡ Zhengfeng Ji,7, 8 Yi Shen,9 Duanlu Zhou,10 and Bei Zeng11, 2, 6

1Department of Physics, Tsinghua University, Beijing, 100084, China2Institute for Quantum Computing, University of Waterloo, Waterloo N2L 3G1, Ontario, Canada3Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, 100084, China

4Beijing Computational Science Research Center, Beijing, 100193, China5Joint Center for Quantum Information and Computer Science,

University of Maryland, College Park, Maryland, USA6Department of Physics, Southern University of Science and Technology, Shenzhen 518055, China

7Centre for Quantum Computation & Intelligent Systems, School of Software,Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, Australia

8State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing, China9Department of Statistics and Actuarial Science, University of Waterloo, Waterloo N2L 3G1, Ontario, Canada

10Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China11Department of Mathematics & Statistics, University of Guelph, Guelph N1G 2W1, Ontario, Canada

(Dated: May 4, 2017)

The problem of determining whether a given quantum state is entangled lies at the heart in quantum infor-mation processing. Despite the many methods – such as the positive partial transpose (PPT) criterion and thek-symmetric extendibility criterion – to tackle this problem, none of them enables a general, practical solutiondue to the problem’s NP-hard complexity. Explicitly, states that are separable form a high-dimensional convexset of vastly complicated structure. In this work, we build a new separability-entanglement classifier under-pinned by machine learning techniques. Our method outperforms the existing methods in generic cases in termsof both speed and accuracy, opening up the avenues to explore quantum entanglement via the machine learningapproach.

Born from pattern recognition, machine learning possessesthe capability to make decisions without being explicitly pro-grammed after learning from large amount of data. Beyond itsextensive applications in industry, machine learning has alsobeen employed to investigate physics-related problems in re-cent years. A number of promising applications have beenproposed to date, such as the Hamiltonian learning [1], auto-mated quantum experiments search [2], phase transition iden-tification [3, 4], and phases of matter classification [5], just toname a few. Nevertheless, there are yet a myriad of significantbut hard problems in physics to be assessed, in which shouldmachine learning provide more novel insights. For example,determining whether a generic quantum state is entangled ornot is a fundamental and NP-hard problem in quantum infor-mation processing, and machine learning is demonstrated tobe exceptionally effective in tackling it as shown in this work.

As one of the key features in quantum mechanics, entangle-ment allows two or more parties to be correlated in a way thatis much stronger than they can be in any classical way [6]. Italso plays a key role in many quantum information processingtasks such as teleportation and quantum key distribution [7].As a result, one question naturally arises: is there a univer-sal criterion to tell if a generic quantum state is separable orentangled? This is a typical classification problem, which re-mains of great challenge even for bipartite states. In fact, suchan entanglement detection problem is proved to be NP-hard[8], implying that it is almost impossible to devise an efficientalgorithm in complete generality.

Here, we focus on the task of detecting bipartite entangle-ment [9]. Consider a bipartite system AB with the Hilbert

space HA ⊗ HB , where HA has dimension dA and HB hasdimension dB , respectively. A state ρAB is separable if it canbe written as a convex combination ρAB =

∑i λiρA,i ⊗ ρB,i

with a probability distribution λi ≥ 0 and∑i λi = 1. Here

ρA,i : HA → HA and ρB,i : HB → HB are density opera-tors. Otherwise, ρAB is entangled [10]. To date, many crite-ria have been proposed to detect bipartite entanglement, eachwith its own pros and cons. For instance, the most famouscriterion is the positive partial transpose (PPT) criterion, say-ing that a separable state must have a PPT; however, it is onlynecessary and sufficient when dAdB ≤ 6 [11, 12]. Anotherwidely used method is the k-symmetric extension hierarchy[13, 14], which is one of the most powerful criteria at present,but very hard to compute in practice due to its exponentiallygrowing complexity with k (see Appendix A in [15] for de-tails).

In this work, we employ the machine learning techniquesto tackle the bipartite entanglement detection problem by re-casting it as a learning task, based on the fact that all theseparable states form high-dimensional convex sets. Due toits renowned effectiveness in pattern recognition for high-dimensional objects, machine learning is a powerful tool tosolve the above problem. In particular, a reliable separability-entanglement classifier in terms of speed and accuracy is con-structed via the supervised learning approach. The idea is tofeed our classifier by a large amount of sampled trial states aswell as their corresponding class labels (separable or entan-gled), and then train the classifier to predict the class labelsof new states that it has not encountered before. It is worthystressing that, there is also a remarkable improvement with re-

arX

iv:1

705.

0152

3v1

[qu

ant-

ph]

3 M

ay 2

017

2

(a)

Separable

PPT

State Space

Witnesses (b)

State Space

FIG. 1. (a) In the high-dimensional space, the set of all states is con-vex, while separable states form a convex subset. Many criteria, suchas the linear (green straight line) or nonlinear (green curve) entan-glement witnesses [6, 16] and PPT tests, are based on this geometricstructure and detect a limited set of entangled states. (b) Classifierbuilt from supervised machine learning has a decision boundary ofhighly complex shape.

spect to universality in our classifier compared to the conven-tional methods. Previous methods only detect a limited partof the state space, e.g. different entangled states often requiredifferent entanglement witnesses. In contrast, our classifiercan handle a variety of input states once properly trained, asshown in Fig. 1.

Supervised learning – The bipartite entanglement detectionproblem can be formulated as a supervised binary classifi-cation task. Following the standard procedure of supervisedlearning [17, 18], the feature vector representation of the in-put objects (states) in a bipartite system AB is first created.Indeed, any quantum state ρ, as a density operator acting onHA ⊗ HB can be represented as a real vector x in Rd2Ad2B−1.This is due to the fact that ρ is Hermitian, positive semi-definite, and of trace one. So one can always find a Hermitianorthonormal basis for dAdB × dAdB matrices that containsidentity (for instance, the tensor product of Pauli matrices fordA = dB = 2), such that ρ can be expanded in such a basiswith real coefficients. We can denote these real coefficientsby a vector x. In the machine learning language, x is referredas the feature vector and X as the feature space.

Next, a dataset of training examples is produced, with theform D = {(x1, y1), ..., (xn, yn)}, where n is the set size,xi ∈ X is the i-th sample, and yi is its label signifying whichclass xi belongs to. Without loss of generality, we set yi equals1 if xi is entangled or −1 otherwise. The value of yi can bedetermined using the PPT criterion in the case that dAdB ≤ 6,or estimated by the convex hull approximation in a higher-dimensional space, as described later. The task is to analyzethese training data and produce an inferred classifier functionthat predicts the unknown class labels for new inputs.

Explicitly, the aim of supervised learning is to infer a hy-pothesis function (classifier) h : X → {−1, 1}, where “hy-pothesis” means that the function h is expected to be close tothe true function which we do not know but intend to model.A valid learning algorithm should be able to generalize fromthe training data to unseen situations in a reasonable way. One

basic approach to choose h is via the so-called empirical riskminimization, which seeks the function that best fits the train-ing data. In particular, to evaluate how well h fits the trainingdata D, a loss function is defined as

L(h,D) =1

|D|∑

(xi,yi)∈D

1(yi 6= h(xi)), (1)

in which 1(·) is the truth function of its arguments. Numer-ous supervised learning algorithms have been developed, eachwith its strengths and weaknesses. These algorithms, whichhave distinct choices of hypothesis, include support vectormachine (SVM) [19], decision trees [20], bootstrap aggregat-ing [21], and boosting [22] etc.

PPT and k-extension tests – To build a reliable separability-entanglement classifier, it is natural to take into account thesimplest two-qubit scenario to examine different algorithms,since the PPT criterion at this dimension offers a necessaryand sufficient condition to verify test results. Based on theperformances of different algorithms in the two-qubit case,we pick out the best one to construct classifiers for higher-dimensional problems.

In our preliminary numerical simulations, we created thetraining dataset by randomly sampling n = 5 × 104 states inaccordance with a specified distribution (see Appendix B fordetails). Class labels for separability or entanglement weresubsequently assigned to these inputs using the PPT criterion,forming a dataset D = {(x1, y1), ..., (xn, yn)}. We then em-ployed kernel SVM algorithms [19] to train a classifier. Ba-sically, a kernel SVM uses an implicit map to map data fromthe original feature space to another Hilbert space, and findsa hyperplane in the new Hilbert space to split the data intotwo subclasses. However, it turns out that many commonlyused kernels, including linear, radial basis, and polynomialfunctions, fail to give satisfactory results as the generalizationerror is always around 10%. This indicates that the shape ofthe separable states are indeed too complicated to be handledby simply applying apply common kernel SVM algorithms.

We then added some prior knowledge to the training pro-cedure, as prior knowledge in kernel algorithms is expectedto offer a performance boost. One approach to achievingthe goal is to utilize the k-extension property, that is, ρABis said to be k-symmetric extendible if there exists a globalstate ρAB1B2···Bk

whose marginals (reduced density matrices)ρABi equal to ρAB for each 1 ≤ i ≤ k. The set of all k-extendible states, denoted by Θk, is convex with a hierarchystructure Θk ⊃ Θk+1. Moreover, when k → ∞, Θk con-verges exactly to the set of separable states [23], meaning thatthe boundary of the k-extension set can approximate the truedecision boundary to arbitrary precisions. However, comput-ing the k-extension boundary is exponentially hard with in-creasing k in practice. Even when k = 12, there is still alarge gap between the separable boundary and the k-extensionboundary as depicted in Fig. 2(a) (see Appendix A for de-tails). In other words, a substantial part of the state spaceremains inaccessible to be classified in this case.

3

-0.5 0 0.5

-0.4

0

0.4

(a) (b)

(c)⟨H1⟩

⟨H2⟩

Sep 2-Ext 3-Ext 4-Ext

5-Ext 6-Ext 7-Ext 8-Ext

9-Ext 10-Ext 11-Ext 12-Ext

pp′

SeparablepI

State Space

Σ

FIG. 2. (a) Projections of the boundaries of the separable statesand the k-symmetric extendible states (k = 2, ..., 12) on the planespanned by the operators H1 = |0〉〈0| ⊗ σz/

√2 and H2 = (σy ⊗

σx − σx ⊗ σy)/2. Here σx, σy, σz are the three Pauli operators. (b)Illustration of the convex hull approximation of the separable bound-ary. The convex hull C is formed based on a randomly selected setof extreme points. To test whether a state p is inside C, we find themaximum α via linear programming such that p′ = pI + α(p− pI)is still inside C. If α ≥ 1, then p is inside C and surely separable,otherwise, p is outside the convex hull and probably entangled. (c)Illustration of the learning algorithm: ensemble methods are modelscomposed of multiple weaker models that are independently trainedand whose predictions are combined in some way to make the overallprediction. For example, each time we draw a subset of training data(marked as yellow dots in the figure), and we train a model based onthis subset, which is a weak model on the whole training set. We re-peat the process for many times and obtain a batch of weak models,and combine them as a committee.

Convex hull membership – We have shown that the aboveapproach based on the PPT and k-extension criteria can-not lead to a satisfactory separability-entanglement classifier.The solution we used is by appending finer prior informa-tion about the decision boundary, making use of the fact thatthe set of separable states S is convex and closed. In addi-tion, the extreme points of S are pure states formulated as|ψiA〉〈ψiA| ⊗ |ψiB〉〈ψiB |, which can be easily parameterizedand generated in a numerical way. Hence, the convex hullconstructed by a finite number of extreme points may approx-imate the true boundary from inside as shown in Fig. 2(b),boiling down the original problem of entanglement detectionto a convex hull membership (CHM) problem.

Concretely, we randomly chose m separable pure states{c1, ..., cm} to build the convex hull C := conv({c1, ..., cm}).

To decide whether a point p is in C is now equivalent to find-ing whether p can be written as a convex combination of thesestates ci. i.e. finding λi by solving the following linear pro-gramming:

max α

s.t. pI + α (p− pI) =

m∑i=1

λici,

λi ≥ 0,∑

iλi = 1, (2)

where pI is the feature vector of the maximally mixed stateI/(dAdB). If α ≥ 1, p is in C and thus separable; otherwise, itis possibly an entangled state. In principle, the more extremepoints we use to build C, the more accurate the classifier willbe.

For two-qubit case, we use the PPT criterion to test the ac-curacy of our CHM method. The results are shown as the blueline in Fig. 3(c). For m = 103, the error rate is 9%, whichdecreases quickly to 3% for m = 104.

For two-qutrit case, PPT criterion is no longer sufficientfor detecting separability. That is, there exists PPT entangledstates, which are usually called the “bound entangled” states.To illustrate the power of our CHM method beyond the PPTcriterion, we use a specific example that is previously well-studied. Consider a set of qutrit pure states {|v1〉, . . . , |v5〉}that form the well-known unextendible product basis [24],where |v1〉 = (|00〉 − |01〉)/

√2, |v2〉 = (|21〉 − |22〉)/

√2,

|v3〉 = (|02〉 − |12〉)/√

2, |v4〉 = (|10〉 − |20〉)/√

2, and|v5〉 = (|0〉+ |1〉+ |2〉)⊗2/3.

It is known that the state ρtiles = (I −∑5i=1 |vi〉〈vi|)/4

is PPT and entangled [25], where I is the identity matrix. Ifwe create a probabilistic mixture of ρtiles and the maximallymixed state I/9 by ρp = pI/9 + (1− p)ρtiles, 0 ≤ p ≤ 1. Itis obvious that ρp is entangled when p = 0, and is separablewhen p = 1. Due to the fact that the set of separable states Sis convex, there must exist a critical value pc ∈ (0, 1] such thatρp is entangled if p < pc and is separable otherwise. However,the exact value of pc has not been found yet. We use ourCHM methods to find the value of pc and the result is shownin Table I.

m 2000 5000 10000 20000 50000 100000

pc 0.5556 0.4531 0.4209 0.3740 0.3213 0.2853

TABLE I. Numerical results for detecting the critical point pc forρp = pI/9 + (1− p)ρtiles using the CHM method. m is the numberof randomly sampled extreme points that are used to build the convexhull.

In Ref. [26], the effectiveness of various separability crite-ria are compared. It concludes that one cannot tell whether ornot ρp is entangled for p ∈ (0.1351, 0.4357). Compared tothis large gap, our CHM method results in dramatically bettervalue for pc. That is, when m = 104, we have pc = 0.2853,so we know for sure that ρp is separable when p > 0.2853.

4

Also in Table I, the value of pc does not converge. This meansthat when given more computational power (i.e. larger m),one can further improve the value of pc.

Combining CHM and supervised learning – There is yeta noticeable drawback of the above CHM approach from theperspective of the tradeoff between the accuracy and time con-sumption. Boosting the accuracy means adding additional ex-treme points to enlarge the convex hull, which leads to moretime costs to determine if a point is inside the enlarged convexhull or not. To overcome this, we combined CHM with super-vised learning, as machine learning has the power to speed upsuch computations.

To design a learning process that is suitable for our prob-lem (instead to directly use common method, e.g. the kernalSVM, as previous described that do not lead to good accu-racy), we use an additional parameter α in the training datasetto encode the information of the CHM. After generating thetraining dataset D = {(x1, y1), ..., (xn, yn)}, we maximizedαi via Eq. (2) such that αixi is in C, and recorded αi’s valuesin D. In this way, α stores the information of the CHM, andthus supervised learning and the CHM method are combined.

In this combined case, the loss function is then modified to

L(h,Dnew) =1

|Dnew|∑

(xi,αi,yi)∈Dnew

1(yi 6= h(xi, αi))

(3)in the presence of αi, where the new training dataset isDnew = {(x1, α1, y1) , . . . , (xn, αn, yn)}. Subsequently, weemployed a standard ensemble learning approach [21] to traina classifier from Dnew, described as follows.

The essential idea of ensemble learning is to improve thepredictive performance by combining multiple classifiers intoa committee, which may still perform excellent even if theprediction from each constituent is poor. For the binary clas-sification problem, we can train different classifiers to givetheir respective binary votes for each prediction, and use themajority rule to choose the value which receives more thanhalf votes as the final answer, see Fig. 2(c) for a schematicdiagram.

Here, we chose bootstrap aggregating (bagging) [27] as ourtraining ensemble algorithm. In each run, a training subsetwas randomly drawn from the whole set Dnew, and a modelwas trained from the training subset using another learning al-gorithm, e.g., decision trees learning. We repeated the processfor L = 100 times and obtain L different models, which werefinally combined together as the committee. Since the param-eter α contains the CHM information, our method is indeed acombination of CHM and supervised learning algorithm. Wecall this combined method BCHM.

The computational cost contains two parts: the cost of com-puting α, and the time of computing each constituent in thecommittee. The latter cost is much smaller than the for-mer. Therefore, by using a convex hull of much smaller sizeand implementing a bagging algorithm, a significant boost interms of accuracy is anticipated if the total computational costis fixed. For the two-qubit case, we have demonstrated such

−0.5 0 0.5

0

1

2

3

4

-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.80

0.5

1

1.5

2

2.5

3

3.5

(a) (b)

(c) (d)⟨H1⟩

α−

1

−0.1 0 0.10

0.5

1

1.5

-0.15 -0.1 -0.05 0 0.05 0.1 0.150.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

⟨P1⟩

1 2 3 4 5 6 7 8 9 100

2%

4%

6%

8%

10%

m (×103)

Err

or

CHMBCHM

1 2 3 4 5 6 7 8 90

10%

20%

30%

40%

m (×104)

CHMBCHM

FIG. 3. The results of the CHM method and BCHM classifier. Moredetails of the BCHM classifier can be found in Appendix C. (a) Testresults of BCHM whenm = 103 for the two-qubit case. The randomdensity matrices in the test dataset are projected on a plane by pro-jection π : (x, α) → (x1, α

−1). Here H1 = |0〉〈0| ⊗ σz/√2. The

red points are states predicted as separable, while the blue points arestates predicted as entangled. The difference between BCHM andCHM only is that some states with α < 1 are predicted as separable.(b) Test results of BCHM whenm = 2×104 for the two-qutrit case.Here P1 = (|00〉〈00| − |01〉〈01|)

√3/2. All the states in the test

dataset are PPT states. The red points are states predicted as sepa-rable, while the blue points are states predicted as bound entangled.(c) The comparison between CHM and BCHM in 2-qubit case. Forthe same m, the BCHM method clearly suppresses the error correctsignificantly. And to achieve the same error rate, the BCHM needsmuch less running time (which is mainly depended on the value ofm). For instance, to decrease the error rate to less than 3%, CHM re-quires a convex hull withm ≈ 7×103, while BCHM only requires aconvex hull with m ≈ 103, which highly reduces the computationalcost. (d) The comparison between CHM and bagging with BCHMin 2-qutrit case. The BCHM clearly outperforms CHM in terms ofspeed and accuracy. Both methods converge to 0 error for m = 105,since we choose to use m = 105 CHM as our “no error” standardfor comparison reason only.

a remarkable boost of accuracy in our BCHM classifier, asshown in Fig. 3(c), where the advantage of the BCHM classi-fier in terms of both accuracy and speed are shown.

We further extended the classifier to the two-qutrit scenario.Unlike the two-qubit case, the most intuitive question now ishow to set an appropriate criterion to evaluate whether theclassifier is working correctly, since PPT criterion is not suf-ficient for detecting separability in two-qutrit systems. As theconvex hull is capable of approximating the set of separablestates S to an arbitrary precision, we used 105 random separa-ble pure states as extreme points to form the hull, and assumedit to be the true S. The learning procedure is analogous to the

5

one used for two qubits, where CHM and supervised learningare combined (BCHM). Figure 3(d) shows the accuracy of theBCHM classifier compared to that of the sole CHM method,where we have assumed that the convex hull with our 105 ex-treme points exactly represents S. Similar as the two-qubitcase, the BCHM classifier shows clear advantage in terms ofboth accuracy and speed in comparison with the sole CHMmethod.

Conclusion – In summary, we study the entanglement de-tection problem via the machine learning approach, and builda reliable separability-entanglement classifier by combiningsupervised learning and the CHM method. Compared to theconventional criteria for entanglement detection, our methodcan classify an unknown state into the separable or entan-gled category more precisely and rapidly. The classifier canbe extend to higher dimensions in principle, and the devel-oped techniques in this work would also be incorporated infuture entanglement-engineering experiments. We anticipatethat our work would provide new insights to employ the ma-chine learning techniques to deal with more quantum infor-mation processing tasks in the near future.

Acknowledgments – We thank Daniel Gottesman andNathaniel Johnston for helpful discussions. JL is supportedby the National Basic Research Program of China (Grants No.2014CB921403, No. 2016YFA0301201, No. 2014CB848700and No. 2013CB921800), National Natural Science Founda-tion of China (Grants No. 11421063, No. 11534002, No.11375167 and No. 11605005), the National Science Fundfor Distinguished Young Scholars (Grant No. 11425523), andNSAF (Grant No. U1530401). DL and BZ are supported byNSERC and CIFAR.

∗ These authors contributed equally to this work.† [email protected]‡ [email protected]

[1] N. Wiebe, C. Granade, C. Ferrie, and D. G. Cory, Phys. Rev.Lett. 112, 190501 (2014).

[2] M. Krenn, M. Malik, R. Fickler, R. Lapkiewicz, andA. Zeilinger, Phys. Rev. Lett. 116, 090405 (2016).

[3] S. S. Schoenholz, E. D. Cubuk, D. M. Sussman, E. Kaxiras,and A. J. Liu, Nat. Phys. 12, 469 (2016).

[4] E. P. L. van Nieuwenburg, Y.-H. Liu, and S. D. Huber, Nat.Phys. (2017).

[5] J. Carrasquilla and R. G. Melko, Nat. Phys. (2017).[6] R. Horodecki, P. Horodecki, M. Horodecki, and K. Horodecki,

Rev. Mod. Phys. 81, 865 (2009).[7] M. A. Nielsen and I. L. Chuang, Quantum computation and

quantum information (Cambridge university press, 2010).[8] L. Gurvits, in Proceedings of the thirty-fifth annual ACM sym-

posium on Theory of computing (ACM, 2003) pp. 10–19.[9] O. Guhne and G. Toth, Phys. Rep. 474, 1 (2009).

[10] R. F. Werner, Phys. Rev. A 40, 4277 (1989).[11] A. Peres, Phys. Rev. Lett. 77, 1413 (1996).[12] M. Horodecki, P. Horodecki, and R. Horodecki, Phys. Lett. A

223, 1 (1996).[13] M. Navascues, M. Owari, and M. B. Plenio, Phy. Rev. A 80,

052306 (2009).[14] F. G. Brandao, M. Christandl, and J. Yard, in Proceedings of

the forty-third annual ACM symposium on Theory of computing(ACM, 2011) pp. 343–352.

[15] See Supplementary Material for more details.[16] O. Guhne and N. Lutkenhaus, Phys. Rev. Lett. 96, 170502

(2006).[17] M. Mohri, A. Rostamizadeh, and A. Talwalkar, Foundations of

Machine Learning (The MIT Press, 2012).[18] S. Shalev-Shwartz and S. Ben-David, Understanding Machine

Learning: From Theory to Algorithms (Cambridge UniversityPress, 2014).

[19] C. Cortes and V. Vapnik, Mach. Learn. 20, 273 (1995).[20] L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen, Clas-

sification and regression trees (CRC press, 1984).[21] Z.-H. Zhou, Ensemble Methods: Foundations and Algorithms

(Taylor & Francis Group, 2012).[22] R. E. Schapire, in Nonlinear estimation and classification

(Springer, 2003) pp. 149–171.[23] A. C. Doherty, P. A. Parrilo, and F. M. Spedalieri, Phys. Rev.

Lett. 88, 187904 (2002).[24] I. Bengtsson and K. Zyczkowski, Geometry of quantum states:

an introduction to quantum entanglement (Cambridge Univer-sity Press, 2007).

[25] C. H. Bennett, D. P. DiVincenzo, T. Mor, P. W. Shor, J. A.Smolin, and B. M. Terhal, Phys. Rev. Lett. 82, 5385 (1999).

[26] N. Johnston, “Entanglement detection,” (Lecture note, Univer-sity of Waterloo, 2014).

[27] L. Breiman, Mach. Learn. 24, 123 (1996).[28] B. Zeng, X. Chen, D.-L. Zhou, and X.-G. Wen,

arXiv:1508.02595 (2015).[29] A. A. Klyachko, J. Phys.: Conf. Ser. 36, 72 (2006).[30] A. J. Coleman, Rev. Mod. Phys. 35, 668 (1963).[31] R. M. Erdahl, J. Math. Phys. 13, 1608 (1972).[32] Y.-K. Liu, in Approximation, Randomization, and Combinato-

rial Optimization. Algorithms and Techniques, Lecture Notesin Computer Science, Vol. 4110, edited by J. Diaz, K. Jansen,J. D. Rolim, and U. Zwick (Springer Berlin Heidelberg, 2006)pp. 438–449.

[33] Y.-K. Liu, M. Christandl, and F. Verstraete, Phys. Rev. Lett. 98,110503 (2007).

[34] T.-C. Wei, M. Mosca, and A. Nayak, Phys. Rev. Lett. 104,040501 (2010).

[35] N. Johnston, “Qetlab: A matlab toolbox for quantum entangle-ment, version 0.9,” (2016).

[36] E. Stormer, J. Funct. Anal. 3, 48 (1969).[37] R. Hudson and G. Moody, Probab. Theory Rel. 33, 343 (1976).[38] R. Renner, Nat. Phys. 3, 645 (2007).[39] M. Christandl, R. Konig, G. Mitchison, and R. Renner, Comm.

Math. Phys 273, 473 (2007).[40] A. W. Harrow, arXiv:1308.6595 (2013).[41] K. Zyczkowski, Phys. Rev. A 60, 3496 (1999).

A. The set of k-extendible states

In this section, we recall known facts regarding k-extendible states and the relationship to separability. We willmain focus on the Θk, which is known to be closely related tothe ground state of some k+1-body Hamiltonians [28]. Moregeneral properties on k-extendability and its relationship tothe quantum marginal problem can be found in [29–34].

mailto:[email protected]

mailto:[email protected]

http://dx.doi.org/ 10.1103/PhysRevLett.112.190501



http://dx.doi.org/10.1038/NPHYS3644

http://dx.doi.org/10.1103/RevModPhys.81.865

http://dx.doi.org/10.1017/CBO9780511976667


http://dx.doi.org/10.1145/780542.780545

http://dx.doi.org/10.1145/780542.780545

http://dx.doi.org/10.1016/j.physrep.2009.02.004

http://dx.doi.org/10.1103/PhysRevA.40.4277

http://dx.doi.org/10.1103/PhysRevLett.77.1413

http://dx.doi.org/10.1016/S0375-9601(96)00706-2

http://dx.doi.org/10.1016/S0375-9601(96)00706-2



http://dx.doi.org/10.1145/1993636.1993683

http://dx.doi.org/10.1145/1993636.1993683



http://dx.doi.org/10.1017/CBO9781107298019.001

http://dx.doi.org/10.1017/CBO9781107298019.001

http://dx.doi.org/10.1007/BF00994018






http://dx.doi.org/10.1007/BF00058655

http://dx.doi.org/10.1088/1742-6596/36/1/014

http://dx.doi.org/10.1103/RevModPhys.35.668

http://dx.doi.org/10.1063/1.1665885

http://dx.doi.org/10.1007/11830924_40

http://dx.doi.org/10.1007/11830924_40





http://dx.doi.org/10.1016/0022-1236(69)90050-0

http://dx.doi.org/10.1007/BF00534784

http://dx.doi.org/10.1038/nphys684

http://dx.doi.org/10.1007/s00220-007-0189-3

http://dx.doi.org/10.1007/s00220-007-0189-3


6

To be more precise, consider a 2-local Hamiltonian H

of a k + 1-body system with Hilbert space CdA⊗k

i=1 CdBi

of dimension dAdkB , as given in the following form H =∑k

i=1HABi. Here HABi

is any Hermitian operator actingnontrivially on particles A and Bi, and trivially on other k−1parties. In other words, we will have HAB1

= HAB ⊗ I2,...,k(I2,...,k is the identity operator of B2, . . . , Bk), and given thesymmetry of Bis, we can always write the nontrivial action ofHABi

on CdA⊗k

i=1 CdBi in terms of some HAB with dAdBdimensions.

For any given H , denote its normalized ground state by|ψg〉 ∈ CdA

⊗ki=1 CdBi . Denote ρg = |ψg〉〈ψg|. Then the

extreme points of Θk are given by the marginals of ρg on par-ticlesABi, which are the same for any i. Denote this marginalby ρH , since it is completely determined by H .

To implement our kernal algorithms, we will let use the ex-treme points of Θk, which are given by ρH . To generate theseextreme points, we will need to first parametrize them. Denote{OlmAB} as set of orthonormal Hermitian basis for operatorson HA ⊗ HBj

(e.g. Pauli operators for the qubit case), thenwe can always write HAB =

∑lm almO

lmAB , with parameters

alm. Without loss of generality, we assume O00AB = I , and we

will assume a00 = 0, so there are only dAdB − 1 terms in thesum. Since HAB is a hermitian matrix, alm can be chosen asreal, and we can further require that

∑l,m a

2lm = 1. Conse-

quently, ρH will be a point in RdAdB−1, which is parametrizedby {alm}s. And the coordinate of ρH are explicitly given byblm = tr(ρHO

lmAB).

Also, each HAB given an entanglement witness. Theground state energy of HAB is given by E0 =〈ψg|H|ψg〉 =

∑lm almblm/k. For any density matrix ρAB , if

tr(ρABHAB) < E0, then ρAB has no k-symmetric extension,hence is surely entangled.

Since the dimension of H grows exponentially with k, togenerate these extreme points for Θk becomes hard for when kgrows. However, in practice, H is sparse, so it is easy to gen-erate these extreme points up to e.g. k = 20 for dA = dB = 2.As a comparison, for the online package given in [35], oneusually only can handle k ∼ 7.

When k →∞, however, the extreme points of Θ∞ becomevery easy to characterize, which is essentially the consequenceof the quantum de Finetti’s theorem [36, 37]. And quanti-tively, the finite quantum de Finetti’s theorem implies that forany k-extendible ρAB , ||ρAB − σAB ||1 ≤ 4d2Bk

−1 [23, 38–40]. That is, Θ∞ is exactly the set of separable states, whoseextreme points are given by pure states of the product form|ψA〉 ⊗ |ψB〉, which are easy to parametrize and generate nu-merically.

B. Generating Random Density Matrices

It can be straightforward to generate a random separablepure state in the space HA ⊗ HB ∼= CdA ⊗ CdB : we firstsample a state vector |ψA〉 ∈ CdA from uniform distribution

on the unit hypersphere in CdA , according to haar measure,then we sample another state vector |ψB〉 ∈ CdB similarly,and give |ψA〉|ψB〉 as the desired state.

There exists many probability distributions from which wecan generate random mixed states. However, to test the ro-bustness of our machine learning method, the probability ofobtaining a separable state and that of a entangled state shouldnot be far away from balanced. We have found that the prod-uct measures demonstrated in section II.A of Ref. [41] isa proper measure for testing. The procedure for generatingan n-dimensional random mixed state is demonstrated as fol-lows: we first pick a n-dimensional vector (d1, . . . , dn) onthe n-dimensional simplex

∑ni=1 di = 1 from a particular in-

stance of Dirichlet distributions,

∆1/2(d1, . . . , dn) = Cd−1/21 . . . d−1/2n , (4)

where C is the normalization constant. Then we pick an n-dimensional unitary matrix U from uniform distribution onU(n) according to Haar measure. Finally, we give ρ = U ·diag(d1, . . . , dn) · U† as the desired state. Mathematically,the state is sampled from the measure µ = ν × ∆1/2 in theprocedure demonstrated previously, where ν is the uniformdistribution on U(n) according to Haar measure, and µ is aprobability distribution on the state space. In the 2-qubit case,35.2% of the sampled states are separable [41], which is anacceptable proportion. In the 2-qutrit case, we first generatea set of 105 separable pure states, and use the convex hull ofthem to approximate the separable set. Since we are focuson detecting bound entanglement only, we sample 2 × 104

PPT states from µ, and label these states by their convex hullmembership. We found that ?% of the states are inside theconvex hull, i.e., surely separable.

We implement the procedure of sampling from µ on MAT-LAB, using the package QETLAB [35].

C. Details of Testing the Machine Learning Method

In 2-qubits case, we first generate a set of n = 50000 ran-dom quantum states x1, . . . , xn from the probability distribu-tion µ described above, and give each xi a label yi using thePPT criterion. Then we generate 10m random separate purestates c1, . . . , c10m, here m = 1000. Next, we create 10 con-vex hulls

CH1 = conv {c1, . . . , cm} ,CH2 = conv {c2, . . . , c2m} ,

. . . ,

CH10 = conv {c1, . . . , c10m} .

Note that we must have CH1 ⊂ . . . ⊂ CH10, which meansthat the accuracy of approximating separable set by CHkis increased by k. For each convex hull CHk and eachrandom state xi, we compute the parameter α(k)

i such thatpI + α

(k)i (xi − pI) is inside CHk. Then we form a training

7

dataset Dk = {(x1, α1, yi) , . . . , (xn, αn, yn)}, and we uti-lize the classification app in MATLAB to train a model on Dkvia bagging algorithm. In the classification app, we can onlyuse decision tree as the learning algorithm for obtaining weakclassifiers for bagging, and the degree of each tree (numberof splits) is 100 in our experiment. We chose the number ofweak classifier L = 100. Finally, we generate another set of

n random quantum states from µ as the test set. The test resultis shown in Fig 3.

In the case of 2 qutrits, we chose n = 20000 and m =10000. Instead of PPT criterion, we give each xi a label yiusing the CH10-membership of xi, i.e., if xi is inside CH10,then yi = −1, otherwise, yi = 1. The number of splits is 100,and L = 100.

Documents

A Separability-Entanglement Classiﬁer via Machine …k whose marginals (reduced density matrices) ˆ AB i equal to ˆ AB for each 1 i k. The set of all k-extendible states, denoted