24
Local SGD for non-i.i.d. data Konstantin Mishchenko Work done together with Ahmed Khaled and Peter Richtárik

Local SGD for non-i.i.d. data · Local SGD for non-i.i.d. data Konstantin Mishchenko Work done together with Ahmed Khaled and Peter Richtárik. Problem Convex min x 1 M XM m=1 f m

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Local SGD for non-i.i.d. data · Local SGD for non-i.i.d. data Konstantin Mishchenko Work done together with Ahmed Khaled and Peter Richtárik. Problem Convex min x 1 M XM m=1 f m

Local SGD for non-i.i.d. data

Konstantin MishchenkoWork done together with

Ahmed Khaled and Peter Richtárik

Page 2: Local SGD for non-i.i.d. data · Local SGD for non-i.i.d. data Konstantin Mishchenko Work done together with Ahmed Khaled and Peter Richtárik. Problem Convex min x 1 M XM m=1 f m

Problem

Convex

minx

1

M

MX

m=1

fm(x)<latexit sha1_base64="DJWDtL5mSWrGww/2Afw1MWWCqws=">AAACDnicbVC7SgNBFJ2NrxhfUUubwRCITdiNgjZC0MYmEME8IBuX2clsMmRmdpmZlYRlv8DGX7GxUMTW2s6/cfIoNPHAhcM593LvPX7EqNK2/W1lVlbX1jeym7mt7Z3dvfz+QVOFscSkgUMWyraPFGFUkIammpF2JAniPiMtf3g98VsPRCoaijs9jkiXo76gAcVIG8nLF11OhTeCbiARTpw0qaWuirmX8Esnva/BwOOl0YmXL9hlewq4TJw5KYA56l7+y+2FOOZEaMyQUh3HjnQ3QVJTzEiac2NFIoSHqE86hgrEieom03dSWDRKDwahNCU0nKq/JxLElRpz33RypAdq0ZuI/3mdWAcX3YSKKNZE4NmiIGZQh3CSDexRSbBmY0MQltTcCvEAmWC0STBnQnAWX14mzUrZOS1Xbs8K1at5HFlwBI5BCTjgHFTBDaiDBsDgETyDV/BmPVkv1rv1MWvNWPOZQ/AH1ucPkhmbxw==</latexit>

Page 3: Local SGD for non-i.i.d. data · Local SGD for non-i.i.d. data Konstantin Mishchenko Work done together with Ahmed Khaled and Peter Richtárik. Problem Convex min x 1 M XM m=1 f m

Problem

ConvexIn practice, usually a neural network

minx

1

M

MX

m=1

fm(x)<latexit sha1_base64="DJWDtL5mSWrGww/2Afw1MWWCqws=">AAACDnicbVC7SgNBFJ2NrxhfUUubwRCITdiNgjZC0MYmEME8IBuX2clsMmRmdpmZlYRlv8DGX7GxUMTW2s6/cfIoNPHAhcM593LvPX7EqNK2/W1lVlbX1jeym7mt7Z3dvfz+QVOFscSkgUMWyraPFGFUkIammpF2JAniPiMtf3g98VsPRCoaijs9jkiXo76gAcVIG8nLF11OhTeCbiARTpw0qaWuirmX8Esnva/BwOOl0YmXL9hlewq4TJw5KYA56l7+y+2FOOZEaMyQUh3HjnQ3QVJTzEiac2NFIoSHqE86hgrEieom03dSWDRKDwahNCU0nKq/JxLElRpz33RypAdq0ZuI/3mdWAcX3YSKKNZE4NmiIGZQh3CSDexRSbBmY0MQltTcCvEAmWC0STBnQnAWX14mzUrZOS1Xbs8K1at5HFlwBI5BCTjgHFTBDaiDBsDgETyDV/BmPVkv1rv1MWvNWPOZQ/AH1ucPkhmbxw==</latexit>

Page 4: Local SGD for non-i.i.d. data · Local SGD for non-i.i.d. data Konstantin Mishchenko Work done together with Ahmed Khaled and Peter Richtárik. Problem Convex min x 1 M XM m=1 f m

Problem

minx

1

M

MX

m=1

fm(x)<latexit sha1_base64="DJWDtL5mSWrGww/2Afw1MWWCqws=">AAACDnicbVC7SgNBFJ2NrxhfUUubwRCITdiNgjZC0MYmEME8IBuX2clsMmRmdpmZlYRlv8DGX7GxUMTW2s6/cfIoNPHAhcM593LvPX7EqNK2/W1lVlbX1jeym7mt7Z3dvfz+QVOFscSkgUMWyraPFGFUkIammpF2JAniPiMtf3g98VsPRCoaijs9jkiXo76gAcVIG8nLF11OhTeCbiARTpw0qaWuirmX8Esnva/BwOOl0YmXL9hlewq4TJw5KYA56l7+y+2FOOZEaMyQUh3HjnQ3QVJTzEiac2NFIoSHqE86hgrEieom03dSWDRKDwahNCU0nKq/JxLElRpz33RypAdq0ZuI/3mdWAcX3YSKKNZE4NmiIGZQh3CSDexRSbBmY0MQltTcCvEAmWC0STBnQnAWX14mzUrZOS1Xbs8K1at5HFlwBI5BCTjgHFTBDaiDBsDgETyDV/BmPVkv1rv1MWvNWPOZQ/AH1ucPkhmbxw==</latexit>

fm(x) = E⇠fm(x; ⇠)<latexit sha1_base64="2TvImUb0Ro/I8bGbe3H5+DgMKlo=">AAACDnicbZDLSsNAFIYn9VbrLerSzWAptJuSVEFBhKIILivYCzQhTKaTdujkwsxEWkKfwI2v4saFIm5du/NtnKRZaOuBgY//P4c553cjRoU0jG+tsLK6tr5R3Cxtbe/s7un7Bx0RxhyTNg5ZyHsuEoTRgLQllYz0Ik6Q7zLSdcfXqd99IFzQMLiX04jYPhoG1KMYSSU5esVz/OqkBi+h5SM5ct3kZuYk1oTOYOZcQMU1Ry8bdSMruAxmDmWQV8vRv6xBiGOfBBIzJETfNCJpJ4hLihmZlaxYkAjhMRqSvsIA+UTYSXbODFaUMoBeyNULJMzU3xMJ8oWY+q7qTHcWi14q/uf1Y+md2wkNoliSAM8/8mIGZQjTbOCAcoIlmypAmFO1K8QjxBGWKsGSCsFcPHkZOo26eVJv3J2Wm1d5HEVwBI5BFZjgDDTBLWiBNsDgETyDV/CmPWkv2rv2MW8taPnMIfhT2ucPI2Sa5w==</latexit>

Page 5: Local SGD for non-i.i.d. data · Local SGD for non-i.i.d. data Konstantin Mishchenko Work done together with Ahmed Khaled and Peter Richtárik. Problem Convex min x 1 M XM m=1 f m

Local SGD

minx

1

M

MX

m=1

fm(x)<latexit sha1_base64="DJWDtL5mSWrGww/2Afw1MWWCqws=">AAACDnicbVC7SgNBFJ2NrxhfUUubwRCITdiNgjZC0MYmEME8IBuX2clsMmRmdpmZlYRlv8DGX7GxUMTW2s6/cfIoNPHAhcM593LvPX7EqNK2/W1lVlbX1jeym7mt7Z3dvfz+QVOFscSkgUMWyraPFGFUkIammpF2JAniPiMtf3g98VsPRCoaijs9jkiXo76gAcVIG8nLF11OhTeCbiARTpw0qaWuirmX8Esnva/BwOOl0YmXL9hlewq4TJw5KYA56l7+y+2FOOZEaMyQUh3HjnQ3QVJTzEiac2NFIoSHqE86hgrEieom03dSWDRKDwahNCU0nKq/JxLElRpz33RypAdq0ZuI/3mdWAcX3YSKKNZE4NmiIGZQh3CSDexRSbBmY0MQltTcCvEAmWC0STBnQnAWX14mzUrZOS1Xbs8K1at5HFlwBI5BCTjgHFTBDaiDBsDgETyDV/BmPVkv1rv1MWvNWPOZQ/AH1ucPkhmbxw==</latexit>

xmt+1 =

(x̂t+1, if t mod H = 0

xmt � �rfm(xm

t ; ⇠mt ), otherwise<latexit sha1_base64="TU3D7h/GIHQvE30NZkRDc74gSRE=">AAACenicbVHbattAEF2pt9S9ueljKQwxbROSGCkNJBACoX3JYwp1ErBcsVqN7CW7K7E7Sm2EP6K/lrd+SV/60LWtQpt0YOHMOXPbmaxS0lEU/QjCe/cfPHy09rjz5Omz5y+6L9fPXVlbgQNRqtJeZtyhkgYHJEnhZWWR60zhRXb1aaFfXKN1sjRfaFbhSPOxkYUUnDyVdr9P04a24/lXDceQZDiWphG+npt3kgknaOWdhHBKjSxgDgQrB3SZe/f0OEoS6ExT8jV2IRlzrTkkhmeKQ5HqzaVyBMlULsDWzp/0kiZov0mH4Huhydu+abcX9aOlwV0Qt6DHWjtLuzdJXopaoyGhuHPDOKpo1HBLUij0tWuHFRdXfIxDDw3X6EbNcnVzeOuZHIrS+mcIluzfGQ3Xzs105iM1p4m7rS3I/2nDmorDUSNNVRMasWpU1AqohMUdIJcWBamZB1xY6WcFMeGWC/LX6vglxLe/fBec7/XjD/29z/u9k4/tOtbYa7bBNlnMDtgJO2VnbMAE+xm8Cd4F74Nf4Ua4FW6vQsOgzXnF/rFw/zcCfb63</latexit>

Page 6: Local SGD for non-i.i.d. data · Local SGD for non-i.i.d. data Konstantin Mishchenko Work done together with Ahmed Khaled and Peter Richtárik. Problem Convex min x 1 M XM m=1 f m

Local SGD

minx

1

M

MX

m=1

fm(x)<latexit sha1_base64="DJWDtL5mSWrGww/2Afw1MWWCqws=">AAACDnicbVC7SgNBFJ2NrxhfUUubwRCITdiNgjZC0MYmEME8IBuX2clsMmRmdpmZlYRlv8DGX7GxUMTW2s6/cfIoNPHAhcM593LvPX7EqNK2/W1lVlbX1jeym7mt7Z3dvfz+QVOFscSkgUMWyraPFGFUkIammpF2JAniPiMtf3g98VsPRCoaijs9jkiXo76gAcVIG8nLF11OhTeCbiARTpw0qaWuirmX8Esnva/BwOOl0YmXL9hlewq4TJw5KYA56l7+y+2FOOZEaMyQUh3HjnQ3QVJTzEiac2NFIoSHqE86hgrEieom03dSWDRKDwahNCU0nKq/JxLElRpz33RypAdq0ZuI/3mdWAcX3YSKKNZE4NmiIGZQh3CSDexRSbBmY0MQltTcCvEAmWC0STBnQnAWX14mzUrZOS1Xbs8K1at5HFlwBI5BCTjgHFTBDaiDBsDgETyDV/BmPVkv1rv1MWvNWPOZQ/AH1ucPkhmbxw==</latexit>

xmt+1 =

(1M

PMj=1(x

jt � �rfj(x

jt ; ⇠

jt )), if t mod H = 0

xmt � �rfm(xm

t ; ⇠mt ), otherwise<latexit sha1_base64="lgA188Epjy9RAdJcl5dGIGQd9cg=">AAACsXicbVFda9swFJW9ry77SrfHvVwWBg3rgt0OVhiBsr30pdBB03bEjicrcqLEko10vSUI/789723/ZnLiwtb2guDo3HuOru5Ny1wYDII/nn/v/oOHj3Yed548ffb8RXf35YUpKs34iBV5oa9SanguFB+hwJxflZpTmeb8Ml1+afKXP7g2olDnuC55LOlMiUwwio5Kur9WicV3YT2RMIQo5TOhLHN+pu5EmabMhrU9rSNTycQuhq7uFPZWCU4W8B6iGZWSQqRomlPIksU28wmilWhAv78fIV+hFRnUgLC9gCym7noyDKIIOo1C3uElN17y2kv296/lBc65/ikMB9ciV9O23aTbCwbBJuA2CFvQI22cJd3f0bRgleQKWU6NGYdBibGlGgXLufOuDC8pW9IZHzuoqOQmtpuJ1/DWMVPICu2OQtiw/yoslcasZeoqJcW5uZlryLty4wqzo9gKVVbIFds+lFU5YAHN+mAqNGeYrx2gTAvXK7A5dYtCt+SOG0J488u3wcXBIDwcHHz90Dv+3I5jh7wmb8geCclHckxOyBkZEeYNvHMv9ib+of/N/+6n21LfazWvyH/hL/8CRMjSKA==</latexit>

Page 7: Local SGD for non-i.i.d. data · Local SGD for non-i.i.d. data Konstantin Mishchenko Work done together with Ahmed Khaled and Peter Richtárik. Problem Convex min x 1 M XM m=1 f m

Local SGD

minx

1

M

MX

m=1

fm(x)<latexit sha1_base64="DJWDtL5mSWrGww/2Afw1MWWCqws=">AAACDnicbVC7SgNBFJ2NrxhfUUubwRCITdiNgjZC0MYmEME8IBuX2clsMmRmdpmZlYRlv8DGX7GxUMTW2s6/cfIoNPHAhcM593LvPX7EqNK2/W1lVlbX1jeym7mt7Z3dvfz+QVOFscSkgUMWyraPFGFUkIammpF2JAniPiMtf3g98VsPRCoaijs9jkiXo76gAcVIG8nLF11OhTeCbiARTpw0qaWuirmX8Esnva/BwOOl0YmXL9hlewq4TJw5KYA56l7+y+2FOOZEaMyQUh3HjnQ3QVJTzEiac2NFIoSHqE86hgrEieom03dSWDRKDwahNCU0nKq/JxLElRpz33RypAdq0ZuI/3mdWAcX3YSKKNZE4NmiIGZQh3CSDexRSbBmY0MQltTcCvEAmWC0STBnQnAWX14mzUrZOS1Xbs8K1at5HFlwBI5BCTjgHFTBDaiDBsDgETyDV/BmPVkv1rv1MWvNWPOZQ/AH1ucPkhmbxw==</latexit>

xmt+1 =

(x̂t+1, if t mod H = 0

xmt � �rfm(xm

t ; ⇠mt ), otherwise<latexit sha1_base64="TU3D7h/GIHQvE30NZkRDc74gSRE=">AAACenicbVHbattAEF2pt9S9ueljKQwxbROSGCkNJBACoX3JYwp1ErBcsVqN7CW7K7E7Sm2EP6K/lrd+SV/60LWtQpt0YOHMOXPbmaxS0lEU/QjCe/cfPHy09rjz5Omz5y+6L9fPXVlbgQNRqtJeZtyhkgYHJEnhZWWR60zhRXb1aaFfXKN1sjRfaFbhSPOxkYUUnDyVdr9P04a24/lXDceQZDiWphG+npt3kgknaOWdhHBKjSxgDgQrB3SZe/f0OEoS6ExT8jV2IRlzrTkkhmeKQ5HqzaVyBMlULsDWzp/0kiZov0mH4Huhydu+abcX9aOlwV0Qt6DHWjtLuzdJXopaoyGhuHPDOKpo1HBLUij0tWuHFRdXfIxDDw3X6EbNcnVzeOuZHIrS+mcIluzfGQ3Xzs105iM1p4m7rS3I/2nDmorDUSNNVRMasWpU1AqohMUdIJcWBamZB1xY6WcFMeGWC/LX6vglxLe/fBec7/XjD/29z/u9k4/tOtbYa7bBNlnMDtgJO2VnbMAE+xm8Cd4F74Nf4Ua4FW6vQsOgzXnF/rFw/zcCfb63</latexit>

H = 1 �! minibatch SGD<latexit sha1_base64="TrRkAl2UzCOO/3evHqQkuSPWkQ0=">AAACC3icbVC7SgNBFJ31GeMramkzJBGswm4stBGCCqaMaB6QhDA7mWyGzM4sM3eVsKS38VdsLBSx9Qfs/Bsnj0ITD1w4nHMv997jR4IbcN1vZ2l5ZXVtPbWR3tza3tnN7O3XjIo1ZVWqhNINnxgmuGRV4CBYI9KMhL5gdX9wOfbr90wbruQdDCPWDkkgeY9TAlbqZLL58rnXEkoGmgd9IFqrhzwOueQ+AdrHt9dXnUzOLbgT4EXizUgOzVDpZL5aXUXjkEmgghjT9NwI2gnRwKlgo3QrNiwidEAC1rRUkpCZdjL5ZYSPrNLFPaVtScAT9fdEQkJjhqFvO0MCfTPvjcX/vGYMvbN2wmUUA5N0uqgXCwwKj4PBXa4ZBTG0hFDN7a2Y9okmFGx8aRuCN//yIqkVC95JoXhTzJUuZnGk0CHKomPkoVNUQmVUQVVE0SN6Rq/ozXlyXpx352PauuTMZg7QHzifP5oomiE=</latexit>

H = T �! one-shot averaging<latexit sha1_base64="EetfaAhe/yrdpwryMbhAPUM3Qgo=">AAACEHicbVC7SgNBFJ31bXytWtoMRtHGsBsLbQTRJmUEY4QkhLuTyWbI7Mwyc1cJIZ9g46/YWChia2nn3zh5FBo9MHA45x7u3BOlUlgMgi9vZnZufmFxaTm3srq2vuFvbt1YnRnGK0xLbW4jsFwKxSsoUPLb1HBIIsmrUfdy6FfvuLFCq2vspbyRQKxEWzBAJzX9g73S2XVdahUbEXcQjNH3e1QrfmQ7Gim4LMRCxU0/HxSCEehfEk5InkxQbvqf9ZZmWcIVMgnW1sIgxUYfDAom+SBXzyxPgXUh5jVHFSTcNvqjgwZ03ykt2tbGPYV0pP5M9CGxtpdEbjIB7Nhpbyj+59UybJ82+kKlGXLFxovamaSo6bAd2hKGM5Q9R4AZ4f5KWQcMMHQd5lwJ4fTJf8lNsRAeF4pXxfz5xaSOJbJDdskhCckJOSclUiYVwsgDeSIv5NV79J69N+99PDrjTTLb5Be8j29W9JzK</latexit>

Page 8: Local SGD for non-i.i.d. data · Local SGD for non-i.i.d. data Konstantin Mishchenko Work done together with Ahmed Khaled and Peter Richtárik. Problem Convex min x 1 M XM m=1 f m

Local GD

xmt+1 = xm

t � �rfm(xmt )

<latexit sha1_base64="/Ss/hIUrOmuSuaZww6WP6qfoGsY=">AAACFXicbVBNSwMxEM36WetX1aOXYBEUteyqoBeh6MWjglWhrctsmq2hSXZJZsWy9E948a948aCIV8Gb/8a09uDXg2Ee782QzItSKSz6/oc3Mjo2PjFZmCpOz8zOzZcWFs9tkhnGayyRibmMwHIpNK+hQMkvU8NBRZJfRJ2jvn9xw40ViT7DbsqbCtpaxIIBOiksbd6GOW4EvStFD+htiK5v0UYblALa0BBJoHGo1gbOelgq+xV/APqXBENSJkOchKX3RithmeIamQRr64GfYjMHg4JJ3is2MstTYB1o87qjGhS3zXxwVY+uOqVF48S40kgH6veNHJS1XRW5SQV4bX97ffE/r55hvN/MhU4z5Jp9PRRnkmJC+xHRljCcoew6AswI91fKrsEAQxdk0YUQ/D75LznfrgQ7le3T3XL1cBhHgSyTFbJGArJHquSYnJAaYeSOPJAn8uzde4/ei/f6NTriDXeWyA94b5/fPZ1j</latexit>

Page 9: Local SGD for non-i.i.d. data · Local SGD for non-i.i.d. data Konstantin Mishchenko Work done together with Ahmed Khaled and Peter Richtárik. Problem Convex min x 1 M XM m=1 f m

The Variance of Local GD

�2f

def=

1

M

MX

m=1

krfm(x⇤)k2<latexit sha1_base64="3gVDmUvpLV4RZgkqaHQ+IBOrrOg=">AAACOnicbVBNaxRBFOyJX3H9WvXopXERoodlZiPoJRD04iWQgJsEtneHN71vNk26e4buN5KlM7/Li7/CmwcvHhTx6g+wZ7MHTSxoKKrq8fpVUWvlKU2/JBvXrt+4eWvzdu/O3Xv3H/QfPjr0VeMkjmWlK3dcgEetLI5Jkcbj2iGYQuNRcfq2848+oPOqsu9pWePUwMKqUkmgKOX9A+HVwkBezkai6oJIQRCeUZhj2bZhp+WidCBD1oa9VvjG5MHsZO1sj4tzYaHQwMvc8K2z/MVzcT4b9fL+IB2mK/CrJFuTAVtjP+9/FvNKNgYtSQ3eT7K0pmkAR0pqbHui8ViDPIUFTiK1YNBPw+r0lj+LypyXlYvPEl+pf08EMN4vTRGTBujEX/Y68X/epKHy9TQoWzeEVl4sKhvNqeJdj3yuHErSy0hAOhX/yuUJxKYoltiVkF0++So5HA2z7eHo4OVg9826jk32hD1lWyxjr9gue8f22ZhJ9pF9Zd/Zj+RT8i35mfy6iG4k65nH7B8kv/8AIC6uEA==</latexit>

xmt+1 = xm

t � �rfm(xmt )

<latexit sha1_base64="/Ss/hIUrOmuSuaZww6WP6qfoGsY=">AAACFXicbVBNSwMxEM36WetX1aOXYBEUteyqoBeh6MWjglWhrctsmq2hSXZJZsWy9E948a948aCIV8Gb/8a09uDXg2Ee782QzItSKSz6/oc3Mjo2PjFZmCpOz8zOzZcWFs9tkhnGayyRibmMwHIpNK+hQMkvU8NBRZJfRJ2jvn9xw40ViT7DbsqbCtpaxIIBOiksbd6GOW4EvStFD+htiK5v0UYblALa0BBJoHGo1gbOelgq+xV/APqXBENSJkOchKX3RithmeIamQRr64GfYjMHg4JJ3is2MstTYB1o87qjGhS3zXxwVY+uOqVF48S40kgH6veNHJS1XRW5SQV4bX97ffE/r55hvN/MhU4z5Jp9PRRnkmJC+xHRljCcoew6AswI91fKrsEAQxdk0YUQ/D75LznfrgQ7le3T3XL1cBhHgSyTFbJGArJHquSYnJAaYeSOPJAn8uzde4/ei/f6NTriDXeWyA94b5/fPZ1j</latexit>

Page 10: Local SGD for non-i.i.d. data · Local SGD for non-i.i.d. data Konstantin Mishchenko Work done together with Ahmed Khaled and Peter Richtárik. Problem Convex min x 1 M XM m=1 f m

Analysis difficulties in local GD

xmt+1 = xm

t � �rfm(xmt )

<latexit sha1_base64="/Ss/hIUrOmuSuaZww6WP6qfoGsY=">AAACFXicbVBNSwMxEM36WetX1aOXYBEUteyqoBeh6MWjglWhrctsmq2hSXZJZsWy9E948a948aCIV8Gb/8a09uDXg2Ee782QzItSKSz6/oc3Mjo2PjFZmCpOz8zOzZcWFs9tkhnGayyRibmMwHIpNK+hQMkvU8NBRZJfRJ2jvn9xw40ViT7DbsqbCtpaxIIBOiksbd6GOW4EvStFD+htiK5v0UYblALa0BBJoHGo1gbOelgq+xV/APqXBENSJkOchKX3RithmeIamQRr64GfYjMHg4JJ3is2MstTYB1o87qjGhS3zXxwVY+uOqVF48S40kgH6veNHJS1XRW5SQV4bX97ffE/r55hvN/MhU4z5Jp9PRRnkmJC+xHRljCcoew6AswI91fKrsEAQxdk0YUQ/D75LznfrgQ7le3T3XL1cBhHgSyTFbJGArJHquSYnJAaYeSOPJAn8uzde4/ei/f6NTriDXeWyA94b5/fPZ1j</latexit>

x̂tdef=

1

M

MX

m=1

xmt

<latexit sha1_base64="lsPe/03Qr5ovpF4QY82tokgmabo=">AAACJ3icbVBBS9xAGJ1Ybe2q7VqPvQwugqclWQV7UcRevAgWuipssmEy+2V3cCYJM1+Ky5B/46V/pZeCiuix/6STNQddfTDweO99fPO9pJDCoO8/egvvFpfef1j+2FpZXfv0ub3+5czkpebQ57nM9UXCDEiRQR8FSrgoNDCVSDhPLr/X/vkv0Ebk2U+cFhApNs5EKjhDJ8Xtg3DCkF7FSMO8zgHaEOEK7QjSqrL7FQ1TzbgNKntShaZUsVX7QTU8qWeGKm53/K4/A31NgoZ0SIPTuH0TjnJeKsiQS2bMIPALjCzTKLiEqhWWBgrGL9kYBo5mTIGJ7OzOim45ZUTTXLuXIZ2pzycsU8ZMVeKSiuHEzHu1+JY3KDH9FlmRFSVCxp8WpaWkmNO6NDoSGjjKqSOMa+H+SvmEuV7QVdZyJQTzJ78mZ71usNPt/djtHB41dSyTr2STbJOA7JFDckxOSZ9wck3+kFty5/32/nr33sNTdMFrZjbIC3j//gNAc6di</latexit>

Page 11: Local SGD for non-i.i.d. data · Local SGD for non-i.i.d. data Konstantin Mishchenko Work done together with Ahmed Khaled and Peter Richtárik. Problem Convex min x 1 M XM m=1 f m

Analysis difficulties in local GD

xmt+1 = xm

t � �rfm(xmt )

<latexit sha1_base64="/Ss/hIUrOmuSuaZww6WP6qfoGsY=">AAACFXicbVBNSwMxEM36WetX1aOXYBEUteyqoBeh6MWjglWhrctsmq2hSXZJZsWy9E948a948aCIV8Gb/8a09uDXg2Ee782QzItSKSz6/oc3Mjo2PjFZmCpOz8zOzZcWFs9tkhnGayyRibmMwHIpNK+hQMkvU8NBRZJfRJ2jvn9xw40ViT7DbsqbCtpaxIIBOiksbd6GOW4EvStFD+htiK5v0UYblALa0BBJoHGo1gbOelgq+xV/APqXBENSJkOchKX3RithmeIamQRr64GfYjMHg4JJ3is2MstTYB1o87qjGhS3zXxwVY+uOqVF48S40kgH6veNHJS1XRW5SQV4bX97ffE/r55hvN/MhU4z5Jp9PRRnkmJC+xHRljCcoew6AswI91fKrsEAQxdk0YUQ/D75LznfrgQ7le3T3XL1cBhHgSyTFbJGArJHquSYnJAaYeSOPJAn8uzde4/ei/f6NTriDXeWyA94b5/fPZ1j</latexit>

Vtdef=

1

M

MX

m=1

kxmt � x̂tk2

<latexit sha1_base64="CsDFHYoojcI5Ak0ompm+p/cvt9E=">AAACM3icbVBBSxwxGM1otbpVu9ajl9BF6MVlZi3oRRB7EUGw0F2Fze6QyX7jBpOZIfmmuMT5T178Ix4E8aCI1/6HZtc9tOqDwMt77yP5XlIoaTEM74KZ2Q9z8x8XFmuflpZXPtdXv3RsXhoBbZGr3Jwm3IKSGbRRooLTwgDXiYKT5PzH2D/5DcbKPPuFowJ6mp9lMpWCo5fi+mEnRsrycQTQMYQLdANIq8rtVpSlhgsXVe6oYrbUsdO7UdU/ouzyIsa+ppuUDTlSf2GX/VZcb4TNcAL6lkRT0iBTHMf1GzbIRakhQ6G4td0oLLDnuEEpFFQ1VloouDjnZ9D1NOMabM9Ndq7ohlcGNM2NPxnSifrvhOPa2pFOfFJzHNrX3lh8z+uWmO70nMyKEiETLw+lpaKY03GBdCANCFQjT7gw0v+ViiH3RaHvsOZLiF6v/JZ0Ws1oq9n6+b2xtz+tY4Gsk6/kG4nINtkjB+SYtIkgV+SWPJDH4Dq4D56C55foTDCdWSP/IfjzFw7Kq7A=</latexit>

x̂tdef=

1

M

MX

m=1

xmt

<latexit sha1_base64="lsPe/03Qr5ovpF4QY82tokgmabo=">AAACJ3icbVBBS9xAGJ1Ybe2q7VqPvQwugqclWQV7UcRevAgWuipssmEy+2V3cCYJM1+Ky5B/46V/pZeCiuix/6STNQddfTDweO99fPO9pJDCoO8/egvvFpfef1j+2FpZXfv0ub3+5czkpebQ57nM9UXCDEiRQR8FSrgoNDCVSDhPLr/X/vkv0Ebk2U+cFhApNs5EKjhDJ8Xtg3DCkF7FSMO8zgHaEOEK7QjSqrL7FQ1TzbgNKntShaZUsVX7QTU8qWeGKm53/K4/A31NgoZ0SIPTuH0TjnJeKsiQS2bMIPALjCzTKLiEqhWWBgrGL9kYBo5mTIGJ7OzOim45ZUTTXLuXIZ2pzycsU8ZMVeKSiuHEzHu1+JY3KDH9FlmRFSVCxp8WpaWkmNO6NDoSGjjKqSOMa+H+SvmEuV7QVdZyJQTzJ78mZ71usNPt/djtHB41dSyTr2STbJOA7JFDckxOSZ9wck3+kFty5/32/nr33sNTdMFrZjbIC3j//gNAc6di</latexit>

gtdef=

1

M

MX

m=1

rfm(xmt )

<latexit sha1_base64="u6d56ApFiGX4woHMZucYOmcywxk=">AAACLnicbVBNSyQxFEyr6+qsq7N69BIcFvQydKvgXgRRBC+CgqPC9NikM6/HYJJukteLQ+hf5MW/ogdBRbz6M8yMc/CrIFBU1ePlVVpIYTEM74Ox8Ykfkz+npmu/Zn7PztX/zB/bvDQcWjyXuTlNmQUpNLRQoITTwgBTqYST9GJn4J/8B2NFro+wX0BHsZ4WmeAMvZTUd3sJ0jgfRABdjHCJrgtZVbnNisaZYdxFlduvYluqxKnNqDrbp7FmqWQ0S9TyZYJnaiWpN8JmOAT9SqIRaZARDpL6bdzNealAI5fM2nYUFthxzKDgEqpaXFooGL9gPWh7qpkC23HDcyv61ytdmuXGP410qL6fcExZ21epTyqG5/azNxC/89olZv86TuiiRND8bVFWSoo5HXRHu8IAR9n3hHEj/F8pP2e+I/T11XwJ0eeTv5Lj1Wa01lw9XG9sbY/qmCKLZIksk4hskC2yRw5Ii3ByRW7IA3kMroO74Cl4fouOBaOZBfIBwcsrVTCp2w==</latexit>

Page 12: Local SGD for non-i.i.d. data · Local SGD for non-i.i.d. data Konstantin Mishchenko Work done together with Ahmed Khaled and Peter Richtárik. Problem Convex min x 1 M XM m=1 f m

Theorem

Choose H such that H pTpM, then � =

pM

4LpT 1

4HL, and hence

f(x̂T )� f(x⇤) 8Lkx0 � x⇤k2p

MT+

3M�2fH2

2LT.

To get a convergence rate of 1/pMT we can choose H = O(T 1/4M�3/4), which

implies a total number of ⌦(T 3/4M3/4) communication steps. If a rate of 1/pT

is desired instead, we can choose a larger H = O(T 1/4).<latexit sha1_base64="YC/RF8b/Ff3DJwq81jzCNEnsZsg=">AAAECnicbVNLb9NAEHYTHiW8WjhyGVGDGmjTOKlEL5UqeglSqxYpfUh1Eq3XY2dV7zr1rmkq12cu/BUuHECIK7+AG/+GcWJVbcpIlsYz38x88+2uN4qENs3m37lK9c7de/fnH9QePnr85OnC4rNDHacJxwMeR3Fy7DGNkVB4YISJ8HiUIJNehEfe6XaRP/qEiRax6pqLEfYkC5UIBGeGQoPFCrgGxybbHsaxRsjtjg065UMwQ2bA7oAb4Rm4QcJ45uqzxGTdPC+93Ty3VwiICmw3ZFIy2LwBJUC2DjtwVXi9m1PkOrBT9GDKB2rDEWpA5noYCpWxSITqTT4JFRYsu0QqG+eDbh1W6Xc8eFOH19d6bhTDLseDJqUp6V72W1dkoZj/tgS2YZdYiVCyQdBvQafAtai4mzemDFD5N+Z3YwjRAAMeK9IznJBNmEGIA7CdtXJIN7fhHIEzBXyqKEm4CXvL3X7mrK3nsNvPVtvk1Gnr86EgoYWkc0ZNrU1sWAQqlR4mk7bunsSQFbXtsnZaSiSkTFV5iKANjnQDPgTU4xalgpHQ4KMWCfogFKGZvzLDkkHEaKlkhm7dbtQGC0vNRnNicNtxSmfJKm1/sPDH9WOeSlSGR0zrE6c5Mr2MJUbwCPOam2ocMX7KQjwhVzGJupdNrnIOryjiQxAn9CkDk+j1ioxJrS+kR0jJzFDP5org/3InqQk2eplQo9TQ2U0HBWlEokPxLsAnebiJLshhPBHElbRhdFcMvZ5CBGd25dvOYavhtButj62lrfelHPPWC+ultWw51jtry+pY+9aBxSufK18r3ys/ql+q36o/q7+m0MpcWfPcumHV3/8A25xABA==</latexit>

Page 13: Local SGD for non-i.i.d. data · Local SGD for non-i.i.d. data Konstantin Mishchenko Work done together with Ahmed Khaled and Peter Richtárik. Problem Convex min x 1 M XM m=1 f m

Plots

Page 14: Local SGD for non-i.i.d. data · Local SGD for non-i.i.d. data Konstantin Mishchenko Work done together with Ahmed Khaled and Peter Richtárik. Problem Convex min x 1 M XM m=1 f m

Plots

Page 15: Local SGD for non-i.i.d. data · Local SGD for non-i.i.d. data Konstantin Mishchenko Work done together with Ahmed Khaled and Peter Richtárik. Problem Convex min x 1 M XM m=1 f m

Local SGD

xmt+1 = xm

t � �rfm(xmt ; ⇠mt )

<latexit sha1_base64="WcKttcLHGZxiDYbIuisgnoie8vI=">AAACHnicbZBdaxNBFIZno7Yx1natl94MhkKKGHbTFoVSCHrjZQrNB2TjcnYymwyZmV1mzkrCkl/iTf+KN14oInil/8bJx4WmPTDMw/uew8x5k1wKi0Hwx6s8ePhob7/6uPbk4Onhkf/suGezwjDeZZnMzCABy6XQvIsCJR/khoNKJO8ns/crv/+JGysyfYOLnI8UTLRIBQN0UuxfzOMSX4XLj4pe0XmM7n5NowkoBTTSkEigaawaa+eSRnOxgtPYrwfNYF30LoRbqJNtdWL/VzTOWKG4RibB2mEY5DgqwaBgki9rUWF5DmwGEz50qEFxOyrX6y3piVPGNM2MOxrpWv13ogRl7UIlrlMBTu2utxLv84YFpm9HpdB5gVyzzUNpISlmdJUVHQvDGcqFA2BGuL9SNgUDDF2iNRdCuLvyXei1muFZs3V9Xm+/28ZRJS/IS9IgIXlD2uQD6ZAuYeQz+UK+ke/erffV++H93LRWvO3Mc/Jfeb//AqHHoPM=</latexit>

E⇠krfm(x; ⇠)�rfm(x)k2 �2<latexit sha1_base64="Pbky/O4rKehZK6sSB0yrhYVciq4=">AAACLHicbVBNSwMxFMz6bf2qevQSLIIeLLtVUPBSFMGjgq1CU5e3abaGJtklyUrL2h/kxb8iiAdFvPo7TGsPVh0IDDPv8TITpYIb6/tv3sTk1PTM7Nx8YWFxaXmluLpWN0mmKavRRCT6OgLDBFesZrkV7DrVDGQk2FXUORn4V3dMG56oS9tLWVNCW/GYU7BOCosnRIK9jaL8tB+SLif3mCiIBOA4lNvdI+y0Hbw7Ju6Q+5sKEQwTw9sSbiphseSX/SHwXxKMSAmNcB4Wn0kroZlkylIBxjQCP7XNHLTlVLB+gWSGpUA70GYNRxVIZpr5MGwfbzmlheNEu6csHqo/N3KQxvRk5CYH0cxvbyD+5zUyGx82c67SzDJFvw/FmcA2wYPmcItrRq3oOQJUc/dXTG9BA7Wu34IrIfgd+S+pV8rBXrlysV+qHo/qmEMbaBNtowAdoCo6Q+eohih6QE/oFb15j96L9+59fI9OeKOddTQG7/MLkW2miQ==</latexit>

E⇠krfm(x; ⇠)�rfm(x)k2 4LDfm(x, x⇤) + 2�2<latexit sha1_base64="ckeNncuaFw0mul06bUCCtt3lbTM=">AAACQ3icbZBNTxsxEIa9QCkNX6E99jIiQkpKG+1ukajUC+qH1AMHKjWAiMNq1vEGC9u7sr0o0ZL/1kv/QG/9A7300KriioQTcqDQV7L06pkZjedNCymsC8Mfwdz8wqPFx0tPassrq2vr9Y2nhzYvDeMdlsvcHKdouRSad5xwkh8XhqNKJT9Kz99P6kcX3FiR6y9uVPCewoEWmWDoPErqJ1ShO0vT6uM4oUMB9JJqTCVClqjm8C141oJXcBe26OVpDFRy2NmHD0nl4bg5fAnD5EULtiGmVgwUnsZJvRG2w6ngoYlmpkFmOkjq32k/Z6Xi2jGJ1najsHC9Co0TTPJxjZaWF8jOccC73mpU3PaqaQZj2PKkD1lu/NMOpvTuRIXK2pFKfefkYnu/NoH/q3VLl73pVUIXpeOa3S7KSgkuh0mg0BeGMydH3iAzwv8V2BkaZM7HXvMhRPdPfmgO43b0uh1/3mnsvZvFsUSek03SJBHZJXvkEzkgHcLIV/KT/CZ/gm/Br+BvcHXbOhfMZp6RfxRc3wC4G62v</latexit>

Page 16: Local SGD for non-i.i.d. data · Local SGD for non-i.i.d. data Konstantin Mishchenko Work done together with Ahmed Khaled and Peter Richtárik. Problem Convex min x 1 M XM m=1 f m

Local SGD

xmt+1 = xm

t � �rfm(xmt ; ⇠mt )

<latexit sha1_base64="WcKttcLHGZxiDYbIuisgnoie8vI=">AAACHnicbZBdaxNBFIZno7Yx1natl94MhkKKGHbTFoVSCHrjZQrNB2TjcnYymwyZmV1mzkrCkl/iTf+KN14oInil/8bJx4WmPTDMw/uew8x5k1wKi0Hwx6s8ePhob7/6uPbk4Onhkf/suGezwjDeZZnMzCABy6XQvIsCJR/khoNKJO8ns/crv/+JGysyfYOLnI8UTLRIBQN0UuxfzOMSX4XLj4pe0XmM7n5NowkoBTTSkEigaawaa+eSRnOxgtPYrwfNYF30LoRbqJNtdWL/VzTOWKG4RibB2mEY5DgqwaBgki9rUWF5DmwGEz50qEFxOyrX6y3piVPGNM2MOxrpWv13ogRl7UIlrlMBTu2utxLv84YFpm9HpdB5gVyzzUNpISlmdJUVHQvDGcqFA2BGuL9SNgUDDF2iNRdCuLvyXei1muFZs3V9Xm+/28ZRJS/IS9IgIXlD2uQD6ZAuYeQz+UK+ke/erffV++H93LRWvO3Mc/Jfeb//AqHHoPM=</latexit>

E⇠krfm(x; ⇠)�rfm(x)k2 4LDfm(x, x⇤) + 2�2<latexit sha1_base64="ckeNncuaFw0mul06bUCCtt3lbTM=">AAACQ3icbZBNTxsxEIa9QCkNX6E99jIiQkpKG+1ukajUC+qH1AMHKjWAiMNq1vEGC9u7sr0o0ZL/1kv/QG/9A7300KriioQTcqDQV7L06pkZjedNCymsC8Mfwdz8wqPFx0tPassrq2vr9Y2nhzYvDeMdlsvcHKdouRSad5xwkh8XhqNKJT9Kz99P6kcX3FiR6y9uVPCewoEWmWDoPErqJ1ShO0vT6uM4oUMB9JJqTCVClqjm8C141oJXcBe26OVpDFRy2NmHD0nl4bg5fAnD5EULtiGmVgwUnsZJvRG2w6ngoYlmpkFmOkjq32k/Z6Xi2jGJ1najsHC9Co0TTPJxjZaWF8jOccC73mpU3PaqaQZj2PKkD1lu/NMOpvTuRIXK2pFKfefkYnu/NoH/q3VLl73pVUIXpeOa3S7KSgkuh0mg0BeGMydH3iAzwv8V2BkaZM7HXvMhRPdPfmgO43b0uh1/3mnsvZvFsUSek03SJBHZJXvkEzkgHcLIV/KT/CZ/gm/Br+BvcHXbOhfMZp6RfxRc3wC4G62v</latexit>

�difdef=

1

M

MX

m=1

E⇠krfm(x⇤, ⇠)k2<latexit sha1_base64="AGNcpAGGCHJmGAS2BlYOWiwgkn0=">AAACVXicbZBfaxQxFMUzY611/dNVH30JLkIVWWa2QvtSKIrgS6GC2xY22yGTvbMNTTJDckd2SfMl+yJ+E18EM9sVtPVC4PA79ya5p2yUdJhlP5L03sb9zQdbD3uPHj95ut1/9vzE1a0VMBa1qu1ZyR0oaWCMEhWcNRa4LhWclpcfO//0G1gna/MVlw1MNZ8bWUnBMaKir5iTc80LzxAW6GeyCoHV3QTgHwaR+YPAKsuFz4M/Csy1uvD6IA/nR5Rpjhdl6T+FeMtCBnbFDC8Vp1WhdxbF23c00jfs6nxU9AfZMFsVvSvytRiQdR0X/Ws2q0WrwaBQ3LlJnjU49dyiFApCj7UOGi4u+RwmURquwU39KpVAX0cyo1Vt4zFIV/TvCc+1c0tdxs5uA3fb6+D/vEmL1f7US9O0CEbcPFS1imJNu4jpTFoQqJZRcGFl/CsVFzyGhzHWXgwhv73yXXEyGua7w9GX94PDD+s4tshL8orskJzskUPymRyTMRHkmvxMkiRNvie/0o1086Y1TdYzL8g/lW7/BjV3tks=</latexit>

Page 17: Local SGD for non-i.i.d. data · Local SGD for non-i.i.d. data Konstantin Mishchenko Work done together with Ahmed Khaled and Peter Richtárik. Problem Convex min x 1 M XM m=1 f m

Local SGD

xmt+1 = xm

t � �rfm(xmt ; ⇠mt )

<latexit sha1_base64="WcKttcLHGZxiDYbIuisgnoie8vI=">AAACHnicbZBdaxNBFIZno7Yx1natl94MhkKKGHbTFoVSCHrjZQrNB2TjcnYymwyZmV1mzkrCkl/iTf+KN14oInil/8bJx4WmPTDMw/uew8x5k1wKi0Hwx6s8ePhob7/6uPbk4Onhkf/suGezwjDeZZnMzCABy6XQvIsCJR/khoNKJO8ns/crv/+JGysyfYOLnI8UTLRIBQN0UuxfzOMSX4XLj4pe0XmM7n5NowkoBTTSkEigaawaa+eSRnOxgtPYrwfNYF30LoRbqJNtdWL/VzTOWKG4RibB2mEY5DgqwaBgki9rUWF5DmwGEz50qEFxOyrX6y3piVPGNM2MOxrpWv13ogRl7UIlrlMBTu2utxLv84YFpm9HpdB5gVyzzUNpISlmdJUVHQvDGcqFA2BGuL9SNgUDDF2iNRdCuLvyXei1muFZs3V9Xm+/28ZRJS/IS9IgIXlD2uQD6ZAuYeQz+UK+ke/erffV++H93LRWvO3Mc/Jfeb//AqHHoPM=</latexit>

E⇠krfm(x; ⇠)�rfm(x)k2 4LDfm(x, x⇤) + 2�2<latexit sha1_base64="ckeNncuaFw0mul06bUCCtt3lbTM=">AAACQ3icbZBNTxsxEIa9QCkNX6E99jIiQkpKG+1ukajUC+qH1AMHKjWAiMNq1vEGC9u7sr0o0ZL/1kv/QG/9A7300KriioQTcqDQV7L06pkZjedNCymsC8Mfwdz8wqPFx0tPassrq2vr9Y2nhzYvDeMdlsvcHKdouRSad5xwkh8XhqNKJT9Kz99P6kcX3FiR6y9uVPCewoEWmWDoPErqJ1ShO0vT6uM4oUMB9JJqTCVClqjm8C141oJXcBe26OVpDFRy2NmHD0nl4bg5fAnD5EULtiGmVgwUnsZJvRG2w6ngoYlmpkFmOkjq32k/Z6Xi2jGJ1najsHC9Co0TTPJxjZaWF8jOccC73mpU3PaqaQZj2PKkD1lu/NMOpvTuRIXK2pFKfefkYnu/NoH/q3VLl73pVUIXpeOa3S7KSgkuh0mg0BeGMydH3iAzwv8V2BkaZM7HXvMhRPdPfmgO43b0uh1/3mnsvZvFsUSek03SJBHZJXvkEzkgHcLIV/KT/CZ/gm/Br+BvcHXbOhfMZp6RfxRc3wC4G62v</latexit>

Df (x, y) = f(x)� f(y)� hrf(y), x� yi<latexit sha1_base64="WUrQxD2lyKmIR1GM2GTS3ZNXqgY=">AAACInicbVBNSwMxEM3Wr1q/qh69BIvQgpbdKqgHoagHjxVsFbqlzKbZNjSbXZKsdCn9LV78K148KOpJ8MeYbnvQ6oMwj/dmmMzzIs6Utu1PKzM3v7C4lF3OrayurW/kN7caKowloXUS8lDeeaAoZ4LWNdOc3kWSQuBxeuv1L8b+7T2VioXiRicRbQXQFcxnBLSR2vnTy7ZfHOzjpITPsGElfGBKMi4uB9HlFLsCPA6puo8HB4krU72dL9hlOwX+S5wpKaApau38u9sJSRxQoQkHpZqOHenWEKRmhNNRzo0VjYD0oUubhgoIqGoN0xNHeM8oHeyH0jyhcar+nBhCoFQSeKYzAN1Ts95Y/M9rxto/aQ2ZiGJNBZks8mOOdYjHeeEOk5RonhgCRDLzV0x6IIFok2rOhODMnvyXNCpl57BcuT4qVM+ncWTRDtpFReSgY1RFV6iG6oigB/SEXtCr9Wg9W2/Wx6Q1Y01nttEvWF/fhoagDg==</latexit>

�difdef=

1

M

MX

m=1

E⇠krfm(x⇤, ⇠)k2<latexit sha1_base64="AGNcpAGGCHJmGAS2BlYOWiwgkn0=">AAACVXicbZBfaxQxFMUzY611/dNVH30JLkIVWWa2QvtSKIrgS6GC2xY22yGTvbMNTTJDckd2SfMl+yJ+E18EM9sVtPVC4PA79ya5p2yUdJhlP5L03sb9zQdbD3uPHj95ut1/9vzE1a0VMBa1qu1ZyR0oaWCMEhWcNRa4LhWclpcfO//0G1gna/MVlw1MNZ8bWUnBMaKir5iTc80LzxAW6GeyCoHV3QTgHwaR+YPAKsuFz4M/Csy1uvD6IA/nR5Rpjhdl6T+FeMtCBnbFDC8Vp1WhdxbF23c00jfs6nxU9AfZMFsVvSvytRiQdR0X/Ws2q0WrwaBQ3LlJnjU49dyiFApCj7UOGi4u+RwmURquwU39KpVAX0cyo1Vt4zFIV/TvCc+1c0tdxs5uA3fb6+D/vEmL1f7US9O0CEbcPFS1imJNu4jpTFoQqJZRcGFl/CsVFzyGhzHWXgwhv73yXXEyGua7w9GX94PDD+s4tshL8orskJzskUPymRyTMRHkmvxMkiRNvie/0o1086Y1TdYzL8g/lW7/BjV3tks=</latexit>

Page 18: Local SGD for non-i.i.d. data · Local SGD for non-i.i.d. data Konstantin Mishchenko Work done together with Ahmed Khaled and Peter Richtárik. Problem Convex min x 1 M XM m=1 f m

Theorem

Choose H such that H pTpM, then � =

pM

8LpT 1

8HLand

Ef(x̂T )� f(x⇤) 32Lkx̂0�x⇤k2

pMT

+ 5�2dif

2LpMT

+ �2difM(H�1)2

4LT.

<latexit sha1_base64="DjE2je4R90UfzEDskAkE7Lisr2E=">AAADNHicdVJbaxNBFJ5dbzXeUn305WBWSNSG3a1iX4RiEYI0UCFpC9m4zM7OJkP31p1ZaZjOj/LFH+KLCD4o4qu/wdlkDUmrBwa+Oef7zm0myGPGhW1/NcwrV69dv7Fxs3Hr9p2795qb9w95VhaEDkkWZ8VxgDmNWUqHgomYHucFxUkQ06PgZK+KH32gBWdZOhCznI4TPElZxAgW2uVvGm/3plnGKVg9C3hJpiCmWOgbeDE9BS 8qMJEePy2EHChVo75S1jNNpClY3gQnCYZXa1RNkDuwD0vhajanivVgX1mA0xA8rwHaLC/BYhoE8o2SEbQ93YU8U/6gA1ug72f+k85alm23yn/+l2drmuZ45+/dZZdQFX5a81/oZtgkwb6cFyoSGbJIqYruLjvtryr+w4c+tHtbTqdSPtfKgeouBmj4zZbdtecGl4FTgxaq7cBvfvbCjJQJTQWJMecjx87FWOJCMBJT1fBKTnNMTvCEjjRMcUL5WM4fXcFj7Qkhygp9UgFz76pC4oTzWRJoZjUAvxirnP+KjUoR7YwlS/NS0JQsCkVlDCKD6gdByApKRDzTAJOC6V6BTLHemND/rFqCc3Hky+DQ7TrbXfed29p9Xa9jAz1Ej1AbOegl2kU9dICGiBgfjS/Gd+OH+cn8Zv40fy2oplFrHqA1M3//AXKwAGE=</latexit>

Page 19: Local SGD for non-i.i.d. data · Local SGD for non-i.i.d. data Konstantin Mishchenko Work done together with Ahmed Khaled and Peter Richtárik. Problem Convex min x 1 M XM m=1 f m

Theorem

Choose H such that H pTpM, then � =

pM

8LpT 1

8HLand

Ef(x̂T )� f(x⇤) 32Lkx̂0�x⇤k2

pMT

+ 5�2dif

2LpMT

+ �2difM(H�1)2

4LT.

<latexit sha1_base64="DjE2je4R90UfzEDskAkE7Lisr2E=">AAADNHicdVJbaxNBFJ5dbzXeUn305WBWSNSG3a1iX4RiEYI0UCFpC9m4zM7OJkP31p1ZaZjOj/LFH+KLCD4o4qu/wdlkDUmrBwa+Oef7zm0myGPGhW1/NcwrV69dv7Fxs3Hr9p2795qb9w95VhaEDkkWZ8VxgDmNWUqHgomYHucFxUkQ06PgZK+KH32gBWdZOhCznI4TPElZxAgW2uVvGm/3plnGKVg9C3hJpiCmWOgbeDE9BS 8qMJEePy2EHChVo75S1jNNpClY3gQnCYZXa1RNkDuwD0vhajanivVgX1mA0xA8rwHaLC/BYhoE8o2SEbQ93YU8U/6gA1ug72f+k85alm23yn/+l2drmuZ45+/dZZdQFX5a81/oZtgkwb6cFyoSGbJIqYruLjvtryr+w4c+tHtbTqdSPtfKgeouBmj4zZbdtecGl4FTgxaq7cBvfvbCjJQJTQWJMecjx87FWOJCMBJT1fBKTnNMTvCEjjRMcUL5WM4fXcFj7Qkhygp9UgFz76pC4oTzWRJoZjUAvxirnP+KjUoR7YwlS/NS0JQsCkVlDCKD6gdByApKRDzTAJOC6V6BTLHemND/rFqCc3Hky+DQ7TrbXfed29p9Xa9jAz1Ej1AbOegl2kU9dICGiBgfjS/Gd+OH+cn8Zv40fy2oplFrHqA1M3//AXKwAGE=</latexit>

Optimal H is H = 1 + bT 1/4M

�3/2c<latexit sha1_base64="tF+C5a4MxsYBIJ56MuCoJRy5jCs=">AAACInicbVBNS8NAEN3Ur1q/qh69DLaCINakFdSDUPTSi1ihVaGpZbPd6NJNNuxuhBL6W7z4V7x4UNST4I9x2+ag1neZx3szzMzzIs6Utu1PKzM1PTM7l53PLSwuLa/kV9culYgloU0iuJDXHlaUs5A2NdOcXkeS4sDj9MrrnQ79q3sqFRNhQ/cj2g7wbch8RrA2Uid/dB5pFmAOxVoRmDIFjsGBHXC5z4WQ0LhJnL39AZzdJLuVvfIAXDkyip18wS7ZI8AkcVJSQCnqnfy72xUkDmioCcdKtRw70u0ES80Ip4OcGysaYdLDt7RlaIgDqtrJ6MUBbBmlC745yBehhpH6cyLBgVL9wDOdAdZ36q83FP/zWrH2D9sJC6NY05CMF/kxBy1gmBd0maRE874hmEhmbgVyhyUm2qSaMyE4f1+eJJflklMplS/KhepJGkcWbaBNtI0cdICqqIbqqIkIekBP6AW9Wo/Ws/VmfYxbM1Y6s45+wfr6Bp/FoA4=</latexit>

Page 20: Local SGD for non-i.i.d. data · Local SGD for non-i.i.d. data Konstantin Mishchenko Work done together with Ahmed Khaled and Peter Richtárik. Problem Convex min x 1 M XM m=1 f m

Theorem

Choose H such that H pTpM, then � =

pM

8LpT 1

8HLand

Ef(x̂T )� f(x⇤) 32Lkx̂0�x⇤k2

pMT

+ 5�2dif

2LpMT

+ �2difM(H�1)2

4LT.

<latexit sha1_base64="DjE2je4R90UfzEDskAkE7Lisr2E=">AAADNHicdVJbaxNBFJ5dbzXeUn305WBWSNSG3a1iX4RiEYI0UCFpC9m4zM7OJkP31p1ZaZjOj/LFH+KLCD4o4qu/wdlkDUmrBwa+Oef7zm0myGPGhW1/NcwrV69dv7Fxs3Hr9p2795qb9w95VhaEDkkWZ8VxgDmNWUqHgomYHucFxUkQ06PgZK+KH32gBWdZOhCznI4TPElZxAgW2uVvGm/3plnGKVg9C3hJpiCmWOgbeDE9BS 8qMJEePy2EHChVo75S1jNNpClY3gQnCYZXa1RNkDuwD0vhajanivVgX1mA0xA8rwHaLC/BYhoE8o2SEbQ93YU8U/6gA1ug72f+k85alm23yn/+l2drmuZ45+/dZZdQFX5a81/oZtgkwb6cFyoSGbJIqYruLjvtryr+w4c+tHtbTqdSPtfKgeouBmj4zZbdtecGl4FTgxaq7cBvfvbCjJQJTQWJMecjx87FWOJCMBJT1fBKTnNMTvCEjjRMcUL5WM4fXcFj7Qkhygp9UgFz76pC4oTzWRJoZjUAvxirnP+KjUoR7YwlS/NS0JQsCkVlDCKD6gdByApKRDzTAJOC6V6BTLHemND/rFqCc3Hky+DQ7TrbXfed29p9Xa9jAz1Ej1AbOegl2kU9dICGiBgfjS/Gd+OH+cn8Zv40fy2oplFrHqA1M3//AXKwAGE=</latexit>

Optimal H is H = 1 + bT 1/4M

�3/2c<latexit sha1_base64="tF+C5a4MxsYBIJ56MuCoJRy5jCs=">AAACInicbVBNS8NAEN3Ur1q/qh69DLaCINakFdSDUPTSi1ihVaGpZbPd6NJNNuxuhBL6W7z4V7x4UNST4I9x2+ag1neZx3szzMzzIs6Utu1PKzM1PTM7l53PLSwuLa/kV9culYgloU0iuJDXHlaUs5A2NdOcXkeS4sDj9MrrnQ79q3sqFRNhQ/cj2g7wbch8RrA2Uid/dB5pFmAOxVoRmDIFjsGBHXC5z4WQ0LhJnL39AZzdJLuVvfIAXDkyip18wS7ZI8AkcVJSQCnqnfy72xUkDmioCcdKtRw70u0ES80Ip4OcGysaYdLDt7RlaIgDqtrJ6MUBbBmlC745yBehhpH6cyLBgVL9wDOdAdZ36q83FP/zWrH2D9sJC6NY05CMF/kxBy1gmBd0maRE874hmEhmbgVyhyUm2qSaMyE4f1+eJJflklMplS/KhepJGkcWbaBNtI0cdICqqIbqqIkIekBP6AW9Wo/Ws/VmfYxbM1Y6s45+wfr6Bp/FoA4=</latexit>

Improves to H = 1 + bT 1/2M

�3/2c<latexit sha1_base64="JfJfUfc8J0YKg4zlvHuGBzOCy/E=">AAACIHicbZDLSsNAFIYnXmu9VV26GWwFQaxJu6gboeimLoQKvUFvTKaTdugkE2YmhRL6KG58FTcuFNGdPo2TNAttPauP/z+Hc85v+4xKZZpfxsrq2vrGZmorvb2zu7efOThsSB4ITOqYMy5aNpKEUY/UFVWMtHxBkGsz0rTHt5HfnBAhKfdqauqTrouGHnUoRkpL/UzpzvUFnxAJFYe5CryGFjyHHeYwzgWs9ULrsjCD973wohhBR8RGDvYzWTNvxgWXwUogC5Kq9jOfnQHHgUs8hRmSsm2ZvuqGSCiKGZmlO4EkPsJjNCRtjR5yieyG8YMzeKqVAXT0RQ73FIzV3xMhcqWcurbudJEayUUvEv/z2oFyrroh9fxAEQ/PFzkBi8KI0oIDKghWbKoBYUH1rRCPkEBY6UzTOgRr8eVlaBTyVjFfeChkyzdJHClwDE7AGbBACZRBBVRBHWDwCJ7BK3gznowX4934mLeuGMnMEfhTxvcPQn+f7g==</latexit>

if Ekrfm(x; ⇠)�rfm(x)k2 �2<latexit sha1_base64="oSqsx+n4enfF0XPe2HJ93e3uRF8=">AAACLHicbVDLSgMxFM3Ud31VXboJtoIuLDPjQsGNKIJLBauFppY7aaaGJpkhyYhl7Ae58VcEcaGIW7/DtHZRHwcCh3Pu5eacKBXcWN9/8woTk1PTM7NzxfmFxaXl0srqpUkyTVmNJiLR9QgME1yxmuVWsHqqGchIsKuoezzwr26ZNjxRF7aXsqaEjuIxp2Cd1Cod8xhXiAR7E0X5SZ/cEwWRABy35NbdASZ3fBvv4HFxm9xfh0QwTAzvSLgOK61S2a/6Q+C/JBiRMhrhrFV6Ju2EZpIpSwUY0wj81DZz0JZTwfpFkhmWAu1ChzUcVSCZaebDsH286ZQ2jhPtnrJ4qI5v5CCN6cnITQ5imd/eQPzPa2Q23m/mXKWZZYp+H4ozgW2CB83hNteMWtFzBKjm7q+Y3oAGal2/RVdC8DvyX3IZVoPdangelg+PRnXMonW0gbZQgPbQITpFZ6iGKHpAT+gVvXmP3ov37n18jxa80c4a+gHv8wuvqqYC</latexit>

Page 21: Local SGD for non-i.i.d. data · Local SGD for non-i.i.d. data Konstantin Mishchenko Work done together with Ahmed Khaled and Peter Richtárik. Problem Convex min x 1 M XM m=1 f m

Plot

Page 22: Local SGD for non-i.i.d. data · Local SGD for non-i.i.d. data Konstantin Mishchenko Work done together with Ahmed Khaled and Peter Richtárik. Problem Convex min x 1 M XM m=1 f m

Open questions

Meta-Learning<latexit sha1_base64="9wBgO6sksKgNqJXrL3BFOSczoWE=">AAAB9HicbVA9SwNBEN3zM8avqKXNYhBsDHex0DJoY6EQwXxAcoS5zVyyZG/v3N0LhJDfYWOhiK0/xs5/4ya5QhMfDDzem2FmXpAIro3rfjsrq2vrG5u5rfz2zu7efuHgsK7jVDGssVjEqhmARsEl1gw3ApuJQogCgY1gcDP1G0NUmsfy0YwS9CPoSR5yBsZK/j0aOL9DUJLLXqdQdEvuDHSZeBkpkgzVTuGr3Y1ZGqE0TIDWLc9NjD8GZTgTOMm3U40JsAH0sGWphAi1P54dPaGnVunSMFa2pKEz9ffEGCKtR1FgOyMwfb3oTcX/vFZqwit/zGWSGpRsvihMBTUxnSZAu1whM2JkCTDF7a2U9UEBMzanvA3BW3x5mdTLJe+iVH4oFyvXWRw5ckxOyBnxyCWpkFtSJTXCyBN5Jq/kzRk6L8678zFvXXGymSPyB87nD3JHkeI=</latexit>

We can learn an ”improvable” model<latexit sha1_base64="cf1mxSBI3k5WISD1FFcFGxPGR48=">AAACC3icbZC7TgJBFIZn8Y431NJmAjGxIrtYaEm0sdREhAQIOTscYMJcNjOzJITQ2/gqNhYaY+sL2Pk2DriFgn8yyZf/nDMz548Twa0Lw68gt7K6tr6xuZXf3tnd2y8cHN5bnRqGNaaFNo0YLAqusOa4E9hIDIKMBdbj4dWsXh+hsVyrOzdOsC2hr3iPM3De6hSKdaQMFBUIRlEPRS4To0fgLyhSqbsoOoVSWA7nossQZVAimW46hc9WV7NUonJMgLXNKExcewLGcSZwmm+lFhNgQ+hj06MCibY9me8ypSfe6dKeNv4oR+fu74kJSGvHMvadEtzALtZm5n+1Zup6F+0JV0nqULGfh3qpoE7TWTC0yw0yJ8YegBnu/0rZAAww5+PL+xCixZWX4b5Sjs7KldtKqXqZxbFJjkmRnJKInJMquSY3pEYYeSBP5IW8Bo/Bc/AWvP+05oJs5oj8UfDxDVGamfY=</latexit>

Page 23: Local SGD for non-i.i.d. data · Local SGD for non-i.i.d. data Konstantin Mishchenko Work done together with Ahmed Khaled and Peter Richtárik. Problem Convex min x 1 M XM m=1 f m

Open questions

Meta-Learning<latexit sha1_base64="9wBgO6sksKgNqJXrL3BFOSczoWE=">AAAB9HicbVA9SwNBEN3zM8avqKXNYhBsDHex0DJoY6EQwXxAcoS5zVyyZG/v3N0LhJDfYWOhiK0/xs5/4ya5QhMfDDzem2FmXpAIro3rfjsrq2vrG5u5rfz2zu7efuHgsK7jVDGssVjEqhmARsEl1gw3ApuJQogCgY1gcDP1G0NUmsfy0YwS9CPoSR5yBsZK/j0aOL9DUJLLXqdQdEvuDHSZeBkpkgzVTuGr3Y1ZGqE0TIDWLc9NjD8GZTgTOMm3U40JsAH0sGWphAi1P54dPaGnVunSMFa2pKEz9ffEGCKtR1FgOyMwfb3oTcX/vFZqwit/zGWSGpRsvihMBTUxnSZAu1whM2JkCTDF7a2U9UEBMzanvA3BW3x5mdTLJe+iVH4oFyvXWRw5ckxOyBnxyCWpkFtSJTXCyBN5Jq/kzRk6L8678zFvXXGymSPyB87nD3JHkeI=</latexit>

We can learn an ”improvable” model<latexit sha1_base64="cf1mxSBI3k5WISD1FFcFGxPGR48=">AAACC3icbZC7TgJBFIZn8Y431NJmAjGxIrtYaEm0sdREhAQIOTscYMJcNjOzJITQ2/gqNhYaY+sL2Pk2DriFgn8yyZf/nDMz548Twa0Lw68gt7K6tr6xuZXf3tnd2y8cHN5bnRqGNaaFNo0YLAqusOa4E9hIDIKMBdbj4dWsXh+hsVyrOzdOsC2hr3iPM3De6hSKdaQMFBUIRlEPRS4To0fgLyhSqbsoOoVSWA7nossQZVAimW46hc9WV7NUonJMgLXNKExcewLGcSZwmm+lFhNgQ+hj06MCibY9me8ypSfe6dKeNv4oR+fu74kJSGvHMvadEtzALtZm5n+1Zup6F+0JV0nqULGfh3qpoE7TWTC0yw0yJ8YegBnu/0rZAAww5+PL+xCixZWX4b5Sjs7KldtKqXqZxbFJjkmRnJKInJMquSY3pEYYeSBP5IW8Bo/Bc/AWvP+05oJs5oj8UfDxDVGamfY=</latexit>

minx

1

m

MX

m=1

fm(x� �rfm(x))<latexit sha1_base64="QQ/icloOoJWldrnOyvZ5panAg1c=">AAACJXicbVBNSwMxEM36WetX1aOXYBHswbJbBT0oFL14ERSsCt26zKbZGppklyQrLcv+GS/+FS8eFBE8+VdMPw5qfTDweG+GmXlhwpk2rvvpTE3PzM7NFxaKi0vLK6ultfVrHaeK0AaJeaxuQ9CUM0kbhhlObxNFQYSc3oTd04F/80CVZrG8Mv2EtgR0JIsYAWOloHTkCyaDHvYjBSTz8kzkvk5FkIljL787x1Egdnp4F/sdEAKwLyHkMFIrlaBUdqvuEHiSeGNSRmNcBKU3vx2TVFBpCAetm56bmFYGyjDCaV70U00TIF3o0KalEgTVrWz4ZY63rdLGUaxsSYOH6s+JDITWfRHaTgHmXv/1BuJ/XjM10WErYzJJDZVktChKOTYxHkSG20xRYnjfEiCK2VsxuQebl7HBFm0I3t+XJ8l1rertVWuX++X6yTiOAtpEW2gHeegA1dEZukANRNAjekav6M15cl6cd+dj1DrljGc20C84X9/fa6Qu</latexit>

Page 24: Local SGD for non-i.i.d. data · Local SGD for non-i.i.d. data Konstantin Mishchenko Work done together with Ahmed Khaled and Peter Richtárik. Problem Convex min x 1 M XM m=1 f m

ReferenceBetter Communication Complexity

for Local SGDarXiv:1909.04746

First Analysis of Local GD on Heterogeneous DataarXiv:1909.04715

NeurIPS workshop on Federated Learning http://federated-learning.org/fl-neurips-2019/