Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
1 XX ò 1 X Ï g Ä z Æ � Vol. XX, No. X
2017 c X � ACTA AUTOMATICA SINICA Month, 201X
�ÝÆS@�O�nã
��÷1, 2 S3�*1, 2 o;u1, 2 o�p1, 2
Á � �X�êâÚ�U����5, ÅìÆS�ïÄ%®m©la�+�=£�@�O�+�, XÛJ,é�5�ê
â�@�U室��U�Æ�Eâ���ïÄ9:, �C��ÝÆSk"mé�êâ@�O�+��ïÄ#9�. �©o
(Cc5�êâ�¸eÄu�ÝÆS�@�O�ïÄ?Ð, ©Ol�ÝÆSêâL«!@��.!�ÝÆS¿1O�9Ù
A^��¡?1c÷V¹!'�Ú©Û, é¡��êâ��ÝÆS@�O��]ÔÚuЪ³?1o(!g��Ð".
'�c �ÝÆS, @�O�, �., Üþ, ¿1
Ú^�ª ��÷, S3�, o;u, o�p. �ÝÆS@�O�nã. gÄzÆ�, 2017, XX(X): X−X
DOI 10.16383/j.aas.2017.c160690
Review on Deep-Learning-based Cognitive Computing
CHEN Wei-Hong1, 2 AN Ji-Yao1, 2 LI Ren-Fa1, 2 LI Wan-Li1, 2
Abstract With the advent of the era of big data and artificial intelligence, the research focus of machine learning has
shifted from perception domain to cognition computational domain. How to improve the cognitive ability for big data
is becoming a research hotpot of intelligence science and technology, in which recent deep learning has been expected
to spark a new wave of research on cognitive computation. This paper summarizes the research progress of cognitive
computation based on deep learning in recent years. And, comparison and analyses of recent progress in deep learning
and cognitive computation are presented at the view of three aspects, that is, deep learning data representation, cognitive
models, parallel computing and its applications in the big data environment. Finally, some challenges and development
trends of cognitive computation based on deep learning for big data are investigated to come forward with future research.
Key words Deep learning, cognitive computation, models, tensor, parallelism
Citation CHEN Wei-Hong, AN Ji-Yao, LI Ren-Fa, LI Wan-Li. Review on Deep-Learning-based Cognitive Computing.
Acta Automatica Sinica, 2017, XX(X): X−X
@�O�(Cognitive computing, CC) u�[
<M�O�ÅXÚ�<ó�U, §´ÏL<�g,
�¸��p9ØäÆS, �ÏûüölØÓa.�
°þêâ¥�«���É, ±¢yØÓ§Ý�a
�!PÁ!ÆSÚÙ§@�¹Ä[1]. �X�êâ�
���5, ´L�êâÚ�£�@�O�H5#
�Å�. �dÓ�, êâ�5�!«a!�ÝÚE,
ÝÑ���L<M�@�Uå, XÛk��¤é
�êâ�@�, ù�DÚ@�O���5ã�]
Ô[2].
@�O�´é#���UXÚA:�V). l
õU�¡þ, @�XÚä�<a�,@�Uå, U
ÂvFÏ 2016-10-10 ¹^FÏ 2017-06-22Manuscript received October 10, 2016; accepted June 22, 2017I[g,�ÆÄ7(61672217, 61370097)Supported by National Natural Science Foundation of China
(61672217, 61370097)�©I??� XXXRecommended by Associate Editor XXX1. �H�Æ&E�Æ�ó§Æ� �â 410082 2. i\ª��äO��H�:¢�¿ �â 4100821. College of Computer Science and Electronic Engineering,
Hunan University Changsha 410082 2. Key Laboratory forEmbedded and Network Computing of Hunan Province Chang-sha 410082
ÑÚ�¤éêâ�uy!n)!ín!ûü�A
½@�?Ö[3]. @�O�´)ûn)ÚÆS�¯K,
ÆSUå´@�XÚ�'�, AO´3�c�êâ
��, �øÆS�êâÚ�£�5�´L. Cc5,
�ÃuO�ÅM�5U�J,Ú�O�Eâ�uÐ,
�ÝÆS(Deep learning, DL)���«#�ÅìÆ
S�{, ®¤��êâ��@�O��ïÄ9:�
�[4]. �ÝÆSÏL�ïÄuL«�õ�ÅìÆS
�., Ôö°þêâ, ÆSk^A�, ±��J,£
O!©a½ýÿ�O(5[5]. �ÝÆS�±��V
gÆS, ÆS��\E,��£, §´�� ²�ä
ÆS�{��â»[6, 7]. �ÝÆS@�O�, ´Ä
u�ÝÆS�{�÷êâ¥�d�, ±���O(!
���g��£, J,éêâ�@�Uå[8, 9]. Ä
u�ÝÆS��Ú§SAlphaGo®����ÚÃY
²[10].
2 g Ä z Æ � XXò
ã1 �ÝÆS@�O�«¿ã
Fig. 1 Diagram of deep learning cognitive
computing
8c, ���ÝÆS@�O�L§Ì��¹
n��¡: �ÝÆSêâL«!�ÝÆS@��
.!�ÝÆS¿1O�, �©[nã�n��¬, X
ã1¤«. @�O��8I´lÑ\êâ¥ÏL�.
ÆSÚp5UO�¢y©a!n)�@�¹Ä. �
êâ�¸e, êâäk�þ!õ�5!É�5�A
�, XÛk�L«êâò��K�@��.�ïá
±9@�O���J, êâL«´�ÝÆS@�O
��Ä:. �DÚ�f��ä�', ���ä�.ä
k�r��ÆSUå, Ð��ÝÆS@��.òU
��Ð�@��J, �.´'�. �X�êâ8��
cO�, êâE,õC, ±9�Ý�.�E,, @�
�{?n�´NPJ¯K, üÅO�Uå�ØU÷v
�ÝÆSÔö�I�, p5U¿1O�¤�¢y�
ÝÆS@�O���æ.
du�ÝÆS@��{ÚO�Uå�â», �
ÝÆSiå@�O�+���g�·. 8c, IS
�ÝÆS�ïÄÅ�kõÔõ�Æ!�A|��
Æ!d"4�Æ!Ý��Æ!�^ïÄ�!Google!
IBMïÄ�!zÝúi�, ®ò�ÝÆS¤õA^
uO�ÅÀú[11]!Ã�N£O[12]!ã�Ú�Ñ£
O[13]−[15]!Ѫ?n[[16, 17]]!�ÂL�©a[18]!�
a©Û[19]!g,�ó?n[20, 21]!Uuy#�Ô�
zÆ©f©Û[22, 23]�õ�+�. �ÝÆS|^��
²�äÆSÑp�Ä�A�, ù«ÆSUå��
M�ÆSL§�q, ´@�XÚ��|¤Ü©, ®
���X�â»5�?Ð, �$^�ÝÆS?1
@�O��ïÄE?uÐ?�ã, �ÝÆSòé@
�O��)���K�. �u�êâ�¸e�ÝÆ
S@�O���ÆâïÄ¿Â�A^d�, �©
lêâL«!�ÝÆS�.9`z!�ÝÆS¿1
O���¡o(y�ã�ÝÆS@�O��ïÄ?
Ð.
1 �ÝÆSêâL«
õ�5���êâ��A���§�¹�þ
É�!kØÓ©ÙÚE,'X��êâ®Ø2´�
�½���. ��, õ���p�êâ�5�~�,
ùêâ^DÚ�ÝL«ØU�.zpÝ��5
©Ù, �N´¿�k^�p�(�&E. , , �Ý
ÆS�äkáÚå�A:´§�ÆSUå. 1962c,
Rosenblatt�ÑͶ�ÆS½n: <ó ²�ä
�±Æ¬§�±L��?ÛÀÜ[24]. lù�½n�
±wÑ, <ó ²�ä�L�Uå��/��§
�ÆSUå. Ïd, XÛk�L«�«a.êâ¤�
�êâ�¸e�ÝÆS@�O�ïÄ��SN�
�[25]. lêâ�ØÓA��Ý, êâ�©�ü��
L«Úõ��L«; lêâ��;/ª�Ý, êâ�
©��þ!ÝÚÜþL«. �!Ì�0�õ��
êâL«ÚÜþêâL«�{.
1.1 Äuõ���êâL«
?n�êâõ�5�'�´êâ8¤[26, 27]. �
é�êâõ�5A:, Ngiam�[28]JÑõ��ê
âL«��ÝÆS�{. T�{£ã���
�A�ÆS��Ýg?èì(�, ¦^ü���
�Ñ\, ÏL8¤ÑÀªêâÆSL«, ¦^�
Ýg?èì(�5Ó¼ü����m�“¥�”A
�L«. Salakhutdinov�[29]JÑÄuVÇ�õ�
�DBM(Deep boltzmann machine, �ÝÀ�[ù
Å), T�{Äk3õ��Ñ\�éÜ�m½Â�
�VÇ�Ý, DBM�z����êâ�ï��Ò
ªRBM(Restricted boltzmann machines, É�À
�[ùÅ), ,�^½Â�d3CþG���L«,
5ÆSõ���k�A�L«. ù«VÇ/ªUò
¿����&El^�VÇ¥æ��Ö, 3A�Æ
S�ã, õ���ÝÆS'�k����UÆ��
�A�.
��Ð�õ��ÆS�.I�÷v: (1) 3L«
�m��q5Û¹éA“Vg”��q5; (2) éÜL
«3"y,���¹e�N´¼�, =3�Ñ�
½þêâ�¹e, W¿¿�����´�U�. õ�
��ÝÆS�.��3ÏL�IPêâ�õ��Æ
Sü��L«ÚÆS���êâL«�¡�J�Ð.
�êâ�¸e, Ñ\�É�êâ�¹õ���, z�
��kØÓ�L«Ú'X(�. ~X, ©�²~L«
�lÑDÕ�ücOê�þ, ¦^ký¢�ÚÈ�
Ý���rݽA�Ä�ì�ÑÑ5L«ã�. ù
��õ��êâL«Juuy�3uØÓ��$�
A��m�pÝ��5'X, �'uy�Ó��Ø
ÓA��m�'X�J, l I��ïÚ��êâ
L«�{.
1.2 ÄuÜþ�êâL«
õ���ÝÆS�.3KÜ��L«�A��
c, ©OÕá/lü���ÆSA�, §��ÑÑ\
XÏ ��÷�: �ÝÆS@�O�nã 3
êâ�m¥É�êâ���5'X, ��ØU¿©
ÆSêâ¥�k^A�. ÄuÜþ�êâL«�{
ØÓuõ��êâL«, §±Ú��ªë�ÚL«
�«É�êâ, U�ïÚ��êâL«�.z�ê
â�p�'X[30]−[34]. Kuang�[31]JÑÄuÜþ
�êâL«�{9Ù?nµe. êâ?nµe�¹
Ê��¬: êâÂ8�¬!êâÜþ�¬!êâü
��¬!êâ©Û�¬ÚêâÑÖ�¬, Xã2¤«.
êâÂ8�¬KIÂ85gØÓ+���«a.�
�©êâ, XѪ!Àª!XML©�!GPSêâ�.
ùêâÏ~�(�z!�(�z½�(�z, v
kÚ��êâ�ª, ÄuÜþ�êâL«��ïÚ
��êâL«�.Jø#g´. ÄuÜþ�êâ
L«�{�: Äk�âêâ�Щ�ª)¤ØÓ�
ê�fÜþ, ,�KÜ�fÜþ¤Ú��ÜþL«
�..
ã2 ÄuÜþ�êâL«Ú?nµe
Fig. 2 Tensor-based data representation and
processing framework
(1) ÄuÜþ�êâ�.
��/§�m!�mÚêâÑÖ�Éö�êâ
�Ä�A�, Ïd, ½ÂÄuÜþ�êâ�.�:
T ∈ RIt×Is×Iu×I1×···×Ip (1)
ª(1)¥(P + 3)� Ü þ � ¹ ü Ü ©, � ½ Ü
©RIt×Is×IuÚ�*ÐÜ©RI1×···×Ip§Üþ¥It!Is!
Iu©OL«�m!�mÚ^r. 3Üþ�.¥, Üþ
��L«êâA�, É�êâéAÜþ��L«õ
«A�, ¦^Üþ*Ðö�¦É�êâ�A��6
uÜþ�.��½Ü©.
(2) Üþ*Ðö�
b�A ∈ RIt×Is×Iu×I1 , B ∈ RIt×Is×Iu×I2 , KÜ
þ*Ðö��:
f : A→×B → C,CIt×Is×Iu×I1×I2 (2)
ÏLª(2)�±�ѵÄkòÉ�êâL«¤$�f
Üþ§,�òÙ*ФÚ��p�Üþ"Üþ*Ð
ö�→×3�3��õ�5�Ó�§KÜ�Ó�Ü
þ"~X§fÜþTsub1ÚTsub2k�m�It−1, It−2,
Ù¥It−1 ∈ {i1, i2}, It−2 ∈ {i1, i3}, KIt−1→× It−2 ∈
{i1, i2, i3}.(3) Ú��ÜþL«�.
éu�(�zêâdu!�(�zêâdsemiÚ(
�zêâds§ÏLª(3)¢yÚ��êâÜþzö
�"
f : (du ∪ dsemi ∪ ds)→ Tu ∪ Tsemi ∪ Ts (3)
Üþ��L«¤�|/ª, ��£ã¤�*ÐÜ©
Ú�½Ü©. 3{zÜþ¤ü���.¥, SÜ�.
i\�n�It × Is × Iu�N�.¥, ÏLÜþz�
{, �.zÉ�êâ¤fÜþ, ,��\�ü��.
¥��Ú��Üþ.
3�êâA^¥, êâäkP{5!Ø(½5
�, ��;¤k�Ý�êâ¢Sþ´Ø�U�, ¼
�êâ¥�d�â´���. �êâC��~E
,�, I��k��ÜþO��{, X���Üþ�
�O, 5?n©ÙªO�!�;��5�Üþ. éÜ
þêâL«?1©)½�Ý~�´J,é�êâ@
���SN[35, 36]. Li�[35]¦^MapReduceµe
JÑTuckerÜþ©)��*ÐÚ©Ùª�{MR-
T. T�{&¢�êâ8�DÕ55��z¥m
êâ, ;��.ݦ, �Tucker©)�3?n
p�Üþ�¡���Ý/J¯K. Zhou�[36]ÄuÜ
þCP©)JÑ(¹�¯��{FFCP, ��¦Ü
þêâ�~DÕ, �´u�\ÛÜÂñ. Üþ©)�
{�pE,ÝÚÂñ�Ýú�¯K´òÙ$^u�
êâ©Û���´¶, Ïd, ïáå�êâ�¸eÚ
�!p��Üþ©)�{´��ïÄ��.
2 �ÝÆS@��.
�!Äk0��ÝÆSÄ�g�9;.��Ý
ÆS�., ,�Ú\ÄuÜþ��ÝÆS�., ��
'�Ú©Û�ÝÆS�.`z�{.
4 g Ä z Æ � XXò
2.1 �ÝÆS
�ÝÆS�Vgå u<ó ²�ä�ï
Ä, @3þV90c�����JÑõÛ�
²�ä�.[37]−[39], g2006cGeoffrey Hinton�3
5Science6þJÑÄu�Ý&?�(Deep belief net-
work, DBN)��iÒÔö�{§¢y�ÝÆS
3ÅìÆSïÄ¥��â»[18]. �ÝÆS��
Ò´{ü�¬�õ�æU, §ÏLEÜõ�{ü�
��5�¬�¤ÆSL§[6]. �ÝÆS@��.X
ã3¤«, Ù¥, L0�Ñ\�, �ÑÑ�, w�ë��
���m���, L1 ∼ Ln−1�Û�Ü©, Û��Ñ
\/ÑÑ�mÏL��5¼ê¢yC�.
ã3 �ÝÆS���.
Fig. 3 The deep learning cognitive model
DNN´c" ²�ä½äkõÛ��õ�a
�ì, ÙO�L§�¹c�DÂO�Ú��DÂO
�. lÑ\�m©, c�DÂO��: é1l��z�
ü�j, Ù�y(l)j = f(z
(l)j ), z
(l)j =
∑iwij
(l)x(l)
i+ b(l)
j,
Ù¥i��H{¤k1l�ü�, 0 < l < n, x(l)
i�
1l�Ñ\, b(l)j �1l�1j�ü�� ��. ��/,
$���ÑÑ��p���Ñ\, =x(l+1)i
= y(l)j .
��DÂO�æ^óª{K, òØ�FÝlp�Ñ
Ñü�D4�$�Ñ\ü�, ÑÑü��FÝÏL
é��¼ê¦���. b�ÑÑü����¼ê
�J = 0.5(y(l) − t(l))2, Ù¥t(l)�Ï"ÑÑ�, K:
∂J
∂z(l)=
∂J
∂y(l)· ∂y
(l)
∂z(l)= (y(l) − t(l))f ′(z(l))
lÑÑü��§�c��Û�, 1(l− 1)�z�
ü�i�Ø�FÝO��:
∂J
∂y(l−1)
i
=∑
j
∂J
∂z(l−1)· ∂z
(l−1)
∂y(l−1)
j
=∑
jwij ·
∂J
∂z(l−1)
Ù¥, j��H{¤kÑÑ�ü�. �daí, ��
�Ñ\��Ø�FÝ ∂J∂xi
. DNN��æ^�ÅFÝ
eü(SGD)?1Ôö. ���z��¼êJ, z�
ëêwijÚbjÑIЩz���Å�¿?1�ES
�����Âñ. SGD�{�z�gS�¥éë
êwÚb��#O��:
w(l)ij
= w(l)ij− α ∂J
∂w(l)ij
,
b(l)j
= b(l)j− α ∂J
∂b(l)j
Ù¥, α�ÆSÇ.
�ÝÆS�.¥�z��¬C�$��L
«¤p��L«, Å�EÜvõ�C�, ±Æ
��~E,�¼ê. �â(�ØÓ, �ÝÆS
�.�©��Ý ²�ä(Deep neural network,
DNN)[6]!�Ý&?�DBN [8]!ÒªgÄ?è
ì(Stacked auto-encoder, SAE)[40, 41]!�ÝÀ�[
ùÅDBM[42]!òÈ ²�ä(Convolutional neural
network, CNN)[14]!48 ²�ä(Recurrent neu-
ral network, RNN)�. éuÄu�ÝÆS�@�,
p�L«�±Or,�¡�Ñ\, ³�Ø�'�C
þ. ~X, ±ê|L«����ã�, 3L«�1�
�£ãã��A½ �½��´Äk>�3, 1�
�uÿd>�¤�¬, 1n�|Üù¬�¤ÔN
�,�Ü©, /¤���¬, ��|Üù�¬¤�
��N, �EÑ�uÿ8I[19]. �ÝÆS�A��
ØI�<ó�O, §�ÏLXÚÔögļ�Ñ\
��gA�, z��¬ÏLC�§�Ñ\5O\L
«�ÀJ5ÚØC5[43]−[45].
�âÔöêâ´ÄkIP, �ÝÆS�©�k
iÒ�ÝÆSÚÃiÒ�ÝÆS. kiÒ��ÝÆ
S�.Ï~Ôö�k�, �ï�(¹, �·ÜE,
XÚ�à�àÆS, ~XDNN!CNN. �ÝÃiÒ
ÆS�.�´n), �N´i\+��£Ú?nØ
(½5¯K, �éE,XÚJu?níä�ÆS, ~
XDBN!DBM!RNN!SAE. DBN�±k�¦^
�I5êâ, w���VÇ)¤�., ÏL�)ªý
Ôö�{`z�.���, k�)û@�O�¥�
L[ܽj[ܯK, Ù�mE,Ý��.��Ú
�ݤ�5'X. ¦^ýÔöÐ�DBN5Щz�
.��, ~~¬��'�ÅЩz�{�Ð�(J.
2.2 òÈ ²�ä
òÈ ²�ä´|^ã��ÛÜA5Ôöê
â!�E��Ü©ëÏ�ä!Û�æ^òÈØ�ª
wÄO�� ²�ä, ÙØ%g�´: ÛÜë�!�
���!�m½�mæ�. òÈ ²�ä��?(
��)êâÑ\�!òÈ�!��5C�!³z/e
æ��Ú�ë��, ;.��ÝòÈ�(�LeNetX
ã4¤«. òÈ�^5uÿ5gþ��A��ÛÜë
�, ³z�^5KÜ�Â��qA�. òòÈ!��
XÏ ��÷�: �ÝÆS@�O�nã 5
5!³z���½n��ã?1æU, ,�k�õ
�òÈÚ�ë��, ÏL�ä��DÂFÝ, #N¤
kòÈØ|�����Ôö.
ã4 �ÝòÈ��.LeNet
Fig. 4 Deep convolution network model of
LeNet
CNNÔöaqDNN, äNL§�¹ü��ã:
�cDÂÚ��DÂ. 1��ã§Äk, l��¥�
����(X, t(l)), òXÑ\�ä; ,�²Lõg�
òÈ!��5C�!³zO�Ú�ë�O�, ��
¢SÑÑy(l). 1��ã, ÄkO�¢SÑÑy(l)��
A�Ï"ÑÑt(l)��; ,�, U��zØ��{�
�DÂN���[34]. òÈ�¥�ü�±A�ã/ª
L«, z�A�㥦^�¹��8�òÈØ£q
¡LÈì¤|ë�z�ü��c��A�ã�ÛÜ
¬, ,�ÏLReLU��5C�D4ÛÜ��Ú. b
�Ñ\A�ãL«�X, òÈØ�K[m,n], KÏL
òÈO�ÑÑ�A�N�y�:
yi,j = (X ∗K)i,j
=∑m
∑n
Xi+m,j+nKm,n(4)
Ù¥, ∗���lÑòÈ$�. ¤¢³z, Ò´ã¡
�eæ�. CNN�³z�{k: Max pooling(��
³z), Mean pooling(þ�³z), Overlapping(
Uæ�)�, Ù¥�²;�´��³z. ±ã4�~,
Ñ\���32*32�Ã�Nã¡, ÑÑ�©a(J,
=0-9�m���ê. C1�òÈ�, ÀJ6�5*5�
òÈØ, Ú��1, �±��6�A�ã, z�A�
ã���32-5+1=28. S2�eæ��, ¦^��³
z�{, ³���2*2, ��uéC1�28*28ã¡?
1©¬, z�¬���2*2, �z�¬����, �
�6Ü14*14���ã¡. C3�òÈ�, Ó�ÀJ
òÈØ���5*5, Ú��1, ��#�ã¡��
�10*10. ù16Üã¡�z�Ü, ´ÏLS2�6Ü
ã¡�òÈØ?1òÈ$��?1\�|Ü�
�. S4ÓS2, æ^2*2���³z¬?1��³z
O�, ��16Ü5*5���ã¡. C5UY^5*5òÈ
Ø?1O�, ��1*1���ã¡, =�� ²ü
�. ^120�òÈØ��120�A�ã, éA120�
²ü�. ù� ²ü��, �A�^�ë�
�DNN?1?n.
òÈ ²�äæ^3ØÓ ����Ó��,
3��ã��ØÓ �uÿ�Ó�ª, ÏLA�ã
�òÈØ�òÈö���~�ëêê8, {z
ÔöL§. ��A�ã¥�¤k(J���Ó�ò
ÈØ, �´, ��òÈØ�UJ���A�, �J�
õ�A�ÒI�õ�òÈØ, 3Ó��¥�ØÓA
�ã¦^ØÓ�òÈØ. ��³zü�l¬¥�Ñ
\, ù¬ÏLØ�L�1½��£Ä~�£ã
��Ý, é��£ ½C/MïØC5. ã��±
����Ñ\, ;�DÚ£O�{¥E,�A�
Ä�Úêâ�L§; ����¦�ä(��aq
u)Ô ²�ä, ~���êþ, ü$�ä�.
�E,Ý. òÈ ²�ä�±?nE,�¸, ín5
Kز(�¯K, äkûÐ�N�Uå!¿1?n
UåÚgÆSUå, ù«�ä(�é²£!'~
�!���C/äkpÝØC5. 8c, òÈ ²�
ä®�þA^��Ñ!ã�!¡ÜÚó£O�+
�.
2.3 Òªg?èì
Òªg?èì´�a;.�ÃiÒ�Ý ²�
ä, ÙÑÑ�þÚÑ\�þÓ�, ÏLÛ�ÆSN�
�.ëê, ��z����, ÑÑØÓ�L«, ù
L«=�A�. SAE�Ì��¬´g?èì, §d?
èìÚ)èì|¤§Xã5¤«. g?èìÄk²L
?èìòÑ\xN��Û�, L«�y; ,�y²L)
èìq�gN��Û�, L«�z, ÏL?èìÚ)
èì��Ñ\. ?èì¼ê/ª�:
y = f(x) = sf (Wx+ b) (5)
Ù¥, sf���5-¹¼ê, Ï~�Sigmoid¼
êsf (x) = 11+e−x , b�?èì ��þ, W���
Ý. )èì¼ê/ªz�:
z = g(y) = sg(W’y + b’) (6)
Ù¥, sg�)è-¹¼ê, Ï~�ð�¼ê
½Sigmoid¼ê, b′)èì ��þ, W
′��
Ý. g?èìÏL��z8I¼êÔ
öëê{W, b,W ′, b′}, 8I¼ê�: JAE(θ) =∑x∈D L(x, g(f(x))), Ù¥L��g�Ø�½���.
ã5 g?èì(�
Fig. 5 The structure of the auto-encoder
���L[Ü, \\��P~�Kz��
�Ø�¥: JAE+wd(θ) =∑
x∈D L(x, g(f(x))) +
6 g Ä z Æ � XXò
λ∑
ij w2ij, Ù¥λ��ëê§^u���KzrÝ.
ù«\\�Kz�SAE¡�DÕg?èì(SParse
Auto-Encoder, SPAE)[46]. SPAEæ^�8�{Æ
S, �mE,ÝÚ�mE,Ýþ��5, ·^u�
5�êâ�ÆS. ÏLDÕ5�å¦���C
�0�“dropout”�{Ó���5�ý, ù«�ýò
3Ñ\êâ½Û�¥Ú\. ~X, Kuang�[47]£ã
��Dg?èì(Denoising Auto-Encoder, DAE),
\\�ÅD(�Ñ\êâ¥. AlainÚBengio|^�
g��Ø�ÚpdZ6D(ò(J�z�?¿
ëê�?)èL§[48]. �D(oþ�Cu0�, �.
�±�(�O)¤êâ�©Ù.
2.4 48 ²�ä
DÚ ²�ä�Ñ\ÚÑÑ´�pÕá�,
k?Ö��YÑÑ�c�ã�A�´�'�,
l Ú\äk“PÁ”Vg�48 ²�ä[49].
RNN´�Ý�.¥�Û�!:½�ë�¤��<
ó ²�ä. RNN´���~r��Ä�XÚ, Û
��ÑÑ�±�;3�;ì¥��,��Ñ\, �
��DÂ�FÝ3z����Ú½¥Q�UO\,
��U~�. RNN��9�Ðm, �w�´�~�
Ý�c" ²�ä, Xã6¤«. 3Û�ü�^!
:sL«, 3t���st, Û�U�5gc�Ú½Ù§
²��Ñ\. Uìù«�ª, 48 ²�äUN
�Ñ\S�xt¤ÑÑS�ot, z��ot�6u¤kc
�g�xt′ , Ù¥t′ ≤ t. 3z�Ú¥, RNN���
Ó�ëêU!V!W .
ã6 RNN(�
Fig. 6 The structure of RNN
Visin�[50]JÑReNet(�, T(�¦^o
�RNNO�CNN¥�òÈ�, lo���×£$
�A�: le þ!lþ e!l��m!lm
��. �CNN�', RNN¥�48�(�§�
z��A�-y���ã��A½ �?1-
y. ���.Ôö¥, �c�ÑÑØ=�6u
L��A�, ��U�6�5�A�. duI
ORNN �Ñ�5�þe©&E, Weninger�
JÑV�RNN(Bidirectional RNNs, BRNN)�
.[51]. BRNN�z�Ú/z��m:��õ�(�,
z��ÔöS��cÚ��´ü�RNN, ë�XÓ
��ÑÑ�, ù«(��ÑÑ�z�!:Jø��
�L�Ú�5þe©&E.
RNN�U|^���.¥�þe©&
E, �PÁNþk�, N´¿��ë�È�
�&E. Hakkani-Tjr�JÑLSTM(�áÏP
Á)�RNN�{[52]. LSTMÏL“�”4&EÀJ
5ÏL, )û&E�Ï�6'X�¯K. 8
cLSTM�A«CN: O\“peephole connection”,
4����ÉG��Ñ\; ¦^ÍÜ�#P�ÚÑ
\�, Ó��ÑÏL½¿ï�ûü[52, 53].
2.5 ÄuÜþ��ÝÆS�.
�êâ�¹�þ�(�z!�(�zÚ(�
zêâa., õ��êâL«ÚÜþêâL«�
Ñ�êâ@��L«�{, �XÛïáÚ���
ÝÆS�.´�êâ@��J:. Yu�3DNNÄ
:þuÐ�ÝÜþ ²�ä(Deep tensor neu-
ral network, DTNN)[54]. DTNN|^VÝK�O
�DNN¥���½õ��, z��Ñ\�þN�¤
ü���5f�mÚ��Üþ�, ü�f�m�p
ÝK5éÜýÿ�ÝÆS�e��. Deng�JÑ
Üþ�ÝÒª�(Tensor deep stacked network, T-
DSN)[55, 56], T-DSN�z��¬�¹ü�Õá�Û
�, ü�ÕáÛ�?1È$���m�Û�, �)
�¹ü�Û��¤k�U¤é���¦È���þ,
ù�ÜþC¤Ý. ÏL�ï��5��m�Û
�, #Np���A��p.
Zhang�ÄuÜþ�L«�., JÑÜþg
?èì(Tensor auto-encoder, TAE)�Üþ�ÝÆ
S�.[57]. TAEæ^Üþål, Ø´î¼ålÚ
������Ø�¥�²þØ�²�Ú, 5-y
¥mL«Ó¼¦�Uõ���p��êâ©Ù: �
O3Üþ�m¥�p�BP�{(HBP)ÔöTAE¥
ëê; ÏLæUÜþg?èìïáÜþ�ÝÆS�
., 5ÆS�êâ¥�õ�A��. b�X�Ñ\
Üþ, f�Sigmoid¼ê, ©OL«Û��Ñ\ÚÑ
Ñ��Ñ\, a(2)j1j2···jn(1 ≤ ji ≤ Ji, 1 ≤ i ≤ n)!
z(3)i1i2···in(1 ≤ ij ≤ Ij, 1 ≤ j ≤ n)©OL«Û��-
yÚÑÑ��-y, KÜþg?èì�.L«�:
z(2)j1j2···jn = W(1)
α ·X + b(1)j1j2···jn
(α = jn +∑N−1
i=1 (ji − 1)∏Nt=i+1 Jt)
(7)
a(2)j1j2···jn = f(z
(2)j1j2···jn) (8)
XÏ ��÷�: �ÝÆS@�O�nã 7
z(3)i1i2···in = W
(2)β · a(2) + b
(2)i1i2···in
(β = in +∑N−1
j=1 (ij − 1)∏Nt=j+1 Jt)
(9)
�J��êâA�, æUÜþg?èì/
¤Üþ�ÝÆS�.(Stacking Tensor Auto-
encoder, STAE)[58]. STAE±Òªg?èìSAE�
Ó��ªÐ©zÜþ�ÝÆS�., ¦^ÆS��
1k�L«��1k + 1��Ñ\, =3ck�Ôö�
�Ôö1k+ 1�. STAE�¤ýÔö�, ¦^§��
p�ÑÑL«��iÒÆS�{�Ñ\, ÏLN`
�����ëê. ÄuÜþ��ÝÆS�.UL«
Ú?n�5�É�êâ, U^u�êâ�A�ÆS,
�´ÆS�5�É�êâ�¦�õ�ëê. XÛé
ÄuÜþ��ÝÆS@��.?1k��Üþ©)
±~��.ëê½ü$�{E,Ý, ±9éÜþ�
.@�O�¿1z5\�Äu�ÝÆS�@�, �
�?�ÚïÄÚ&?.
2.6 �ÝÆS�.`z
DNN¥õÛ� ²��¦^, Ø=wÍJp
DNN�ï�Uå, ��)Nõ�C�`�
��. =¦ëê3ÆS¦^�ÅFÝeü(SGD)�
{, ÆSL§¬�\ÛÜ�`, �duÑyjZ�
ÛÜ�`VÇ'�.¥¦^�ê ²���$, ¤
±�N�JDNNE,éÐ. éuDNNÆS�pÝ�
à`z¯K, du`z´lЩ�.m©�, ¤±
�Ð�ëêЩzEâò¬�EÑ�Ð��.. �
êâ�¸e, ïÄö�3�ï�ÝÆS�.�Ó�
���.`z�¡�ó�, Ì��)��5C
�[59, 60]!��Щz[61]−[63]!FÝeü[64, 65]!5
U`z[66, 67]�.
��5C�. SigmoidÚtanh´DNN�~^�
��5ü�, ���äü�3ü���Ñ�C�
Ú�, FÝCzé�, ���ä�ÆSC�éú.
JaintlyÚHinton[68]��ÑSigmoidü��":,
JÑ�5?�ü�(ReLU). 3¤k�¥�¿�Ñ
�Ü©Û�ü�, ù«�{U��LÝ[Ü, ¼�Ð
��J[44]. kReLU��5� ²�䮲y²'
IO�Sigmoidü��¯, ®�DahlÚMaas¤õA
^3�c®þ�Ñ£Oþ[69, 70]. Salakhutdinov�
JÑ3ReLU�ä¥é��`z�Path-SGD�
{[61]. T�{Øy ���éAÛ5��ØC5,
ÏL#�ÄÜ·�AÛ(�5`z��, ØK�
�äÑÑ. ��`z�{�AÛA5!g·AÚ�!
Äþ�(Ü, 3�5��äþ?1��5��Ôö,
ò¼��Ð��J. Miao�[71]JÑ“��ÑÑ”�
{Maxout, =3�|�½Ñ\��þ?1��z
ö�, T�{�CNN¥���³zaq. Zhang�
òMaxoutü�í2�üa: 1�«Soft-maxoutò
�5���zö�O��Soft-max¼ê; 1�«p-
normü�¦^��5�C�[72]. ¢�L², p-
normü�¦^p=2�, 'Maxout!tanhÚReLUü
��JÑÐ.
Srivastava�[73]JѦ^dropoutL§, äNö
�Xe: éz�Ôö¢~, z�Û�ü�Ñ�Å/
±�½VÇ£Xp=0.5¤��Ñ, ��ئ^
�DNN��, )è�~�¤. Dropout�Kz`
:: ÔöDNNL§¦Û�ü�=Ég�-y�K
�, Ø�6Ù§ü�, ¿Jø�«3ØÓ�ä¥
¦Ù²þ�.��{. ù`:3Ôöêâk�½
ö�DNN�ä��'êâ���õ�, �J��²
w. Dahl�òdropoutüÑÚReLUü��å¦^,
�=3�ë��DNNp�A^¥¦^dropout[65].
Deng�[74]òdropoutA^�òÈ ²�ä�¤k
�, �)$��òÈ�Ú³z�!p���
ë��, ¿uyCNN¥�dropoutÇI��Ìü
$. DahlG�¦^ReLU+dropout3DNNþÔö,
'Sigmoidü��éJp4.2%, 'GMM/HMMXÚ
U?14.4%[65]. ¦+dropout��LÝ[Ü, k�
Ð�°�5, �3ÔöL§¥O\D(!ü$
ÆS�Ç. �#dropout�ª���XÛ(½, ±
9é3�5�êâþÔö�JXÛ, ùÑ´'
udropoutI?�ÚïÄ�¯K.
3 �ÝÆS¿1O�
¿1O�´�r���5� E,�O�¯K,
©)¤eZ�5����±Aé�f¯K, ,�^
õ�?nì¿1/¦)ùf¯K, ±�ª¼��
¯K�¦)[70]. �êâ�¸e��ÝÆS@�O�
¿1z³371. 1�, �ÝÆS��ä�.E,,
Ôöêâõ, O�þ�. �Ý ²�äI�õ ²�
�[<��M, ²��m�ë�êþ�; z� ²
��¹êÆO�, I��O�ëêþ�; DNNI�
�þêââUÔöÑpO(Ç��.. 1�, �Ý
²�äI�|±��.. ~X, ÏLO\òÈ��L
Èìêþ±9�.�Ý�, ¼��Ð��.�þ. 1
n, �Ý ²�äÔöJÂñ, I��Eõg¢�.
�Ý ²�ä��.(�!�.Щz�ÑòK�
�ª�@��J. �ÝÆS@�¿1O�, ò4�/
\�@�XÚéêâ�uy!n)!ín!ûü�
@�. 8c, ®ZyÑNõ¡��ÝÆS�¿1O�
�{, �)GPU\�!êâ¿1��.¿1!O�
8+9Ù¿1A^[75]−[82].
8 g Ä z Æ � XXò
3.1 GPU\�
��5�O�|±�Ý�., �Ó�I��Ñ
�þ�O�] . �ÃuGPU¯ØNX(�, §
S3GPUXÚþ�$1�Ý�éüØCPU J
,A��$�þZ�. 3�êâÔö|µe, |
^GPU5Ôö�ÝÆS�., �±¿©u�Ùp�
�¿1O�Uå, �Ì áO��m. 8c, �Ý
ÆSµe�¢y8¥3ÄuGPU�¿1. 3CUDA
GPU�Y¥, z�GPUk��p�°Úpò���
Û�;ì, ù«(�#N�-?¿1Ú�§?¿1.
GPU¿13'5êâ6, ��¡, ÏL=��ê
â¬~�RAMÚGPU�m�êâDÑ, ÏLþD¼
�¦�U���IPêâ, 3�Û�;ì¥�;g
dëê; ,��¡, ÏL�ïêâ¿1ÚÆS�#5
¢y¿1[75]. 2015cNvidiauÙ�ÝÆSGPUÔ
öXÚ, §�GPU\��ÝÆS^�!GPU\��
{¥ÚCUDA�ÝÆS�ä, �±ÏL�¯�E,
�.ÔöÚ�O5ME¿Ôö��!�O(��Ý
ÆS�., Jø\�`z��ÝÆSUå.
3.2 êâ��.¿1
lÔö�ªþ, �ÝÆS¿1�©�êâ¿1
Ú�.¿1. êâ¿1´�éÔöêâ��©, Ó�
æ^õ��.¢~, éõ�©¡�êâ¿1Ôö. �
�¤êâ¿1I��ëê��, Ï~d��ëêÑ
Öì5�Ï�¤. �.¿1ò�. ©¤A�©¡,
dA�Ôöü�©O±k, �Ó���¤Ôö. ��
� ²��Ñ\5g,��Ôöü�þ� ²��
ÑÑ�, �)Ï&m�. êâ¿1Ú�.¿1ÑØU
Ã�*Ð. êâ¿1�Ôö§S�õ�, Ø�Ø~�
ÆSÇ, ±�yÔöL§�²; �.¿1�©¡�
õ�, ²�ÑÑ����þ¬:ìO\, �Ç�Ì
eü. Ïd, 8¤�.¿1Úêâ¿1�`³�´�
«~��Y.
ã7´CNN GPU�êâ¿1Ú�.¿1·Üe
�, Te�éo�GPU©�ü|, |Sü�GPU�
�.¿1, |m�êâ¿1. �.¿13O�L§¥
��ÑÑ�Úí�, êâ¿13Mini-batch(å�
���.�, CNNêâ¿1Ú�.¿1µe®3
ã�£OA^¥®Ð�¤�, 3�&ã��Ö¥�
}ÁÙ¯õd3A^.
ã7 ÄuGPU�êâ¿1Ú�.¿1·Üe�
Fig. 7 The hybrid architecture based on data
parallel and model parallel
3.3 �ÝÆSO�8+9Ù¿1A^
�ïO�8+´\��ÝÆS�~^�Y,
XCPU8+!GPU8+½É�8+. CPU8+�
`³3u: |^�.�©Ùª�;!ëêÉÚÏ&
�A:, ���5�©ÙªO�¯�Ôö�8�.
CPU8+�Y·ÜGPUSJ±NB���., ±9
DÕë�� ²�ä. Google�ï�DistBelief, ´
��æ^CPU8+��ÝÆSêâ¿1Ú�.¿
1µe, ���.Ú�êâ8��ÝÆS©ÙªX
Ú�ÔöÚÆS �O, ¢y�ÅFÝeü�{
�¿1z, |±�Ñ£OÚã�©a�@�O�A
^[76]. ©z[76]¥ò��.©�¤144¬, ��Jp
Ôö�Ý.
(ÜGPU\�Eâ, �ïGPU8+�¤�\
��5��ÝÆSÔö�k�)û�Y. GPU8
+�ï3CPU-GPUÉ�XÚ�þ, æ^10Gbps�
p���äÏ&��kÐ��J. 3m �
«, Krizhevsky�uÙ�Cuda-Convnet, 2012c|
±ü�GPUÔö, 2014c|±õGPUþ�êâ
¿1Ú�.¿1Ôö. Jia�[77]Jø�Caffe^
�3CPUÚGPUþ¢y¯�òÈ ²�ä, ¦
^NVIDIA K40½Titan GPU3�US�¤oZ
�Üã¡�Ôö. 3ó�., Google�COTS
HPCXÚ�)16�GPUÑÖì8+, z�ÑÖì
��o�NVIDIA GTX680 GPUs, k4GB�;ì,
GPUÑÖì�mæ^Infinibandë�, dMPI��
Ï&[72]. �k�¦^�;ìÚGPUO�, ¢
yL§I�c[�OCUDAØ. ~X, O�Ý
¦Y=WX, Ù¥W´LÈìÝ, X´Ñ\Ý,
Coates�[78]©O|^ÝDÕÚÛÜaÉ�, Ï
Lé��Ó�aÉ�� ²�Ä�W¥��"�,
,��X¥�A���¦, düѤõ;���¦
��;�uGPU���;ì��¹. Facebook!z
Ý�(Üêâ¿1Ú�.¿1, ¢yõGPU��
ÝÆSÔö²�[79, 80]. 2014c, ��@�JÑ
Ä��ÝÆS?nìe�“»ÉV”, 2017cJÑõ
�¡ÅìÆS�“DaDianNiao”(�5\��ÝÆ
SCNNÚDNN[81, 82]. GPU8+3u�ü!:pO
�UåÄ:þ, 2¿©�Ó8+¥õÑÖì�m�
O�Uå, U?�Ú\��5�@�O�?Ö.
4 �ÝÆS@�O�A^
Äu�ÝÆS�@�ïÄNõg,&Ò�
©?�¤A5. 3©?(�¥, p?A�d$�A
�EÜ�¤. ~X, 3ã�¥, ÛÜ>�EÜ/¤
XÏ ��÷�: �ÝÆS@�O�nã 9
¬, ¬|¤Ü©, Ü©|¤ÔN[5]. �q�A��
3u�Ñ!c!é!©��&Ò¥. 2012c, dÍ
¶�d"4�Æ�ÇAndrew NgÚ�5�O�º?
;[Jeff Dean�ÓÌ��Google Brain�8$^�
ÝÆSEâ¦�ÅìXÚUÆS¿gÄ£Oc[83].
¦�Äk|^�þ��IP�ã¡?1ÃiÒ�
ÝÆS, ÆS¼�Uk�L«�«ÔN�Ä�A
�; 3dÄ:þ, |^�þ�kIP�Ôöã��
±¯�ÆSÑÔN�.. 32012c�ImageNet¿
m¥, A^�ÝòÈ���¹5gWeb�z�ã�
�êâ8¥, TWeb�¹1000�ØÓ�a, XÚk
�¦^GPU!ReLUÚdropout5�z�{, ©a�
ØÇü$�4.94%. 2015c�^æ³ïÄ�A^�
�í���ÆSL§, ½�DNN¥�&E6,
¢yã��½ !uÿ�©a, XÚ�ØÇ$
�3.57%[84]. 2015c, Ren�Äu�ÝÆSEâ, ¢
y��ÅìÀú+��A^, =éÜòÈ ²�
äÚ48 ²�ä5)¤ã�IK[85]. 3�Ñ£O
�¡, õ«�ÑêâéÜÔö�.[16]. �Ý ²�
ä�.ÏLÅ�Ôö, UJ��N<a�Ñ�5
�p�A�, é8I�Ñ£O�)È4��J.
�1�)û�Y�(Ü, Äu�ÝÆS�@�
O�Ul�êâ¥�÷Ñpd��ÀÜ, ¢yû�
A^. �ÝÆS@�L§, Ï~òÙw¤çÝ?1
Ôö. ���äÔö�¤, �±wßçÝ, �ÀzS
�L§¥z��A�ã�Cz. XÛ|^S�L§
¥�A�Cz5Æ\�ÆS, ü$O�¤�, ¦�¡
�CPS�1�A^´J:.
5 �ÝÆS@�O�uЪ³�g�
�ÃuGPU¯ØNX(��Ñy±9�O�E
â�uÐ, �ÝÆS±Ù(¹����.ÚêâL
«`³, 3�Ñ!ã�!©��@�O�+�CA
c��ã�¤õ, ù�ïÄÚ)û¢S@�O�
¯K�5#�g�. , , �Aéêâ��5�Ú
õ�5!�¡J,�êâ�@�Uå, ��3Nõ
���\&?�¯K:
(1) êâL«
�ÝÆS'DÚ�f�ÆSäk�Ð���5
L«Uå, �du°þÉ�êâ�E,5, XÛ�ï
Ú��êâL«5¿©ÆSêâ¥�k^A�, ´
�ÝÆS@�O�¥Ä�)û�¯K. �é��5
��ÜþO�¯K, �ÄòÜþ©)![£ÆS5
\��ÝÆS�.¥, éJ,¯K�@�´�«k
��ª.
(2) @��.
�ÝÆS@��.3ëêêþ�Lêâ:êþ
�äk�{���L�Uå. �éäN1�A^, X
Ûk�|^ÆSL§¥¼��Ä�A�5�Oäk
�rL�Uå����., �ÝÆS�rzÆS�
KÜò¤�ïá�þ?�ÝÆS�.�â»�, �
ÝÆS�çÝ�Àzk"mér<ó�U��.
(3) ¿1O�
�êâ�©ÙªO���;, ±9��5�
ÄuÜþ�ÝÆS�.�@�O�¯K, ¤��
5�É�¿1O����J:. lDÚ�üGPU,
�Google�É�¿1, 2�zÝ�õØGPU¿1,
�ÑSGDØU¿1�JK. , , ¡�k¢�5U
I¦�XÚ, XÛ��S�!�NÉ�õ?nì�
m�O�Uå\��êâ�¸e�@�, �ïp5
U��ÝÆS@�O�äk�©��¿Â.
(4) �ÝÆS@�O�A^
8c, �ÝÆS@�O��A^¯K3u´Ä
u�Ý(�� ²�ä(a�ª�´êâA�L«
¯K. ä�k��£��UÆS�{U3¡é#?
Ö�'�#mu� ²�ä¬Ly��Ð!Ôö�
Ý�¯. 'X§ÄuGAN(Generative adversarial
networks, )¤é|�ä)�.§^�`DÑnØ5
&¢�Ý ²�ä, wBÅìÆSçÝ, U�Ð/n
)�ÝÆS��§l ��@�O���¦Ú8I.
�éA½+�§'Xð�CPS!¢�i\ªXÚ§
XÛïá���þO�5UÚN´��n)��Ý
ÆS@��., ¿÷v3$O�¤��¹e�°�
�OI¦, ´��äk]Ô5�ïÄ�K.
6 (Ø
�êâ�¸e, Äu�ÝÆS�@�O��C
c5�êâ©ÛÚn)JøEâ|±. �©±�
ÝÆS@�O�ïÄ?Ð��â, ©Û�êâé
@�O��5�¯K�]Ô, ©n��¡nã�
êâ�¸e�ÝÆS@�O��êâL«!@��
.!¿1O�, 9ÙA^, ¿éÄu�ÝÆS�@�
O�?1o(ÚÐ", Ï���'ïÄ<Jø
#g´, é@�O��)�K�.
10 g Ä z Æ � XXò
References
1 Zorzi M, Zanella A, Testolin A. Cognition-based networks: anew perspective on network optimization using learning anddistributed intelligence. IEEE Access, 2015, 3: 1512-1530
2 Shen Xiao-Wei. From big data to cognitive computing.Communications of the China Computer Federation, 2014,10(12): 26-28(!¡¥. l�êâ�@�O�. ¥IO�ÅƬÏÕ, 2014,10(12): 26-28)
3 Kelly J. Computing, cognition and thefuture of knowing [online], available:http://www.research.ibm.com/software/IBMResearch/multimedia/Computing Cognition WhitePaper.pdf, 2015
4 Yu Kai, Jia Lei, Chen Yu-Qiang, et al. Deep learning: yes-terday, today, tomorrow. Journal of Computer Research andDevelopment, 2013, 50(9): 1799-1804({p, _[, ��r, �. �ÝÆS��U, 8UÚ²U. O�ÅïÄ�uÐ, 2013, 50(9): 1799-1804)
5 LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015,521(7553): 436-444
6 Schmidhuber J. Deep learning in neural networks: Anoverview. Neural Networks, 2015, 61: 85-117
7 Wang F Y, Zhang J J, Zheng X, et al. Where does AlphaGogo: from Church-Turing thesis to AlphaGo thesis and be-yond. Acta Automatica Sinica, 2016, 3(2): 113-120
8 Hinton G E, Salakhutdinov R R. Reducing the dimensional-ity of data with neural networks. Science, 2006, 313(5786):504-507
9 Ludwig L. Toward an integral cognitive theory of mem-ory and technology. Extended Artificial Memory. Kaiser-slautern: Technical University of Kaiserslautern, 2013. 16-32
10 Silver D, Huang A, Maddison C J, et al. Mastering the gameof Go with deep neural networks and tree search. Nature,2016, 529(7587): 484-489
11 Ji S, Xu W, Yang M, et al. 3D convolutional neural networksfor human action recognition. IEEE Transactions on Pat-tern Analysis and Machine Intelligence, 2013, 35(1): 221-231
12 Deng L, Yu D. Deep learning: methods and applications.Signal Processing, 2014, 7 (3-4): 197-387
13 Sun Xu, Li Xiao-Guang, Li Jia-Feng, et al. Review ondeep learning based image super-resolution restoration al-gorithms. Acta Automatica Sinica, 2017, 43(5): 697-709(�R, o¡1, oW¹,�. Äu�ÝÆS�ã��©EÇE�ïÄ?Ð. gÄzÆ�, 2017, 43(5): 697-709)
14 Krizhevsky A, Sutskever I, Hinton G E. ImageNet classifica-tion with deep convolutional neural networks. In: Proceed-ings of advances in Neural Information Processing Systems.Lake Tahoe, Nevada, US, 2012. 1097-1105
15 Tompson J, Jain A, LeCun Y, et al. Joint training of a con-volutional network and a graphical model for human poseestimation. In: Proceedings of Advances in Neural Informa-tion Processing Systems. Montreal, Quebec, Canada, 2014.1799-1807
16 Yu D, Deng L. Automatic Speech Recognition: A DeepLearning Approach. New York: Springer, 2015. 58-80
17 Hinton G E, Deng L, Yu D, et al. Deep neural networksfor acoustic modeling in speech recognition. IEEE SignalProcessing Magazine, 2012, 29 (6): 82-97
18 Bordes A, Chopra S, Weston J. Question answer-ing with subgraph embeddings [Online], available:http://arxiv.org/abs/1406.3676v3, September 4, 2014
19 Ma J, Sheridan R P, Liaw A, et al. Deep neural netsas a method for quantitative structure-activity relation-ships. Journal of Chemical Information and Modeling, 2015,55(2), 263-274
20 He Y, Xiang S, Kang C, et al. Cross-modal retrieval viadeep and bidirectional representation learning. IEEE Trans-actions on Multimedia, 2016, 18(7): 1363-1377
21 Jean S, Cho K, Memisevic R, Bengio Y. On using very largetarget vocabulary for neural machine translation [Online],available: https://arxiv.org/abs/1412.2007, March 18, 2015
22 Chen Y, Li Y, Narayan R, et al. Gene expression inferencewith deep learning. Bioinformatics, 2016, 32(12): 1832-1839
23 Leung M K K, Xiong H Y, Lee L J, et al. Deep learningof the tissue regulated splicing code. Bioinformatics, 2014,30(12): i121-i129
24 Xiong H Y, Alipanahi B, Lee L J, et al. The human splicingcode reveals new insights into the genetic determinants ofdisease. Science, 2015, 347(6218):1254806
25 Bengio Y, Courville A, Vincent P. Representation learning:A review and new perspectives. IEEE Transactions on Pat-tern Analysis and Machine Intelligence, 2013, 35(8): 1798-1828
26 Zheng Y, Capra L, Wolfson O, et al. Urban computing: con-cepts, methodologies, and applications. ACM Transactionson Intelligent Systems and Technology, 2014, 5(3): 1-18
27 Zheng Y. Methodologies for cross-domain data fusion: anoverview. IEEE Transactions on Big Data, 2015, 1(1): 16-34
28 Ngiam J, Khosla A, Kim M, et al. Multimodal deep learning.In: Proceedings of machine learning. Bellevue, Washington,USA, 2011. 689-696
29 Srivastava N, Salakhutdinov R R. Multimodal learning withdeep boltzmann machines. In: Proceedings of Advancesin Neural Information Processing Systems. Lake Tahoe,Nevada, US, 2012. 2222-2230
30 Cichocki A. Era of big data processing: a new approach viatensor networks and tensor decompositions [Online], avail-able: https://arxiv.org/abs/1403.2048, March 9, 2014
31 Kuang L, Hao F, Yang L T, et al. A Tensor-Based Ap-proach for Big Data Representation and Dimensionality Re-duction. IEEE Transactions on Emerging Topics in Com-puting, 2014, 2(3): 280-291
32 Zhang Q, Yang L T, Chen Z. Deep computation model forunsupervised feature fearning on big data. IEEE Transac-tions on Service Computing, 2014, 1-11
33 Hutchinson B, Deng L, Yu D. Tensor deep stacking net-works. IEEE Transactions on Pattern Analysis and Ma-chine Intelligence, 2013, 35(8): 1944-1957
34 Deng L, Yu D. Deep Learning for Signal and InformationProcessing. Delft: Microsoft Research Monograph, 2013. 29-48
35 Li L, Boulware D. High-order tensor decomposition forlarge-scale data analysis. In: Proceedings of Big Data, NewYork, USA, 2015. 665-668
36 Zhou G, Cichocki A, Xie S. Decomposition of bigtensors with low multilinearrank [Online], available:arXiv:1412.1885, December 29, 2014
37 J.-S. R. Jang, ANFIS: Adaptive-network-based fuzzy infer-ence system, IEEE Transactions on Systems, Man, Cyber-netics, 1993, (23): 665-685
38 Wang F Y, Kim H. Implementing adaptive fuzzy logic con-trollers with neural networks: a design paradigm. Journalof Intelligent & Fuzzy Systems, 1995, 3(2): 165-180
XÏ ��÷�: �ÝÆS@�O�nã 11
39 Wang G Y, Shi H B. Three valued logic neural network. In:Proceedings of International Conference on Neural Informa-tion Processing, Hong Kong, 1996. 1112-1115
40 Rosca M, Lakshminarayanan B, Warde-Farley D,et al. Variational approaches for auto-encodinggenerative adversarial networks [online], available:https://arxiv.org/abs/1706.04987, June 15, 2017
41 Lv Y, Duan Y, Kang W, Li Z, Wang F. Traffic flow predic-tion with big data: a deep learning approach. IEEE Trans-actions on intelligent transportation systems, 2015, 16(2):865-873
42 Goodfellow I, Courville A, Bengio Y. Scaling up spike-and-slab models for unsupervised feature learning. IEEE Trans-actions on pattern analysis and machine intelligence, 2013,35(8): 1902-1914
43 Chang C H. Deep and shallow architecture of multilayerneural networks. IEEE transactions on neural networks andlearning system, 2015, 26(10): 2477-2486
44 Xie Kun, Wang Xin, Liu Xue-Li, et al. Interference-awarecooperative communication in multi-radio multi-channelwireless networks. IEEE Transactions on Computers, 2016,65(5): 1528-1542
45 Salakhutdinov R. Learning deep generative models [Ph. D.dissertation], University of Toronto, 2009
46 Wu X D, Yu K, Ding W, et al. Online feature selection withstreaming features. IEEE Transactions on Pattern Analysisand Machine Intelligence, 2013, 35(5): 1178-1192
47 Kuang L W, Zhang Y X, Yang L T, et al. A holistic ap-proach to distributed dimensionality reduction of big data.IEEE Transactions on Cloud Computing, 2015, PP(99): 1-14
48 Bengio Y, Yao L, Alain G, et al. Generalized denoising auto-encoders as generative models. In: Proceedings of Advancesin Neural Information Processing Systems. Lake Tahoe,Nevada, USA, 2013. 899-907
49 Zhu Yu, Zhao Jiang-Kun, Wang Yi-Ning, et al. A review ofhuman recognition based on deep learning. Acta Automat-ica Sinica, 2016, 42(6): 848-857(Áø, ëô%, �ºw, �. Äu�ÝÆS�<N1�£O�{nã. gÄzÆ�, 2016, 42(6): 848-857)
50 Visin F, Kastner K, Cho K, et al. Renet: a recurrent neu-ral network based alternative to convolutional networks [on-line], available: https://arxiv.org/abs/1505.00393, July 23,2015
51 Weninger F, Bergmann J, Schuller B W. Introducing CUR-RENNT: the munich open-source CUDA recurrent neuralnetwork toolkit. Journal of Machine Learning Research,2015, 16(3): 547-551
52 Wang F, Tax D M J. Survey on the attention based RNNmodel and its applications in computer vision [online], avail-able: https://arxiv.org/abs/1601.06823, January 25, 2016
53 Duan Yan-Jie, Lv Yi-Sheng, Zhang Jie, et al. Deep learningfor control: the state of the art and prospects. Acta Auto-matica Sinica, 2016, 42(5):643-654(ãý#, ½¨), Ü#,�. �ÝÆS3��+��ïÄyG�Ð".gÄzÆ�, 2016, 42(5):643-654)
54 Yu D, Deng L, Seide F. The deep tensor neural network withapplications to large vocabulary speech recognition. IEEETransactions on Audio, Speech, and Language Processing,2013, 21(2): 388-396
55 Hutchinson B, Deng L, Yu D. Tensor deep stacking net-works. IEEE Transactions on Pattern Analysis and Ma-chine Intelligence, 2013, 35(8): 1944-1957
56 Deng L, Yu D. Deep learning: methods and applications.Foundations and Trends in Signal Processing, 2014, 7(3-4):197-387
57 Zhang Q, Yang L T, Chen Z. Deep computation model forunsupervised feature learning on big data. IEEE Transac-tions on service computing, 2016, 9(1): 161-171
58 Hutchinson B, Deng L, Yu D. Tensor deep stacking net-works. IEEE Transactions on Pattern Analysis and Ma-chine Intelligence, 2013, 35(8): 1944-1957
59 Dahl G E, Sainath T N, Hinton G E. Improving deep neuralnetworks for LVCSR using rectified linear units and dropout.In: Proceedings of Acoustics, Speech, and Signal Processing.Vancouver, Canada, 2013. 8609-8613
60 Chandra B, Sharma R K. Fast learning in deep neural net-works. Neurocomputing, 2016, 171: 1205-1215
61 Neyshabur B, Salakhutdinov R R, Srebro N. Path-SGD:Path-normalized optimization in deep neural networks. In:Proceedings of Advances in Neural Information ProcessingSystems. Montreal, Canada, 2015. 2422-2430
62 Burda Y, Grosse R, Salakhutdinov R. Impor-tance weighted autoencoders [Online], available:https://arxiv.org/abs/1509.00519, November 7, 2016
63 Le Q V, Jaitly N, Hinton G E. A simple way to initialize re-current networks of rectified linear units [Online], available:https://arxiv.org/abs/1504.00941, April 7, 2015
64 Sutskever I, Martens J, Dahl G, et al. On the importance ofmomentum and initialization in deep learning. In: Proceed-ings of Machine Learning, Atlanta, GA, USA, 2013
65 Kingma D P, Ba J L. Adam: a method for stochastic opti-mization. In: Proceedings of Learning Representations, SanDiego, CA, 2015. 1-15
66 Martens J, Grosse R B. Optimizing neural networks withkronecker-factored approximate curvature. In: Proceedingsof Learning Representations, San Diego, CA, 2015. 2408-2417
67 Shinozaki T, Watanabe S. Structure discovery of deep neu-ral network based on evolutionary algorithms. In: Proceed-ings of Acoustics, Speech and Signal Processing. Brisbane,Australia, 2015. 4979-4983
68 Jaitly N, Hinton G. Learning a better representation ofspeech soundwaves using restricted boltzmann machines.In: Proceedings of Acoustics, Speech and Signal Process-ing. Prague, Czech Republic, 2011. 5884-5887
69 Dahl G E, Sainath T N, Hinton G E. Improving deep neuralnetworks for LVCSR using rectified linear units and dropout.In: Proceedings of Acoustics, Speech and Signal Processing.Vancouver, BC, Canada, 2013. 8609-8613
70 Maas A L, Hannun A Y, Ng A Y. Rectifier nonlinearitiesimprove neural network acoustic models. In: Proceedings ofMachine Learning, Atlanta, US, 2013. 1-6
71 Miao Y, Metze F, Rawat S. Deep maxout networks for low-resource speech recognition. In: Proceedings of AutomaticSpeech Recognition and Understanding, Olomouc, CzechRepublic, 2013. 398-403
72 Zhang X, Trmal J, Povey D, et al. Improving deep neu-ral network acoustic models using generalized maxout net-works. In: Proceedings of Acoustics, Speech and Signal Pro-cessing, Florence, Italy, 2014. 215-219
73 Srivastava N, Hinton G E, Krizhevsky A, et al. Dropout:a simple way to prevent neural networks from overfitting.Journal of Machine Learning Research, 2014, 15(1): 1929-1958
12 g Ä z Æ � XXò
74 Chen Guo-Liang, Mao Rui, Lu Ke-Zhong. Parallel comput-ing framework of big data. Chinese Science Bulletin, 2015,60(5): 566-569(�Iû, fH, º�¥. �êâ¿1O�µe. �ÆÏ�, 2015,60(5): 566-569.)
75 Chetlur S, Woolley C, Vandermersch P, et al. cudnn:Efficient primitives for deep learning [Online], available:https://arxiv.org/abs/1410.0759, December 18, 2014
76 Dean J, Corrado G, Monga R, et al. Large scale distributeddeep networks. In: Proceedings of the Neural InformationProcessing Systems. Lake Tahoe, Nevada, United States,2012. 1223-1231
77 Jia Y, Shelhamer E, Donahue J, et al. Caffe: Convolutionalarchitecture for fast feature embedding. In: Proceedings ofthe 22nd ACM International Conference on Multimedia. Or-lando, Florida, USA, 2014. 675-678
78 Coates A, Huval B, Wang T, et al. Deep learning with COTSHPC systems. In: Proceedings of Machine Learning. At-lanta, Georgia, USA, 2013. 16-21
79 Yadan O, Adams K, Taigman Y, et al. Multi-gpu training of convnets [Online], available:https://arxiv.org/abs/1312.5853, February 18, 2013
80 Yu K. Large-scale deep learning at Baidu. In: Proceedingsof the 22nd ACM International Conference on Information& Knowledge Management. San Francisco, CA, USA, 2013,ACM. 2211-2212
81 Chen Y, Luo T, Liu S, et al. DaDianNao: A machine-learning supercomputer. In: Proceedings of the 47thIEEE/ACM International Symposium on Microarchitec-ture. Cambridge, United Kingdom, 2014
82 Luo T, Liu S, Li L, et al. DaDianNao: A neural networksupercomputer. IEEE Transactions on Computers, 2017,66(1): 73-86
83 Markoff J. How many computers to identify a cat? 16,000.New York Times, 2012: 6-25
84 He K, Zhang X, Ren S, et al. Deep residual learning forimage recognition. In: Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition. Las Vegas,Nevada, USA, 2016. 770-778
85 Ren M, Kiros R, Zemel R. Exploring models and data forimage question answering. In: Proceedings of the NeuralInformation Processing Systems. Montreal, Canada, 2015.2953-2961
��÷ �H�Æ&E�Æ�ó§Æ�Æ
¬ïÄ), �H¢½Æ��Ç. 2006c¼�
�H�Æa¬Æ . Ì�ïÄ���&EÔ
nXÚ, ©ÙªO�, ÅìÆS.
E-mail: [email protected]
(CHEN Wei-Hong Ph. D. candidate
at the College of Computer Science and
Electronic Engineering, Hunan University.
Professor at Hunan City University. She received her Mas-
ter’s degree from Hunan University in 2006. Her research
interest covers cyber physical systems, distributed comput-
ing, and machine learning.)
S3� �H�Æ�Ç. 2012c¼��H
�ÆÆ¬Æ . Ì�ïÄ���&EÔ
nXÚ,¿1�©ÙªO�,O��U.�
©Ï&�ö.
E-mail: [email protected]
(AN Ji-Yao Professor at Hunan Uni-
versity. He received his doctor’s degree
from Hunan University in 2012. His re-
search interest covers cyber-physical systems, parallel and
distributed computing, and computing intelligence. Corre-
sponding author of this paper.)
o;u �H�Æ�Ç, u¥�E�ÆÆ
¬. Ì�ïÄ���i\ªXÚ, &EÔn
XÚ, <ó�U�ÅìÀú.
E-mail: [email protected]
(LI Ren-Fa Professor at Hunan Uni-
versity. He received his doctor’s degree
from Huazhong University of Science and
Technology. His research interest covers
embedded system, cyber-physical systems, artificial intelli-
gence, and machine vision.)
o�p �H�Æ&E�Æ�ó§Æ�
ƬïÄ). 2014c¼��H�ÆƬÆ
. Ì�ïÄ���ÅìÆS, O�ÅÀ
ú, �U�ÏXÚÚf¨1�©Û.
E-mail: [email protected]
(LI Wan-Li Ph. D. candidate at the
College of Computer Science and Elec-
tronic Engineering, Hunan University.
He received his B.S. degree from Hunan University in 2014.
His research interest covers machine learning, computer vi-
sion, intelligent transportation systems, and driver behav-
ior analysis.)