Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
A First Course inInformation Theory
Information Technology: Transmission, Processing, and Storage
Series Editor: Jack Keil WolfUniversity of California at San DiegoLa Jolla, California
Editorial Board: Robert J. McElieceCalifornia Institute of TechnologyPasadena, California
John ProakisNortheastern UniversityBoston, Massachusetts
William H. TranterVirginia Polytechnic Institute and State UniversityBlacksburg, Virginia
A First Course in Information TheoryRaymond W. Yeung
Multi-Carrier Digital Communications: Theory and Applications of OFDMAhmad R. S. Bahai and Burton R. Saltzberg
Nonuniform Sampling: Theory and PracticeEdited by Farokh Marvasti
Principles of Digital Transmission: With Wireless ApplicationsSergio Benedetto and Ezio Biglieri
Simulation of Communication Systems, Second Edition: Methodology,Modeling, and TechniquesMichel C. Jeruchim, Philip Balaban, and K. Sam Shanmugan
A Continuation Order Plan is available for this series. A continuation order will bring delivery of each new volumeimmediately upon publicati on. Volumes are billed only upon actual shipment. For further information please contactthe publisher.
A First Course in Information Theory
Raymond w. Yeung The Chinese University of Hong Kong
~ Springer
Additional material to this book can be downloaded from http://extras.springer.com.
ISBN 978-1-4613-4645-6 ISBN 978-1-4419-8608-5 (eBook) DOI 10.1007/978-1-4419-8608-5
© 2002 Springer Science+Business Media New York Originally published by Springer Science+Business Media, fuc in 2002
Softcover reprint ofthe hardcover lst edition 2002
Ali rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC) except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now know or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks and similar terms, even if the are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.
9876543
springeronline.com
To my parents and my family
Foreword
The first course usually is an appetizer. In the case of Raymond Yeung'sA First Course in Information Theory, however, another delectable dish getsserved up in each of the sixteen chapters. Chapters I through 7 deal with thebasic concepts of entropy and information with applications to loss less sourcecoding. This is the traditional early fare of an information theory text, but Yeung flavors it uniquely. No one since Shannon has had a better appreciation forthe mathematical structure of information quantities than Prof. Yeung. In theearly chapters this manifests itself in a careful treatment of information measures via both Yeung's analytical theory of I-Measure and his geometricallyintuitive information diagrams. (This material, never before presented in a text book , is rooted in works by G. D. Hu, by H. Dyckman, and by R. Yeung et al.)Fundamental interrelations among information measures and Markoviannessare developed with precision and unity. New slants are provided on staples likethe divergence inequality, the data processing theorem, and Fano's inequality.There is also a clever, Kraft-inequality-free way of proving that the averagelength of the words in a lossless prefix source code must exceed the source'sentropy. An easily digestible treatment of the redundancy of loss less prefixsource codes also is served up, an important topic in practice that usually isslighted in textbooks .
The concept of weakly typical sequences is introduced and then used to anchor Yeung 's proof of the lossless block source coding theorem. The conceptof strongly typical sequences is introduced next. Later extended to joint typicality, this provides a foundation for proving the channel coding theorem inChapter 8, the lossy source coding (rate-distortion) theorem in Chapter 9, andselected multi-source network coding theorems in Chapter 15. Although theproof of the channel coding theorem follows standard lines, Yeung's tastefuldevelopment of the interplay between information quantities and Markovianness readies one's palate for a rigorous proof that feedback around a discretememoryless channel does not increase its capacity. In most information the-
vii
viii A FIRSTCOURSE IN INFORMATION THEORY
ory books this basic result of Shannon either does not appear or is relegated toa problem in which the several steps are outlined in order to guide the readertoward the goal. Rate-distortion theory and Shannon 's lossy source coding theorem are treated in familiar ways. When proving the latter, one confronts lackof independence of the events {(X, X(i)) E Tn}, where X is a random source
word, X(i) is the ith word in a randomly chosen source code, and T" is theset of jointly typical vector pairs. In those instances in which this widely unappreciated stumbling block is not overlooked entirely, it usually is addressedvia either a non-selfcontained reference or a mammoth problem at the end ofthe chapter. However, Yeung's thorough earlier development of strong jointtypicality concepts allows him to tackle it head-on.
Chapter 10 dishes up a careful treatment of the iterative algorithms forcomputation of channel capacity and rate-distortion functions pioneered byR. E. Blahut and S. Arimoto, which is generally accepted as today's preferredapproach to computational information theory. Moreover, it has the extra advantage that iterative optimization algorithms are finding widespread application to areas as diverse as decoding of turbo and low-density parity-checkcodes and belief propagation in artificial intelligence and in real and artificialneural nets.
Chapters I I through 16 are a unique tour de force. In as digestible a fashion as could possibly be expected, Yeung unveils a smorgasbord of topics inmodem information theory that heretofore have been available only in researchpapers generated principally by Yeung and his research collaborators . Chapter I I is a strong treatment of single-source network coding which developscarefully the relationships between information multicasting and the max-flowmin-cut theory. Yeung makes an iron-clad case for how nodes must in generalperform coding, not just storing and forwarding . Chapters 12, 13 and 14 oninformation inequalities of both Shannon and non-Shannon type constitute adefinitive presentation of these topics by the master chef himself. Connectionswith linear programming are exploited, culminating in explication of Information Theory Inequality Prover (ITIP) of R. Yeung and Y.-O. Yan for inequalities of Shannon-type which comes with this book (also WWW-available). Thisleads, in tum, to the fascinating area of non-Shannon-type information inequalities, pioneered by R. Yeung and Z. Zhang. This material has been found topossess profound implications for the general area of information structuresbeing studied by mathematical logicians and may also contribute to thermodynamics and statistical mechanics wherein the concept of entropy originatedand which continue to be heavily concerned with various families of generalinequalities . The theory of I-Measure introduced in Chapter 6 provides theessential insight into those of the non-Shannon type inequalities that are discussed here. Multi-source network coding in Chapter 15 is a confounding areain which Yeung and others have made considerable progress but a compre-
FOREWORD ix
hensive theory remains elusive. Nonetheless, the geometrical framework forinformation inequalities developed in Chapters 12 and 13 renders a unifyingtool for attacking this class of problems. The closing chapter linking entropyto the theory of groups is mouthwateringly provocative, having the potential tobecomea majorcontribution of information theory to this renowned branchofmathematics and mathematical physics.
Savorthis book; I think you will agree the proof is in the pudding.
Toby BergerIrwin and Joan JacobsProfessorof EngineeringCornell University, Ithaca, New York
Preface
Cover and Thomas wrote a book on information theory [52] ten years agowhich covers most of the major topics with considerable depth . Their bookhas since become the standard textbook in the field, and it was no doubt aremarkable success. Instead of writing another comprehensive textbook on thesubject , which has become more difficult as new results keep emerging, mygoal is to write a book on the fundamentals of the subject in a unified andcoherent manner.
During the last ten years , significant progress has been made in understanding the entropy function and information inequalities of discrete random variables. The results along this direction not only are of core interest in information theory, but also have applications in network coding theory, probabilitytheory, group theory, Kolmogorov complexity, and possibly physics . This bookis an up-to-date treatment of information theory for discrete random variables,which forms the foundation of the theory at large. There are eight chapterson classical topics (Chapters I, 2, 3, 4, 5, 8, 9, and 10), five chapters on fundamental tools (Chapters 6, 7, 12, 13, and 14), and three chapters on selectedtopics (Chapters 11, 15, and 16). The chapters are arranged according to thelogical order instead of the chronological order of the results in the literature.
What is in this book
Out of the sixteen chapters in this book, the first thirteen chapters are basictopics, while the last three chapters are advanced topics for the more enthusiastic reader. A brief rundown of the chapters will give a better idea of what isin this book.
Chapter I is a very high level introduction to the nature of information theory and the main results in Shannon's original paper in 1948 which foundedthe field. There are also pointers to Shannon's biographies and his works .
XI
xii A FIRSTCOURSE IN INFORMATION THEORY
Chapter 2 introduces Shannon's information measures and their basic properties. Useful identities and inequalities in information theory are derived andexplained. Extra care is taken in handling joint distribut ions with zero probability masses . The chapter ends with a section on the entropy rate of a stationaryinformation source .
Chapter 3 is a discussion of zero-error data compression by uniquely decodable codes, with prefix codes as a special case . A proof of the entropy boundfor prefix codes which involves neither the Kraft inequality nor the fundamental inequality is given. This proof facilitates the discussion of the redundancyof prefix codes .
Chapter 4 is a thorough treatment of weak typicality. The weak asymptoticequipartition property and the source coding theorem are discussed . An explanation of the fact that a good data compression scheme produces almosti.i.d. bits is given. There is also a brief discussion of the Shannon-McMillanBreiman theorem .
Chapter 5 introduces a new definition of strong typicality which does notinvolve the cardinalities of the alphabet sets. The treatment of strong typicalityhere is more detailed than Berger [21] but less abstract than Csiszar and Komer[55]. A new exponential convergence result is proved in Theorem 5.3.
Chapter 6 is an introduction to the theory of I-Measure which establishes aone-to-one correspondence between Shannon's information measures and settheory. A number of examples are given to show how the use of informationdiagrams can simplify the proofs of many results in information theory. Mostof these examples are previously unpublished. In particular, Example 6.15 is ageneralization of Shannon's perfect secrecy theorem .
Chapter 7 explore s the structure of the I-Measure for Markov structures.Set-theoretic characterizations of full condit ional independence and Markovrandom field are discussed . The treatment of Markov random field here is perhaps too specialized for the average reader, but the structure of the I-Measureand the simplicity of the information diagram for a Markov chain is best explained as a special case of a Markov random field.
Chapter 8 consists of a new treatment of the channel coding theorem. Specifically, a graphical model approach is employed to explain the conditional independence of random variables . Great care is taken in discussing feedback .
Chapter 9 is an introduction to rate-distortion theory. The version of therate-distortion theorem here, proved by using strong typicality , is a strongerversion of the original theorem obtained by Shannon.
In Chapter 10, the Blahut-Arimoto algorithm s for computing channel capacity and the rate-distortion function are discussed, and a simplified proof forconvergence is given. Great care is taken in handling distributions with zeroprobability masses .
PREFACE xiii
Chapter 11 is an introduction to network coding theory. The surprising factthat coding at the intermediate nodes can improve the throughput when an information source is multicast in a point-to-point network is explained. Themax-flow bound for network coding with a single information source is explained in detail. Multi-source network coding will be discussed in Chapter 15after the necessary tools are developed in the next three chapters.
Information inequalities are sometimes called the laws of information theorybecause they govern the impossibilities in information theory. In Chapter 12,the geometrical meaning of information inequalities and the relation betweeninformation inequalities and conditional independence are explained in depth.The framework for information inequalities discussed here is the basis of thenext two chapters.
Chapter 13 explains how the problem of proving information inequalitiescan be formulated as a linear programming problem. This leads to a completecharacterization of all information inequalities which can be proved by conventional techniques. These are called Shannon-type inequalities, which can nowbe proved by the software ITIP which comes with this book. It is also shownhow Shannon-type inequalities can be used to tackle the implication problemof conditional independence in probability theory.
All information inequalities we used to know were Shannon-type inequalities. Recently, a few non-Shannon-type inequalities have been discovered .This means that there exist laws in information theory beyond those laid downby Shannon. These inequalities and their applications are explained in depthin Chapter 14.
Network coding theory is further developed in Chapter 15. The situationwhen more than one information source are multicast in a point-to-point network is discussed. The surprising fact that a multi-source problem is not equivalent to a few single-source problems even when the information sources aremutually independent is clearly explained. Implicit and explicit bounds onthe achievable coding rate region are discussed. These characterizations onthe achievable coding rate region involve almost all the tools that have beendeveloped earlier in the book, in particular, the framework for information inequalities.
Chapter 16 explains an intriguing relation between information theory andgroup theory. Specifically, for every information inequality satisfied by anyjoint distribution, there is a corresponding group inequality satisfied by anyfinite group and its subgroups, and vice versa. Inequalities of the latter typegovern the orders of any finite group and their subgroups. Group-theoreticproofs of Shannon-type information inequalities are given. At the end of thischapter, a group inequality is obtained from a non-Shannon-type inequalitydiscussed in Chapter 14. The meaning and the implication of this inequalityare yet to be understood.
xiv A FIRSTCOURSE IN INFORMATION THEORY
How to use this book
You are recommended to read the chapters according to the above chart.However, you will not have too much difficulty jumping around in the bookbecause there should be sufficient references to the previous relevant sections.
As a relatively slow thinker, I feel uncomfortable whenever I do not reasonin the most explicit way. This probably has helped in writing this book, inwhich all the derivations are from the first principle. In the book, I try toexplain all the subtle mathematical details without sacrificing the big picture.Interpretations of the results are usually given before the proofs are presented.The book also contains a large number of examples . Unlike the examples inmost books which are supplementary, the examples in this book are essential.
This book can be used as a reference book or a textbook. For a two-semestercourse on information theory, this would be a suitable textbook for the firstsemester. This would also be a suitable textbook for a one-semester courseif only information theory for discrete random variables is covered. If the instructor also wants to include topics on continuous random variables, this bookcan be used as a textbook or a reference book in conjunction with another suitable textbook. The instructor will find this book a good source for homeworkproblems because many problems here do not appear in any other textbook. Acomprehensive instructor's manual is available upon request. Please contactthe author at whyeung@ie .cuhk.edu.hk for information and access.
Just like any other lengthy document, this book for sure contains errors andomissions. To alleviate the problem, an errata will be maintained at the bookhomepage http://www.ie.cuhk.edu.hkJIT_book!.
RAYMOND W. YEUNG
Acknowledgments
Writing a book is always a major undertaking. This is especially so for abook on information theory because it is extremely easy to make mistakes inthe writing if one has not done research work on that particular topic . Besides,the time needed for writing a book cannot be under estimated. Thanks to thegenerous support of a fellowship from the Croucher Foundation, I have had theluxury of taking a one-year leave in 2000-01 to write this book. I also thankmy department for the support which made this arrangement possible.
There are many individuals who have directly or indirectly contributed tothis book . First, I am indebted to Toby Berger who taught me informationtheory and writing. Venkat Anantharam, Dick Blahut, Dave Delchamps, TerryFine, and Chris Heegard all had much influence on me when I was a graduatestudent at Cornell.
I am most thankful to Zhen Zhang for his friendship and inspiration. With out the results obtained through our collaboration, this book would not be com plete. I would also like to thank Julia Abrahams, Vijay Bhargava, Agnes andVincent Chan, Tom Cover, Imre Csiszar, Bob Gallager, Bruce Hajek, Te SunHan , Jim Massey, Alon Orlitsky, Shlomo Shamai, Sergio Verdii, Victor Wei,Frans Willems, and Jack Wolf for their support and encouragement throughoutthe years . In particular, Imre Csiszar has closely followed my work and hasgiven invaluable feedback: I would also like to thank all the collaborators ofmy work for their contribution and all the anonymous reviewers for their usefulcomments.
During the process of writing this book , I received tremendous help fromNing Cai and Fangwei Fu in proofreading the manuscript. Lihua Song andTerence Chan helped compose the figures and gave useful suggestions. With out their help, it would have been quite impossible to finish the book withinthe set time frame. I also thank Yu He, Andries Hekstra, Siu- Wai Ho, KingoKobayashi, Frank Kschischang, Amos Lapidoth, Li Ping, Prakash Narayan,Igal Sason, Wojtek Szpankowski, Emre Telatar, Chunxuan Ye, and Ken Zeger
xv
xvi A FIRSTCOURSE IN INFORMATION THEORY
for their valuable inputs. The code for ITIP was written by Ying-On Yan, andJack Lee helped modify the code and adapt it for the PC. Gordon Yeung helpedon numerous Unix and IbT}3X problems . The two-dimensional Venn diagramrepresentation for four sets which is repeatedly used in the book was taught tome by Yong Nan Yeh.
On the domestic side, I am most grateful to my wife Rebecca for her loveand support . My daughter Shannon has grown from an infant into a toddlerduring this time. She has added much joy to our family. I also thank mymother-in-law Mrs. Tsang and my sister-in-law Ophelia for coming over fromtime to time to take care of Shannon. Life would have been a lot more hecticwithout their generous help.
Contents
1. THE SCIENCE OF INFORMATION
2. INFORMATION MEASURES 5
2.1 Independence and Markov Chains 5
2.2 Shannon's Information Measures 10
2.3 Continuity of Shannon's Information Measures 16
2.4 Chain Rules 172.5 Informational Divergence 192.6 The Basic Inequalities 23
2.7 Some Useful Information Inequalities 25
2.8 Fano 's Inequality 282.9 Entropy Rate of Stationary Source 32
Problems 36Historical Notes 39
3. ZERO-ERROR DATACOMPRESSION 413.1 The Entropy Bound 42
3.2 Prefix Codes 453.2.1 Definition and Existence 453.2.2 Huffman Codes 48
3.3 Redundancy of Prefix Codes 54
Problems 58
Historical Notes 59
4. WEAK TYPICALITY 614.1 The Weak AEP 61
xvii
xviii A FIRSTCOURSE IN INFORMATION THEORY
4.2 The Source Coding Theorem 64
4.3 Efficient Source Coding 66
4.4 The Shannon-McMillan-Breiman Theorem 68
Problems 70
Historical Notes 71
5. STRONG TYPICALITY 73
5.1 Strong AEP 73
5.2 Strong Typicality Versus Weak Typicality 81
5.3 Joint Typicality 82
5.4 An Interpretation of the Basic Inequalities 92
Problems 93
Historical Notes 94
6. THE I-MEASURE 95
6.1 Preliminaries 96
6.2 The I -Measure for Two Random Variables 97
6.3 Construction of the I-Measure J.l* 100
6.4 J.l* Can be Negative 103
6.5 Information Diagrams 105
6.6 Examples of Applications 112
Appendix 6.A: A Variation of the Inclusion-Exclusion Formula 119
Problems 121
Historical Notes 124
7. MARKOV STRUCTURES 125
7.1 Conditional Mutual Independence 126
7.2 Full Conditional Mutual Independence 135
7.3 Markov Random Field 140
7.4 Markov Chain 143
Problems 146
Historical Notes 147
8. CHANNEL CAPACITY 149
8.1 Discrete Memoryless Channels 153
8.2 The Channel Coding Theorem 158
8.3 The Converse 160
Contents
8.4 Achievability of the Channel Capacity
8.5 A Discussion
8.6 Feedback Capacity
8.7 Separation of Source and Channel Coding
Problems
Historical Notes
9. RATE-DISTORTION THEORY
9.1 Single-Letter Distortion Measures
9.2 The Rate-Distortion Function R(D)
9.3 The Rate-Distortion Theorem
9.4 The Converse
9.5 Achievability of R[(D)
Problems
Historical Notes
10. THE BLAHUT-ARIMOTO ALGORITHMS
10.1 Alternating Optimization
10.2 The Algorithms10.2.1 Channel Capacity10.2.2 The Rate-Distortion Function
10.3 Convergence10.3.L A Sufficient Condition10.3.2 Convergence to the Channel Capacity
Problems
Historical Notes
11. SINGLE-SOURCE NETWORK CODING
11.1 A Point-to-Point Network
11.2 What is Network Coding?
11.3 A Network Code
11.4 The Max-Flow Bound
11 .5 Achievability of the Max-Flow Bound
11 .5.1 Acyclic Networks11.5.2 Cyclic Networks
Problems
Historical Notes
XIX
166
171
174
180
183
186
187
188
191
196
204
206
212
214
215
216
218218223
226227230
231
231
233
234
236
240
242
245246251
259
262
xx A FIRSTCOURSE IN INFORMATION THEORY
12. INFORMATION INEQUALITIES 263
12.1 The Region r; 265
12.2 Information Expressions in Canonical Form 267
12.3 A Geometrical Framework 26912.3.1 Unconstrained Inequalities 26912.3.2 Constrained Inequalities 27012.3.3 Constrained Identities 272
12.4 Equivalence of Constrained Inequalities 273
12.5 The Implication Problem of Conditional Independence 276
Problems 277
Historical Notes 278
13', SHANNON-TYPE INEQUALITIES 279
13.1 The Elemental Inequalities 279
13.2 A Linear Programming Approach 28113.2.1 Unconstrained Inequalities 28313.2.2 Constrained Inequalities and Identities 284
13.3 A Duality 285
13.4 Machine Proving - ITIP 287
13.5 Tackling the Implication Problem 291
13.6 Minimality of the Elemental Inequalities 293
Appendix 13.A: The Basic Inequalities and the PolymatroidalAxioms 297
Problems 298
Historical Notes 300
14. BEYOND SHANNON-TYPE INEQUALITIES 301
14.1 Characterizations of q ,q ,and r~ 302
14.2 A Non-Shannon-Type Unconstrained Inequality 310
14.3 A Non-Shannon-Type Constrained Inequality 315
14.4 Applications 321
Problems 324
Historical Notes 325
15. MULTI-SOURCE NETWORK CODING 327
15.1 Two Characteristics 32815.1.1 The Max-Flow Bounds 32815.1.2 Superposition Coding 330
Contents xxi
15.2 Examples of Application 33515.2.1 Multilevel Diversity Coding 33515.2.2 Satellite Communication Network 336
15.3 A Network Code for Acyclic Networks 337
15.4 An Inner Bound 340
15.5 An Outer Bound 342
15.6 The LP Bound and Its Tightness 346
15.7 Achievability ofRin 350
Appendix 15.A: Approximation of Random Variables withInfinite Alphabets 360
Problems 361
Historical Notes 364
16. ENTROPY AND GROUPS 365
16.1 Group Preliminaries 366
16.2 Group-Characterizable Entropy Functions 372
16.3 A Group Characterization of r~ 377
16.4 Information Inequalities and Group Inequalities 380
Problems 384
Historical Notes 387
Bibliography 389
Index403
A First Course inInformation Theory