65
NATO UNCLASSIFIED NATO NATO Consultation, Command Consultation, Command and Control Agency and Control Agency COMMUNICATIONS & INFORMATION COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Decreasing “Bit Pollution” through Pollution” through “Sequence Reduction” “Sequence Reduction” Dr. Davras Yavuz [email protected]

NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

Embed Size (px)

Citation preview

Page 1: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED

NATONATOConsultation, Command and Consultation, Command and

Control AgencyControl Agency

COMMUNICATIONS & INFORMATIONCOMMUNICATIONS & INFORMATION

SYSTEMS SYSTEMS

Decreasing “Bit Pollution” through Decreasing “Bit Pollution” through “Sequence Reduction”“Sequence Reduction”

Dr. Davras [email protected]

Page 2: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 2

You will find this presentation and the You will find this presentation and the accompanying paper at accompanying paper at

www.nc3a.info/MCC2006www.nc3a.info/MCC2006

from where both can be viewed and/or downloadedfrom where both can be viewed and/or downloaded

(the four other NC3A presentations can also be found (the four other NC3A presentations can also be found at the above URL) at the above URL)

Page 3: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 3

TerminologyTerminology

““Sequence Reduction” Sequence Reduction” Originates with Peribit ~2000, Founder’s Ph. D. on Genome Originates with Peribit ~2000, Founder’s Ph. D. on Genome Mapping - uses the term “Molecular Sequence Reduction” Mapping - uses the term “Molecular Sequence Reduction” (MCR) - Biomedical Informatics, Stanford University(MCR) - Biomedical Informatics, Stanford University

““Bit Pollution” Bit Pollution” Link/network pollution repetition of redundant digital Link/network pollution repetition of redundant digital sequences over transmission media (especially significant sequences over transmission media (especially significant for mobile/deployed networks/links)for mobile/deployed networks/links)

Other related terms: WAN optimizer, Other related terms: WAN optimizer, Application Accelerator/ Application Accelerator/ Optimizer or Application Controller-Optimizer, Performance Optimizer or Application Controller-Optimizer, Performance Enhancement Proxies (PEP), WAN Expanders, Latency (=delay) Enhancement Proxies (PEP), WAN Expanders, Latency (=delay) removers/compensators/mitigators ….. etc.removers/compensators/mitigators ….. etc.

New & dynamic field, many terms will continue to appear, coalesce, New & dynamic field, many terms will continue to appear, coalesce, some will catch on others will disappear some will catch on others will disappear

Page 4: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED

• “Next Generation Compression”, “Bit Pollution Reduction”, “Sequence Reduction” (latter Peribit/Dr. Amit Singh)

• WAN Expander (WX), WAN Optimizer, WAN Optimization Controller (WOC) (Juniper/Peribit)

• Application Accelerator/Optimizer/Controller-Optimizer• Latency Remover/Optimizer (replace Latency by “Delay” )

• Especially for networks with SATCOM links

• In general; use of a-priori knowledge of data comms protocols required by application to optimize the data input/output

• Combinations of above• Unfortunately all present implementations “proprietary”

• Unrealistic to expect “standards” soon, technology too new and lucrative

Terminology

Page 5: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 5

Why “Bit Pollution” ?Why “Bit Pollution” ?

Most of us deal daily with various electronic files/ informationMost of us deal daily with various electronic files/ information

Taking MS Office as an example; Word, PPT, Excel, Project, HTML, Taking MS Office as an example; Word, PPT, Excel, Project, HTML, Access, …. FilesAccess, …. Files

……and/or many other electronic files, data-bases, forms, etc.,..and/or many other electronic files, data-bases, forms, etc.,..

On many occasions we make small changes and send them back On many occasions we make small changes and send them back and/or forward to othersand/or forward to others

Repetitive traffic over communication links can, in general, be classified Repetitive traffic over communication links can, in general, be classified broadly into 3 categories:broadly into 3 categories:1) Application & protocol overheads1) Application & protocol overheads2) Commonly used words, phrases, strings, objects (logos, 2) Commonly used words, phrases, strings, objects (logos, images, audio clips, etc.)images, audio clips, etc.) 3) 3) Process flows (data-base updates/views, forms, templates, Process flows (data-base updates/views, forms, templates, etc. going back & forth)etc. going back & forth)

Page 6: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 6

SEQUENCE REDUCTIONSEQUENCE REDUCTIONNext Generation Compression Next Generation Compression

- Examples- Examples

256 Kbps satellite link256 Kbps satellite link 20 Mbytes PPT file (48 slides) sent 120 Mbytes PPT file (48 slides) sent 1stst time : ~12 minutes (700 secs) time : ~12 minutes (700 secs)

6 of the slides modified, file size change <0.5 Mbytes6 of the slides modified, file size change <0.5 Mbytes Modified file sent 6 hours later time taken: ~ 8 secsModified file sent 6 hours later time taken: ~ 8 secs Same modified file sent 24 hours later ~ 18 secsSame modified file sent 24 hours later ~ 18 secs

Sent 7 days later ~24 secsSent 7 days later ~24 secs Original file sent 7 days later : ~14 secsOriginal file sent 7 days later : ~14 secs

Similar results for Word, Excel files and web pagesSimilar results for Word, Excel files and web pages Less but still significant improvement for PDF filesLess but still significant improvement for PDF files Smallest improvement for zipped files (reduction by ~ 2.5 to 3)Smallest improvement for zipped files (reduction by ~ 2.5 to 3)

Amount of “new” files in between repetitions & SR RAM/HD capacities have strong Amount of “new” files in between repetitions & SR RAM/HD capacities have strong effect on the duration of repeat transmissions (dynamic library updates)effect on the duration of repeat transmissions (dynamic library updates)

Above results based on Peribit SR s : German MOD, Syracuse University Above results based on Peribit SR s : German MOD, Syracuse University “Real World” Labs (Network Computing Nov 2004) and NC3A“Real World” Labs (Network Computing Nov 2004) and NC3A

GE MOD results based on operational traffic, others test trafficGE MOD results based on operational traffic, others test traffic

Ref [6] of paper: Ref [6] of paper: “Record“Record for throughput was ~60Mbps through a T1. It came about for throughput was ~60Mbps through a T1. It came about when copying 1.5GB file twice! ”when copying 1.5GB file twice! ”

Page 7: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED

Mobile/Tactical Comms Mobile/Tactical Comms

DivergenceDivergence

Mobile/Tactical Comms Mobile/Tactical Comms

DivergenceDivergence

NATO UNCLASSIFIED

• Fixed communications – WANs with all users/nodes fixedFixed communications – WANs with all users/nodes fixed• Fiber-optic/photonic revolution: Essentially unlimited capacity is now possible/available if/when a cable can be installed

• Mobile comms: Networks with mobile/deployable users • No technological revolution similar to photonic foreseen• Radio propagation will be the limiting factor

–Mainstay will be radio: Tactical LOS tens/hundreds of Kbps, BLOS (rough terrain, long distances) few Kbps

–Star-wars scenarios : Moving laser beams ???• LEO satellites will provide some 100s of Kbps at a cost

• Divergence will continue • Another factor: Input into the five senses : ~100 Shannon/

Entropy bps– For transmission redundancy : x 10 = 1 Kbps

• Fixed communications – WANs with all users/nodes fixedFixed communications – WANs with all users/nodes fixed• Fiber-optic/photonic revolution: Essentially unlimited capacity is now possible/available if/when a cable can be installed

• Mobile comms: Networks with mobile/deployable users • No technological revolution similar to photonic foreseen• Radio propagation will be the limiting factor

–Mainstay will be radio: Tactical LOS tens/hundreds of Kbps, BLOS (rough terrain, long distances) few Kbps

–Star-wars scenarios : Moving laser beams ???• LEO satellites will provide some 100s of Kbps at a cost

• Divergence will continue • Another factor: Input into the five senses : ~100 Shannon/

Entropy bps– For transmission redundancy : x 10 = 1 Kbps

Therefore: we must treat mobile/tactical comms differentlyTherefore: we must treat mobile/tactical comms differently

Page 8: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 8

Deployable, Mobile, On-the-Deployable, Mobile, On-the-MoveMove

CommunicationsCommunications

At least one end of a link moving/deployedAt least one end of a link moving/deployed Networks which have nodes/users moving/deployedNetworks which have nodes/users moving/deployed

Such links/networks essential for survivability and rapid Such links/networks essential for survivability and rapid reaction reaction Will be taking on increasingly more critical tasksWill be taking on increasingly more critical tasks

Present approach: Use applications developed for fixed Present approach: Use applications developed for fixed links/networks for deployed/mobile unitslinks/networks for deployed/mobile units Must consider the very different characteristics of such networks Must consider the very different characteristics of such networks

when choosing applicationswhen choosing applications

Can we measure information” so we can determine performance of links/ Can we measure information” so we can determine performance of links/ networks in terms of “information” transported, not just bits/bytesnetworks in terms of “information” transported, not just bits/bytes

Page 9: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 9

Can we measure Can we measure “information” ?“information” ?Yes we can !Yes we can !

Shannon defined the concept of “Entropy”, a Shannon defined the concept of “Entropy”, a logarithmic measure in 1940s logarithmic measure in 1940s (while working on (while working on

cryptography)cryptography),, it has stood the test of time it has stood the test of time First suggestion of log measure was Hartley (base First suggestion of log measure was Hartley (base

10) but Shannon used the idea to develop a 10) but Shannon used the idea to develop a complete “theory of information & communication”complete “theory of information & communication”

Shannon preferred LogShannon preferred Log22 and called the “unit” bits and called the “unit” bits Base e is also sometimes used (Nats)Base e is also sometimes used (Nats)

Smaller the probability of occurrence of an event Smaller the probability of occurrence of an event higher the “information delivered” when it occurshigher the “information delivered” when it occurs

Page 10: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED

C. E. Shannon (BSTJ 1948)

{{

{Si} {Rj}

discrete

Discrete, countable

Page 11: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 11

EntropyEntropy

Entropy (H) in the case of two

possibilities/events/symbols

Prob of one = pthe other q = 1-p

H = -(p log p + q log q)

H versus p plotted

Page 12: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 12

Let us take a “Natural Language” English as an Let us take a “Natural Language” English as an exampleexample English has 26 letters (characters)English has 26 letters (characters) Space as a delimiterSpace as a delimiter TOTAL 27 characters (symbols)TOTAL 27 characters (symbols) One could include punctuation, special characters, One could include punctuation, special characters,

etc., for example we could use the full 256 ASCII etc., for example we could use the full 256 ASCII symbol set - methodology is the samesymbol set - methodology is the same

Extension to other natural languages readily madeExtension to other natural languages readily made Extension to images also possible (same Extension to images also possible (same

methodology)methodology)

Page 13: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 13

Structure of a “Natural Language” - EnglishStructure of a “Natural Language” - English Defined by many characteristics: Grammar, semantics, Defined by many characteristics: Grammar, semantics,

etymology, usage, …., historical developments, ….etymology, usage, …., historical developments, …. Until early 70s there was substantial belief that “Natural Until early 70s there was substantial belief that “Natural

Languages” and “computer programming languages” Languages” and “computer programming languages” (finite automata instructions) had similarities(finite automata instructions) had similarities

Noam Chomsky’s work (Professor at MIT) completely Noam Chomsky’s work (Professor at MIT) completely destroyed those expectationsdestroyed those expectations

Natural Languages can be studied through Natural Languages can be studied through probabilistic (Markov) models probabilistic (Markov) models Shannon’s approach Shannon’s approach (1940s, no computers, Bell Labs staff (1940s, no computers, Bell Labs staff

flipped through many pages of books to get the probabilities)flipped through many pages of books to get the probabilities) He was actually working on cryptography and He was actually working on cryptography and

made important contributions in that area alsomade important contributions in that area also

Page 14: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 14

Various Markov model examples here, Various Markov model examples here, skipped here for continuity, may be found skipped here for continuity, may be found

at the endat the end

Page 15: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 15

Zipf’s Law “Principle of Least Effort”Zipf’s Law “Principle of Least Effort” George Kingsley Zipf, Professor of Linguistics, Harvard (1902 – 1950)George Kingsley Zipf, Professor of Linguistics, Harvard (1902 – 1950) If the “words” in a language are ordered (“ranked”) from the

most frequently used down the probability Pn of the nth word

in this list is Pn 0.1 / n

Implies a maximum vocabulary size 12366 words since

( 1 / n is not finite when summed 1 to )

For details of above see DY IEEE Transactions on Information Theory, September 1974

Many other applications of “Zipf’s Law”, if interested just make a Google/Internet search

Page 16: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED

Zipf’s Law

(Principle of

Least Effort)

From “Symbols, Signals & Noise” J. R. Pierce

~ million words, various texts

Page 17: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 17

Entropy bits/character - EnglishEntropy bits/character - English

Amazingly it turns out to be about the same for most “Natural Languages” for which the analysis has been done (Arabic, French, German, Hebrew, Latin,

Spanish, Turkish, .…). These languages also follow Zipf’s Law.

Page 18: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 18

Entropy of Natural LanguagesEntropy of Natural Languages

Between 1 & 2 bits per letter/characterBetween 1 & 2 bits per letter/character

1.5 bits per letter is commonly used1.5 bits per letter is commonly used

English has ~4.5 letters per word on the averageEnglish has ~4.5 letters per word on the average

4.5 x 1.5 = 6.75 or ~7 bits per word 4.5 x 1.5 = 6.75 or ~7 bits per word

averageaverage

Normal speech 1 - 2 words per Normal speech 1 - 2 words per secondsecond

Hence information per second ~ 5 Hence information per second ~ 5 bitsbits

Page 19: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 19

Extension to ImagesExtension to Images

Same concept and definitionsSame concept and definitions Letters replaced by pixels/groups of pixels, etc.Letters replaced by pixels/groups of pixels, etc.

Words could be analogous to sets of pixels, objectsWords could be analogous to sets of pixels, objects The numbers are much largerThe numbers are much larger

E.g. 400 x 600 = 240000 pixel image with each pixel capable E.g. 400 x 600 = 240000 pixel image with each pixel capable of taking on one of 16 brightness levelsof taking on one of 16 brightness levels• 1616240000240000 possible images possible images

• Assume all these images are equally likely (*): Probability of Assume all these images are equally likely (*): Probability of one these images is 1/ 16 one these images is 1/ 16240000 240000 and the information and the information provided by that image is 240000 logprovided by that image is 240000 log22 16 = 0.96 10 16 = 0.96 1066 bits bits

• A real image contains much smaller “information” A real image contains much smaller “information” adjacent/nearby pixels are not independent of each otheradjacent/nearby pixels are not independent of each other

• Movies : frame to frame only small/incremental changes Movies : frame to frame only small/incremental changes

(*) “equally likely” assumption clearly not realistic“equally likely” assumption clearly not realistic

Page 20: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED

~5 b/s is irreducible information content, x by 10 to introduce redundancy - therefore we should be able communicate speech “information” at ~50 bps

Examples of speech coding we use:

64000 bps , 32000 bps PC64000 bps , 32000 bps PC

16000 bps CVSD, 2400 bps LPC, MELP 16000 bps CVSD, 2400 bps LPC, MELP

1200, 600 bps MELP1200, 600 bps MELP

All above “waveform” codecs, they will also convey “non-All above “waveform” codecs, they will also convey “non-measurable” (intangible) informationmeasurable” (intangible) information

Speech codecs (recognition at transmitter and synthesis at receiver ) technology could conceivably go lower than 600 bps but would not contain the intangible component !

Speech CodingSpeech Coding

Page 21: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 21

A QUICK REFRESHER ON A QUICK REFRESHER ON

CONVENTIONAL CONVENTIONAL

COMPRESSIONCOMPRESSIONMay be found at the endMay be found at the end

Page 22: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 22

SEQUENCE REDUCTIONSEQUENCE REDUCTIONNext Generation Next Generation

CompressionCompression

Dictionary based – implements learning algorithmDictionary based – implements learning algorithm Dynamically learns the “language” of the communications traffic Dynamically learns the “language” of the communications traffic

and translates into “short-hand”and translates into “short-hand” Continuously updates/improves “knowledge” of link “language” Continuously updates/improves “knowledge” of link “language” Frequent patterns move up in dictionary, infrequent patterns Frequent patterns move up in dictionary, infrequent patterns

move down and eventually can age out move down and eventually can age out No fixed packet or window boundariesNo fixed packet or window boundaries

Unlike e.g. LZ which generally uses 2048 byte windowUnlike e.g. LZ which generally uses 2048 byte window

Once a pattern is learned and put in dictionary it will be Once a pattern is learned and put in dictionary it will be compressed wherever it appearscompressed wherever it appears

Data compression is based on previously seen dataData compression is based on previously seen data

Performance improves with time as “learning” increasesPerformance improves with time as “learning” increases Very quickly at first (10 –20 minutes) and then slowlyVery quickly at first (10 –20 minutes) and then slowly When a new application comes in, SR adapts to its “language”When a new application comes in, SR adapts to its “language”

Page 23: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIEDRelative positioning of statistical and substitutional compression algorithms (from Peribit, A. P. Singh)

MOLECULAR SEQUENCE R

EDUCTION

Page 24: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 24

““Molecular Sequence reduction”Molecular Sequence reduction”

www.Peribit.com

Page 25: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 25

MSR – TechnologyMSR – Technology

Origins in DNApattern matching

Real time, high speed, low latencyReal time, high speed, low latency Continuously learns and updates dictionaryContinuously learns and updates dictionary Transparently operates on all traffic Transparently operates on all traffic (optimized for IP)(optimized for IP) Eliminates patterns of any size, anywhere in streamEliminates patterns of any size, anywhere in stream Patent-pending technologyPatent-pending technology

Page 26: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 26

MSR – MSR – Molecular Sequence ReductionMolecular Sequence Reduction““Next-gen dictionary-based compression”Next-gen dictionary-based compression”

www.peribit.com

Page 27: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 27

Government/Military use Government/Military use examplesexamples

Many thousands of units in use in USA Many thousands of units in use in USA (mostly corporate but also government (mostly corporate but also government agencies)agencies)

GE MOD using Peribit SRs (since ~2 years)GE MOD using Peribit SRs (since ~2 years) INMARSAT German Navy WAN (encrypted)INMARSAT German Navy WAN (encrypted) Links to GE Navy ships in/around South AfricaLinks to GE Navy ships in/around South Africa Satellite links to GE units in AfghanistanSatellite links to GE units in Afghanistan Plans for some 64 Kbps landlinesPlans for some 64 Kbps landlines GE MOD total : 300+ unitsGE MOD total : 300+ units

also other nations ……also other nations …… Some with initial trials Some with initial trials

Page 28: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 28

Reduction rates observed(reduced by % amount given)

GE Armed Forces Results

Traffic type Version 3.0 V 4.02 V 5.0

HTTP 30 % 40 % 46 %

MAIL 61 % 67 %  

NetBios 59 % 62 %  

CIFS 92 % 92 %  

FTP 69 % 73 %  

TELNET 65 % 69 %  

93 %

Page 29: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 29

From German MOD

Page 30: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 30

Startup behavior example From German MOD

Page 31: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 31

From German MOD

Page 32: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 32

From German MOD

Page 33: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 33

From Peribit.com (not GE MOD data)

Page 34: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 34

EFFECTIVE WAN CAPACITYINCREASED BY 2.80DATA REDUCTION BY 64.34 %

NO DATA COMPRESSION & NO REDUCTION

WITH DATA COMPRESSION & REDUCTION !!!

Peribit (screen capture)

NC3A – WAN (NL – BE)

Page 35: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 35

Page 36: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 36

Peribit Sequence ReducersPeribit Sequence Reducers

www.peribit.com

Page 37: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 37

512 kbpssatellite link

MultiplexedTCP/IP

Link with SCPS-TP acceleration

Link with application accelerator & IP data compressor

Un-accelerated link

NC3A TEST RESULT NC3A TEST RESULT SUMMARYSUMMARY

Expand Model 4800 “WAN Link Expand Model 4800 “WAN Link Accelerators”Accelerators”

Page 38: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 38

512 kbpssatellite link

MultiplexedTCP/IP

Link with SCPS-TP acceleration

Link with application accelerator & IP data compressor

Un-accelerated link

NC3A TEST RESULT SUMMARYNC3A TEST RESULT SUMMARY

Page 39: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 39

Link with SCPS-TP acceleration

Link with application accelerator & IP data compressor

Un-accelerated link

512 Kbps 512 Kbps satellite linksatellite link

10 multiplexed 10 multiplexed

TCP/IP sessionsTCP/IP sessions

Page 40: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 40

PacketeerPacketeer

Page 41: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 41

IndustryIndustry

New area but many & increasing number of companiesNew area but many & increasing number of companies

Peribit.com (now Juniper Networks)Peribit.com (now Juniper Networks)Expand.com (Expand Networks)Expand.com (Expand Networks)Packeteer.comPacketeer.comRiverbed.comRiverbed.comSilver-peak.comSilver-peak.com……....

National authorities (e.g. USA & GE) also working with National authorities (e.g. USA & GE) also working with industry to incorporate SR/WX technology into national industry to incorporate SR/WX technology into national

crypto devicescrypto devices

Page 42: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 42

SEQUENCE REDUCTIONSEQUENCE REDUCTIONNext Generation CompressionNext Generation Compression

Summary (1)Summary (1)

WANs will form backbone of Network Enabled OperationWANs will form backbone of Network Enabled OperationThis technology provides significant improvements in capacityThis technology provides significant improvements in capacity

Dictionary based – implements learning algorithmDictionary based – implements learning algorithm Dynamically learns the “language” of the communications traffic Dynamically learns the “language” of the communications traffic

and translates into “short-hand”and translates into “short-hand” Continuously updates/improves “knowledge” of link “language” Continuously updates/improves “knowledge” of link “language” Frequent patterns move up in dictionary, infrequent patterns Frequent patterns move up in dictionary, infrequent patterns

move down and eventually can age out move down and eventually can age out No fixed packet or window boundariesNo fixed packet or window boundaries

Unlike conventional compression which operates over 1-2 KbytesUnlike conventional compression which operates over 1-2 Kbytes Once a pattern is learned and put in dictionary it will be compressed Once a pattern is learned and put in dictionary it will be compressed

wherever it appearswherever it appears

Data compression is based on previously seen dataData compression is based on previously seen data Performance improves with time as “learning” increasesPerformance improves with time as “learning” increases

Very quickly at first (10 –20 minutes) and then slowlyVery quickly at first (10 –20 minutes) and then slowly When a new application comes in, SR adapts to its “language”When a new application comes in, SR adapts to its “language”

Page 43: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 43

SEQUENCE REDUCTIONSEQUENCE REDUCTIONNext Generation CompressionNext Generation Compression

Summary (1)Summary (1)

• Significant advantages for WANs where capacity Significant advantages for WANs where capacity is an issue (i.e. deployed/mobile/tactical)is an issue (i.e. deployed/mobile/tactical)

• Removes redundant/repetitive transmissionsRemoves redundant/repetitive transmissions• Packet-flow acceleration (latency removal) can be Packet-flow acceleration (latency removal) can be

easily addedeasily added• Quality of Service & Policy Based Multipath can Quality of Service & Policy Based Multipath can

also be implementedalso be implemented• Does not impact security implementations Does not impact security implementations

(cryptos between SRs)(cryptos between SRs)

HoweverHowever• Presently available from a few sources, each with Presently available from a few sources, each with

its “proprietary” technology its “proprietary” technology

Page 44: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 44

ConclusionsConclusions

Shannon Information Theory provides tools for Shannon Information Theory provides tools for measuring “information” as “Entropy”measuring “information” as “Entropy”

Has formed the basis for most of the coding, Has formed the basis for most of the coding, data transmission/detection results since 1950sdata transmission/detection results since 1950s

DNA / Genome mapping process has also DNA / Genome mapping process has also apparently benefited from itapparently benefited from it In 90s estimate for human genome was 20-30 years; took 2-In 90s estimate for human genome was 20-30 years; took 2-

3 years with the computational developments in late 90s3 years with the computational developments in late 90s A new form of compression, “Sequence Reduction” A new form of compression, “Sequence Reduction”

provides significant reductions by reducing redun-provides significant reductions by reducing redun-dancies in transmitted datadancies in transmitted data Will provide important advantages for mobile/deployable/moving Will provide important advantages for mobile/deployable/moving

WAN link applicationsWAN link applications

Page 45: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 45

QuestionsQuestionsComments Comments

This presentation & associated paper can be found at This presentation & associated paper can be found at

www.nc3a.info/MCC2006www.nc3a.info/MCC2006

Page 46: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 46

NC3ANC3A

NC3A Brussels

Visiting address:

Bâtiment ZAvenue du Bourget 140B-1110 BrusselsTelephone +32 (0)2 7074111Fax +32 (0)2 7078770

Postal address:NATO C3 AgencyBoulevard Leopold IIIB-1110 Brussels - Belgium

NC3A The Hague

Visiting address:

Oude Waalsdorperweg 612597 AK The Hague

Telephone +31 (0)70 3743000Fax +31 (0)70 3743239

Postal address:NATO C3 AgencyP.O. Box 1742501 CD The HagueThe Netherlands

Page 47: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 47

Markov model examplesMarkov model examples

Page 48: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED

AZEWRTZYNSADXESYJRQY_WGECIJJ_OB AZEWRTZYNSADXESYJRQY_WGECIJJ_OB

_KRBQPOZB_YMBUAWVLBTQCNIKFMP_KM_KRBQPOZB_YMBUAWVLBTQCNIKFMP_KM

VUUGBSAXHLHSIE_MAULEXJ_NATSKIVUUGBSAXHLHSIE_MAULEXJ_NATSKI

AZEWRTZYNSADXESYJRQY_WGECIJJ_OB AZEWRTZYNSADXESYJRQY_WGECIJJ_OB

_KRBQPOZB_YMBUAWVLBTQCNIKFMP_KM_KRBQPOZB_YMBUAWVLBTQCNIKFMP_KM

VUUGBSAXHLHSIE_MAULEXJ_NATSKIVUUGBSAXHLHSIE_MAULEXJ_NATSKI

Zeroth approximation to English (zero memory)

[Zero order Markov : equally likely letters, 27 numbers ]

All logs base 2

Entropy = pi log (1/pi) for i = 1 to 27 = log 27 = 4.75 bits / letter (or symbol)

Page 49: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED

AI_NGAE__ITF__NR_ASAEV_OIE_BAINTHHHYROAI_NGAE__ITF__NR_ASAEV_OIE_BAINTHHHYRO

O_POER_SETRYGAIETRWCO__ EHDUARU_ O_POER_SETRYGAIETRWCO__ EHDUARU_

EU_C_FT_NSREM_DIY_EESE_ F_O_SRIS_R EU_C_FT_NSREM_DIY_EESE_ F_O_SRIS_R

__UNNASHOR_CIE_AT_XEOIT_UTKLOOUL_E__UNNASHOR_CIE_AT_XEOIT_UTKLOOUL_E

AI_NGAE__ITF__NR_ASAEV_OIE_BAINTHHHYROAI_NGAE__ITF__NR_ASAEV_OIE_BAINTHHHYRO

O_POER_SETRYGAIETRWCO__ EHDUARU_ O_POER_SETRYGAIETRWCO__ EHDUARU_

EU_C_FT_NSREM_DIY_EESE_ F_O_SRIS_R EU_C_FT_NSREM_DIY_EESE_ F_O_SRIS_R

__UNNASHOR_CIE_AT_XEOIT_UTKLOOUL_E__UNNASHOR_CIE_AT_XEOIT_UTKLOOUL_E

First approximation to English (zero memory)

[Zero order Markov : letter probabilities, 27 numbers ]

Entropy = pi log (1/pi) for i = 1 to 27 = ~ 4 bits / letter

Page 50: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED

URTESHETHING_AD_E AT_FOULE_ URTESHETHING_AD_E AT_FOULE_

ITHALIORT_WACT_D_STE_MINTSAN_OLIITHALIORT_WACT_D_STE_MINTSAN_OLI

NS__TWID_OULY_TE_THIGHE_CO_YS_THNS__TWID_OULY_TE_THIGHE_CO_YS_TH

_HR_ UPAVIDE_PAD_CTAVED_QUES_E_HR_ UPAVIDE_PAD_CTAVED_QUES_E

Second approximation to English (memory)

[First order Markov : e.g. prob(a|a), prob(b|a), prob(c|a), … ,

27 x 27 = 729 numbers, some zero]

Entropy = pi,k log (1/pi/k) for i = 1 to 729 (= 27 x 27) = ~ 3.3 bits / letter

Page 51: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED

IANKS _CAN_OU_ANG_RLER_THATTED IANKS _CAN_OU_ANG_RLER_THATTED

_OF_TO_SHOR_OF_TO_HAVEMEM_A_I__OF_TO_SHOR_OF_TO_HAVEMEM_A_I_

MAND_AND_BUT_WHISSITABLY_THERVMAND_AND_BUT_WHISSITABLY_THERV

EREER_EIGHTS_TAKILLIS_TA_KIND_ALEREER_EIGHTS_TAKILLIS_TA_KIND_AL

Third approximation to English (memory)

[Second order Markov : e.g. prob(a|aa), prob(a|ab), prob(a|ac), …,

….., prob(z|zy), prob(z|zz - 27 x 27 x 27 = 19683, ~

75% zero]

(Shannon calls these “di-gram probabilities)

Entropy: ~ 3 bits / letter

Page 52: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED

JOU_MOUPLAS_DE_MONNERNAISSAINJOU_MOUPLAS_DE_MONNERNAISSAIN

S_DEME_US_VREH_BRETU_DE_TOUCS_DEME_US_VREH_BRETU_DE_TOUC

HEUR_DIMMERE_LLES_MAR_ELAME_HEUR_DIMMERE_LLES_MAR_ELAME_

RE_A_VER_IL_DOUVENTS_SO_FUITERE_A_VER_IL_DOUVENTS_SO_FUITE

Third approximation to French

N. Abramson “Information Theory & Coding”

Page 53: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED

ET_LIGERCUM_SITECI_LIBEMUS_ACET_LIGERCUM_SITECI_LIBEMUS_AC

ERELEN_TE_VICAESCERUM_PE_NONERELEN_TE_VICAESCERUM_PE_NON

_SUM_MINUS_UTERNE_UT_IN_ARION_SUM_MINUS_UTERNE_UT_IN_ARION

_POPOMIN_SE_INQUENEQUE_IRA_POPOMIN_SE_INQUENEQUE_IRA

Third approximation to ????

N. Abramson “Information Theory & Coding”

Page 54: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED

WE COULD CONTINUE THIS WITH CONDITIONAL WE COULD CONTINUE THIS WITH CONDITIONAL

PROBABILITIES GIVEN TRIPLETS (tri-grams), PROBABILITIES GIVEN TRIPLETS (tri-grams),

QUADRUPLETS (tetra-grams), … n-grams,... QUADRUPLETS (tetra-grams), … n-grams,...

etc. (i.e. metc. (i.e. mthth ORDER MARKOV SOURCES m ORDER MARKOV SOURCES m

3)3) HOWEVER, THIS BECOMES IMPRACTICAL AS THE HOWEVER, THIS BECOMES IMPRACTICAL AS THE

NUMBER OF JOINT PROBABILITIES BECOMES TOO NUMBER OF JOINT PROBABILITIES BECOMES TOO

LARGE - SO SHANNON JUMPED TO MARKOV LARGE - SO SHANNON JUMPED TO MARKOV

SOURCES WITH WORDS AS SYMBOLS - symbol SOURCES WITH WORDS AS SYMBOLS - symbol

set no longer 27 characters, but thousands of set no longer 27 characters, but thousands of

words. However m=1,2 Markov model gives much words. However m=1,2 Markov model gives much

betterbetter results than n-gram analysis as “n” results than n-gram analysis as “n”

is increased is increased

Page 55: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED

REPRESENTING AND SPEEDILY IS AN REPRESENTING AND SPEEDILY IS AN

GOOD APT OR COME CAN DIFFERENTGOOD APT OR COME CAN DIFFERENT

NATURAL HERE HE THE A IN CAME THE TO NATURAL HERE HE THE A IN CAME THE TO

OF TO EXPERT GRAY COME TO FURNISHES OF TO EXPERT GRAY COME TO FURNISHES

THE LINE MESSAGE HAD BE THESE …THE LINE MESSAGE HAD BE THESE …

Fourth approximation to English

[Zero order Markov with words : e.g. Probability of

words, zero memory]

(Shannon 1948)Entropy = ~ 2.2 bits / letter (using Zipf’s Law)

Page 56: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED

THE HEAD AND IN FRONTAL ATTACK ON AN THE HEAD AND IN FRONTAL ATTACK ON AN

ENGLISH WRITER THAT THE CHARACTER OF ENGLISH WRITER THAT THE CHARACTER OF

THIS POINT IS THEREFORE ANOTHER THIS POINT IS THEREFORE ANOTHER

METHOD FOR THE LETTERS THAT THE TIME METHOD FOR THE LETTERS THAT THE TIME

OF WHO EVER TOLD THE PROBLEM FOR AN…OF WHO EVER TOLD THE PROBLEM FOR AN…

Fifth approximation to English (memory)

[First order Markov with words :

e.g. Probability (wordi | wordj)

(Shannon 1948)

Page 57: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED

BIR ANLATTIKLARINA BIR ANLATTIKLARINA GŰLMECE YAZDIGŰLMECE YAZDI

YAPITLARININ ŞARAP BİÇİMLERİ BELA YAPITLARININ ŞARAP BİÇİMLERİ BELA

GÖRŰNŰMŰ GÖRŰNŰMŰ GGİİBBİ AMA BİR ETMEK YOK İ AMA BİR ETMEK YOK

TUTULDU GELEN TUTULDU GELEN GİDENGİDEN YER YER KALMADIKALMADI ... ...

Fifth approximation to Turkish (memory)

[First order Markov with words :

e.g. Probability (wordi | wordj)

Page 58: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 58

A QUICK REFRESHER ON A QUICK REFRESHER ON

CONVENTIONAL COMPRESSIONCONVENTIONAL COMPRESSION

Page 59: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED

Lossy Compression

•Not necessarily a copy of the input: most audio, image, video compression algorithms are “Lossy” – our ears and eyes have resolution thresholds

Loss-less Compression

•Data integrity essential in digital data communications – Network compression must be “Loss-less”

•Two basic approaches

•Statistical compression algorithms

•Substitutional compression algorithms

Conventional CompressionConventional Compression

Page 60: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED

Statistical compression : Probabilities of characters in the input

data calculated (or given) - frequently occurring characters are encoded into

fewer bits [e.g. Huffman code, Morse code]

• Static coding : Once the coding is determined in accordance with the probabilities of occurrence it does not change

• Dynamic coding : Coding changes with “context” - for example, the occurrence of “q” in English increases the probability of occur-rence of “u” to 1, similarly the occurrence of “th” significantly increases the probability of occurrence of “e” , etc.

• As the amount of “historical context” information increases “dynamic coding” techniques can approach “Shannon limit”, however computational requirements increase exponentially making them impractical for real-time/on-line applications

Page 61: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED

Substitutional compression : Identifies repeated strings of

characters (longer the better) and replaces them with reference

identifiers or tokens (shorter the better) - At the receiver the tokens

are de-referenced and the reverse substitution performed

• Essentially a form of “pattern recognition” and classification• Pattern detection/recognition generally much faster than

computations needed for dynamic coding algorithms• Most network compression techniques in use today use

substitutional compression

Compression techniques can also be combined – for example

substitution based compression followed by static coding, etc.

Page 62: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED

• “Substitution” based compression is the basis of almost all

network compression implementations

• Principle of all : replace repeated patterns with shorter tokens

• Different techniques for detecting/encoding repeated patterns

Two basic approaches :

• Lempel-Ziv (LZ) “stateless” window compression

• e.g. v.42bis, fax compression, LZS(STAC)

• Predictor compression

• Tries to predict the next input byte : the matching algorithm looks for the most recent match of any pattern rather than best and longest match - higher speed but misses many significant pattern repetitions therefore lower data reduction (not much used)

Page 63: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED

Published in 1977 (hence LZ77)

• Basis of ~all loss-less data compression implementations today

• Repeated “strings” replaced by “pointers” to the previous location where the string had occurred

• Buffer or “window” required for the “historical” information to be available for reference – typically 1000 – 2000 bytes (mostly 2048 bytes)

• All previous data outside the buffer/window is lost or “forgotten” hence the name “stateless” or memory-less

•Can find and compress only patterns that are repeated within the window – repetitions separated by more than window size are ignored

• Poor scalability: For compression efficiency large window size is required but this increases pattern search computation significantly

• Good for “file compression” type applications

Lempel-Ziv (LZ) “stateless” window compression

Page 64: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 64

Page 65: NATO UNCLASSIFIED NATO Consultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS SYSTEMS Decreasing “Bit Pollution” through “Sequence

NATO UNCLASSIFIED 65

Nov 1978, University of Pennsylvania, Museum Hall, Banquet in honor of Claude E. Shannon receiving H. Pender award (Prof. F. Haber & DY)