A Text Recognition Procedure for Cryptanalysis - NSA.gov · PDF fileDID: 4106291 UNCLASSIFIED A Text Recognition Procedure for Cryptanalysis BY C. V. KIMBALL Unclassified Presented

DOCID: 4106291 UNCLASSIFIED

A Text Recognition Procedure for Cryptanalysis

BY C. V. KIMBALL

Unclassified

Present:ed before the I ntemationa/, Symposium in Information Theory at UCLA, 31 January~2 February 1966, this paper revea/,s the charac'ter and qua/,ity of work in cryptana/,ysis now in progress in universities.

It uses standard Bayesian methods equivalent to the common log scoring 'techinques with which our ana/,ysts are familiar. The conditions for acceptance or rejection of the null hypothesis are somewhat more explicitl,y stat:ed, perhaps, but no new concepts are involved. The solution given is for the easy part of the problem, with a/,l probabilities assumed known, and even a priori odds. The rea/,ly cha/,lenging task is to deduce useful weights, not from known or assumed probabilities, but from samples of the cipher or code. The problem treat:ed is clearly and carefully sta'ted, and the solution is thorough and compe'tent.

Dr. B. C. Getchell, Pl

INTRODUCTION

This paper is a brief summary of a long-term study reported in detail in [l]. AB a result, much of the underlying material is given only cursory treatment. The first section introduces the decision procedure used in the study and explains the importance of source redundancy in making rapid decisions. In the second section, the decision procedure is applied to the recognition of natural-language texts among random texts. In the third section, the recognition procedure is applied to a general cryptographic problem.

DEFERRED-DECISION PROCEDURE

In the binary detection problem considered here, one of two sources, SN (signal plus noise) or N (noise alone), provides a stream of M symbols to the decision device. 1 The decision device is to respond A if SN is present, or B if N is present. Decisions are made on the basis of sequential observation8 of successive symbols from the source. The two sources generate independent symbols taken from the same finite alphabet,(Z1, Z2, ... , ZK), according to known probability distribu-

1 Much of the current work in deferred-decision theory involves the detection of signals in noise, and the symbols above are common.

103 UNCLASSIFIED

-- -----

A.pproved for Release by \JSA on 03-28-2014, FOIA ~ase # 74462

DOCID: 4106291

UNCLASSIFIED TEXT RECOGNITION

tions which are conditional to the source. We let a; be the probability of Z; when SN is present and let {3; be the probability of Z; when N is present. In addition, we will assume that the a priori probabilities of SN and N are known.

For this problem, the deferred-decision procedure leads to minimwn expected losses according to a predetermined loss rule. This loss rule assigns a cost WM to the response B when the SN source is present, and a cost W F to the response A when N is present. Also, a fixed cost, C, is charged for each observation. When the parameters M, I a;l, I {3 ;I, W F, WM, and Care specified, the decision procedure can be found by using a computer-implemented optimization process.

The log-odds-ratio transformation is helpful in describing the operation of the decision device. Let m designate the nwnber of observations that have been taken; then the log-odds-ratio after m observations is given by

P(S Nim observations) Lm = Zn 1 - P(SN/m observations) (l)

When the state of the decision process is expressed in terms of Lm, Bayes' rule can be written in a form particularly convenient for analysis and implementation:

Lm+1 = Lm +X(Z;). (2)

Here Lm+ 1 is the log odds ratio after the symobl Z; is observed, and X; is the log likelihood ratio of Z;, given by

X (Z;) = lna;/fh (3) The decision process is usually considered in terms of L,... The

decision function is represented by M +1 pairs, (Am, r,,.), of decision points. If Lm 2:: Am, the decision is A; if Lrn < r rn, the decision is B. If Am 2::Lrn > r ,,., another observation is taken and the process continues. Fig. 1 is helpful in visualizing the operation of the device.

As background, let us consider an important theorem for the case in which the symbols from the N source have a uniform distribution -that is, in which {3; = 1/ K for all i. For sequential procedures, as is well known, ([2], [3]), the speed with which decisions are made depends on Aµ, the difference in mean values of the log likelihood ratio in SN and N.

Aµ = E(X ISN) - E(X IN). (4)

The following theorem relates Aµ to the Shannon-redundancy of the SN source. Theorem

Aµ 2:: ln2 [1og2 K + ;~ o:;log2 a] > ln2 R(SN).

UNCLASSIFIED 104

(5)

(6)

DOCID: 4106291 4.

L

2.

"· ,, !!Cl

lG 20

... k ..

-2. [ L. ______

0 () t:I C) :< .. -4 .

~ ...... .. 0 tr ~ °' ..

jl. to > ~ -6. t< ;· t< o·

= "' & -B.

!&. "" s:: c: ~ -10. c: z z

'"' '"' .... ro~ ....

> > (/) (/) (/)

-12. (/)

55 :;;

'" m 0 0

DOCID: 4106291

UNCLASSIFIED TEXT RECOGNITION

Thus one can often compare the detectability of two sources when only their respective redundancies are known.

THE RECOGNITION PROCEDURE

The problem considered here is that of recognizing natural-language among random texts in which symbols are uniformly distributed. Natural-language text is defined as a meaningful sequence of symbols from a written language-that is, letters spelling words which, together, convey a meaning. A deferred-decision procedure is used to recognize the natural language rapidly and achieve optimum loss performance.

The deferred-decision procedure can be used only when distributions of the symbols for random and natural-language texts differ. The theorem in Section 1 suggests that the redundancy of natural-language text is an excellent measure of the differences between the two distributions. Since the single-symbol redundancy of most natural languages is of the order of 15 percent, the deferred-decision procedure provides an effective recognition technique. For purposes of analysis, we have considered the problem of recognition of English text; of course, the procedure can be applied to other written languages.

The deferred-decision procedure requires that the loss parameters WF, Wm, and C, be specified; they, in turn, depend on the problem at hand. Here we will use the loss conditions that arise in the application of the procedure to cryptography, described in the next section. In this application the loss structure has the following characteristics.

(1) The cost of a miss, Wm, is much greater than that of a false alarm, WF.

(2) The cost of observation for a single symbol is much smaller than that of either kind of error loss.

The present analysis is based on the loss ratio W m/W F = 500.0 and W p/C = 10,000, and yields representative results for this loss structure.

The deferred-decision procedure was analyzed for the above loss conditions for the case in which 36 observations were available to the decision device; that is, M = 36. The probabilities a; were taken as the single-letter probabilities for English and are shown with their corresponding )\'s in Table 1. Figure 2 depicts the decision functions obtained and the motion of the log-odds ratio for two texts. The first text is from a well known news magazine, the second has been derived with the aid of a table of random units. Decision procedure uses a priori probability of English of 0.000002. The theoretical per· formance of the procedure under these conditions is given as a function of the probability of SN in Table 2, where "sN and "N designate

UNCLASSIFIED 106

. _J

DOCID: 4106291

C. V. KIMBALL UNCLASSIFIED

the expected number of observations in SN and N, respectively. From these results we conclude that English text can be recognized very rapidly by the deferred-decision procedure.

Symbol, Zi a; {3 i f. (Z ;)

A ' 0.815 0.0385 +0.7510 B 0.0144 0.0385 -0.9817 c 0.0276 0.0385 -0.3322 D 0.0379 0.0385 -0.0149 E 0.1311 0.0385 +I.2258 F 0.0292 0.0385 -0.2737 G 0.0119 0.0385 -0.6564 H 0.0526 0.0385 +0.3131 I 0.0635 0.0385 +0.5008 J 0.0013 0.0385 -3.3720 K 0.0042 0.0385 -2.2145 L 0.0339 0.0385 -0.1262 M 0.0254 0.0385 -0.4161 N 0.0710 0.0385 +0.6129 0 0.0800 0.0385 +0.7316 p 0.0198 0.0385 -0.6624 Q 0.0012 0.0385 -3.4590 R 0.0683 0.0385 +0.5747 s 0.0610 0.0385 +0.4616 T 0.1047 0.0385 +1.0011 u 0.0246 0.0385 -0.4469 v 0.0092 0.0385 -1.4315 w 0.0154 0.0385 -0.9153 x 0.0017 0.0385 -3.1428 y 0.0198 0.0385 -0.6624 z 0.0008 0.0385 -3.9110

Table 1.-Symbol probabilities for English and random texts.

107 UNCLASSIFIED

4106291 ,\{ = 36

W,,iC = 10.ooci

W 11 ,'~V,, ....- 500

10 15

/,..., for Text 1

~

B Decision

Text lo LEGlm CONN!!;CT!ON TO CIVIL RIGHTS X I~ SHORT

Text 2: TRCSG VJMUY GFADF AYHBA NZNIV MIGEX MDHVZ T

L c z I"\ r

+s > ~

- 6 :!! m

-t4 0

+"

A Decision ..., l':I

- 2 x ..., - 4 :tl

t"l C"'l

-6 0 t:'J

- 8 z ~

-10 0 z -12

- 1'

- 16

DOCID: 4106291 C. V. KIMBALL UNCLASSIFIED

A Priori Probability Probability Probability of Correct of "SN "N

Of SN Detection False Alarm

0.000002 0.754 <0.0001 27. 3.3 0.00001 0.873 0.0002 33. 6.1 0.0001 0.983 0.0006 33. 10.0 0.001 0.996 0.0021 31. 13.0 0.01 0.999 0.0065 27. 17.0 0.1 0.999 0.0174 24. 21.0

Table 2.-Theoretical performance of the recognition procedure.

w F/C = 10.000 WM/WF = 500 M = 36

APPLICATION TO CRYPTANALYSIS

The recognition procedure described in the preceding section was developed for use in cryptanalysis, the extraction of meaningful text from an enciphered text without a key. Although a particular cipher is used as an example, the method presented here is applicable to any cipher that preserves the redundancy of the concealed text.

A cipher is a set of invertible transformations {f1} of a natural language text t into an enciphered text x,

f1(t) = x. (7)

The subscript j designates the particular transformation being used and is referred to as the key. The basic problem of cryptanalysis is to determine fi and t from a given x. Shannon has shown [4] that cryptanalysis can be successful only if the concealed text contains redundancy.

A fundamental approach to cryptanalysis is to consider the set of all possible inverses of the message x, {f / 1 (x) l - If x has a large enough number of characters (greater than what Shannon calls the unicity distance), there will be a most probable text t* in (/1- 1 (x) l - This most probable text t* will have the same statistical structure as the natural language. Thus an enciphered text can be analyzed by examining all possible inverses f/ 1(x) for the structure of the naturallanguage text.

Since the effectiveness of the recognition procedure depends on the redundancy of the natural language, the procedure is well suited for the above approach. In addition, the performance of the recognition procedure can be predicted directly from the theory.

Let us now apply the recognition procedure to an example of cryptanalysis. A cipher using 456,976 possible keys was used to en-

109 UNCLASSIFIED

DOCID: 4106291 U~KLASS:rn:.D TEXT RECOGNITION

cipher a typical segment of English text. This cipher, suggested by Shannon ([4], p. 709), produces enciphered texts with nearly uniform single-letter distributions. A computer was programmed to test all possible keys on the enciphered text, using the recognition procedure. The time per trial of a 72-character message was less than 100 seconds.

Table 3 compares the experimental results with those predicted by the theory. Since the maximum number of observations for the experiment, 72, was twice the number used in the theoretical analysis, the results for the experiment should be slightly better than those of the theory; and, indeed, except for the v sN, the experimental results are in excellent agreement with the theory. The difference in v sN is due to the differences in the maximum number of observations.

Theoretical (M = 36) Experimental (M = 72)

Probability of Correction Detection 0.754 0.8

Probability of False Alarm <0.0001 0.00008

'VsN 27. 70. 2

VN 3.3 3.5

Table 3.-Comparison of the theoretical and experimental results.

Wp/C = 100 WM/Wp = 500 A Priori Probability of SN = 0.000002

CONCLUSIONS

The speed of detection for a deferred-decision process has been related to the source redundancy when the N source has uniform distribution. This relation has been used to develop a recognition procedure for natural-language text. The theoretical performance of the recognition procedure has been supported by results obtained in the solution of a general problem of cryptanalysis.

2Discrepancy between theoretical and experimental results is accounted for by the increased M of the experimental work.

UNCLASSIFIED 110

DOCID: 4106291 C. V. KIMBALL UNCLASSIFIED

REFERENCES

Ill C. V. Kimball, A Recognition Procedure for Natural-Language Text with Application to Cryptography, Unpublished thesis, Department of Electrical Engineering, The University of Michigan, Ann Arbor, 1965.

!~] A. Wald, Sequential Analysis, John Wiley and Sons, Inc., New York, 1947. [3] T. G. Birdsall and R. A. Roberts, Theory of Signal Detectability: II: Decision

Processes, Cooley Electronics Laboratory Technical Report No. 136, The University of Michigan, Ann Arbor, 1964.

{4] C. E. Shannon, "Communications Theory of Secrecy Systems," Bell Systems Technical Jourrnal, Vol. 28, No. 4, October 1949, p. 656.

111 UNCLASSIFIED

Documents

A Text Recognition Procedure for Cryptanalysis - NSA.gov · PDF fileDID: 4106291 UNCLASSIFIED A Text Recognition Procedure for Cryptanalysis BY C. V. KIMBALL Unclassified Presented