An Exploration Into Basic Steganography Over TCP IP

1

Abstract—Steganography describes any number of methods

for hiding information in such a way that it is available to anyone but only readable by those who are meant to receive it. This paper discusses the basics of steganography and describes my project to implement a steganographic file sharing system in Java.

Index Terms—Networks, Security, Steganography

I. INTRODUCTION

TEGANOGRAPHY is the art and science of concealing communication. The word “steganography” roughly

translates to “secret writing” in Greek and has been used in various forms for over 2000 years. Steganography usually involves augmenting an existing method of communication to include a hidden layer of information that is only apparent to those who are meant to know about it.

The ancient Greeks were some of the first people to use steganography. There are records of a technique in which they would melt the wax off of their wax tables to carve a message in the wood underneath. After that, they would reapply new wax to give the appearance of a fresh tablet that could be transported without concern for the their privacy. Thus, they could have private conversations using a modified version of an existing communications medium.

Hosts on the Underground Railroad would knit quilts with symbols representing simple instructions and guidelines to aid the runaway slaves. They would then hang them in their windows to let the slaves know about dangers ahead, safe houses, and hidden paths. This is a good example of steganography applied to artwork.

Former British Prime Minister Margret Thatcher had the word processors of her cabinet members modified to use unique schemes for word spacing for each minister. This way if a document was leaked to the press, the person responsible could be identified by examining the leaked document. This is an example of textual steganography, which allows for the encoding of information within text. This can be accomplished by creating patterns in the spacing of lines and/or words, the syntax, and the semantics of the grammar. These techniques are often used to subvert spam detection filters by encapsulating messages in useless, seemingly

C. D. Gore is with the Department of Computer Science, Western

Michigan University, Kalamazoo, MI 49008 USA (phone: (734) 272-3099; e-mail: [email protected]).

harmless text. In today’s world, the Internet is the central communications

medium. With the large amounts of data flowing across the Internet using the many protocols, applications, and formats, there is clearly an opportunity to hide quite a bit of information. Computers are useful for processing large amounts of data, so it’s not hard to see that the combination of information processing and instant data delivery would be the next logical step for steganography.

II. BASIC USES OF STEGANOGRAPHY There are many various uses of steganography on the

Internet. The most commonly discussed use is hiding files within image files by scattering the file’s bits around in the color attributes of the image. The resulting changes to the aesthetics of the images are so small that no outside observers should be able to determine that there is anything hidden in the file.

There are other ways to hide information of course. One method, for instance, is to hide bits of a file in the optional fields of the TCP header. This a lower level approach and removes the importance of the concealing file altogether. There was a new method discussed at Defcon 15 in which a voice channel can be concealed within another voice channel over VOIP by inserting the bits into the channel in real time.

With any steganographic method, the sending agent hides the pertinent information in the communication medium, and because they are in some sort of information sharing relationship, the receiving agent will know how to extract the information. In a steganographic system such as the one written for this project, both the sender and receiver must have the same algorithm for the distribution of the hidden information. There are really only two ways for this to happen. They can either be hard-coded with the algorithm, or one of the agents can generate an algorithm and send the necessary information to the other agent so that it can generate the same algorithm.

There are advantages and disadvantages to both approaches. A static algorithm eliminates the need to send any metadata between the agents that could aid attempts to crack the formula. The downside is that the patterns will remain generally the same and would therefore be worthless if even one of the patterns was found. For the implementation chosen for this project, the client sends the server an integer and a floating-point number (double) that both sides use to generate

An Exploration Into Basic Steganography Over TCP/IP

Christopher Daniel Gore, Member, IEEE

S

2

a formula for distributing the data. This works well for what I did because anyone sniffing the information would have to figure out the meaning of the two numbers before the algorithm could be cracked.

III. MY IMPLEMENTATION The implementation for this project was written in Java

using the Eclipse IDE, and it operates as a simple client-server model. The server is multi-threaded and should be able to handle multiple clients at a time. The objective of the software is to transfer two files from the client to the server using one as a surrogate. The second file is hidden within the surrogate file and as such, must be much smaller. The transfer appears to only include the larger file to anyone observing the transfer.

When the server starts up, it creates a server socket on port 4444; then, it loops indefinitely, accepting new client sockets on the server socket and then creating new threads for each new client socket. This should let it accept multiple clients simultaneously. Each of the server’s threads handles the bulk of the server’s duties, which include accepting the initial parameters from the client (more on that later), creating new files to store the data from the client, and writing the appropriate data from the client to the appropriate files. The threads run until completion and are cleaned up by the server when it exits.

The client’s operations are relatively simple as well. An instance of the client will simply connect to the server, send the initial parameters, then open the two given files and write them to a DataOutputStream that sends the data to the server. A large chunk of the larger file is written first to make sure that the header at the beginning of the file is kept in tact. A chunk of four bytes (represented as an integer) taken from the smaller file is written only once a given number of integers from the larger file have been written. This number is called offset in the program, and it is generated each time a chunk of the smaller file is written.

A function is created at the beginning of the process from one of the initial parameters called the multiplier. This function is sampled at values determined by a variable that I call counter to give the offset value. The value of counter is incremented by a value given by the variable sampler increment each time a piece of the smaller file is written. The user decides a value for sampler increment and passes it to the client when he/she runs it, and it is sent to the server as one of the two initial parameters.

The offset function is given by

€

offset = multiplier /2 + multiplier* | sin(counter) | (1)

which is represented in a graph shown in figure 1. The graph is an example of “(1)” using a multiplier value of 4. This example demonstrates that a higher multiplier value results in the pieces of the smaller file being spread farther apart. Also, it should be apparent that moving along the x-axis in smaller increments generates offset values that more closely follow the

sine wave. Thus, a smaller sampler increment value produces a more sinusoidal distribution, while a larger sampler increment value produces a more random distribution. The addition of one half of the multiplier insures that the offset is never zero. It would be interesting to try different magnitudes of multipliers to optimize the offset here.

The server uses this same formula to separate the smaller file from the bigger file when reading data from the incoming stream. One obvious problem is that it is possible to set the multiplier too high and cause the bigger file to finish before the smaller file has been completely sent.

IV. DEMONSTRATION To demonstrate this software, there is an instance of the

server running on a Linux virtual machine, and a client is run on OSX. Since the project is written in Java, it should work on virtually any platform. The server is run with the command: java SimStegServer /home/cdgore/Desktop/SimStegFiles/ This starts the server and has it save the files in the given directory. The client is run with the command: java SimStegClient 172.16.137.128 475 0.785 /Users/cdgore/Documents/School/CS5550/Project/tokyo.bmp /Users/cdgore/Documents/School/CS5550/Project/secret.jpg This has it connect to the server, which has the IP address 172.16.137.128 because it is running in VMWare, and gives it a multiplier value of 475 and a sampler increment value of

Figure 1

3

0.785 which is approximately π/4, and it tells it where to find the two files to send. The first file is the larger file, and the second is the smaller file. A bitmap file was chosen for this demonstration because bitmaps are uncompressed data with red, green, and blue values for each pixel, so it doesn’t really matter if a few of them are corrupted with the data from the second file. The larger file is pictured in figure 2, and the smaller file is pictured in figure 3.

Figure 2

Figure 3

Figure 4

When they are transferred to the server, they are combined into something that looks like figure 2 except with some pixels corrupted. The smaller file is reassembled from the data stream without any problems and looks exactly like it did on the client side in figure 3. The demonstration program breaks the smaller file up into chunks of 4 bytes represented as integers, which causes this effect. If this project were to be continued, it would be much better if the second file were broken up into individual bits, which could make the differences between the original file and the resulting file unnoticeable. Figure 4 shows a 500% zoomed close up picture of the resulting file to show the corruption. Zoomed out, the picture appears to be identical to figure 2. Another problem with this implementation is that each integer is sent in a separate transmission, which causes a lot of unnecessary overhead and congestion problems. (See the Wireshark scan in figure 5.) Also, it would seem strange to anyone sniffing the data to see a separate transmission for each piece of the file, and the whole idea behind hiding data like this is that the parties involved do not draw suspicion. This could be achieved by buffering all outbound traffic on the client’s side so that data is sent in larger chunks. This is however an experimental implementation and not meant for practical applications.

Figure 5

V. CONCLUSION This project successfully explored steganography and

implemented a basic stenographic solution. If this project is extended in the ways mentioned, it could potentially be useful to people wanting to communicate discretely.

REFERENCES [1] J. C. Judge, Steganography: Past, Present, Future, GSEC Version 1.2f. [2] N. Provos and P. Honeyman, “Hide and Seek: An Introduction to

Steganography,” IEEE Security & Privacy. [3] D. Artz, “Digital Steganography: Hiding Data within Data,” IEEE

Internet Computing, May-June 2001. [4] D. D. Trammell, “Real-time Steganography with RTP,” Proceedings of

Defcon 15.