D-DOG: Securing Sensitive Data in Distributed Storage ... ychen/ ¢  distributed storage

  • View
    0

  • Download
    0

Embed Size (px)

Text of D-DOG: Securing Sensitive Data in Distributed Storage ... ychen/ ¢  distributed storage

  • 1

    D-DOG: Securing Sensitive Data in Distributed Storage Space by Data Division and Out-of-order keystream Generation

    †Jun Feng, †Yu Chen*, ‡Wei-Shinn Ku, §Zhou Su †Dept. of Electrical & Computer Engineering, SUNY - Binghamton, Binghamton, NY 13902 ‡Dept. of Computer Science & Software Engineering, Auburn University, Auburn, AL 36849

    §Dept. of Computer Science, Waseda University, Ohkubo 3-4-1, Shinjyuku, Tokyo 169-8555, Japan {jfeng3, ychen }@binghamton.edu, weishinn@auburn.edu, zhousu@asagi.waseda.jp

    Abstract∗- Migrating from server-attached storage to distributed storage brings new vulnerabilities in creating a secure data storage and access facility. Particularly it is a challenge on top of insecure networks or unreliable storage service providers. For example, in applications such as cloud computing where data storage is transparent to the owner. It is even harder to protect the data stored in unreliable hosts. More robust security scheme is desired to prevent adversaries from obtaining sensitive information when the data is in their hands. Meanwhile, the performance gap between the execution speed of security software and the amount of data to be processed is ever widening. A common solution to close the performance gap is through hardware implementation. This paper proposes D-DOG (Data Division and Out-of-order keystream Generation), a novel encryption method to protect data in the distributed storage environments. Aside from verifying the correctness and effectiveness of the D-DOG scheme through theoretical analysis and experimental study, we also preliminarily evaluated its hardware implementation. Keywords: Data Security, Distributed Storage, Stream Cipher,

    Encryption/Decryption.

    I. INTRODUCTION Data storage has been recognized as one of the main

    dimensions of information technology. The prosperity of network based applications leads to the moving from server- attached storage to distributed storage. Along with variant advantages, the distributed storage also poses new challenges in creating a secure and reliable data storage and access facility over insecure or unreliable service providers. Aware of that data security is the kernel of information security, a plethora of efforts has been made in the area of distributed storage security [7], [15], [19].

    During past decades, most designs of distributed storage chose the form of either Storage Area Networks (SANs) or Network-Attached Storage (NAS) on the LAN level, such as a network of an enterprise, a campus, or an organization. Either in SANs or NAS, the distributed storage nodes are managed by the same authority. The system administrator has the access and control over each node, and essentially the security level of data is under control. The reliability of such systems is often achieved by redundancy, and the storage security is highly depending on the security of the

    * Manuscript submitted on Sept. 10, 2009 to the 2010 IEEE International Conference on Communications (ICC 2010), May 23 – 27, 2010, Cape Town, South Africa. Corresponding author: Yu Chen, Dept. of Electrical & Computer Eng., SUNY–Binghamton, Binghamton, NY 13902. E-mail: ychen@binghamton.edu, Tel.: (607) 777-6133, Fax: (607) 777-4464.

    system against the attacks/intrusion from outsiders. The confidentiality and integrity of data are mostly achieved using robust cryptograph schemes.

    However, such a security system is not robust enough to protect the data in distributed storage applications at the level of wide area networks. The recent progress of network technology enables global-scale collaboration over heterogeneous networks under different authorities. For instance, in the environment of peer-to-peer (P2P) file sharing or the distributed storage in cloud computing environment, it enables the concrete data storage to be even transparent to the user [19]. There is no approach to guarantee the data host nodes are under robust security protection. In addition, the activity of the medium owner is not controllable to the data owner. Theoretically speaking, an attacker can do whatever he/she wants to the data stored in a storage node once the node is compromised. Therefore, the confidentiality and the integrity would be violated when an adversary controlled a node or the node administrator becomes malicious.

    In the recent years, more and more scientific or enterprise applications have been developed based on the distributed data storage or distributed data computing techniques [9], [14], [15], [19], [20], [21]. Availability and performance are two of the most important metrics in these systems [24]. Data can be stored using encoding schemes such as short secret sharing, or encryption-with-replication. No matter which scheme is chosen, the cipher algorithm is either block cipher based or stream cipher based [8].

    The general block cipher AES is designed mainly for the software application and is not effective for the hardware acceleration. Meanwhile, the general stream cipher schemes developed recently in eSTEAM project [5] follow two different directions. One is for the software application that emphasized the executing speed of software implementation. The other is hardware oriented, which focuses on the implementation on passive RFID tags or low- cost devices. For instance, the hardware security level for the profile 2 cipher was 80 bits [5], [11]. Although it may be adequate for the lower-security applications where low-cost devices might be used, it is not enough for the distributed storage network security application.

    In this paper, we propose D-DOG (Data Division and Out-of-order keystream Generation), a high performance hardware implementation oriented stream cipher for distributed storage network. The D-DOG creates cipher blocks by dividing the plaintext data into multiple blocks and encrypting them, where the keystream is generated by

  • 2

    abstracting bits from the data blocks in a pseudorandom out- of-order manner.

    The D-DOG avoids one of the weaknesses existing in modern stream ciphers resulted from the fixed length initialization vector (IV). Treating the data block as a binary stream, D-DOG generates the keystream by extracting n bits from the plaintext in a pseudorandom manner. The length of the keystream n is flexible and can be set according to different specific security requirements. The variable length keystream makes brute force attacks much more difficult. And the pseudorandom bit abstracting makes decrypted data stream still unrecognizable unless the keystream bits are inserted back to the original position.

    The rest of the paper is organized as follows. Section 2 gives a brief overview of related work. Section 3 presents the principle of our D-DOG scheme and the detailed design is discussed in section 4. Section 5 illustrates the robustness of our design against some known attacks. Section 6 shows the simulation and experimental results. Section 7 summarizes this paper.

    II. RELATED WORK Securing sensitive and/or private data in distributed

    storage has been an important topic in security research community [6], [16], [20]. This section briefly overviews recent work in the modern stream cipher design area.

    Stream ciphers are widely used to protect sensitive data at fast speeds [2], [22]. Although block ciphers have been attracting more and more attention, stream ciphers still are very important, particularly for military applications and to the academic research community. Stream ciphers are more suitable in environments where tight resource constraints are applied, i.e. in wireless mobile devices [3], [22], or wireless sensor networks [6]. When there is a need to encrypt large amount of streaming data, a stream cipher is preferred [2].

    In recent years, a lot of efforts have been reported in this area and many interesting new stream ciphers have been proposed and analyzed. A popular trend in stream cipher design is to turn to block-wise stream ciphers like RC4, SNOW 2.0, and SCREAM [13]. In order to improve the time-data-memory tradeoff for stream cipher, a concept of Hellman’s time-memory tradeoff [3] has been applied and it achieved obvious improvements [10]. The Goldreich-Levin [9] one-way function hard-core bit construction has been enhanced into a more efficient pseudo-random number generator BMGL [12] with a proof of security.

    Efficient hardware implementations of stream ciphers are important in both high-performance and low-power applications [13]. This is the main trend of the stream cipher development in the future. Researchers have pointed out that RFID (Radio Frequency Identification) could be one of the next killer applications for hardware-oriented stream ciphers [22]. The second phase of the eSTREAM project in particular focused stream ciphers suited toward hardware implementation and currently there are eight families of hardware-oriented stream ciphers [5].

    Normally there are two input parameters to a stream cipher, the password and an initialization vector (IV). In contrast with the user password being kept secret, the IV is public. As a consequence, attacks against the IV setup of stream cipher have been very successful [25]. Due to the weakness with the IV setup, more than 25% of the stream ciphers submitted to the eSTREAM project in May 2005 have been broken [1]. Some robust academic designs were broken also due to problems with the IV setup [25].

    In this paper, we will investigate an alternative design approach for the self-encryption stream cipher scheme to avoid the sh