11
www.altencalsoftlabs.com Innovate, Integrate, Transform NETWORKED FILE SYSTEM SWITCHING/ROUTING & SECURE NET-NFS

Networked File System Switching / Routing & Secure Net-NFS · Traditionally storage has always been within the confines of a data center, be it DAS, NAS or SAN. But during the last

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

www.altencalsoftlabs.com

Innovate, Integrate, Transform

NETWORKED FILE SYSTEM

SWITCHING/ROUTING & SECURE NET-NFS

Network file systems have been around over decades. But so far there has not been any real NFS experience that is truly scalable,

secure, fault tolerant and highly available, without using costly hardware solutions/purpose built operating systems. No real

innovation happened in the area as it relates to the networking side of the NFS file data packets. The advantage of Net-NFS

comes mainly from the network level processing/routing of the NFS RPC, as opposed to server level processing. Our architecture

decouples physical NFS servers from the clients, as they talk to a single, Virtual NFS server that presents a NFS server interface to

NFS client access. We demonstrate that Net-NFS is simple to install that needs no modification to clients or servers and can

provide the basic framework for FAN infrastructure.

ABSTRACT

We do name space aggregation and load-balancing, and implement file virtualization. With the security concerns and compliance

requirements taking center stage, our security framework becomes invaluable. One of the distinguishing features of our secure

NFS router is that, security is added to it by design, rather than as an add-on module. We support security at different layers, at

different granularity, at different levels, to different classes of data, differently, driven by business policies that any CIO wants to

control, even while he is away from the data center, on a vacation. This white paper describes the work we have done in NFS

switching and examines the areas of NFS that we re-designed.

The key areas we focused are:

» NFS and scalability

» NFS and virtualization

» NFS intelligence on enterprise data networks

» NFS and storage security issues

INTRODUCTION

NFS is now ubiquitous at SMEs, University campuses and small corporations. It was originally invented by SUN Micro Systems,

which was later submitted to IETF in 1998. NFS provided many features, like heterogeneous, network file access.

It was built on many support protocols that include RPC, XDR and runs on TCP or UDP. Protocols like MOUNT, NFS, NLM, and

STATUSMONITOR were added to make the network file access possible. It served well on LAN environments confined to small

campuses. But it lacked true scalability, security and distributed cache coherence. It went through major revisions through V3, V4

and now pNFS, as well as minor version paths.

ALTEN Calsoft Labs’ storage research team has taken a revolutionary approach to make NFS a file system that is truly networked,

providing network level file data virtualization, load balancing, and many other important features that are very critical for agile

enterprises that need data protection, security and eDiscovery. With Net-NFS, there is no single server bottleneck, there are no

storage silos, all servers are virtualized under a single virtual NFS server, simplifying storage management, increasing data

availability, scalability and performance as data requests are now load balanced.

Security and NFS are not the two words that you often encounter in one sentence. NFS has always been a stumbling block for

many who need fine granular control on their corporate business data, based on its business value, during the full life cycle. There

are many solutions that provide access control or authentication services or network level security (data in motion). However if

the file system cannot encrypt the data, someone can access it by breaking the authentication. Or, one can simply steal the NFS

servers, or underlying data banks, and the data is available to anyone who mounts the volume.

Traditionally storage has always been within the confines of a data center, be it DAS, NAS or SAN. But during the last few years,

data is stored in the internet, be it WAN scale file systems, distributed data centers, Cloud Storage, SSPs or remote vaults. Data is

not confined to single physical territory. This has increased the need for the security of corporate data further, ever growing

regulatory compliance requirements demand additional information security. Here are some of the global regulations that drive

the storage security adoption.

» California SB1386, Ab1950

» New York State Information Security Breach And Notification Act

» EU – Data Privacy Directive

» Japan – PIPAct

» USA – Sarbanes Oxley

» USA – Graham, Leach, Bliley Act (GLBA)

» USA & Canada – Health Insurance Portability and Accounting

» Act (HIPAA)

NET-NFS - PRODUCT ARCHITECTURE

Core of the product is the RPC data routing/forwarding intelligence that truly de-couples storage from the servers. RPC requests

are processed as part of network intelligence, without any modification to clients or NFS servers. Since core networking

technologies are applied to NFS file data traffic across networks, the system has been named Net-NFS (Networked-NFS). Unlike

NFS which is just a protocol that provide a remote file access, Net-NFS is a file system switching module having the file system

functionalities, at the edge/network elements. Key design goals of Net-NFS are,

» Seamless adoption in existing, virtual data center infrastructure without needing any modifications to NAS clients or servers

» Easy expandability when more and more requirements are supported, with a modular design. Data protection,

on-line verification, authentication, cryptographic intelligence, and all forms of data security

» Scalability, reliability, high-availability at the expense of some performance

» Using commodity hardware, with a self-healing, self-learning, highly adaptive software centric solution.

» Storage aggregation of multiple, heterogeneous NAS boxes

» Adaptive load-balancing and pre-configured load-sharing based on the file location

» Application aware storage intelligence at the network proxy level

» Application traffic monitoring, analysis and snooping

» Prevention of NFS DoS attacks

» Policy based file serving, backup, replication and redundancy elimination

» Ability to work as network cache of recently accessed files, eliminating needless traffic to real servers

The basic objective of storage security is to protect long-life data and meta-data for persistent storage in multi-level and multi-

user environments. The possible threats to storage systems based on the CIA (Confidentiality, Integrity and Authentication)

model are,

DATA SECURITY AND DATA VERIFICATION

Need for Security

Confidentiality

1. Unauthorized access to the data

2. File system profiling

Integrity

1. Storage jamming

2. Modification of meta-data

3. Subversion attacks

Authentication

1. Storage User Masquerading

2. Storage Device Masquerading

Physical threats

1. Theft of storage media.

2. Theft of proprietary information

3. Theft of intellectual property

The following table shows the potential of security attacks on enterprise data.

Source: Survey conducted by CSO magazine with U.S Secret Service, Carnegie Mellon University Software Engineering Institute's

CERT program and Microsoft Corp., in 2007

Attacks from insiders are crucial while considering security of storage data. With Net-NFS, we are looking forward to provide the

confidentiality of data at rest and authentication of clients to Net-NFS and access control to the stored data. Confidentiality of

data provided with encryption of files at the server and authentication of clients to Net-NFS is provided with DH-CHAP.

NFS relies on the underlying security model of RPC for its security services. However, no model other than the weakly

authenticated UNIX permission scheme was ever widely adopted, limiting the use of NFS in hostile networks.

NET-NFS SECURITY MODEL

NFS and Storage Security

Currently NFS is providing authentication using different flavors. And future work on security of NFS (for example, by using

RPCSEC_GSS) is directed towards providing privacy, integrity of data on the communication link. Our idea to provide storage

security with NFS comes from the fact that the recent increase in attacks are on stored data (for example, attacks from insiders

are increased).

The product has security built in by design, rather than as an afterthought. Storage security, data integrity, on-line data

verification methods are fundamental part of the design. To minimize performance penalty, adaptive encryption is done based on

the data priority and class. We consider various forms of virus corruption, stolen servers, disks, intrusion, and have a multi- cage

level security. Depending on the configuration, 3 DES, AES (Adhering to government specifications like DoD 5220, FIPS 140)

distributed key management, and Kerberos APIs are used. Security system is completely transparent. We use extensive

journaling, snapshots and auditing allowing an administrator to play back and examine. Other features include extensive

reporting, usage pattern learning intelligence, security based on the order of processing, and notify-on-access policy.

Net-NFS is in between the user access point and the underlying storage, and can be considered as the boundary of file level and

file system level. This allows the system to apply application specific storage intelligence to the data, at the same time providing

sufficient lower level control.

ENCRYPTION AND DECRYPTION IN THE NET-NFS

Confidentiality of data is provided with file encryption. AES, with key sizes can be of 128,192 and 256 bits and data block size of

128 bits. AES was announced by (NIST) as U.S. FIPS PUB 197 (FIPS 197) after a 5-year standardization process. And then AES is

adopted as encryption standard by U.S. Government. AES is a substitution and permutation network.

Apart from this most protected solution, our solution has the flexibility to apply it based on different metrics, which differentiates

itself from hardware based solutions.

DATA VERIFICATION

Data verification brings many benefits to a CIO. It makes sure that data is correct, and is intact. There are many forms of secure

data verification. We calculate a secure hash (SHA1) of each block of data that needs verification. And we maintain our own

records. When a request arrives that indicates data verification to our storage switch, we simply read the file, re-compute the

hash, and match for verification. This is done synchronously some times, to avoid performance hit. E-mail alerts to the system

administrator are generated automatically. Another unique feature we have is, if there is a backup copy, the file blocks are

replaced with the good ones, and data integrity is restored, completely transparent to the administrator.

POLICY BASED SECURITY

Policy based security allows differential treatment of data based on business needs. Each access, user, file, application or

directory can be treated differently. This is achieved through our policy based, intelligent security architecture.

Name space aggregation and network-level file virtualization

We provide a pseudo, virtual global file system name space to clients with the root directory named NFS. Individual servers

(replication group for high-availability) own name-space partitions for load-balancing. Inter-partition file Meta data operations

(like link, move) are not supported at this point. Under root directory, we have 32 directories (which we can alter), with names c1,

c2, and ...c32.

Each file access client will be allowed to mount only single directory partition, at present. Each such directory is served by one

replication group. Each replication group is a cluster of multiple servers, forming load-balancing nodes. Different replication group

forms load-sharing entities. Currently we don't support any file system operations that span across multiple directory partitions.

So, Net-NFS is not a true Distributed file system by itself.

As we have described above, load-balancing and load-sharing are different problems. The first one is adaptive, and the other one

is non-adaptive. We have implemented health-checks, traffic monitoring, and storage capacity monitoring to produce metrics on

which our load-balancing algorithms are based. It is also possible for the administrator to configure policies, and influence the LB

decisions.

We support the concept of hot files. The Application Aware Learner module (work in progress) senses adaptively what traffic

flows to what files, or learns application affinity, and then caches the contents within the load-balancer itself and serves it from

the local cache. At the same time the Meta data on the target NFS server is updated, for data consistency.

As seen in the following picture, Net-NFS intercepts all traffic from NFS clients, and de-couple the physical location/storage of all

real servers, by providing an aggregate storage of all servers, as a large virtual NAS system, through efficient network -level,

storage virtualization.

Pluggable application functionality insertion

This is single-most distinguishing feature of Net-NFS. Any application functionality can be added into the Net-NFS as it offers a

Network level filter interface. Any File level feature (Like policy based backup, replication, migration, De-duplication, compression,

encryption, logging can be added to the NFS file serving environment.

This way, Net-NFS is highly extensible and can absorb any changes as the user requirements grow.

Replicated, concurrent update and striped read

Each replication group is meant for load-balancing and high availability. When we implement read-striping, it also improves

performance. Write striping cannot be implemented without losing transparency.

All RPC requests are multi-cast to replication group for all update operations. We wait for the responses only from any one of the

servers. Our recovery thread takes care of server de-commissioning, data-resynchronization, and journal update tasks. To

increase performance, we divide the read operations to multiple servers, and then splice back the read response, before sending

to the client. This increases performance due to the disk-latency over-heads, as multiple streams are read in parallel. All update

operations are replicated to each replication group, using IP Multi-cast. We remember the reply from all servers in a WAL (write

ahead logging) journal. But we reply to the client, as soon as we get response from any server. Software has enough intelligence,

to take care of replica synchronization, de-commissioning of the failed servers, re-synchronization etc. When we get a non-update

RPC packet, (UPDATE RPC PROCEDURES) multi-cast-replication will be applied.

There are some more advanced features under development. We have to take care of node failures, mirror-resynchronization

intelligence at the LB, while we keep all the NFS state information. We don’t need to wait for all the RPC responses from all the

replicas, to respond to the client, we can return to application client after we get response from the primary. Replica replies can be

done asynchronously. However, all RPC messages have to be journaled, and applied in case of a node failure. We have to maintain

this log of RPC messages in the WAL.

(Write ahead log). This is a complex subsystem. We will have this feature available in a future release.

Server health checks & adaptive load balancing

An adaptive load balancer looks at various metrics to route the RPC requests. The most obvious metric is its status of being up or

down. Other metrics include RPC hit-counters, server processing load, machine configurations etc. Of these the most dynamic

metrics are rpc hits and uptime status. We send regular IP health checks, and monitor rpc hits to update the metrics.

Un-Modified clients/servers

Net-NFS needs no client or server modification. Easy installation and reduced administration costs were our design goals.

I/O request take over

Net-NFS has the ability to take over a failed RPC request from a failed server. When an RPC request fails, Net-NFS feel that before

the client, and the same request is re-directed to another server, with necessary RPC re-writes. If there is a server failure in

between requests, then the health checks mechanisms notice that, and mark the server as de-commissioned. In a traditional NFS

deployment, NFS client will keep on re-transmitting the RPC requests up to a configurable number of times, and then return an

error status to the client. We provide high-availability through our IP request failure protection mechanism.

Virtualization aware Network management

Managing Net-NFS offers its own challenges. Unless sufficient automation is provided, the utility of the system will go down as

the size and hence complexity increases. Similarly, unless sufficient hooks and interfaces are provided, administrators will not be

able to exploit the full capabilities of the system.

Net-NFS comes with a powerful management layer. DMTF's WBEM architecture is used to provide a standard based interface. The

application comes with a CIM Provider that can work with standard Compliant CIMOMs. The provider implements the SMI-S profiles

defined by SNIA. Specifically, it supports Service Location Protocol (SLPv2) for discovery of services. Selected file-level and block-

level profiles are supported. Policy package is supported to provide differentiated QoS based on organizational policies.

This powerful interface allows the administrator to

» Configure replication groups

» Set policies

» Monitor fault and performance characteristics

» The key Storage MIBs we want to define are yet to be decided

NET-NFS DEPLOYMENT SCENARIO

An appliance solution that interposes the NAS requests along the network path from the NAS clients to the NAS servers; It has

purpose-built 2.6 Linux kernel, with NFS router running; It is offered in stand-alone and clustered versions(for high -availability

customers) ; It also remembers the hot files, caches recent NFS transactions, and also works like a content proxy that terminates

the NFS request ; Underlying target plat form is Intel's Storage processors (or can be any server-class machine); This can be used

by any data center that has already a NAS deployment

This is specifically for application switching/server load balancer vendors like Nortel Networks, F5; Our solution is a stand-alone

module which an application switching vendor can add-on with minimal effort, to support NFS load balancing capability at an L7 ,

content intelligent switching level

As part of the LVS: Linux virtual server is very successful in many data centers. We are currently experimenting with the

performance, applicability in an LVS environment.

Net-NFS & rServerPool (Reliable server Pool) architecturerServerPool is an IETF approved architecture, specifically targeted to

provide high availability, fault-tolerance, and load-balancing, for SS7 signaling nodes. The default transport protocol selected is

SCTP, for its message orientation, end-point mobility support, multi-homing, address re-configuration, and added security against

flood attacks, over TCP. When Net-NFS is deployed on such a platform, some of the health-checks, server selection algorithms

may be redundant. All other features of Net-NFS are still required. rServerPool is not yet deployed in the data center, but will be

here soon.

IMPLEMENTATION

Linux User-space code details

User space implementation code contains 7 threads (Posix)

» Rx thread from client

» Rx from servers

» Tx thread to client

» Tx thread to servers

» Load-balancer thread

» Mount thread

Rx thread client takes care of front-ending the clients in the data path, while rx from_server thread does the same with servers.

Similar functions are assigned to Tx to_client, and Tx_to_server threads, on end traffic. Mount thread listens on mount port for

mount requests from clients. Load-balancer thread, as the name indicates, implements all load balancing algorithms.

Kernel space

We are currently implementing this on kernel space. Kernel version runs in dedicated hardware like IXPxxx series, and works like a

real IP router that has NFS intelligence at the routing layer.

Theory of operation

Front end of the LB, is a thread that listens on the socket (NFS port 2049), acting as a virtual NFS server. It gets the RPC request

packets, tags it with an arrival sequence number and enters it to the intent log. (Later, this intent log will be in the NVRAM of 32

MB, (around 64 k concurrent entries). Once it is logged, the thread returns, and processes the next packet in a loop.

LogReader thread then kicks in, and extracts the file handle/Xid of the RPC packet and sees if any processing is required.

Otherwise, does a lookup on the list of servers, finds the appropriate target server, and re-writes the RPC packet with its physical

file handle/destination server id and moves that to a R2T (ready-to-transmit) queue. Tx thread then kicks in, and dispatches it to

the real server. If this is a read operation, and contains a large request, read-stripping is implemented by entering different sets of

Real server IP addresses, read offsets on the entry, and then specially marking as a RPR packet(Return processing required). Later

when the Rx thread encounters the RPR packet, it reads the entries, and collates the returned read data from all servers and sends

it to the client (on a slow-path mode). Otherwise, read operation is sent to any server, and reply is just sent back to the original

client. Write operation is replicated on all severs on a replication group. (Multiple replication groups aggregate the exported virtual

name-space.). Intent log entry is not removed until the LB goes through the full replica synchronization phase. There is a

comprehensive policy data base, configuration table, (that a system administrator configures through a CLI command interface).

Whenever RPC requests are responded by the switch itself, the device makes sure that it is a non-idempotent operation.

Implementation issues encountered

The biggest problem we faced is the fact that NFS protocol is not network friendly, as all NFS procedures have different RPC

contents, formats, data offsets, unlike fixed format TCP headers (ignoring options) and IP headers. That made the header

processing little problematic as we had to identify the type of the operation by looking at various things and then do different

processing. Other issue we had to solve was the state maintenance in the load balancer, and file handle mapping, and various

failure scenarios in case of NFS async write.

Net-NFS is not designed for performance, but for scalability, availability and storage virtualization. Still,the performance numbers

are in fact better than NFS in most situations. Even better performance is expected from kernel space version as it avoids two

trips between kernel and user spaces.

MANCE OF NET-NFS AND NFS

Peripheral BUS enhancements , NIC teaming and multi-path I/O, TOE capable NICs, NFSoRDMA are all examples of the network

level , expensive technology innovation that give a real performance boost to NFS. Clustered NFS requires modifications NFS

servers. Parallel NFS (pNFS) is not yet ready but well suitable for SAN. NFSv4 still has scalability and security issues. Many NAS

vendors provide purpose built operating system based File servers, optimized for file serving (especially for high RAID

performance).

Other innovations around NFS

ALTEN Calsoft Labs is a next gen digital transformation, enterprise IT and product engineering services provider. The

company enables clients innovate, integrate, and transform their business by leveraging disruptive technologies like

mobility, big data, analytics, cloud, IoT and software-defined networking (SDN/NFV). ALTEN Calsoft Labs provides

concept to market offerings for industry verticals like education, healthcare, networking & telecom, hi- tech, ISV and

retail. Headquartered in Bangalore, India, the company has offices in US, Europe and Singapore. ALTEN Calsoft Labs is

a part of ALTEN group, a leader in technology consulting and engineering services.

ABOUT ALTEN CALSOFT LABS

[email protected]

www.altencalsoftlabs.com

© ALTEN Calsoft Labs. All rights Reserved.

Net-NFS is a storage aware, intelligent RPC packet router. It applies core networking operations (packet lookup, classification,

transformation, routing, filtering, and logging) into RPC packet flow transparent to clients or servers. We leverage the power of

networking features to bring true virtualization, data protection, NFS acceleration, high availability, load balancing, mirroring,

intelligent RPC service, and any user-pluggable, application specific modules at the expense of little processing overheads(at our

intercepting device).

CONCLUSION

REFERENCES

» NFS version 4 design considerations; RFC, 2624

» NFS Version 2 and Version 3 Security Issues and the NFS Protocol’s use of RPCSEC_GSS and Kerberos V5, RFC 2623

» Network file system version 4 protocol, RFC 3530

» Remote procedure call protocol specification version 2, RFC 1831

» NFS version 3 protocol specification, RFC 1813

» XDR: External Data representation standard , RFC 4506