Multiplexing in Thrift: Enhancing thrift to meet Enterprise expectations- Impetus White Paper

  • Upload
    impetus

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

  • 7/23/2019 Multiplexing in Thrift: Enhancing thrift to meet Enterprise expectations- Impetus White Paper

    1/14

    Multiplexing in Thrift: Enhancingthrift to meet enterprise expectations

    W H I T E P A P E

    Abstract

    Thrift [1] is an open source library that expedites

    development and implementation of efficient and

    scalable back-end services. Its lightweight framework

    and support for cross language communication makes

    it more robust and efficient than other RPCframeworks like SOA (REST/SOAP) for many

    operations. However, Thrifts capabilities are

    challenged by emerging enterprise solutions like Big

    Data that impose high maintainability and

    administrative overheads on an enterprise hosting

    multiple services over the network, due to its

    limitation of hosting one service per port.

    This paper addresses the challenge and details the

    approach that Impetus has devised, to enhance the

    caliber of Thrift and enable it to meet enterpriseexpectations.

    Impetus Technologies, Inc.

    www.impetus.com

    March - 2012

  • 7/23/2019 Multiplexing in Thrift: Enhancing thrift to meet Enterprise expectations- Impetus White Paper

    2/14

    Multiplexing in Thrift: Enhancing thrift to meet enterprise expectations

    2

    Table of Contents

    Introduction .............................................................................................. 2

    Whats so special about Thrift? ................................................................ 3

    Thrift is powerful, yet lacks the prowess .................................................. 4

    Adding charm to the glorious API through multiplexing .......................... 5

    The approach ............................................................................... 5

    Components ................................................................................. 5

    How to use thrift multiplexing .................................................................. 9

    Creating a multiplexing server with a lookup registry ................. 9

    Making a wise investment lucrative ....................................................... 13

    Summary ................................................................................................. 14

    Introduction

    Thrift is a very lightweight framework for developing and accessing remote

    services that are highly reliable, scalable and efficient in communicating across

    languages.

    Thrift API is extensively used for creating services like search, logging, mobile,

    ads, and the developer platform across various enterprises. The services of

    various Big Data open source initiatives like HBase [6], Hive [7] and Cassandra

    [8] are hosted on Thrift. Its simplicity, versioning support, development

    efficiency, and scalability make it a strong contender in the SOA market, helping

    it to compete successfully against more established integration approaches and

    products.

    Thrift has the capability of supporting a large number of functions,

    communicating across languages for each service. This capability can be furtherenhanced by extending Thrift support to host multiple services on each server.

    In this white paper, we look at how the capabilities of Thrift can be enhanced

    to make optimum use of enterprise resources. We have also presented a

    framework that can enable the creation of server hosting multiple services,

    registration of service(s) and lookup of service(s), based on standard context.

  • 7/23/2019 Multiplexing in Thrift: Enhancing thrift to meet Enterprise expectations- Impetus White Paper

    3/14

    Multiplexing in Thrift: Enhancing thrift to meet enterprise expectations

    3

    Whats so special about Thrift?

    There are various flavors of RPC implementations available in the open source

    arena, including Thrift, Avro [2], MessagePack [3], Protocol Buffers [4], BSON

    [5], etc. Each of RPC implementation libraries has its own pros and cons. Ideally

    we should select the RPC library according to specific enterprise solution

    requirements of the project.

    Some of the features that any RPC implementation aspires for are:

    1. Cross Platform communication2. Multiple Programming Languages3. Support for Fast protocols (local, binary, zipped, etc.)4. Support for Multiple transports5. Flexible Server (configuration for non-blocking, multithreading, etc.)6. Standard server and client implementations7. Compatibility with other RPC libraries8. Support for different data types and containers9. Support for Asynchronous communication10.Support for dynamic typing (no schema compilation)11.Fast serialization

    Compared to other RPC implementations, Thrift, Avro and MessagePack are thetop contenders, serving most of the above listed requirements.

    In an Avro implementation, out-of-band schema can become overkill for

    infrequent conversations between a server and client. MessagePack,

    meanwhile, is weaker than Thrift on account of a paucity of data type

    containers, being inherently JSON-based and no type checking with schema.

    On the other hand, support for various protocols and transports, configurable

    servers, simple standardized IDL, and battle -tested integration with Big Data

    NoSQL data stores like Cassandra make Thrift a powerful contender and

    preferred RPC implementation in enterprise solutions.

  • 7/23/2019 Multiplexing in Thrift: Enhancing thrift to meet Enterprise expectations- Impetus White Paper

    4/14

    Multiplexing in Thrift: Enhancing thrift to meet enterprise expectations

    4

    Thrift is powerful, yet lacks the prowess

    Despite being a powerful and efficient cross language communication tool,

    Thrifts services are challenged by high administrative and maintenance

    overheads. The fact remains that every Thrift server is capable of exposing only

    a single service at a time. In order to host multiple functions, Thrift provides

    organizations with the following two options

    1) Write a monolithic, unwieldy implementation and host it as singleservice

    2) Host multiple small services across a series of ports

    fig1.1 : Option 1- Write a monolithic, unwieldy implementation and host it as single service

    If an enterprise opts to follow the first option (ref fig 1.1) then, monolithic and

    unwieldy implementation elevates the development cost of the solution. Sincethe complexity of the solution keeps on growing with the addition of every new

    service. Return on Investment (ROI) is adversely affected by high maintenance

    overheads.

    fig1.2 : Option 2 - Host multiple small services across a series of ports

  • 7/23/2019 Multiplexing in Thrift: Enhancing thrift to meet Enterprise expectations- Impetus White Paper

    5/14

    Multiplexing in Thrift: Enhancing thrift to meet enterprise expectations

    5

    If an enterprise opts for the second option, the number of ports consumed for

    hosting multiple services will be high. Since ports are a limited enterprise

    resource, that needs to be used judiciously, this poses a serious concern. This

    option will therefore be challenged by high administrative and maintenance

    overheads. Also, to prevent overheads related to connection setups on each

    call, clients have to maintain too many connections (at least one to each port).With the addition of every new service, a new port has to be opened on the

    firewall. The advantage of Thrifts flexible design for the solution is thus

    challenged by high administrative overheads.

    Adding charm to the glorious API through multiplexing

    The need of the hour is to realize and harness the potential of the Thrift API, by

    overcoming its limitation of hosting a single service on each server. The solution

    presented through this White Paper is an attempt to create a framework that

    can enable Java developers to create and host multiple services on each server.

    This solution also presents a lookup framework that any Java client/server can

    use for quick and easy lookup of services that is hosted on each server and a

    way to access the same.

    The approach

    The baseline approach is to assign a symbolic name to each service which is

    referred to as 'service context' in this Paper. This will help us in hosting multiple

    services on each server where each service can be recognized by its respective

    service context. A client using lookup service should be able to fetch the

    appropriate service context and use the same for directing the service call to the

    respective servant.

    Components

    The solution has extended the Thrift API[version 0.9.0] to introduce some of the

    new components (highlighted with red boundaries in fig1.3) mentioned below:

    MultiplexerMultiplexer is the processor that is at the heart of this solution. This

    component acts as a server side request broker and is responsible for

    identifying the service that the client has requested for, based on the

    service context propagated by the client. This component maintains a

    mapping between the service context and the service. While processing

    any request, it reads the service context from the underlying protocol

    and based on the mapping, directs the request to the appropriate

    service.

  • 7/23/2019 Multiplexing in Thrift: Enhancing thrift to meet Enterprise expectations- Impetus White Paper

    6/14

    Multiplexing in Thrift: Enhancing thrift to meet enterprise expectations

    6

    fig1.3 : Thrift Multiplexing

    ProtocolIn our approach, we have made our solution transport and protocol

    agnostic. We have created a wrapper around the underlying protocol

    (any Protocol instance) that is capable of embedding service context to

    the message on the client side and fetching the same on the server side.

    Thus, we have added a new class TMultiplexProtocol as a wrapper

    around the existing TProtocol that overrides the behavior of

    writeMessageBegin (TMessage) and readMessageBegin() methods. Any

    client that has to communicate with TMultiplexer needs to wrap the

    underlying protocol using the TMultiplexProtocol instance.

    Registry and LookupIn order to reduce the overheads associated with managing the service

    context manually, we have created a registry component along with this

    solution that is responsible for managing information pertaining to all

    services hosted on a particular server. This component is hosted as one

    of the service on the underlying multiplexer and can be queried by the

    client on the TMultiplexerConstants.LOOKUP_CONTEXT for procuring

    relevant information about the hosted services.

    The TRegistry interface is the basic client API for querying the lookup

    registry. It provides several lookup methods for querying registry based

    on service context, service name and regular expression. It also

  • 7/23/2019 Multiplexing in Thrift: Enhancing thrift to meet Enterprise expectations- Impetus White Paper

    7/14

    Multiplexing in Thrift: Enhancing thrift to meet enterprise expectations

    7

    facilitates users in checking the existence of any service context and

    listing all available service contexts with the registry.

    TRegistryHelper is an interface for the server API, which is used by the

    server for binding, rebinding and unbinding of service context with the

    lookup registry. We have provided one basic implementation of theregistry API, TRegistryBase that performs in memory management of

    the service context. This component can be extended to override the

    default behavior, based on the specific need, and can be used along

    with the Factory class. TRegistryClientFactory is the Factory class for

    creating the registry client that facilitates remote lookup of registry.

    Service InformationThe solution uses the URIContext class to capture/represent

    information regarding service(s) hosted on a particular server. This

    object is capable of transmitting across the network; and hence can be

    accessed remotely by the client. Service context, service name and

    description are part of the information captured by this object in the

    present solution.

    Multiplexer-extension for lookup

    fig1.3 : Thrift Multiplexing with Lookup Registry

  • 7/23/2019 Multiplexing in Thrift: Enhancing thrift to meet Enterprise expectations- Impetus White Paper

    8/14

    Multiplexing in Thrift: Enhancing thrift to meet enterprise expectations

    8

    On its own, Multiplexer is capable of hosting multiple services.

    However, managing service information is an overhead for the client as

    well as server administrator. To reduce this overhead, we have

    introduced a registry component that is capable of managing service

    information. In order to leverage the capability of the multiplexer and

    registry component in a single processor, we have introduced our new

    processor TLookupMultiplexer that is capable of hosting multiple

    services along with an additional lookup service based on the registry.

    The processor therefore creates an instance of registry with all service

    information and exposes it as an additional service to clients. This

    enables clients to query registry using Registry API, and accessing the

    underlying service using the service context obtained after querying.

    ServerWe have presented a new abstract server, the TMultiplexingServer,

    which is capable of hosting any server implementation on any transport

    and any protocol, using TLookupMultiplexer. This class abstracts the

    underlying complexities of object creation and exposes two abstract

    methods, vis. getServer and configureMultiplexer, to be implemented

    by any class extending this class. This class enables a user to identify the

    server transport and protocol at the time of the server object creation,

    thus providing an additional degree of flexibility when it comes to

    hosting the same server with multiple services on different transport

    and protocols with no additional coding effort. The TMultiplexingServer

    internally wraps the instance of the TServer, allowing the server startupand shutdown to be managed in accordance with the requirement.

    Source CodeWe have extended the Thrift Java library[version 0.9.0] and added a

    new source folder by the name ext that contains the underlying

    implementation of multiplexing components. Also, build.xml has been

    amended to compile existing and extended source code. Compatibility

    of the solution has additionally been tested with the present stable

    version 0.8.0 of Thrift for seamless integration. In order to use the

    multiplexing capability of Thrift, one has to download/pull source code

    of the extended Thrift library [9] from git-hub and run the ant

    command on the downloaded Thrift Java library. This will generate the

    libthrift-xxx.jar in build folder, which can further be used by developers

    for creating their enterprise solutions.

  • 7/23/2019 Multiplexing in Thrift: Enhancing thrift to meet Enterprise expectations- Impetus White Paper

    9/14

    Multiplexing in Thrift: Enhancing thrift to meet enterprise expectations

    9

    How to use thrift multiplexing

    Creating a multiplexing server with a lookup registry

    The multiplexing server can be created by extending

    org.apache.thrift.server.TMultiplexingServer class and by implementing theabstract method configureMultiplexer () and getServer

    (TServerTransportserverTransport, TProtocolFactoryprotFactory, TProcessor

    processor). The sample code with the illustration is provided below:

    Step 1: Creating the server class by extending the TMultiplexingServer class.

    public class Server1

    extends TMultiplexingServer

    Step 2: Optionally override the default constructor to accept server transport

    and protocolpublic Server1(T serverTransport, F protFactory) {

    super(serverTransport, protFactory);

    }

    Step 3: Implement the configureMultiplexer() method to configure the lookup

    multiplexer. As a part of this configuration, one has to create a list of

    MultiplexerArgs that capture the details of the services that will be hosted on

    the server and their respective service information. In the example illustrated

    below, we have hosted the HR and Finance services on Server1.

    @Override

    protected ListconfigureMultiplexer() {

    //list of multiplexer arguments

    List args = new

    ArrayList();

    // configuring HR service context

    TProcessor processor = new HRService.Processor(new

    HRServiceImpl());

    URIContext context = new URIContext(Constants.HR_CONTEXT,

    "HumanResource_Service");MultiplexerArgs arg = new

    MultiplexerArgs(processor, context);

    args.add(arg);

    // configuring FIN service context

  • 7/23/2019 Multiplexing in Thrift: Enhancing thrift to meet Enterprise expectations- Impetus White Paper

    10/14

    Multiplexing in Thrift: Enhancing thrift to meet enterprise expectations

    10

    processor = new FinanceService.Processor(new

    FinanceServiceImpl());

    context = new URIContext(Constants.FIN_CONTEXT, "Finance_Service");

    arg = new MultiplexerArgs(processor, context);

    args.add(arg);

    return args;

    }

    Step 4: Implement the getServer() method to create an instance of the desired

    server. In the example below, we are creating an instance of ThreadPoolServer

    using the arguments.

    @Override

    Protected TServer getServer (TServerTransport serverTransport,

    TProtocolFactory protFactory, TProcessor processor) {

    //creating server args

    Args serverArgs= new Args(serverTransport);

    serverArgs.protocolFactory(protFactory);

    serverArgs.transportFactory(new TTransportFactory());

    serverArgs.processor(processor);

    serverArgs.minWorkerThreads=1;

    serverArgs.maxWorkerThreads=5;

    //creating server instance

    Return new TThreadPoolServer(serverArgs);}

    Step 5: Create the instance of a server class, using the appropriate transport and

    protocol, and start the server.

    public static void main(String[] args) {

    //identifying server transport

    TServerSocket SERVER1_TRANSPORT = new

    TServerSocket(Constants.SERVICE1_PORT);

    //identifying server protocolFactory SERVER1_FACTORY = new TBinaryProtocol.Factory();

    //creating server instances for specific transport and protocol

    Server1 server1 =

    new Server1(SERVER1_TRANSPORT,

    SERVER1_FACTORY);

  • 7/23/2019 Multiplexing in Thrift: Enhancing thrift to meet Enterprise expectations- Impetus White Paper

    11/14

    Multiplexing in Thrift: Enhancing thrift to meet enterprise expectations

    11

    //starting server

    server1.start();

    }

    Creating a client for querying the registry and using the service contextA Client-to-query multiplexing server registry can be procured from

    org.apache.thrift.registry.TRegistryClientFactory class.TRegistryClientFactory is

    the convenience class that provides multiplexing client instances. On the client

    side, one can use the static method getClient(..) of this factory to procure the

    registry client. This can further be used to query registry and identify the

    appropriate server for processing the request. The example code provided

    below is about a client that retrieves the tax detail of an employee using the

    finance service:

    public double getTaxDetails(intempId){

    TTransport transport = null;TProtocol protocol = null;

    try {

    //transport

    transport = new TSocket(Constants.SERVICE_IP,

    Constants.SERVICE1_PORT, 60);

    //Multiplexing protocol

    protocol = Factory.getProtocol(new TBinaryProtocol(transport),

    TConstants.LOOKUP_CONTEXT);

    //Procuring Registry client

    TRegistry client = TRegistryFactory.getClient(protocol);

    //opening transport

    transport.open();

    //querying registry to get context

    Set contexts = client.lookupByName("Finance_Service");

    //executing the request on appropriate service using the context

    if(contexts.size()==1){URIContext uricontext = contexts.iterator().next();

    protocol =

    newTMultiplexProtocol(newTBinaryProtocol(transport),uricontext.getContext())

    ;

    com.service.FinanceService.Client finService = new

    com.service.FinanceService.Client(protocol);

  • 7/23/2019 Multiplexing in Thrift: Enhancing thrift to meet Enterprise expectations- Impetus White Paper

    12/14

    Multiplexing in Thrift: Enhancing thrift to meet enterprise expectations

    12

    return finService.getTaxDeductedTillDate(empId);

    }

    }finally {

    if(transport!=null)//closing transport

    transport.close();

    }

    }

  • 7/23/2019 Multiplexing in Thrift: Enhancing thrift to meet Enterprise expectations- Impetus White Paper

    13/14

    Multiplexing in Thrift: Enhancing thrift to meet enterprise expectations

    13

    Making a wise investment lucrative

    Thrift is a big plus in todays enterprise environment, as it addresses all the

    challenges imposed by any Big Data solution in an effective manner, and

    presents a solution that can be exposed as a service across the network. Most

    enterprises have limited ports, especially in the production environment, and

    opening new ports involves an associated cost. Using Thrift as an RPC

    mechanism for a solution is restrictive, on account of the limited availability of

    the ports. Also, various Big Data solutions like Hadoop, Hive, HBase, Cassandra,

    NoSQL data stores etc., and other enterprise software such as web servers,

    application servers, and ESBs already use up a number of ports. If an enterprise

    has to expose its solutions as services (that are using the underlying Big Data) on

    the network, then opening extra ports for each service would be ineffective in

    terms of cost and resources. This enterprise problem can be effectively

    addressed by hosting all the services with the help of Thrift multiplexing thatcan reduce the number of ports to one, with very minimal development and

    administrative overheads.

    An organization investing in this technology is certainly going to reap the benefit

    of quick turnaround times and low development costs. Furthermore, the

    extensions done for multiplexing make these investments lucrative by reducing

    the maintenance and administrative overheads for enterprises. With

    multiplexing, multiple services can be hosted on a single Thrift server, thus

    cutting maintenance costs over the long run. Modular designing of services can

    be undertaken using the capability of multiplexing that can reduce the future

    development cost of introducing new service(s)/function(s) or amending

    existing services. Hence, multiplexing through its simple approach, not only

    makes an investment worthwhile, but also brings added value to business.

  • 7/23/2019 Multiplexing in Thrift: Enhancing thrift to meet Enterprise expectations- Impetus White Paper

    14/14

    Multiplexing in Thrift: Enhancing thrift to meet enterprise expectations

    14

    Summary

    In recent times Thrift has emerged as a powerful technology for communicating

    across programming languages in a reliable and efficient manner. Enterprises

    dealing with Big Data and other advanced technologies can use the Thrift

    solution to host multiple services on the network by efficiently utilizing

    enterprise resources, at low maintenance costs.

    References

    [1] http://thrift.apache.org/[2] http://avro.apache.org/

    [3] http://msgpack.org/

    [4] http://code.google.com/p/protobuf/

    [5] http://bsonspec.org/

    [6] http://hbase.apache.org/

    [7] http://hive.apache.org/

    [8] http://cassandra.apache.org/

    [9] git://github.com/impetus-opensource/thrift.git

    About Impetus

    Impetus Technologies offers Product Engineering and Technology R&D services for software product development.

    With ongoing investments in research and application of emerging technology areas, innovative business models, and

    an agile approach, we partner with our client base comprising large scale ISVs and technology innovators to deliver

    cutting-edge software products. Our expertise spans the domains of Big Data, SaaS, Cloud Computing, Mobility

    Solutions, Test Engineering, Performance Engineering, and Social Media among others.

    Impetus Technologies, Inc.5300 Stevens Creek Boulevard, Suite 450, San Jose, CA 95129, USA

    Tel: 408.213.3310 | Email:[email protected]

    Regional Development Centers - INDIA: New Delhi Bangalore Indore Hyderabad

    Visit:www.impetus.com

    DisclaimersThe information contained in this document is the proprietary and exclusive property of Impetus Technologies Inc. except as otherwise indicated. No part of

    this document, in whole or in part, may be reproduced, stored, transmitted, or used for design purposes without the prior written permission of Impetus

    Technologies Inc.