15
Apache Avro Zafar Gilani Muhammad Adnan Khan Hui Shang

Avero Mapper Java

Embed Size (px)

DESCRIPTION

Java based avero mapper desc

Citation preview

Apache Avro

Zafar Gilani

Muhammad Adnan Khan

Hui Shang

Outline

• Overview

• Comparison

• Specification

• SASL profile and usage

• References

Overview

• A data serialization system.

• An RPC framework.

• For: storage & comm.

• Purpose:

– Provide rich data structures.

– A compact and fast binary data format.

– Simple integration with dynamic languages.

Overview

• Avro uses JSON for Interface Description Language (IDL).

– To specify data types.

– To specify protocols.

• Review: JavaScript Object Notation is just a light-weight text-based standard for data interchange.

Why the need for Avro?

• Primary usage in Hadoop, provides standard:

1. Serialization format for persistent data.

2. Wire format for communication ..

• .. among Hadoop nodes.

• .. from client programs to Hadoop services.

Overview

• Avro relies on schemas.

– Schema stored with data.

– Each datum written with no per-value overheads.

• Thus serialization is fast and small.

• Avro in RPC:

– Schema exchange during client-server handshake.

– Correspondence in fields can be easily resolved.

APIs

• Supporting API for:

– Java

– C

– C++

– C#

– Python

– Ruby

Comparison with other systems

• Avro vs. Protobuf and Thrift.

• A quick note about Thrift:

– Initially developed at Facebook by a Google intern.

– Closer to Google’s protobuf.

Comparison with other systems

Avro Google protobuf Thrift

Implementation Hmm.. Cleaner Hmm..

Error handling Complex Simple OK

Extensibility Hmm.. Richer OK

Compatibility Java, C, C++, C#, Python and Ruby

That and much more such as Adobe Actionscript, Microsoft Silverlight, etc.

About the same as protobuf

Specification

• Schema represented in one of: – JSON string, naming a defined type.

– JSON object of the form: • {"type": "typeName" ...attributes...}

– JSON array

• Primitive types: null, boolean, int, long, float, double, bytes, string – {"type": "string"}

• Complex types: records, enums, arrays, maps, unions, fixed

Specification, example protocol

{

"namespace": "com.acme",

"protocol": "HelloWorld",

"doc": "Protocol Greetings",

"types": [

{"name": "Greeting", "type": "record", "fields": [

{"name": "message", "type": "string"}]},

{"name": "Curse", "type": "error", "fields": [

{"name": "message", "type": "string"}]}

],

"messages": {

"hello": {

"doc": "Say hello.",

"request": [{"name": "greeting", "type": "Greeting" }],

"response": "Greeting",

"errors": ["Curse"]

}

}

}

SASL profile

• Simple Authentication and Security Layer.

• Provides a framework for

– Authentication.

– Security of network protocols.

SASL usage

• Negotiation procedure to use connection-oriented Avro RPC:

– 0: START Used in a client's initial message.

– 1: CONTINUE Used while negotiation is ongoing.

– 2: FAIL Terminates negotiation unsuccessfully.

– 3: COMPLETE Terminates negotiation sucessfully.

References

1. Apache Avro, http://avro.apache.org/docs/current/

2. Google protocol buffers vs Apache Avro, http://www.sammur.com/?p=36

3. Avro vs Thrift, http://tech.puredanger.com/2011/05/27/serialization-comparison/

4. SASL, http://avro.apache.org/docs/current/sasl.html

Apache Avro

Zafar Gilani

Muhammad Adnan Khan

Hui Shang