20
© 2008 Palantir Technologies Inc. All rights reserved. Architecture & Scalability An overview of the Palantir Server Architecture Akash Jain Director of Engineering

Architecture

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Architecture

© 2008 Palantir Technologies Inc. All rights reserved.

Architecture & Scalability

An overview of the Palantir Server Architecture

Akash JainDirector of Engineering

Page 2: Architecture

Overview

Palantir Server Architecture– A fully-featured, enterprise-grade analytic platform– Robust, scalable, open and maintainable

In this talk– Dispatch Server– Oracle DB– Search Server– Job Server– Raptor Server

Page 3: Architecture

Server Architecture

Dispatch Server

Revisioning DB

JDBC 3.0w/ SSL

OracleDatabase Storage

Search Server

Lucene Index

Storage

HTTPS

Job Server

Shared Storage

HTTPS

Job Data and Specs

Job Logsand Results

Page 4: Architecture

Dispatch Server

Clients connect here– “Gateway to Palantir”– Clients can only connect here

Connects to database– Access control– Revisioning database

Connects to search and federated search Responsible for job creation and scheduling

Page 5: Architecture

Roadmap: Revisioning DB

Dispatch Server

Revisioning DB

JDBC 3.0w/ SSL

OracleDatabase Storage

Search Server

Lucene Index

Storage

HTTPS

Job Server

Shared Storage

HTTPS

Job Data and Specs

Job Logsand Results

Page 6: Architecture

Revisioning DB

Persistence store Oracle 10g RDBMS Enterprise-grade

– Scalability– Backup and Maintenance– Industry Standard– Large DBA community

JDBC 3.0 with SSL

Dispatch Server

Revisioning DB

JDBC 3.0w/ SSL

OracleDatabase Storage

Page 7: Architecture

Roadmap: Search Server

Dispatch Server

Revisioning DB

JDBC 3.0w/ SSL

OracleDatabase Storage

Search Server

Lucene Index

Storage

HTTPS

Job Server

Shared Storage

HTTPS

Job Data and Specs

Job Logsand Results

Page 8: Architecture

Search Server

Built on Apache Lucene– Leverage text processing capability– IR Library -> Enterprise Server– Full-text search capability– Custom fuzzy search using approxes

Why build our own?– Flexibility – database agnostic– Security – built into indexes– Scalability

Search Server

Lucene Index Storage

Page 9: Architecture

Clustered Search Scale Parameters

Palantir Search Server scales horizontally User scale

– Number of concurrent requests Data scale

– Additional corpora/data sources– Also includes manually entered data

Search Server

Lucene Index Storage

Page 10: Architecture

Clustered Search Mirroring Mirroring for User Scalability

– Redundancy across machines– Index write requests go to all mirrors– Search requests go to one mirror– More mirrors-> more concurrent queries

Search Mirror

Lucene Index Storage

Search Mirror

Lucene Index Storage

Search Mirror

Lucene Index Storage

Index Request

A

Index Request

A

Index Request

A

Search Request

1

Search Request

3

Search Request

2

Search Mirror

Lucene Index

Storage

Search Mirror

Lucene Index

Storage

Search Mirror

Lucene Index

Storage

Search Mirror

Lucene Index

Storage

Search Mirror

Lucene Index

Storage

Search Mirror

Lucene Index

Storage

Increased ThroughputSearch

Request 1

Search Request

2

Search Request

3

Search Request

4

Search Request

5

Search Request

6

Page 11: Architecture

Clustered Search Partitioning Partitioning for Data Scale

– Split data across many machines– Search requests go to all partitions– Index write requests go to one partition– More partitions -> more data with constant index size

Search Partition

Lucene Index Storage

Search Partition

Lucene Index Storage

Search Partition

Lucene Index Storage

Index Request

1

Index Request

3

Index Request

2Search Partitio

n

Lucene Index

Storage

Search Partitio

n

Lucene Index

Storage

Search Partitio

n

Lucene Index

Storage

Search Partitio

n

Lucene Index

Storage

Search Partitio

n

Lucene Index

Storage

Search Partitio

n

Lucene Index

Storage

Search Request

A

Search Request

A

Search Request

A

Increased Index CapacityIndex Reque

st 1

Index Reque

st 3

Index Reque

st 2

Index Reque

st 4

Index Reque

st 6

Index Reque

st 5

Page 12: Architecture

Roadmap: Job Server

Dispatch Server

Revisioning DB

JDBC 3.0w/ SSL

OracleDatabase Storage

Search Server

Lucene Index

Storage

HTTPS

Job Server

Shared Storage

HTTPS

Job Data and Specs

Job Logsand Results

Page 13: Architecture

Job Server

The job server runs asynchronous jobs on behalf of clients– Bulk data imports– Persistent searches– LDAP auth syncs

Many job servers

Dispatch Server

Job Server

Shared Storage

HTTPS

Job Data and Specs Job Logs

and Results

Page 14: Architecture

Systems Diagram

External Network

DMZ

Internal Network

Dispatch Server

Rev DB

JDBC 3.0w/ SSL

OracleDatabase Storage

Search Server

Lucene Index

Storage

HTTPS

Shared Storage

HTTPS

Job Server

Job Data and Specs

Job Logsand Results

HTTPS

Client

Page 15: Architecture

Raptor Overview

Raptor sits in front of data sources Raptor indexes data source and answers search

queries Raptor monitors changes in your data source and

sends them to Palantir

Page 16: Architecture

Federated Search

Raptor is Palantir’s federated search server– Rich data modeling– Extensible searching– Highly scalable indexing and search capabilities

Leverages– Palantir Data Import Pipeline– Palantir Clustered Search Server

With Raptor: Data owners control data You control performance characteristics

Page 17: Architecture

Raptor Query Process

Raptor A

Searching

Raptor B

Searching

Raptor C

Searching

Search Query• Hits Palantir Search

Server• Federated to Raptor

Instances if applicable• Supports both keyword

search and structured queries

Results Collection• Results are sorted using

relevance from each search

Import to Palantir• On-The-Fly (OTF) Import• Sourcing information

retained for each attribute imported

• Enables full Palantir functionality

Palantir Query Result

Raptor C

Raptor B

Raptor A

Page 18: Architecture

Raptor Scale Characteristics

Data Scale– 100 million row Netflix dataset– 10 million document usenet corpus– 1.5 million entity extracted Wikipedia corpus

Indexing Performance– 1m rows/hour structured indexing– 500k docs/hour unstructured document indexing– 100k docs/hour entity-extracted document indexing

Searching Performance– Sub-second search processing

Page 19: Architecture

Summary

Palantir server components support a robust, scalable platform for analysis

Leverage enterprise-grade infrastructure Raptor provides further scalability

Page 20: Architecture

© 2008 Palantir Technologies Inc. All rights reserved.

Architecture & Scalability

An overview of the Palantir Server Architecture

Akash JainDirector of Engineering