10
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 1 HiVertica Capstone Project University of Pittsburgh January 11, 2013 Stephen Walkauskas, Architect, Data Management, Vertica

HiVe rtica Capstone Project

  • Upload
    eldon

  • View
    67

  • Download
    0

Embed Size (px)

DESCRIPTION

HiVe rtica Capstone Project. Stephen Walkauskas, Architect, Data Management, Vertica. University of Pittsburgh January 11, 2013. Contact info. Stephen Walkauskas [email protected]. Vertica culture. What Is Vertica. Speed. SQL Database for Real-time Analytics - PowerPoint PPT Presentation

Citation preview

Page 1: HiVe rtica Capstone Project

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.1

HiVerticaCapstone Project

University of Pittsburgh January 11, 2013

Stephen Walkauskas, Architect, Data Management, Vertica

Page 2: HiVe rtica Capstone Project

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.2

Contact infoStephen [email protected]

Page 3: HiVe rtica Capstone Project

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.3

Vertica culture

Page 4: HiVe rtica Capstone Project

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.4

What Is Vertica

• SQL Database for Real-time Analytics • Runs on x86 hardware• MPP Columnar Architecture – scales to

PBs!• Reduced footprint via Advanced

Compression• Extensible analytics capabilities• Easy to setup and use• Elastic - grow/shrink as needed• Extensive Ecosystem of analytic tools

Speed

Scale

Simplicity

Page 5: HiVe rtica Capstone Project

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.5

Map/Reduce

Page 6: HiVe rtica Capstone Project

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.6

-- HQLSELECT a.val1, a.val2, b.val, c.val FROM a JOIN b ON (a.key = b.key)LEFT OUTER JOIN c ON (a.key = c.key)

Page 7: HiVe rtica Capstone Project

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.7

HiVertica

Page 8: HiVe rtica Capstone Project

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8

HiVerticaa) Write code to read Hive / HCatalog meta-data and generate DDL to

create corresponding external tables (ETs) in a Vertica DB.

b) Configure ETs with files referenced by the corresponding Hive tables. Vertica ships a connector to source files from hdfs. Using this connector the aforementioned ETs can be used to query data in Hive (assuming data is in a format Vertica can parse).

Page 9: HiVe rtica Capstone Project

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.9

HiVerticac) Vertica supports User Defined Parsers (you can write your own csv

parser if you’re so inclined). RCFile is commonly used to store data in Hive. It would be useful to be able to parse that format in a Vertica UDParser.

d) Find that place in Hive where it compiles HQL into M/R jobs and instead rename the HQL to SQL and, leveraging the above features, send the query to Vertica instead. The two systems are not 100%; we can tweak them to shrink the feature gap.

Page 10: HiVe rtica Capstone Project

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.10

Thanks!