44
:)

Tiny Google Projects

Embed Size (px)

DESCRIPTION

Presentation about 3 Google projects.

Citation preview

Page 1: Tiny Google Projects

:)

Page 2: Tiny Google Projects

google

Page 3: Tiny Google Projects

tiny :projects

Page 4: Tiny Google Projects
Page 5: Tiny Google Projects
Page 6: Tiny Google Projects

مرحبا العالم

Page 7: Tiny Google Projects

Tesseract OCR

1985 2006

HP Google

Page 8: Tiny Google Projects

Tesseract OCR

2006 2011

TIFF *

Page 9: Tiny Google Projects

Tesseract OCR

2009 2010

Text layout

Page 10: Tiny Google Projects

Tesseract OCR

2007 2011

6 33

Page 11: Tiny Google Projects

Tesseract OCR

Arabic, English, Bulgarian, Catalan, Czech, Chinese (Simplified and Traditional), Danish

(standard and Fraktur script), German, Greek, Finnish, French, Hebrew, Croatian, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian,

Lithuanian, Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak (standard and Fraktur script), Slovenian, Spanish, Serbian, Swedish, Tagalog, Thai,

Turkish, Ukrainian and Vietnamese

Page 12: Tiny Google Projects

Tesseract OCR

Officially supported:

Probably runs on:

Page 13: Tiny Google Projects

Image processing

Page 14: Tiny Google Projects
Page 15: Tiny Google Projects
Page 16: Tiny Google Projects
Page 17: Tiny Google Projects

Google Refine

Page 18: Tiny Google Projects

Runs on:

Page 19: Tiny Google Projects

Runs in:

Page 20: Tiny Google Projects

Major features:

Import from anywhereFacetingClusteringSplit crate custom columnsGREL transformationsExport/etc

Page 21: Tiny Google Projects
Page 22: Tiny Google Projects

google protocol buffersgoogle protocol buffers

message Person {  required int32 id = 1;  required string name = 2;  optional string email = 3;}

Person person;person.set_id(123);person.set_name("Bob");person.set_email("[email protected]");

fstream out("person.pb", ios::out ...person.SerializeToOstream(&out);out.close();

>

Page 23: Tiny Google Projects

512 bytes / tweet 340,000,000 tweets / day (2012)7,253,333,333 bytes / hour 2,014,814 bytes / second 1,921 Mbytes / second 15,371 Mbits / second

8 Tbytes / day (2011)

Google: ~ 377M searches/day

Page 24: Tiny Google Projects

=+

Page 25: Tiny Google Projects

=+

Page 26: Tiny Google Projects

=+

Page 27: Tiny Google Projects

=+>

Page 28: Tiny Google Projects

=+>

Page 29: Tiny Google Projects

=+>

MapReduce

?

Page 30: Tiny Google Projects
Page 31: Tiny Google Projects

snappyhttp://code.google.com/p/snappy/

Page 32: Tiny Google Projects

Free and BSDRobust

snappy

Fast Stable

Page 33: Tiny Google Projects

Size

lzjb 2010

lzo 2.04 1x

fastlz 0.1 -1

fastlz 0.1 -2

lzf 3.6 vf

lzf 3.6 uf

lzrw1

lzrw1-a

lzrw2

lzrw3

lzrw3-a

snappy 1.0

quicklz 1.5.0 -1

quicklz 1.5.0 -2

0

10

20

30

40

50

60

70

80

compression ratio (%) (less is better)

Page 34: Tiny Google Projects

Data types

plain text html jpeg0

1

2

3

4

5

6

snappyzlib

com

pres

sion

ratio

Page 35: Tiny Google Projects

Size

from 20% to 100% bigger

:(

...not for amazon glacier

Page 36: Tiny Google Projects

Speed

lzjb 2010

lzo 2.04 1x

fastlz 0.1 -1

fastlz 0.1 -2

lzf 3.6 vf

lzf 3.6 uf

lzrw1

lzrw1-a

lzrw2

lzrw3

lzrw3-a

snappy 1.0

quicklz 1.5.0 -1

quicklz 1.5.0 -2

0

50

100

150

200

250

Compression (MB/s) (more is better)

Page 37: Tiny Google Projects

Speed

lzjb 2010

lzo 2.04 1x

fastlz 0.1 -1

fastlz 0.1 -2

lzf 3.6 vf

lzf 3.6 uf

lzrw1

lzrw1-a

lzrw2

lzrw3

lzrw3-a

snappy 1.0

quicklz 1.5.0 -1

quicklz 1.5.0 -2

0

50

100

150

200

250

300

350

400

450

500

Decompression (MB/s) (more is better)

Page 38: Tiny Google Projects

On 1 core of 64-bit Core i7 processor:

• Compression: 250MB/s

• Decompression: 500MB/s

:P

Page 39: Tiny Google Projects

Portable, but...

Page 40: Tiny Google Projects

Portable, but primarily optimizedfor 64-bit x86-compatible processors

Page 41: Tiny Google Projects

Used:

BigTableMapReduceGoogle RPC

Hadoop

Page 42: Tiny Google Projects

Bindings:

Page 43: Tiny Google Projects

@TarasRoshko

HTTP headers here:

http://code.google.com/p/snappy/source/browse/trunk/framing_for

mat.txt

Page 44: Tiny Google Projects

QA? Ostap Andrusiv

Software EngineerEleks software@p1f