Upload
ostap-andrusiv
View
1.934
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Presentation about 3 Google projects.
Citation preview
:)
tiny :projects
مرحبا العالم
Tesseract OCR
1985 2006
HP Google
Tesseract OCR
2006 2011
TIFF *
Tesseract OCR
2009 2010
Text layout
Tesseract OCR
2007 2011
6 33
Tesseract OCR
Arabic, English, Bulgarian, Catalan, Czech, Chinese (Simplified and Traditional), Danish
(standard and Fraktur script), German, Greek, Finnish, French, Hebrew, Croatian, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian,
Lithuanian, Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak (standard and Fraktur script), Slovenian, Spanish, Serbian, Swedish, Tagalog, Thai,
Turkish, Ukrainian and Vietnamese
Tesseract OCR
Officially supported:
Probably runs on:
Image processing
Google Refine
Runs on:
Runs in:
Major features:
Import from anywhereFacetingClusteringSplit crate custom columnsGREL transformationsExport/etc
google protocol buffersgoogle protocol buffers
message Person { required int32 id = 1; required string name = 2; optional string email = 3;}
Person person;person.set_id(123);person.set_name("Bob");person.set_email("[email protected]");
fstream out("person.pb", ios::out ...person.SerializeToOstream(&out);out.close();
>
512 bytes / tweet 340,000,000 tweets / day (2012)7,253,333,333 bytes / hour 2,014,814 bytes / second 1,921 Mbytes / second 15,371 Mbits / second
8 Tbytes / day (2011)
Google: ~ 377M searches/day
=+
=+
=+
=+>
=+>
=+>
MapReduce
?
snappyhttp://code.google.com/p/snappy/
Free and BSDRobust
snappy
Fast Stable
Size
lzjb 2010
lzo 2.04 1x
fastlz 0.1 -1
fastlz 0.1 -2
lzf 3.6 vf
lzf 3.6 uf
lzrw1
lzrw1-a
lzrw2
lzrw3
lzrw3-a
snappy 1.0
quicklz 1.5.0 -1
quicklz 1.5.0 -2
0
10
20
30
40
50
60
70
80
compression ratio (%) (less is better)
Data types
plain text html jpeg0
1
2
3
4
5
6
snappyzlib
com
pres
sion
ratio
Size
from 20% to 100% bigger
:(
...not for amazon glacier
Speed
lzjb 2010
lzo 2.04 1x
fastlz 0.1 -1
fastlz 0.1 -2
lzf 3.6 vf
lzf 3.6 uf
lzrw1
lzrw1-a
lzrw2
lzrw3
lzrw3-a
snappy 1.0
quicklz 1.5.0 -1
quicklz 1.5.0 -2
0
50
100
150
200
250
Compression (MB/s) (more is better)
Speed
lzjb 2010
lzo 2.04 1x
fastlz 0.1 -1
fastlz 0.1 -2
lzf 3.6 vf
lzf 3.6 uf
lzrw1
lzrw1-a
lzrw2
lzrw3
lzrw3-a
snappy 1.0
quicklz 1.5.0 -1
quicklz 1.5.0 -2
0
50
100
150
200
250
300
350
400
450
500
Decompression (MB/s) (more is better)
On 1 core of 64-bit Core i7 processor:
• Compression: 250MB/s
• Decompression: 500MB/s
:P
Portable, but...
Portable, but primarily optimizedfor 64-bit x86-compatible processors
Used:
BigTableMapReduceGoogle RPC
Hadoop
Bindings:
@TarasRoshko
HTTP headers here:
http://code.google.com/p/snappy/source/browse/trunk/framing_for
mat.txt
QA? Ostap Andrusiv
Software EngineerEleks software@p1f