23

Balaur @ Geekmeet

  • Upload
    balaur

  • View
    675

  • Download
    2

Embed Size (px)

DESCRIPTION

Cum am construit balaurul, aspecte tehnice.

Citation preview

Page 1: Balaur @ Geekmeet
Page 2: Balaur @ Geekmeet

EchipaMircea Silviu Cristian

Page 3: Balaur @ Geekmeet

Caută joburi

Page 4: Balaur @ Geekmeet

API & Widget

• API– Indexare

– Căutare (folosit de jobincluj.ro si 1000eu.ro)

• Widget

Page 5: Balaur @ Geekmeet

Job trends

Page 6: Balaur @ Geekmeet

Bucuresti

Page 7: Balaur @ Geekmeet

Python vs Ruby

Page 8: Balaur @ Geekmeet

Tehnologii

Javascript

bash

Page 9: Balaur @ Geekmeet

Tehnologii backend

BeautifulSoup

Facebook Apache Thrift

Page 10: Balaur @ Geekmeet

Tehnologii Frontend

Google Vis. API

Page 11: Balaur @ Geekmeet

Alte tehnologii

routes

pycrypto

rsync

glade

Page 12: Balaur @ Geekmeet

Arhitectura

Page 13: Balaur @ Geekmeet

Indexing pipeline

...

URLs

HTML

Thriftjobs

Page 14: Balaur @ Geekmeet

Data

• 60+ GB

• No MySQL– Read-only

– Secvential

– Shard-uri Thrift

– Bloom filters

Page 15: Balaur @ Geekmeet

Pipeline

• Bash rulează (Job-uri în paralel / secvențial)

Page 16: Balaur @ Geekmeet

Testing

• import unittest

• Testing framework

Page 17: Balaur @ Geekmeet

Job Editor

glade

Page 18: Balaur @ Geekmeet

Logging

• 4 niveluri– DEBUG, INFO, WARN, ERROR

• Why?– Troubleshooting

– Profiling

– Goodies

Page 19: Balaur @ Geekmeet

Logging → Status

Page 20: Balaur @ Geekmeet

Xapian

• Pro– Scris în C/C++

– API compact, mic

– Foarte rapid la căutare

– Imbunătățit activ

• Contra– Mai încet la indexare

– Index mai mare (trends index ~4.5G)

Page 21: Balaur @ Geekmeet

[email protected]

• Relevanță custom

• Custom spelling suggestions

• Elimină duplicatele

• Random shuffle

Page 22: Balaur @ Geekmeet

Show me the QPS

+ caching

Page 23: Balaur @ Geekmeet

Întrebări?http://balaur.ro/contact