50
Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

  • View
    217

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

Highlights from EPC 2006

Vincenzo Innocente

On behalf of the

Local Organizing Committee

Page 2: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 2

EuroPython at CERN

• EuroPython conference organized by SFT this year!

• Three days– Parallel sessions in Bld 40 – Keynotes and “Lightning” in main auditorium– Dinner in the Globe

• 280 participants• 100 presentations (w/o lightning)

– 5 by “CERN”

Page 3: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 3

Schedule

4 parallel sessions (in bld 40)– All synchronized– 5 minutes pause between talks– Easy for people to move from

one session to another

Plenary Lightning & key notes (in Main Amphi)

Page 4: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 4

Scientific Program

• 7 tracks– Python in Science– Python Language & Libraries– Agile Development– Web Frameworks– Business and Applications– Teaching– Games and Entertainment

Page 5: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 5

Community

• Who– Wide age spectrum

• Many in post-doc age-range

– All 5 continents– Very few women (1-2%, all managers?)

• Where– Mostly Companies developing Software Solutions

• Revenue from Selling custom products or services• Find business advantages

– In using open source software (contribute to its development)– Develop components reusable beyond a specific project

– Some Research Labs• Domain specific applications• Reuse in the community (adapting to pre-existing “habits”)

Page 6: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 6

Community• What:

– Core language development– Web framework, web applications– Software development tools (web based)– Scientific data processing, visualization– No sys-admin, net-admin, embedded-software, office automation

• Why:– Hear news about Language, Libraries,

key products (Zope,…)• Discuss, propose, complain

– Present their products• In many cases just a spin-off

component

– Work (in Sprint sessions)

Page 7: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 7

Messages

• Python:– A language for rapid-prototyping, extreme-

programming, just-in-time deployment– THE integration framework– THE Business Domain Language– THE embedded scripting language

Python is faster than Assembler

Page 8: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 8

Outline

• What I will not cover– Latest greatest features of Python– Python 3000– SciPy, PyTables, PyPy, Zope, Plone, Gjango,…– Python in HEP– Google….

• I will focus on– Python: a framework for scientific application– Building and sharing components– Python: from fast-prototyping to engineered code – Dispersed development

Page 9: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

Scientific Frameworks

Page 10: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 10

MGL ToolsIndependent and re-usable component for

structural bioinformatics

Page 11: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 11

MGL ToolsIndependent and re-usable component for

structural bioinformatics

Page 12: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 12

Page 13: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 13

Page 14: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 14

Page 15: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 15

Page 16: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 16

Page 17: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 17

Page 18: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 18

Pyphant

Page 19: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 19

Pyphant application

Page 20: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 20

Pyphant architecture

Page 21: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 21

Worker Code

Page 22: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

Building & Sharing Components

Page 23: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 24

Page 24: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 25

Builds upon SciPy (data representation)

And HDF5(I/0 layer)

Page 25: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 26

Page 26: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 27

The Company

Page 27: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 28

The Customer

Page 28: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 29

The “new” Components

• For this customer they had to two additional requirements to fulfill:– Avoid to blow the CMS with binary files– Count the number of accesses

• They developed two lightweight products– Plug in the deployed solution

• Reuse the existing infrastructure

– Reusable outside this project and company– Extendable to other architecture/framework– Contribution to open source software

Page 29: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 30

Tramline

• Tramline plugs between Apache and Plone/ZOPE

• On Upload: – extract data to disk– Assign id– Store id in ZOPE

• On download– Replace id with file

content

Page 30: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 31

Linktally

•Scan logs•Count request•Store in the DB as Metadata•Rank content in CMS

Page 31: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 32

LinkTally status & prospects

Now Solution for one customer

Limited spin-off

Evolution Contribution from community

Spin-in: use it in other projects!

Page 32: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

From a prototype to a product

Page 33: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 34

The Indico Technology

• Main programming language: Python• Runs on Apache using the Python module mod_python• Persistence based in ZODB (Zope Object Database)

• Transparency: no need for explicit read/writes of the objects• Fits very well with Indico complex object model• Proven performance and scalability

• Timetable generation: libXML, libXSLt + python bindings• Portable technologies: runs on Windows, linux• Export gateways:

– iCalendar ; XML ; PDF outputs– OAI (Open Archive Initiatives) for ensuring integration with other services

• Standard protocol for information exchange between digital libraries• Allows to expose conference data• Allows other systems to fetch conference data and build services over it• Simple mechanism XML over HTTP

Page 34: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 35

• Main programming language: Python• Runs on Apache using the Python module

mod_python • Uses MySQL RDBMS

– Take advantage of fully featured query language

• Invenio home made Indexes • Internal representation with XML-MARC• Export gateways:

– Multiple output formats: HTML, XML, MARC, OAI, DC, etc.

• Some modules:– Still in PHP (slowly moved to Python)– Some in Common Lisp (BibCheck)

The Invenio Technology

Page 35: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 36

Index Space Design (II)• Two important speed factors to consider:

– speed of set intersections (Web App Server)– speed of set marshalling (Web App <-> DB Server)

• Data structures tested:– sorted (lists, Patricia trees)– unsorted (hashed sets, binary vectors)

• fast prototyping: (Python)– throw-away coding, organic-growth software

• development model– typical search time gain: 4.0 sec 0.2 sec– typical indexing time loss: 7 hours 4 days– binary vectors found the best compromise

(for all types of sets)

Page 36: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 37

Performance Benchmarks (2002)

• Testing marshalling/intersection/union/unmarshalling• Bytecode interpreted language study: (Python, Java)

– Python faster than Java (mainly due to marshalling)

• Machine code compiled language study: (ML, Lisp)– OCaml, CMU CL: 3+ times faster than Python C libs– CMU CL best scalable: intersecting 6M records in 0.01 sec, 30M

records in 0.04 sec

• Data structure study:– OCaml, 3,000,000 records: bit vectors 0.43 sec, hashed sets 1.71

sec, lists 3.76 sec, Patricia trees do not scale well for dense sets

• Python fast enough for production (1M records)– fast C modules: Numeric (byte/bit), Marshal, Psyco

Page 37: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 38

The + of Python• Clean aesthetical language

• Easy to learn, important for many internship students and temporary members working on the project

• Very good for rapid prototyping & organic-growth development

• Plenty of ready-to-be-used modules

• Bytecode-compiled only, speed okay for our needs

Page 38: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 39

Use Python?

Page 39: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 40

Page 40: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

Dispersed Teams

Page 41: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 42

Dispersed teams

Page 42: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 43

Page 43: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 44

Page 44: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 45

Page 45: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 46

Page 46: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 47

Page 47: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

At Last

Page 48: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 49

What I Learned• Python is not just a language for scripting and glue code

– Fully fledged, highly engineered frameworks can be written in Python

• Frameworks and component architectures are established practices– Frameworks tend to be domain specific– All very similar to each other and share many design patters

• Many concepts common to modern HEP-framework architectures

• BusinessDomainLanguages are essential:– Python has the expressive power to implement them

Page 49: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 50

What I learned

• What can be reused?– Experience, patterns,

• Provided one has a common “culture”

– Low level components– Plugin components

• Provided that the interface is NOT business-domain specific

• LHC is not anymore at the frontier of distributed collaboration

• There are Individuals/Labs/Companies which value– Sharing information– Building reusable software components– Cooperating in developing the basic building blocks– Become a community around such a common ground

Page 50: Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

VI @ EPC06 51

More?

• Visit– http://vanrees.org/weblog/topics/europython– http://indico.cern.ch/conferenceDisplay.py?confId=44– http://www.europython.org/

– http://www.google.com/search?q=europython