9
Data Processing with Ruby Brian Chapados http://chapados.org SDRuby April 3, 2008

Processing Data with Ruby

Embed Size (px)

DESCRIPTION

Brief overview of how to deal with processing scientific data using Ruby to interface with existing software.

Citation preview

Page 1: Processing Data with Ruby

Data Processing with Ruby

Brian Chapadoshttp://chapados.org

SDRubyApril 3, 2008

Page 2: Processing Data with Ruby

> Archaeglobus PCNAMIDVIMTGELLKTVTRAIVALVSEARIHFLEKGLHSRAVDPANVAMVIVDIPKDSFEVYNIDEEKTIGVDMDRIFDISKSISTKDLVELIVEDESTLKVKFGSVEYKVALIDPSAIRKEPRIPELELPAKIVMDAGEFKKAIAAADKISDQVIFRSDKEGFRIEAKGDVDSIVFHMTETELIEFNGGEARSMFSVDYLKEFCKVAGSGDLLTIHLGTNYPVRLVFELVGGRAKVEYILAPRIESE

Understanding Proteins

sequence: 1-D linear chain

structure: 3-D after folding

Page 3: Processing Data with Ruby

Hard to do structures with several components

Page 4: Processing Data with Ruby

X-ray scattering

C. Trame, personal communication.Sousa et al. 2000. Cell 103: 633-643.

Page 5: Processing Data with Ruby

Raw Data Distance distribution function of

particle

R P(R) ERROR

0.0000E+00 0.0000E+00 0.0000E+00 0.5000E+00 0.3157E-02 0.0000E+00 0.1000E+01 0.6069E-02 0.0000E+00 0.1500E+01 0.8740E-02 0.0000E+00 0.2000E+01 0.1118E-01 0.0000E+00 0.2500E+01 0.1339E-01 0.0000E+00 0.3000E+01 0.1538E-01 0.0000E+00 0.3500E+01 0.1718E-01 0.0000E+00 0.4000E+01 0.1879E-01 0.0000E+00 0.4500E+01 0.2023E-01 0.0000E+00 0.5000E+01 0.2153E-01 0.0000E+00 0.5500E+01 0.2269E-01 0.0000E+00 0.6000E+01 0.2374E-01 0.0000E+00 0.6500E+01 0.2471E-01 0.0000E+00 0.7000E+01 0.2560E-01 0.0000E+00 0.7500E+01 0.2645E-01 0.0000E+00 0.8000E+01 0.2727E-01 0.0000E+00 0.8500E+01 0.2809E-01 0.0000E+00 0.9000E+01 0.2891E-01 0.0000E+00 0.9500E+01 0.2976E-01 0.0000E+00 0.1000E+02 0.3065E-01 0.0000E+00 0.1050E+02 0.3160E-01 0.0000E+00 0.1100E+02 0.3261E-01 0.0000E+00 0.1150E+02 0.3370E-01 0.0000E+00 0.1200E+02 0.3487E-01 0.0000E+00 0.1250E+02 0.3613E-01 0.0000E+00 0.1300E+02 0.3747E-01 0.0000E+00 0.1350E+02 0.3890E-01 0.0000E+00 0.1400E+02 0.4041E-01 0.0000E+00 0.1450E+02 0.4201E-01 0.0000E+00 0.1500E+02 0.4367E-01 0.0000E+00 0.1550E+02 0.4539E-01 0.0000E+00 0.1600E+02 0.4717E-01 0.0000E+00 0.1650E+02 0.4899E-01 0.0000E+00 0.1700E+02 0.5083E-01 0.0000E+00 0.1750E+02 0.5268E-01 0.0000E+00 0.1800E+02 0.5453E-01 0.0000E+00 0.1850E+02 0.5636E-01 0.0000E+00 0.1900E+02 0.5815E-01 0.0000E+00 0.1950E+02 0.5989E-01 0.0000E+00 0.2000E+02 0.6157E-01 0.0000E+00 0.2050E+02 0.6317E-01 0.0000E+00 0.2100E+02 0.6467E-01 0.0000E+00 0.2150E+02 0.6607E-01 0.0000E+00 0.2200E+02 0.6735E-01 0.0000E+00 0.2250E+02 0.6851E-01 0.0000E+00 0.2300E+02 0.6954E-01 0.0000E+00 0.2350E+02 0.7043E-01 0.0000E+00 0.2400E+02 0.7118E-01 0.0000E+00 0.2450E+02 0.7179E-01 0.0000E+00 0.2500E+02 0.7225E-01 0.0000E+00 0.2550E+02 0.7258E-01 0.0000E+00 0.2600E+02 0.7277E-01 0.0000E+00 0.2650E+02 0.7283E-01 0.0000E+00 0.2700E+02 0.7277E-01 0.0000E+00 0.2750E+02 0.7259E-01 0.0000E+00 0.2800E+02 0.7231E-01 0.0000E+00 0.2850E+02 0.7194E-01 0.0000E+00 0.2900E+02 0.7149E-01 0.0000E+00 0.2950E+02 0.7096E-01 0.0000E+00 0.3000E+02 0.7038E-01 0.0000E+00 0.3050E+02 0.6975E-01 0.0000E+00 0.3100E+02 0.6909E-01 0.0000E+00 0.3150E+02 0.6840E-01 0.0000E+00 0.3200E+02 0.6770E-01 0.0000E+00 0.3250E+02 0.6700E-01 0.0000E+00 0.3300E+02 0.6630E-01 0.0000E+00 0.3350E+02 0.6561E-01 0.0000E+00 0.3400E+02 0.6494E-01 0.0000E+00 0.3450E+02 0.6429E-01 0.0000E+00 0.3500E+02 0.6366E-01 0.0000E+00 0.3550E+02 0.6304E-01 0.0000E+00 0.3600E+02 0.6245E-01 0.0000E+00 0.3650E+02 0.6186E-01 0.0000E+00 0.3700E+02 0.6128E-01 0.0000E+00 0.3750E+02 0.6070E-01 0.0000E+00 0.3800E+02 0.6010E-01 0.0000E+00 0.3850E+02 0.5948E-01 0.0000E+00 0.3900E+02 0.5881E-01 0.0000E+00 0.3950E+02 0.5810E-01 0.0000E+00 0.4000E+02 0.5731E-01 0.0000E+00 0.4050E+02 0.5643E-01 0.0000E+00 0.4100E+02 0.5545E-01 0.0000E+00 0.4150E+02 0.5434E-01 0.0000E+00 0.4200E+02 0.5309E-01 0.0000E+00 0.4250E+02 0.5168E-01 0.0000E+00 0.4300E+02 0.5008E-01 0.0000E+00 0.4350E+02 0.4828E-01 0.0000E+00 0.4400E+02 0.4627E-01 0.0000E+00 0.4450E+02 0.4401E-01 0.0000E+00 0.4500E+02 0.4151E-01 0.0000E+00 0.4550E+02 0.3874E-01 0.0000E+00 0.4600E+02 0.3568E-01 0.0000E+00 0.4650E+02 0.3234E-01 0.0000E+00 0.4700E+02 0.2869E-01 0.0000E+00 0.4750E+02 0.2472E-01 0.0000E+00 0.4800E+02 0.2044E-01 0.0000E+00 0.4850E+02 0.1583E-01 0.0000E+00 0.4900E+02 0.1088E-01 0.0000E+00 0.4950E+02 0.5608E-02 0.0000E+00 0.5000E+02 0.0000E+00 0.0000E+00

Reciprocal space: Rg = 20.97 , I(0) = 0.2953E+02

Real space: Rg = 20.94 +- 0.026 I(0) = 0.2953E+02 +- 0.2278E+00

Page 6: Processing Data with Ruby

Existing SoftwareSvergun group @ EMBLhttp://www.embl-hamburg.de/ExternalInfo/Research/Sax/software.html

“interactive” interfacesnot easily scriptable

Works well, but...

requires running each program multiple times

no really... you have to see it to believe it

Page 7: Processing Data with Ruby

Help from Ruby

We want to use linux clusters with hundreds of CPUs

Ruby

Rake

wrap external programswrite shell scripts to run external programs

define relationships between inputs/outputs of different programs

launch external programs after dependencies are satisfied

Page 8: Processing Data with Ruby

Do more with Ruby

Define input parameters in a scriptDefine common tasks in a library

quick and dirty...

more robust...

Evolve towards a micro-framework

Ruby API for running commands

More sophisticated information processing

Page 9: Processing Data with Ruby

AcknowledgementsLab (Scripps Research Institute)

John TainerScott WilliamsChris Putnam

Data CollectionBeamline 12.3.1

The Advanced Light Source (ALS, LBNL)

FundingNIH, DOE, NCI