Upload
dan-starr
View
106
Download
1
Embed Size (px)
DESCRIPTION
Talk about T.C.P. for CDI's inter-departmental workshop at UC Berkeley. 20090911.
Citation preview
, Justin Higgins, Adam Morgan
Josh Bloom (PI)
Broadcast
Database
Classify
Transients
Classification
Pipeline
“Object” Datastream
Broadcast “sources” • interesting or transient source• include classifications• include features, context
Broadcast
Database
ClassifyDatabase containing “sources”
• features for a source• data epochs associated
with a source
Transients
Classification
Pipeline
Survey Y (static survey repository)
SDSSstripe-82
archived data
PTF / LBLsubtraction
pipeline Survey X (real-time survey
telescope)
LSST (future)
SASIR (future)
Database containing “sources”
• features for a source• data epochs associated
with a source
• A deep field from the Sloan Digital Sky Survey
• 750 Million observation epochs
• ~20 Million “sources” clustered from epochs
• 5 colors / filters, 4 years of observations
• We used Stripe-82 for testing and development
Transients
Classification
Pipeline
SDSS Stripe 82SDSSstripe-82
archived data
Palomar Transient Factory
• Palomar 48” telescope
• 100 Mpix, 7.8 sq-deg detector
• ~120s cadence : ~200MB : <100GB/night
• Post subtraction: ~1M difference objects / night
• Post filtering: ~10k difference objects / night
~100s transient and variable stars
MDM 1.3m & 2.4m
PAIRITEL 1.3m
Palomar 60”
PTF consortium
LBLsubtraction
pipelineTCP
Large Synoptic Survey Telescope (LSST):1 Gb every 2 seconds
light curves of 800 million sources every
3 days
106 supernovae/yr105 eclipsing systems107 asteroids...
Next Generation Survey: LSST
TCP
Broadcast
featuregeneration
sourcegeneration
Transients Classification Pipeline
“Object” Datastream
sourceclassification
Database
Follow-uptelescope observations
Parallelized source correlation and classification
• Retrieve difference objects
• Each difference-object is passed to an IPython client
• Each parallel IPython client performs:
• Source creation or correlation with existing sources
• “Feature” generation (or re-generation) for that source
• Classification of that source
featuregeneration
sourcegeneration
sourceclassification
Parallelized source correlation and classification
• Realtime TCP runs on 22 dedicated cores
• LCOGT’s 96 core beowulf
• non run-time tasks
• Classifier generation
• Additional resources: (for future classification work)
• Yahoo! M45 cluster
• Amazon EC2 cluster
featuregeneration
sourcegeneration
sourceclassification
Warehouse of light-curves
• Need representative light-curves for all science
• With these we can model each science class
• We’ve built a warehouse of example light-curves
TCP-TUTORinternal interface
DotAstro.orgpublic interface
“Noisifying to the Survey”
• Well sampled light-curves
• Can make good classifiers for well-sampled data.
• Don’t immediately make good classifiers for noisy, sparse data.
• We need classifiers which are trained using:
• sampling cadence of our survey
• sparseness of our survey data
• noise and sensitivity limitations of our instrument
• We need “Noisification” software which:
• Resamples well-sampled light-curves
• Outputs noisified sources which are used for generating classifiers
“Noisifying to the Survey”
• For PTF:
• Code uses PTF pointing and survey observing plans
• Occasionally PTF observes using a faster cadence:
• 7.5 minutes between revisiting an RA, Dec
• Faster cadence requires a separate set of noisified light-curves and classifiers.
• Other surveys:
• Other pointing and observing plans could be used.
• Can generate noisified light-curves for other surveys.
• Then we can generate science classifiers for these surveys.
“Noisifying to the Survey”
Classifiers
• General Classifier
• Timeseries Classifiers
• Weighted combination of WEKA classifiers
• bagged Random Forest classifier using a cost-matrix
• Each classifier trained on different cadenced noisified data
• Astronomer crafted classifiers for specific science types
• Microlens, Super Nova
• well sampled (periodic & nonperiodic)
• interesting sources near known galaxies
• periodic variable science class when confidence is high
• poorly subtracted sources
• minor planets / rocks
• cosmic rays
• detector defects
Filter out:Identify:
Interesting near-galaxy PTF sources
• Identified by TCP during end of Aug ‘09
• Classification triggered by latest epoch added to the source
~0.4 day period RR Lyrae using
10 epoch noisification
• Currently, science classes are determined by combining the weighted probabilities generated by different classification models, for a source.
• Each machine-learned classification model is trained using “noisified” lightcurves which were generated using different parameters.
0.1 - 0.17 day period RR Lyrae using 15 epoch noisification
Clicking on a class for one of dozens of ML models...
...shows highest classification probability sources for that
model::class
~0.14 day period RR Lyrae using
20 epoch noisification
Periodic variable classifiers
Overplotting of period-folded model
still needs work
period-fold plotting probably failed here
Evaluating and Combining Classifiers
• Issues when using multiple classifiers:
• How to combine classifiers when using:
• weighted classifiers
• tree-hierarchy of sub-classifiers
• How to generate final classification “probabilities” when using:
• Widely varying types of classifiers
• Classifiers which contain sub-classifications & probabilities
• Evaluate the final combination of classifiers
• Classify PTF09xxx user classified sources, determine efficiencies
• Classify noisified sources, determine efficiencies