Upload
anabel-gardner
View
220
Download
0
Embed Size (px)
Citation preview
A Data/Detector Characterization Pipeline(What is it and why we need one)
Soumya D. MohantyAEI
January 18, 2001
Outline of the talk• Functions of a Pipeline
• A Walk through a candidate pipeline
• Requirements: Issues
• Proposal for a plan of work
The functions of a pipeline
• Why have one? – Understanding a new feature or establishing confidence in detection will
require a fair amount of manual work (human intensive).
– Large data rate (main+auxiliary channels) implies that an automated tool that helps in focussing our attention is essential.
• Definition: An automated tool to point out “interesting” segments.– Not meant for detector commissioning stage data.
– Types: Data/detector Characterization, Data preparation or conditioning.
– May not be possible to cleanly separate the design process.
– Byproducts: routine, uninteresting information (data summaries) to support data mining tasks.
• Open Issue : What is interesting?– Automated tool means precise definition of interesting features required.
– Example: Change in PSD, Transients, Change in cross-couplings, …
Pipeline: Not just a sum of its parts
• Simple Example– Transient test characterized without studying effect on/of line noise.– Line removal tool characterized without studying effect on/of
transients.– When real data is passed through the line removal tool followed by
the transient test, the result will be different from transient test followed by line removal.
• There can exist other “cross-couplings” which will affect the overall performance of a pipeline.
• Computational costs need not be a simple sum of parts.• Pipeline design and characterization will involve more than
the study of tools in isolation.
Analyzing pipeline performance
• Basic criteria: The pipeline should not make too many mistakes. On the other hand, it should not lose interesting segments.– Extremely reliable statistical characterization will be required.
• Open issue: Metrics for pipeline performance (or pipeline calibration).– Metric must include: False alarm and Detection, dependence on a priori
modeling of data, Computational costs, …
– For data preparation pipeline: Calibrate by injecting GW signals into input.
– For data/detector characterization pipeline: ?
• Bottom Line: Lot of experience with simulated and real data is required.
A Candidate Pipeline
• Design Status: At the stage of a blueprint that can be implemented.– Several new tools identified that need to be developed. (e.g., need a line
removal method which is unaffected by transients.)– The blueprint is concrete enough to begin computational cost studies and
statistical characterization studies.
• Origins– The word “pipeline” has been used on several occasions (e.g., LSC Data
Analysis White Paper) but this is the first concrete design.– 1999: SDM Commissioned to design one as part of the 40m/TAMA
coincidence analysis project.
• Important: A pipeline will affect planning for other data analysis components.– Examples: Software/hardware environment, User interfaces, A sophisticated
database or simple sequential files, Interfaces to DAQ, ...
Data/Detector Characterization Pipeline
Requirements: Issues• Computing.
– Should work online.
– Memory requirements might be non-trivial if database access overheads turn out to be large.
• Implementation Language and environment.
Within LDAS (adapted to GEO)? Language: C++
TRIANA? JAVA
DMT? VEGA? C++
• Database. Not an issue confined to this pipeline alone.– Need depends on what kind of data mining tasks will be required.
– Examples : (1) Collect data with a particular type of transient (2) Store information about new types of features.
• Others.– Lots of ideas and guidelines from users required for the design phase.
– Code writing and testing phase will be manpower intensive.
Proposal for a plan of work (fastest)
• Almost all components available in MATLAB.
• Use sequential files instead of relational database.
• Implement as a large MATLAB program.
• Come up with some metrics of performance.
• Test against simulated and some real data.
• If (coincidence run with LIGO), aim to produce X hours of characterized data using this MATLAB code.
• In the meantime, work on related issues and requirements definition.
Conclusions
• Large amount of data makes it necessary to have a Pipeline in order to direct our attention to where it is really required.
• Pipeline design and characterization requires more than listing tools and studying them in isolation.
• Pipeline designing can identify missing features.
• A concrete design now exists.
• Several candidate pipelines must be generated and compared.
• What is interesting? Guidelines, Ideas and experience with real data required to evolve an answer.