View
59
Download
2
Category
Preview:
Citation preview
ETC&AuthorsintheDriver’sSeatvs
YesWorkflow:Revealingdata-/workflowfromscriptsKurator:AutomatingdatacurationworkflowsEulerX:AgreeingtodisagreeabouttaxonomiesWhole-Tale:Reproducible,computationalnarratives
BertramLudäscherludaesch@illinois.edu
ETC+Authors @Biosphere22018-01-10..12
Director,CenterforInformaticsResearchinScience&Scholarship(CIRSS)SchoolofInformationSciences(iSchool@Illinois)
&NationalCenterforSupercomputingApplications(NCSA)&DepartmentofComputerScience(CS@Illinois)
1
Author’sDriving..• Curators:dealingwithproblemsofdataquality,reuse,interoperability,etc.assoonastheycan– butoften:“downtheroad”…
• Authors:address(meta-)dataqualityupstream– ..atthesource,whendataiscreated
=>Resonateswith“empoweringscientists”themewe’repursuinginotherprojects(e.g.WT,YW..)
Ludäscher:Workflows&Provenance=>Understanding 2
Provenance(Lineage)matters…
• Oneofthesesoldfor$180M,theotheronefor$22K(butcouldbeworthmore...definitelymaybe...)
• Whichonewouldyouliketoown?
Ludäscher:Workflows&Provenance=>Understanding 3
Provenance(Lineage)matters…
• Oneofthesesoldfor$180M,theotheronefor…• …$450M!!!Ludäscher:Workflows&Provenance=>Understanding 4
Provenanceis:keepingrecords …
• GrandCanyon’srocklayersarearecordoftheearlygeologichistoryofNorthAmerica.Theancestralpuebloan granariesatNankoweap Creektellarchaeologistsaboutmorerecenthumanhistory.(ByDrenaline,licensedunderCCBY-SA3.0)
• Notshown:computationalarchaeologistsreconstructingpastclimatefrommultipletree-ringdatabasesè computationalprovenanceiskeyfortransparency &reproducibility
Ludäscher:Workflows&Provenance=>Understanding 5
...andprovenanceis:Understanding whathappened!
Zrzavý,Jan,DavidStorch,and StanislavMihulka.Evolution:EinLese-Lehrbuch.
Springer-Verlag,2009.
Author:Jkwchui (BasedondrawingbyTruth-seeker2004)
Ludäscher:Workflows&Provenance=>Understanding 6
Computational Provenance …• Origin,processinghistoryofartifacts
– dataproducts,figures,...– also:underlyingworkflowè understandmethods,dataflow,anddependencies
Ludäscher:Workflows&Provenance=>Understanding 7
Climate Change Impacts in the United States
U.S. National Climate AssessmentU.S. Global Change Research Program
Rewind: Data Curation Workflows (Filtered-Push … Kepler … Kurator projects)
Ludäscher:Workflows&Provenance=>Understanding 8
DataCurationWorkflows&Provenance
• Datacurationanddatacleaningworkflows– …canbedefinedusingaworkflowsystem
• workflow=“prospective”provenance(=generalrecipe)
– ...orusinggood-old scripts (bash,Python,R,...)• …whichiswhatmany“meremortals”use!
• Script-basedworkflows– …benefitfromhavingtheworkflowexposedanddataflowdependenciesrevealed
Ludäscher:Workflows&Provenance=>Understanding 9
RuntimeProvenance(a.k.a.traces,logs,
retrospectiveprovenance,“Trace-land”)
WorkflowModeling&Design(a.k.a.prospective provenance
“Workflow-land”)
Ludäscher:Workflows&Provenance=>Understanding 10
Workflowsó Provenanceanimportantlink!
=W3CPROV+DataONE extensions
11
Trace
Workflow
Data (extensible)
See purl.dataone.org/provone-v1-dev
• …NSFSKOPE: systemandtoolstodiscover,access,analyze,visualizepaleoenvironmentaldata– unprecedentedabilitytoexploreprovenance
(detailed,comprehensiblerecordofcomputationalderivationofresults)
– forresearchers,tinkerers,andmodelers
• …NSFWholeTale:– leverage&contributetoexistingCItosupportthe
wholetale(“livingpaper”),fromworkflowruntoscholarlypublication
– integratetools&CI(DataONE,Globus,iRODS,NDS,...)tosimplifyuseandpromotebestpractices.
– drivenbyscienceWGs(Archaeology/SKOPE,materialsscience,astro,bio..)
RelatedProjects:NSFDataONE (ProvONE ..)+…
Ludäscher:Workflows&Provenance=>Understanding 12
ProvenanceSupportforReproducibleScienceExample:PaleoclimateReconstruction
Sciencepaper(OA)uses:• opensourcecode:
– R,PaleoCAR,…
• Isthatallweneed?• Whatwasthe“workflow”?
• Isthereprospectiveand/orretrospectiveprovenance?
Ludäscher:Workflows&Provenance=>Understanding 13
SKOPE:SynthesizedKnowledgeOfPastEnvironmentsBocinsky,Kohleretal.studyrain-fedmaizeof Anasazi
– FourCorners;AD600–1500. ClimatechangeinfluencedMesaVerdeMigrations;late13thcenturyAD.Usesnetworkoftree-ringchronologiestoreconstructaspatio-temporalclimatefieldatafairlyhighresolution(~800m)fromAD1–2000.Algorithmestimatesjointinformationintree-ringsandaclimatesignaltoidentify“best” tree-ringchronologiesforclimatereconstructing.
K.Bocinsky,T.Kohler,A2000-yearreconstructionoftherain-fedmaizeagriculturalnicheintheUSSouthwest.Nature
Communications.doi:10.1038/ncomms6618
… implemented as an R Script … Ludäscher:Workflows&Provenance=>Understanding 14
YesWorkflow:Prospective&RetrospectiveProvenance…(almost)forfree!
• YWannotationsina(Python,R,…)scriptrecreateaworkflowviewfromthescript…
cassette_id
sample_score_cutoff
sample_spreadsheetfile:cassette_{cassette_id}_spreadsheet.csv
calibration_imagefile:calibration.img
initialize_run
run_logfile:run/run_log.txt
load_screening_results
sample_namesample_quality
calculate_strategy
rejected_sample accepted_sample num_images energies
log_rejected_sample
rejection_logfile:/run/rejected_samples.txt
collect_data_set
sample_id energy frame_numberraw_image
file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw
transform_images
corrected_imagefile:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img
total_intensitypixel_count corrected_image_path
log_average_image_intensity
collection_logfile:run/collected_images.csv
YW!
Ludäscher:Workflows&Provenance=>Understanding 15
@BEGIN..@END..@IN..@OUT..@URI..@LOG..
GetModernClimate
PRISM_annual_growing_season_precipitation
SubsetAllData
dendro_series_for_calibration
dendro_series_for_reconstruction CAR_Analysis_unique
cellwise_unique_selected_linear_models
CAR_Analysis_union
cellwise_union_selected_linear_models
CAR_Reconstruction_union
raster_brick_spatial_reconstruction raster_brick_spatial_reconstruction_errors
CAR_Reconstruction_union_output
ZuniCibola_PRISM_grow_prcp_ols_loocv_union_recons.tif ZuniCibola_PRISM_grow_prcp_ols_loocv_union_errors.tif
master_data_directory prism_directory
tree_ring_datacalibration_years retrodiction_years
Paleoclimate Reconstruction(openSKOPE.org)• …explainedusingYesWorkflow!
KyleB.,(computational)archaeologist:"Ittookmeabout20minutestocomment.LessthananhourtolearnandYW-annotate,all-told."
Ludäscher:Workflows&Provenance=>Understanding 16
YWDemoUseCases(IDCC’17)Domain Usecase Programminglanguage Provenancemethods
Climatescience C3C4 MATLAB YW+MATLABRunManager
Astrophysics LIGO Python YW+NW(code-level)
Protein crystalsamples Simulatedatacollection
Python YW+NW(code-level)
Biodiversitydatacuration
kurator-SPNHC Python YW-recon+YW-logging
Socialnetwork analysis Twitter Python YW +NW(file-level)
Oceanography OHIBC Howe Sound(multi-run multi-script)
R YW +RRunManager
Ludäscher:Workflows&Provenance=>Understanding 17
run/
├── raw
│ └── q55
│ ├── DRT240
│ │ ├── e10000
│ │ │ ├── image_001.raw
... ... ... ...
│ │ │ └── image_037.raw
│ │ └── e11000
│ │ ├── image_001.raw
... ... ...
│ │ └── image_037.raw
│ └── DRT322
│ ├── e10000
│ │ ├── image_001.raw
... ... ...
│ │ └── image_030.raw
│ └── e11000
│ ├── image_001.raw
... ...
│ └── image_030.raw
├── data
│ ├── DRT240
│ │ ├── DRT240_10000eV_001.img
... ... ...
│ │ └── DRT240_11000eV_037.img
│ └── DRT322
│ ├── DRT322_10000eV_001.img
... ...
│ └── DRT322_11000eV_030.img
│
├── collected_images.csv
├── rejected_samples.txt
└── run_log.txt
YW-RECON:Prospective&RetrospectiveProvenance…(almost)forfree!
cassette_id
sample_score_cutoff
sample_spreadsheetfile:cassette_{cassette_id}_spreadsheet.csv
calibration_imagefile:calibration.img
initialize_run
run_logfile:run/run_log.txt
load_screening_results
sample_namesample_quality
calculate_strategy
rejected_sample accepted_sample num_images energies
log_rejected_sample
rejection_logfile:/run/rejected_samples.txt
collect_data_set
sample_id energy frame_numberraw_image
file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw
transform_images
corrected_imagefile:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img
total_intensitypixel_count corrected_image_path
log_average_image_intensity
collection_logfile:run/collected_images.csv
• URI-templateslink conceptualentitiestoruntimeprovenance“leftbehind”bythescriptauthor…
• …facilitatingprovenancereconstructionLudäscher:Workflows&Provenance=>Understanding 18
initialize_run
run_logfile:run/run_log.txt
load_screening_results
sample_name sample_quality
calculate_strategy
rejected_sample accepted_sample num_imagesenergies
log_rejected_sample
rejection_logfile:/run/rejected_samples.txt
collect_data_set
sample_idenergyframe_numberraw_image
file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw
transform_images
corrected_imagefile:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img
total_intensitypixel_count corrected_image_path
log_average_image_intensity
collection_logfile:run/collected_images.csv
sample_spreadsheetfile:cassette_{cassette_id}_spreadsheet.csv
calibration_imagefile:calibration.img
cassette_id
sample_score_cutoff
Q1:Whatsamples didthescriptruncollectimagesfrom?
run/
├── raw
│ └── q55
│ ├── DRT240
│ │ ├── e10000
│ │ │ ├── image_001.raw
... ... ... ...
│ │ │ └── image_037.raw
│ │ └── e11000
│ │ ├── image_001.raw
... ... ...
│ │ └── image_037.raw
│ └── DRT322
│ ├── e10000
│ │ ├── image_001.raw
... ... ...
│ │ └── image_030.raw
│ └── e11000
│ ├── image_001.raw
... ...
│ └── image_030.raw
├── data
│ ├── DRT240
│ │ ├── DRT240_10000eV_001.img
... ... ...
│ │ └── DRT240_11000eV_037.img
│ └── DRT322
│ ├── DRT322_10000eV_001.img
... ...
│ └── DRT322_11000eV_030.img
│
├── collected_images.csv
├── rejected_samples.txt
└── run_log.txt
Ludäscher:Workflows&Provenance=>Understanding 19
initialize_run
run_logfile:run/run_log.txt
load_screening_results
sample_name sample_quality
calculate_strategy
rejected_sample accepted_sample num_imagesenergies
log_rejected_sample
rejection_logfile:/run/rejected_samples.txt
collect_data_set
sample_idenergyframe_numberraw_image
file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw
transform_images
corrected_imagefile:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img
total_intensitypixel_count corrected_image_path
log_average_image_intensity
collection_logfile:run/collected_images.csv
sample_spreadsheetfile:cassette_{cassette_id}_spreadsheet.csv
calibration_imagefile:calibration.img
cassette_id
sample_score_cutoff
Q2:Whatenergies wereusedforimagecollectionfromsampleDRT322?
run/
├── raw
│ └── q55
│ ├── DRT240
│ │ ├── e10000
│ │ │ ├── image_001.raw
... ... ... ...
│ │ │ └── image_037.raw
│ │ └── e11000
│ │ ├── image_001.raw
... ... ...
│ │ └── image_037.raw
│ └── DRT322
│ ├── e10000
│ │ ├── image_001.raw
... ... ...
│ │ └── image_030.raw
│ └── e11000
│ ├── image_001.raw
... ...
│ └── image_030.raw
├── data
│ ├── DRT240
│ │ ├── DRT240_10000eV_001.img
... ... ...
│ │ └── DRT240_11000eV_037.img
│ └── DRT322
│ ├── DRT322_10000eV_001.img
... ...
│ └── DRT322_11000eV_030.img
│
├── collected_images.csv
├── rejected_samples.txt
└── run_log.txt
Ludäscher:Workflows&Provenance=>Understanding 20
initialize_run
run_logfile:run/run_log.txt
load_screening_results
sample_name sample_quality
calculate_strategy
rejected_sample accepted_sample num_imagesenergies
log_rejected_sample
rejection_logfile:/run/rejected_samples.txt
collect_data_set
sample_idenergyframe_numberraw_image
file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw
transform_images
corrected_imagefile:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img
total_intensitypixel_count corrected_image_path
log_average_image_intensity
collection_logfile:run/collected_images.csv
sample_spreadsheetfile:cassette_{cassette_id}_spreadsheet.csv
calibration_imagefile:calibration.img
cassette_id
sample_score_cutoff
Q3:WhereistherawimageofthecorrectedimageDRT322_11000ev_030.img?run/
├── raw
│ └── q55
│ ├── DRT240
│ │ ├── e10000
│ │ │ ├── image_001.raw
... ... ... ...
│ │ │ └── image_037.raw
│ │ └── e11000
│ │ ├── image_001.raw
... ... ...
│ │ └── image_037.raw
│ └── DRT322
│ ├── e10000
│ │ ├── image_001.raw
... ... ...
│ │ └── image_030.raw
│ └── e11000
│ ├── image_001.raw
... ...
│ └── image_030.raw
├── data
│ ├── DRT240
│ │ ├── DRT240_10000eV_001.img
... ... ...
│ │ └── DRT240_11000eV_037.img
│ └── DRT322
│ ├── DRT322_10000eV_001.img
... ...
│ └── DRT322_11000eV_030.img
│
├── collected_images.csv
├── rejected_samples.txt
└── run_log.txt
Ludäscher:Workflows&Provenance=>Understanding 21
initialize_run
run_logfile:run/run_log.txt
load_screening_results
sample_name sample_quality
calculate_strategy
rejected_sample accepted_sample num_imagesenergies
log_rejected_sample
rejection_logfile:/run/rejected_samples.txt
collect_data_set
sample_idenergyframe_numberraw_image
file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw
transform_images
corrected_imagefile:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img
total_intensitypixel_count corrected_image_path
log_average_image_intensity
collection_logfile:run/collected_images.csv
sample_spreadsheetfile:cassette_{cassette_id}_spreadsheet.csv
calibration_imagefile:calibration.img
cassette_id
sample_score_cutoff
run/
├── raw
│ └── q55
│ ├── DRT240
│ │ ├── e10000
│ │ │ ├── image_001.raw
... ... ... ...
│ │ │ └── image_037.raw
│ │ └── e11000
│ │ ├── image_001.raw
... ... ...
│ │ └── image_037.raw
│ └── DRT322
│ ├── e10000
│ │ ├── image_001.raw
... ... ...
│ │ └── image_030.raw
│ └── e11000
│ ├── image_001.raw
... ...
│ └── image_030.raw
├── data
│ ├── DRT240
│ │ ├── DRT240_10000eV_001.img
... ... ...
│ │ └── DRT240_11000eV_037.img
│ └── DRT322
│ ├── DRT322_10000eV_001.img
... ...
│ └── DRT322_11000eV_030.img
│
├── collected_images.csv
├── rejected_samples.txt
└── run_log.txt
Q5:Whatcassette-idhadthesampleleadingtoDRT240_10000ev_001.img?
Ludäscher:Workflows&Provenance=>Understanding 22
Hybrid Provenance:YWModel + RuntimeObservables (filelevel)
Ludäscher:Workflows&Provenance=>Understanding 23
�����������������
�����
���������
��������������
����������������
����������
�����������������
����������������
�������
����������
������������������
����������������
�����������������
�������������������
�����������
������������������
����������
�����������������
�����������
������������
�������������
���������������������
�������������������������������������������������������������������
�����������������
�������������������������������������������������������������������������
• TheYWmodelcanbeconnectedwithruntimeobservables
• è YWrecon(prov reconstruction)• Here:
• Whatspecificfileswereread,writtenandwheredotheyoccurintheworkflow?
C3-C4ProspectiveProvenance
Ludäscher:Workflows&Provenance=>Understanding
C3_C4_map_present_NA
fetch_SYNMAP_land_cover_map_variable
lon_variable lat_variable lon_bnds_variable lat_bnds_variable
fetch_monthly_mean_air_temperature_data
Tair_Matrix
fetch_monthly_mean_precipitation_data
Rain_Matrix
initialize_Grass_Matrix
Grass_variable
examine_pixels_for_grass
C3_Data C4_Data
generate_netcdf_file_for_C3_fraction
C3_fraction_datafile:outputs/SYNMAP_PRESENTVEG_C3Grass_RelaFrac_NA_v2.0.nc
generate_netcdf_file_for_C4_fraction
C4_fraction_datafile:outputs/SYNMAP_PRESENTVEG_C4Grass_RelaFrac_NA_v2.0.nc
generate_netcdf_file_for_Grass_fraction
Grass_fraction_datafile:outputs/SYNMAP_PRESENTVEG_Grass_Fraction_NA_v2.0.nc
SYNMAP_land_cover_map_datainputs/land_cover/SYNMAP_NA_QD.nc
mean_airtempfile:inputs/narr_air.2m_monthly/air.2m_monthly_{start_year}_{end_year}_mean.{month}.nc
mean_precipfile:inputs/narr_apcp_rescaled_monthly/apcp_monthly_{start_year}_{end_year}_mean.{month}.nc
24
WhatdoesC4_fraction_data dependon?C3_C4_map_present_NA
examine_pixels_for_grass
C4_Data
fetch_SYNMAP_land_cover_map_variable
lon_variable lat_variable lon_bnds_variable lat_bnds_variable
fetch_monthly_mean_precipitation_data
Rain_Matrix
fetch_monthly_mean_air_temperature_data
Tair_Matrix
generate_netcdf_file_for_C4_fraction
C4_fraction_data
SYNMAP_land_cover_map_data
mean_airtempmean_precipC3_C4_map_present_NA
fetch_SYNMAP_land_cover_map_variable
lon_variable lat_variable lon_bnds_variable lat_bnds_variable
fetch_monthly_mean_air_temperature_data
Tair_Matrix
fetch_monthly_mean_precipitation_data
Rain_Matrix
initialize_Grass_Matrix
Grass_variable
examine_pixels_for_grass
C3_Data C4_Data
generate_netcdf_file_for_C3_fraction
C3_fraction_data
generate_netcdf_file_for_C4_fraction
C4_fraction_data
generate_netcdf_file_for_Grass_fraction
Grass_fraction_data
SYNMAP_land_cover_map_data
mean_airtempmean_precip
C4_fraction_datalineage verysimilartooverallworkflowgraph!
Ludäscher:Workflows&Provenance=>Understanding 25
WhatdoesGrass_fraction_data dependon?
C3_C4_map_present_NA
fetch_SYNMAP_land_cover_map_variable
lon_variable lat_variable lon_bnds_variable lat_bnds_variable
fetch_monthly_mean_air_temperature_data
Tair_Matrix
fetch_monthly_mean_precipitation_data
Rain_Matrix
initialize_Grass_Matrix
Grass_variable
examine_pixels_for_grass
C3_Data C4_Data
generate_netcdf_file_for_C3_fraction
C3_fraction_data
generate_netcdf_file_for_C4_fraction
C4_fraction_data
generate_netcdf_file_for_Grass_fraction
Grass_fraction_data
SYNMAP_land_cover_map_data
mean_airtempmean_precip
C4_fraction_datalineage differentfromoverallworkflowgraph!- Smaller subgraph- Dependsononly1of3inputs!
C3_C4_map_present_NA
initialize_Grass_Matrix
Grass_variable
fetch_SYNMAP_land_cover_map_variable
lon_variable lat_variable lon_bnds_variable lat_bnds_variable
generate_netcdf_file_for_Grass_fraction
Grass_fraction_data
SYNMAP_land_cover_map_data
Ludäscher:Workflows&Provenance=>Understanding 26
Whathappensafterrunningthescript?Hybrid provenancegraph!
• 3inputsspreadacross25 (=2x24+1)files
• Doall3outputfilesdependonall25inputs?
C3_C4_map_present_NA
fetch_SYNMAP_land_cover_map_variable
lon_variable lat_variable lon_bnds_variable lat_bnds_variable
fetch_monthly_mean_air_temperature_data
Tair_Matrix
fetch_monthly_mean_precipitation_data
Rain_Matrix
initialize_Grass_Matrix
Grass_variable
examine_pixels_for_grass
C3_Data C4_Data
generate_netcdf_file_for_C3_fraction
C3_fraction_data
outputs/SYNMAP_PRESENTVEG_C3Grass_RelaFrac_NA_v2.0.nc
generate_netcdf_file_for_C4_fraction
C4_fraction_data
outputs/SYNMAP_PRESENTVEG_C4Grass_RelaFrac_NA_v2.0.nc
generate_netcdf_file_for_Grass_fraction
Grass_fraction_data
outputs/SYNMAP_PRESENTVEG_Grass_Fraction_NA_v2.0.nc
SYNMAP_land_cover_map_data
inputs/land_cover/SYNMAP_NA_QD.nc
mean_airtemp
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.9.ncinputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.2.ncinputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.1.ncinputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.6.ncinputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.10.ncinputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.3.ncinputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.7.ncinputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.11.ncinputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.4.ncinputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.8.ncinputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.12.ncinputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.5.nc
mean_precip
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.4.ncinputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.8.ncinputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.1.ncinputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.12.ncinputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.5.ncinputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.9.ncinputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.2.ncinputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.6.ncinputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.10.ncinputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.3.ncinputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.7.ncinputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.11.nc
Ludäscher:Workflows&Provenance=>Understanding 27
WhatC4_fraction_datadependson(hybrid)…
C3_C4_map_present_NA
examine_pixels_for_grass
C4_Data
fetch_SYNMAP_land_cover_map_variable
lon_variable lat_variable lon_bnds_variable lat_bnds_variable
fetch_monthly_mean_precipitation_data
Rain_Matrix
fetch_monthly_mean_air_temperature_data
Tair_Matrix
generate_netcdf_file_for_C4_fraction
C4_fraction_data
SYNMAP_land_cover_map_data
mean_airtempmean_precip
Earlierprospectivequeryresult
C3_C4_map_present_NA
examine_pixels_for_grass
C4_Data
fetch_SYNMAP_land_cover_map_variable
lon_variable lat_variable lon_bnds_variable lat_bnds_variable
fetch_monthly_mean_precipitation_data
Rain_Matrix
fetch_monthly_mean_air_temperature_data
Tair_Matrix
generate_netcdf_file_for_C4_fraction
C4_fraction_data
outputs/SYNMAP_PRESENTVEG_C4Grass_RelaFrac_NA_v2.0.nc
SYNMAP_land_cover_map_data
inputs/land_cover/SYNMAP_NA_QD.nc
mean_airtemp
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.4.ncinputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.8.ncinputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.1.ncinputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.12.ncinputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.5.ncinputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.9.ncinputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.2.ncinputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.6.ncinputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.10.ncinputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.3.ncinputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.7.ncinputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.11.nc
mean_precip
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.10.ncinputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.3.ncinputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.7.ncinputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.11.ncinputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.4.ncinputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.8.ncinputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.1.ncinputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.12.ncinputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.5.ncinputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.9.ncinputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.2.ncinputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.6.nc
Ludäscher:Workflows&Provenance=>Understanding 28
WhatGrass_fraction_data dependson(hybrid)…
C3_C4_map_present_NA
initialize_Grass_Matrix
Grass_variable
fetch_SYNMAP_land_cover_map_variable
lon_variable lat_variable lon_bnds_variable lat_bnds_variable
generate_netcdf_file_for_Grass_fraction
Grass_fraction_data
SYNMAP_land_cover_map_data
C3_C4_map_present_NA
initialize_Grass_Matrix
Grass_variable
fetch_SYNMAP_land_cover_map_variable
lon_variable lat_variable lon_bnds_variable lat_bnds_variable
generate_netcdf_file_for_Grass_fraction
Grass_fraction_data
outputs/SYNMAP_PRESENTVEG_Grass_Fraction_NA_v2.0.nc
SYNMAP_land_cover_map_data
inputs/land_cover/SYNMAP_NA_QD.ncC3_C4_map_present_NA
fetch_SYNMAP_land_cover_map_variable
lon_variable lat_variable lon_bnds_variable lat_bnds_variable
fetch_monthly_mean_air_temperature_data
Tair_Matrix
fetch_monthly_mean_precipitation_data
Rain_Matrix
initialize_Grass_Matrix
Grass_variable
examine_pixels_for_grass
C3_Data C4_Data
generate_netcdf_file_for_C3_fraction
C3_fraction_data
generate_netcdf_file_for_C4_fraction
C4_fraction_data
generate_netcdf_file_for_Grass_fraction
Grass_fraction_data
SYNMAP_land_cover_map_data
mean_airtempmean_precip
Overallworkflow
UpstreamofGrass_fraction_data
(prospective)
UpstreamofGrass_fraction_data(hybrid)
# @BEGIN
Gravitational_Wave_Detection
# @IN fn_d @as FN_Detector
# @IN fn_sr @as FN_Sampling_Rate
# @OUT shifted.wav @as
shifted_wave
# @OUT whitenbp.wav @as
whitened_bandpass
import numpy as np
from scipy import signal
…
# @BEGIN
Amplitude_Spectral_Density
# @IN strain_H1
# @IN strain_L1
# @PARAM fs
# @OUT psd_H1
# @OUT psd_L1
# @OUT GW150914_ASDs.png @URI …
…
NFFT = 1*fs
fmin, fmax = 10, 2000
…
YesWorkflow-annotatedscripts
File I/OEvents
Log filesLogicrulesforreconstructing,
querying,andvisualizingprospective andretrospective
provenancetogether
upstream(strain_LI_whitenbp) [NW-recon]
WHITENING
strain_L1_whitenstrain_L1_whiten = array([8.494, -1.672, ..., 72.156])
AMPLITUDE_SPECTRAL_DENSITY
PSD_L1psd_L1 = scipy.interpolate.interpolate.interp1d
object at 0x113969418
LOAD_DATA
strain_L1strain_L1 = array([-1.779e-18, -1.765e-18, ..., -1.719e-18])
BANDPASSING
strain_L1_whitenbpstrain_L1_whitenbp = array([8.184, 19.935,..., -0.684])
FN_Detectorfn_d = L-L1_LOSC_4_V1-1126259446-32.hdf5
fsfs = 4096
upstream(strain_LI_whitenbp) [prospective]
WHITENING
strain_H1_whiten strain_L1_whiten
AMPLITUDE_SPECTRAL_DENSITY
PSD_H1 PSD_L1
LOAD_DATA
strain_H1 strain_L1
BANDPASSING
strain_L1_whitenbp
FN_Detectorfile:{Detector}_LOSC_4_V1-...
FN_Sampling_ratefile:H-H1_LOSC_{Rate}_V1-...
fs
upstream(strain_L1_whitenbp) [URI-recon]
WHITENING
strain_H1_whiten strain_L1_whiten
AMPLITUDE_SPECTRAL_DENSITY
PSD_H1 PSD_L1
LOAD_DATA
strain_H1 strain_L1
BANDPASSING
strain_L1_whitenbp
FN_Detector
L-L1_LOSC_4_V1-1126259446-32.hdf5H-H1_LOSC_4_V1-1126259446-32.hdf5
FN_Sampling_rate
H-H1_LOSC_4_V1-1126259446-32.hdf5H-H1_LOSC_16_V1-1126259446-32.hdf5
fs
ProvenanceRecorders
Functioncallgraphandvariabledependencies
Rawruntimeobservations
YesWorkflow toolkitExtract annotationsand
model scriptasaworkflow
YesWorkflow toolkitReconstruct scriptrunandretrospectiveprovenance
YesWorkflow toolkitRenderworkflowmodelgraphically
ProspectiveProvenanceuser-defined
workflowmodels
HybridProvenance
Generalpurposeprovenancebridges
ProvenancequeriesQuery provenance(esp.graphs)andvisualize results
ProvenanceExportersQuery andvisualize
provenance
noWorkflowtoolkitQuery andvisualize
provenance
RetrospectiveProvenancePythonruntimeobservables
prospective+code-levelruntimeobservables
subgraph
NW_FILTERED_LINEAGE_GRAPH_FOR_STRAIN_L1_WHITENBP
whiten
141 fn_d = 'L-L1_LOSC_4_V1-1126259446-32.hdf5'
142 loaddata = (array([ -1.77955839e-18, ... 1, 1, 1], dtype=uint32)})
142 time_L1 = array([ 1.12625945e+09, ... 8e+09, 1.12625948e+09]) 142 strain_L1 = array([ -1.77955839e-18, ... 6e-18, -1.71969299e-18]) 151 fs = 4096
153 time = array([ 1.12625945e+09, ... 8e+09, 1.12625948e+09])
155 dt = 0.000244140625
266 NFFT = 4096
270 psd = (array([ 2.22851728e-36, ... e+03, 2.04800000e+03]))
270 freqs = array([ 0.00000000e+00, ... 0e+03, 2.04800000e+03]) 270 Pxx_L1 = array([ 2.22851728e-36, ... 5e-46, 1.77059496e-46])
274 psd_L1 = <scipy.interpolate.interp ... 1d object at 0x1095b0260>
334 return = array([ 8.49413154, -1. ... .39942945, 72.15659253])
333 white_ht = array([ 8.49413154, -1. ... .39942945, 72.15659253])
325 strain = array([ -1.77955839e-18, ... 6e-18, -1.71969299e-18])
325 interp_psd = <scipy.interpolate.interp ... 1d object at 0x1095b0260>
325 dt = 0.000244140625
326 len
326 Nt = 131072
327 rfftfreq = array([ 0.00000000e+00, ... 5e+03, 2.04800000e+03])
327 freqs = array([ 0.00000000e+00, ... 5e+03, 2.04800000e+03])
331 rfft = array([ -2.39692348e-13 + ... 54e-19 +0.00000000e+00j])
331 hf = array([ -2.39692348e-13 + ... 54e-19 +0.00000000e+00j]) 332 (np.sqrt(interp_psd(freqs) /dt/2.))
332 white_hf = array([ -3.54798023e+03 + ... 58e+02 +0.00000000e+00j])
333 irfft = array([ 8.49413154, -1. ... .39942945, 72.15659253])
338 strain_L1_whiten = array([ 8.49413154, -1. ... .39942945, 72.15659253])
362 butter = (array([ 0.0012848 , 0. ... 9166733 , 0.32217438]))
362 ab = array([ 1. , -6. ... .9166733 , 0.32217438])362 bb = array([ 0.0012848 , 0. ... 0. , 0.0012848 ])
364 filtfilt = array([ 8.18464884, 19. ... .18198039, -0.68432653])
364 strain_L1_whitenbp = array([ 8.18464884, 19. ... .18198039, -0.68432653])
whiten
write_wavfile write_wavfile
write_wavfilewrite_wavfile
get_filter_coefs
iir_bandstopsiir_bandstops iir_bandstopsiir_bandstops iir_bandstopsiir_bandstopsiir_bandstops iir_bandstopsiir_bandstopsiir_bandstopsiir_bandstops iir_bandstops iir_bandstops iir_bandstops iir_bandstopsiir_bandstops
reqshift reqshift
write_wavfile
write_wavfile
reqshift
whiten whiten
filter_data
filter_data filter_datafilter_data
136 loaddata
135 fn_H1
136 time_H1 136 strain_H1 136 chan_dict_H1139 loaddata
138 fn_L1
139 time_L1 139 strain_L1139 chan_dict_L1
163 genfromtxt163 ndarray.transpose163 NR_H1163 NRtime
175 len175 ndarray.min 175 ndarray.mean 175 ndarray.max 176 len 176 ndarray.min 176 ndarray.mean 176 ndarray.max 177 len 177 ndarray.min 177 ndarray.mean 177 ndarray.max
181 len
180 bits
181 ndarray.min 181 array_str181 ndarray.mean 181 ndarray.max 181 array_str183 len
182 bits
183 ndarray.min 183 array_str183 ndarray.mean 183 ndarray.max 183 array_str 185 len
184 bits
185 ndarray.min 185 array_str185 ndarray.mean 185 ndarray.max 185 array_str187 len
186 bits
187 ndarray.min 187 array_str 187 ndarray.mean 187 ndarray.max 187 array_str189 len
188 bits
189 ndarray.min 189 array_str 189 ndarray.mean 189 ndarray.max 189 array_str 191 len
190 bits
191 ndarray.min 191 array_str 191 ndarray.mean 191 ndarray.max 191 array_str
207 where
204 tevent205 deltat
207 indxt
209 figure
210 plot 211 plot
212 str(tevent)212 xlabel 212 str(tevent)
213 ylabel 214 legend 215 title GW150914_strain.png
216 savefig
258 psd
142 fs
255 NFFT
258 Pxx_H1 258 freqs259 psd259 freqs 259 Pxx_L1
262 psd_H1 263 psd_L1
266 figure
267 np.sqrt(Pxx_H1)267 loglog267 np.sqrt(Pxx_H1) 268 np.sqrt(Pxx_L1)268 loglog 268 np.sqrt(Pxx_L1)
269 axis
256 fmin 257 fmax
270 grid 271 ylabel 272 xlabel 273 legend 274 title GW150914_ASDs.png
275 savefig
323 return
322 white_ht
314 strain 314 interp_psd 314 dt
146 dt
315 len 315 Nt
316 rfftfreq 316 freqs
320 rfft320 hf
321 (np.sqrt(interp_psd(freqs) /dt/2.))321 white_hf
322 irfft
144 time
326 strain_H1_whiten
323 return
322 white_ht
314 strain 314 interp_psd314 dt
315 len315 Nt
316 rfftfreq 316 freqs
320 rfft320 hf
321 (np.sqrt(interp_psd(freqs) /dt/2.))321 white_hf
322 irfft
327 strain_L1_whiten
323 return
322 white_ht
314 strain314 interp_psd 314 dt
315 len315 Nt
316 rfftfreq 316 freqs
320 rfft320 hf
321 (np.sqrt(interp_psd(freqs) /dt/2.))321 white_hf
322 irfft
328 NR_H1_whiten 351 butter351 ab 351 bb
352 filtfilt 352 strain_H1_whitenbp 353 filtfilt 353 strain_L1_whitenbp354 filtfilt 354 NR_H1_whitenbp
368 int(0.007*fs)368 roll368 strain_L1_shift 368 int(0.007*fs)
370 figure
371 plot
372 plot
373 plot
374 xlim 375 ylim
376 str(tevent)376 xlabel 376 str(tevent)
377 ylabel 378 legend 379 title GW150914_strain_whitened.png
380 savefig
414 where
411 tevent 412 deltat
414 indxt 422 blackman
417 NFFT
422 window
431 figure
433 plt.specgram(strain_H1[in ... xextent=[-deltat,deltat])
427 spec_cmap419 NOVL
432 im 432 spec_H1 432 freqs 432 bins
433 specgram433 plt.specgram(strain_H1[in ... xextent=[-deltat,deltat])
434 str(tevent)434 xlabel 434 str(tevent)
435 ylabel 436 colorbar
437 axis
438 title GW150914_H1_spectrogram.png
439 savefig
442 figure
444 plt.specgram(strain_L1[in ... xextent=[-deltat,deltat])
443 im 443 spec_H1 443 freqs 443 bins
444 specgram 444 plt.specgram(strain_L1[in ... xextent=[-deltat,deltat])
445 str(tevent)445 xlabel 445 str(tevent)
446 ylabel 447 colorbar
448 axis
449 title GW150914_L1_spectrogram.png
450 savefig
478 where
475 tevent
476 deltat
478 indxt 486 blackman
481 NFFT
486 window
489 figure
491 plt.specgram(strain_H1_wh ... xextent=[-deltat,deltat])
483 NOVL
490 im 490 spec_H1 490 freqs490 bins
491 specgram491 plt.specgram(strain_H1_wh ... xextent=[-deltat,deltat])
492 str(tevent)492 xlabel 492 str(tevent)
493 ylabel 494 colorbar 495 axis 496 title GW150914_H1_spectrogram_whitened.png
497 savefig
500 figure
502 plt.specgram(strain_L1_wh ... xextent=[-deltat,deltat])
501 im 501 spec_H1 501 freqs 501 bins
502 specgram 502 plt.specgram(strain_L1_wh ... xextent=[-deltat,deltat])
503 str(tevent)503 xlabel 503 str(tevent)
504 ylabel 505 colorbar 506 axis 507 title GW150914_L1_spectrogram_whitened.png
508 savefig
608 return
575 coefs
572 fs
586 butter
580 order584 low 585 high
586 ab586 bb
587 list.append
593 np.array( [14.0,3 ... 331.49, 510.02, 1009.99])
591 notchesAbsolute
593 array
597 array
535 fstops
569 return
568 a 568 b
535 fs
545 array545 zd546 array546 pd
559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
554 low 555 high 556 low2 557 high2
542 nyq
558 p 558 k 558 z
559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
560 append560 zd561 append 561 pd
564 zpk2tf564 aPrelim 564 bPrelim
565 freqz565 outg0565 outFreq
568 zpk2tf
597 bn597 an
598 list.append
597 array
596 notchf
535 fstops
569 return
568 a 568 b
535 fs
545 array545 zd546 array546 pd
559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
554 low 555 high 556 low2 557 high2
542 nyq
558 p 558 k 558 z
559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
560 append560 zd561 append 561 pd
564 zpk2tf564 aPrelim 564 bPrelim
565 freqz565 outg0565 outFreq
568 zpk2tf
597 bn 597 an
598 list.append
597 array
596 notchf
535 fstops
569 return
568 a 568 b
535 fs
545 array545 zd546 array546 pd
559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
554 low 555 high 556 low2 557 high2
542 nyq
558 p 558 k 558 z
559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
560 append560 zd561 append 561 pd
564 zpk2tf564 aPrelim 564 bPrelim
565 freqz565 outg0565 outFreq
568 zpk2tf
597 bn 597 an
598 list.append
597 array
596 notchf
535 fstops
569 return
568 a 568 b
535 fs
545 array545 zd546 array546 pd
559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
554 low 555 high 556 low2 557 high2
542 nyq
558 p 558 k 558 z
559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
560 append560 zd561 append 561 pd
564 zpk2tf564 aPrelim 564 bPrelim
565 freqz565 outg0565 outFreq
568 zpk2tf
597 bn 597 an
598 list.append
597 array
596 notchf
535 fstops
569 return
568 a 568 b
535 fs
545 array545 zd546 array546 pd
559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
554 low 555 high 556 low2 557 high2
542 nyq
558 p 558 k 558 z
559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
560 append560 zd561 append 561 pd
564 zpk2tf564 aPrelim 564 bPrelim
565 freqz565 outg0565 outFreq
568 zpk2tf
597 bn597 an
598 list.append
597 array
596 notchf
535 fstops
569 return
568 a 568 b
535 fs
545 array545 zd546 array546 pd
559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
554 low 555 high 556 low2 557 high2
542 nyq
558 p 558 k 558 z
559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
560 append560 zd561 append 561 pd
564 zpk2tf564 aPrelim 564 bPrelim
565 freqz565 outg0565 outFreq
568 zpk2tf
597 bn 597 an
598 list.append
597 array
596 notchf
535 fstops
569 return
568 a 568 b
535 fs
545 array545 zd546 array546 pd
559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
554 low 555 high 556 low2 557 high2
542 nyq
558 p 558 k 558 z
559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
560 append560 zd561 append 561 pd
564 zpk2tf564 aPrelim 564 bPrelim
565 freqz565 outg0565 outFreq
568 zpk2tf
597 bn 597 an
598 list.append
597 array
596 notchf
535 fstops
569 return
568 a 568 b
535 fs
545 array545 zd546 array546 pd
559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
554 low 555 high 556 low2 557 high2
542 nyq
558 p 558 k 558 z
559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
560 append560 zd561 append 561 pd
564 zpk2tf564 aPrelim 564 bPrelim
565 freqz565 outg0565 outFreq
568 zpk2tf
597 bn 597 an
598 list.append
597 array
596 notchf
535 fstops
569 return
568 a 568 b
535 fs
545 array545 zd546 array546 pd
559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
554 low 555 high 556 low2 557 high2
542 nyq
558 p 558 k 558 z
559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
560 append560 zd561 append 561 pd
564 zpk2tf564 aPrelim 564 bPrelim
565 freqz565 outg0565 outFreq
568 zpk2tf
597 bn597 an
598 list.append
597 array
596 notchf
535 fstops
569 return
568 a 568 b
535 fs
545 array545 zd546 array546 pd
559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
554 low 555 high 556 low2 557 high2
542 nyq
558 p 558 k 558 z
559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
560 append560 zd561 append 561 pd
564 zpk2tf564 aPrelim 564 bPrelim
565 freqz565 outg0565 outFreq
568 zpk2tf
597 bn597 an
598 list.append
597 array
596 notchf
535 fstops
569 return
568 a 568 b
535 fs
545 array545 zd546 array546 pd
559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
554 low 555 high 556 low2 557 high2
542 nyq
558 p 558 k 558 z
559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
560 append560 zd561 append 561 pd
564 zpk2tf564 aPrelim 564 bPrelim
565 freqz565 outg0565 outFreq
568 zpk2tf
597 bn597 an
598 list.append
597 array
596 notchf
535 fstops
569 return
568 a 568 b
535 fs
545 array545 zd546 array546 pd
559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
554 low 555 high 556 low2 557 high2
542 nyq
558 p 558 k 558 z
559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
560 append560 zd561 append 561 pd
564 zpk2tf564 aPrelim 564 bPrelim
565 freqz565 outg0565 outFreq
568 zpk2tf
597 bn597 an
598 list.append
597 array
596 notchf
535 fstops
569 return
568 a 568 b
535 fs
545 array545 zd546 array546 pd
559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
554 low 555 high 556 low2 557 high2
542 nyq
558 p 558 k 558 z
559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
560 append560 zd561 append 561 pd
564 zpk2tf564 aPrelim 564 bPrelim
565 freqz565 outg0565 outFreq
568 zpk2tf
597 bn597 an
598 list.append
597 array
596 notchf
535 fstops
569 return
568 a 568 b
535 fs
545 array545 zd546 array546 pd
559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
554 low 555 high 556 low2 557 high2
542 nyq
558 p 558 k 558 z
559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
560 append560 zd561 append 561 pd
564 zpk2tf564 aPrelim 564 bPrelim
565 freqz565 outg0565 outFreq
568 zpk2tf
597 bn597 an
598 list.append
601 array
535 fstops
569 return
568 a 568 b
535 fs
545 array545 zd546 array546 pd
559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
554 low 555 high 556 low2 557 high2
542 nyq
558 p 558 k 558 z
559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
560 append560 zd561 append 561 pd
564 zpk2tf564 aPrelim 564 bPrelim
565 freqz565 outg0565 outFreq
568 zpk2tf
601 bn 601 an
602 list.append
605 array
535 fstops
569 return
568 a 568 b
535 fs
545 array545 zd546 array546 pd
559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
554 low 555 high 556 low2 557 high2
542 nyq
558 p 558 k 558 z
559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk')
560 append560 zd561 append 561 pd
564 zpk2tf564 aPrelim 564 bPrelim
565 freqz565 outg0565 outFreq
568 zpk2tf
605 bn605 an
606 list.append
639 coefs642 RandomState.randn 642 data
631 return
630 data
624 data_in624 coefs
625 ndarray.copy625 data
630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
645 resp
649 psd
648 NFFT
649 freqs649 Pxx_data
650 psd650 Pxx_resp650 freqs
653 np.sqrt(Pxx_data)653 ndarray.mean 653 np.sqrt(Pxx_data)653 norm
654 np.sqrt(Pxx_data)654 asd_data
655 np.sqrt(Pxx_resp)655 asd_resp
659 ones
658 Nc
659 filt_resp662 freqz
661 b661 a
662 r662 w
663 np.abs(r)663 filt_resp
662 freqz
661 b661 a
662 r 662 w
663 np.abs(r)663 filt_resp
662 freqz
661 b661 a
662 r 662 w
663 np.abs(r)663 filt_resp
662 freqz
661 b661 a
662 r 662 w
663 np.abs(r)663 filt_resp
662 freqz
661 b661 a
662 r 662 w
663 np.abs(r)663 filt_resp
662 freqz
661 b661 a
662 r 662 w
663 np.abs(r)663 filt_resp
662 freqz
661 b661 a
662 r 662 w
663 np.abs(r)663 filt_resp
662 freqz
661 b661 a
662 r 662 w
663 np.abs(r)663 filt_resp
662 freqz
661 b 661 a
662 r 662 w
663 np.abs(r)663 filt_resp
662 freqz
661 b 661 a
662 r 662 w
663 np.abs(r)663 filt_resp
662 freqz
661 b661 a
662 r662 w
663 np.abs(r)663 filt_resp
662 freqz
661 b 661 a
662 r662 w
663 np.abs(r)663 filt_resp
662 freqz
661 b 661 a
662 r662 w
663 np.abs(r)663 filt_resp
662 freqz
661 b 661 a
662 r662 w
663 np.abs(r)663 filt_resp
662 freqz
661 b 661 a
662 r662 w
663 np.abs(r)663 filt_resp
662 freqz
661 b 661 a
662 r662 w
663 np.abs(r)663 filt_resp
662 freqz
661 b661 a
662 r662 w
663 np.abs(r)663 filt_resp
669 figure
670 plot
671 plot
672 plot
664 freqf
666 filt_resp
673 xlim 674 grid 675 ylabel 676 xlabel 677 legend GW150914_filter.png
678 savefig
631 return
630 data
624 data_in 624 coefs
625 ndarray.copy 625 data
630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
688 strain_H1_filt
631 return
630 data
624 data_in 624 coefs
625 ndarray.copy 625 data
630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
689 strain_L1_filt
631 return
630 data
624 data_in 624 coefs
625 ndarray.copy 625 data
630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
630 data630 filtfilt
627 b 627 a
692 NR_H1_filt
710 figure
711 plot 712 plot
713 xlim
714 str(tevent)714 xlabel 714 str(tevent)
715 ylabel 716 legend 717 title GW150914_H1_strain_unfiltered.png
718 savefig
722 int(0.007*fs)722 roll722 strain_L1_fils 722 int(0.007*fs)
724 figure
725 plot
726 plot
727 plot
728 xlim 729 ylim
730 str(tevent)730 xlabel 730 str(tevent)
731 ylabel 732 legend 733 title GW150914_H1_strain_filtered.png
734 savefig
776 where
772 tevent773 deltat
776 indxt
779 int(fs)
768 fs 768 data768 filename
769 np.abs(data)769 amax769 np.int16(data/np.max(np.abs(data)) * 32767 * 0.9)769 d 769 np.abs(data)
770 int(fs)770 write 770 int(fs)
GW150914_H1_whitenbp.wav
779 int(fs) 780 int(fs)
768 fs 768 data768 filename
769 np.abs(data)769 amax769 np.int16(data/np.max(np.abs(data)) * 32767 * 0.9)769 d 769 np.abs(data)
770 int(fs)770 write 770 int(fs)
GW150914_L1_whitenbp.wav
780 int(fs) 781 int(fs)
768 fs 768 data768 filename
769 np.abs(data)769 amax769 np.int16(data/np.max(np.abs(data)) * 32767 * 0.9)769 d 769 np.abs(data)
770 int(fs)770 write 770 int(fs)
GW150914_NR_whitenbp.wav
781 int(fs)
824 int(float(fs)*float(speedup)) 824 float(fs)824 float(speedup)
821 fs823 speedup
824 fss
818 return
817 z
808 data 808 fshift
822 fshift
808 sample_rate
811 rfft811 x
812 len812 T812 float(sample_rate)
814 int(fshift/df)
813 df
814 nbins
816 roll816 y 816 roll
817 irfft
827 strain_H1_shifted
818 return
817 z
808 data808 fshift808 sample_rate
811 rfft811 x
812 len812 T812 float(sample_rate)
814 int(fshift/df)
813 df
814 nbins
816 roll816 y 816 roll
817 irfft
828 strain_L1_shifted
818 return
817 z
808 data 808 fshift808 sample_rate
811 rfft811 x
812 len 812 T 812 float(sample_rate)
814 int(fshift/df)
813 df
814 nbins
816 roll816 y 816 roll
817 irfft
829 NR_H1_shifted
845 int(fs)
768 fs 768 data768 filename
769 np.abs(data)769 amax769 np.int16(data/np.max(np.abs(data)) * 32767 * 0.9)769 d 769 np.abs(data)
770 int(fs)770 write 770 int(fs)
GW150914_H1_shifted.wav
845 int(fs) 846 int(fs)
768 fs 768 data768 filename
769 np.abs(data)769 amax769 np.int16(data/np.max(np.abs(data)) * 32767 * 0.9)769 d 769 np.abs(data)
770 int(fs)770 write 770 int(fs)
GW150914_L1_shifted.wav
846 int(fs) 847 int(fs)
768 fs 768 data768 filename
769 np.abs(data)769 amax769 np.int16(data/np.max(np.abs(data)) * 32767 * 0.9)769 d 769 np.abs(data)
770 int(fs)770 write 770 int(fs)
GW150914_NR_shifted.wav
847 int(fs)
876 loaddata
875 fn_16
876 time_16876 strain_16876 chan_dict
878 loaddata
877 fn_4
878 time_4878 strain_4 878 chan_dict883 psd
881 fs
882 NFFT
883 freqs_16 883 Pxx_16
887 psd
885 fs
886 NFFT
887 Pxx_4 887 freqs_4
892 figure
893 np.sqrt(Pxx_16)893 loglog 893 np.sqrt(Pxx_16)
894 np.sqrt(Pxx_4)894 loglog894 np.sqrt(Pxx_4) 895 axis
889 fmin 890 fmax
896 grid 897 ylabel 898 xlabel 899 legend 900 title GW150914_H1_ASD_16384.png
901 savefig
913 figure
914 np.sqrt(Pxx_16)914 plot 914 np.sqrt(Pxx_16)
915 np.sqrt(Pxx_4)915 plot915 np.sqrt(Pxx_4) 916 axis
910 fmin 911 fmax
917 grid 918 ylabel 919 xlabel 920 legend 921 title GW150914_H1_ASD_16384_zoom.png
922 savefig
937 decimate
935 factor936 numtaps
937 strain_4new
941 psd
939 fs
940 NFFT
941 Pxx_4new 941 freqs_4
946 figure947 np.sqrt(Pxx_4new)947 plot947 np.sqrt(Pxx_4new) 948 np.sqrt(Pxx_4)948 plot 948 np.sqrt(Pxx_4) 949 axis
943 fmin 944 fmax
950 grid 951 ylabel 952 xlabel 953 legend 954 title GW150914_H1_ASD_4096_zoom.png
955 savefig
979 loaddata
978 fn
979 strain 979 chan_dict 979 time
982 dict.items982 keys 982 values 982 keys 982 values 982 keys 982 values 982 keys 982 values 982 keys 982 values 982 keys982 values 982 keys 982 values 982 keys 982 values 982 keys 982 values 982 keys 982 values 982 keys 982 values 982 keys 982 values 982 keys 982 values
984 array_str 984 array_str 984 array_str 984 array_str 984 array_str 984 array_str 984 array_str 984 array_str 984 array_str 984 array_str 984 array_str 984 array_str 984 array_str
989 np.isnan(strain)989 sum 989 np.isnan(strain) 990 len 995 dq_channel_to_seglist
993 DQflag
995 segment_list
996 len
1003 len
1002 seg_strain
1009 dq_channel_to_seglist1009 segment_list
1010 len
1015 len
1014 seg_strain
Workflowmodel(graph)Facts(Prolog)
ReconstructedprovenanceFacts(Prolog)
RunobservationsFacts(Prolog)
prospective+file-level runtimeobservables
Ludäscher:Workflows&Provenance=>Understanding 29
LIGOexample:Whatstrain_L1_whitenbp dependson…
Overallworkflow
Upstreamofstrain_L1_whitenbp
(prospective)
GRAVITATIONAL_WAVE_DETECTION
LOAD_DATA
Load hdf5 data.
strain_H1strain_L1 strain_16 strain_4
AMPLITUDE_SPECTRAL_DENSITY
Amplitude spectral density.
ASDsfile:GW150914_ASDs.png
PSD_H1PSD_L1
WHITENING
suppress low frequencies noise.
strain_H1_whiten strain_L1_whiten
BANDPASSING
remove high frequency noise.
strain_H1_whitenbp strain_L1_whitenbp
STRAIN_WAVEFORM_FOR_WHITENED_DATA
plot whitened data.
WHITENED_strain_datafile:GW150914_strain_whitened.png
SPECTROGRAMS_FOR_STRAIN_DATA
plot spectrogram for strain data.
spectrogramfile:GW150914_{detector}_spectrogram.png
SPECTROGRAMS_FOR_WHITEND_DATA
plot spectrogram for whitened data.
spectrogram_whitenedfile:GW150914_{detector}_spectrogram_whitened.png
FILTER_COEFS
Filter signal in time domain (bandpassing).
COEFFICIENTS
FILTER_DATA
filter data.
filtered_white_noise_datafile:GW150914_filter.png
strain_H1_filtstrain_L1_filt
STRAIN_WAVEFORM_FOR_FILTERED_DATA
plot the filtered data.
H1_strain_filteredfile:GW150914_H1_strain_filtered.png
H1_strain_unfilteredfile:GW150914_H1_strain_unfiltered.png
WAVE_FILE_GENERATOR_FOR_WHITENED_DATA
Make sound files for whitened data.
whitened_bandpass_wavefilefile:GW150914_{detector}_whitenbp.wav
SHIFT_FREQUENCY_BANDPASSED
shift frequency of bandpassed signal.
strain_H1_shifted strain_L1_shifted
WAVE_FILE_GENERATOR_FOR_SHIFTED_DATA
Make sound files for shifted data.
shifted_wavefilefile:GW150914_{detector}_shifted.wav
DOWNSAMPLING
Downsampling from 16384 Hz to 4096 Hz.
H1_ASD_SamplingRatefile:GW150914_H1_ASD_{SamplingRate}.png
FN_Detectorfile:{Detector}_LOSC_4_V1-1126259446-32.hdf5
FN_Sampling_ratefile:H-H1_LOSC_{DownSampling}_V1-1126259446-32.hdf5
fs
upstream(strain_LI_whitenbp) [prospective]
WHITENING
strain_H1_whiten strain_L1_whiten
AMPLITUDE_SPECTRAL_DENSITY
PSD_H1 PSD_L1
LOAD_DATA
strain_H1 strain_L1
BANDPASSING
strain_L1_whitenbp
FN_Detectorfile:{Detector}_LOSC_4_V1-...
FN_Sampling_ratefile:H-H1_LOSC_{Rate}_V1-...
fs
upstream(strain_L1_whitenbp) [URI-recon]
WHITENING
strain_H1_whiten strain_L1_whiten
AMPLITUDE_SPECTRAL_DENSITY
PSD_H1 PSD_L1
LOAD_DATA
strain_H1 strain_L1
BANDPASSING
strain_L1_whitenbp
FN_Detector
L-L1_LOSC_4_V1-1126259446-32.hdf5H-H1_LOSC_4_V1-1126259446-32.hdf5
FN_Sampling_rate
H-H1_LOSC_4_V1-1126259446-32.hdf5H-H1_LOSC_16_V1-1126259446-32.hdf5
fs
upstream(strain_LI_whitenbp) [NW-recon]
WHITENING
strain_L1_whitenstrain_L1_whiten = array([8.494, -1.672, ..., 72.156])
AMPLITUDE_SPECTRAL_DENSITY
PSD_L1psd_L1 = scipy.interpolate.interpolate.interp1d
object at 0x113969418
LOAD_DATA
strain_L1strain_L1 = array([-1.779e-18, -1.765e-18, ..., -1.719e-18])
BANDPASSING
strain_L1_whitenbpstrain_L1_whitenbp = array([8.184, 19.935,..., -0.684])
FN_Detectorfn_d = L-L1_LOSC_4_V1-1126259446-32.hdf5
fsfs = 4096
Upstreamofstrain_L1_whitenbp(hybridYW-NWatthecode-
level)
Upstreamofstrain_L1_whitenbp(hybridYW-NWatthefile-level)
3inputsspreadacross5 (=2x2+1)files
Doesintermediatedatastrain_L1_whitenbpdependonall5inputs?
• Intermediatedatastrain_L1_whitenbpdependonlyon2 outof5inputs!
Ludäscher:Workflows&Provenance=>Understanding 30
DwCA TaxonLookupWorkflow
• Declareinputs,outputs,andsteps ofascript(orwf)withYWannotationsto...– communicateprovenancegraphically(viagraphviz)
– combine differentformsofprovenance
– query provenance• SimpleYWannotationsincomments:– @BEGINStep,@ENDStep– @INData,@OUTData– @URITemplate,@LOGPattern
Ludäscher:Workflows&Provenance=>Understanding 31
�����������������
�������������������������������������������������������������������
��������������������������������������������������������������
������������������������������������������������
�������������������������
�������������������������������������������������������������
����������
�������������������������������������������������������������������������������������������������������
����������������
���������������������
�������������������������������������������������������
����������������
�������������������������������������������������������
�������������������
������������������������������������������
������������������
����������������������������������������
�����������������
���������������������������������������
������������
�������������������������������������������������������������������
��������������������������������������������������������
�����������������
TaxonLookupWorkflow:DataViewandProcessView
Ludäscher:Workflows&Provenance=>Understanding 32
Thestoryoftwoindividual
records
Ludäscher:Workflows&Provenance=>Understanding 33
�����������������
�����������������
�������������������
�������
����������
����������
�����������������
�����
���������
��������������
����������������
����������
���������������
�����������������
����������������
������
������������������
����������������
�������������������������������
�����������
������������������
����
�����������
������������
�������������
���������������������
�������������������������������������������������������������������
�����������������
�������������������������������������������������������������������������
�����������������
������������������
����������������
�������
����������
�����������
������������������
�����
���������
��������������
����������������
����������
���������������
�����������������
����������������
���������
�����������������
�������������������
���������������������������������
����������
�����������������
��������������������������������������
�����������
������������
�������������
���������������������
�������������������������������������������������������������������
�����������������
������������������������������������������������������������������
• OnetooktheGBIFroute,while…
• … theotherwentallWORMS!
Theaggregate story..
Ludäscher:Workflows&Provenance=>Understanding 34
�����������������
�����
���������
��������������
����������������
��������������������
�����������������
��������������������������
�������
����������
������������������
�������������������������
�����������������
����������������������������
�����������
�������������������������������
���������
����������
������������������������������
��������
�����������
������������
�������������
���������������������
�������������������������������������������������������������������
�����������������
�������������������������������������������������������������������������
• Howmanyrecordswereobservedasinputsoroutputsofworkflowsteps?
• WerethereanyNULLvalues?Howmany?
SummaryI• YWannotationscanbeaddedeasilytoyourscriptstoreapworkflowbenefits– Documentation ofwhat’simportant
– Visualization ofdependencies– Queryingprovenance(prospective,retrospective,andhybrid)
èmakeprovenanceactionableè provenanceforself!
=> github.com/yesworkflow-org/yw=> try.yesworkflow.org
Ludäscher:Workflows&Provenance=>Understanding 35
�����������������
�������������������������������������������������������������������
��������������������������������������������������������������
������������������������������������������������
�������������������������
�������������������������������������������������������������
����������
�������������������������������������������������������������������������������������������������������
����������������
���������������������
�������������������������������������������������������
����������������
�������������������������������������������������������
�������������������
������������������������������������������
������������������
����������������������������������������
�����������������
���������������������������������������
������������
�������������������������������������������������������������������
��������������������������������������������������������
�����������������
�����������������
�����
���������
��������������
����������������
��������������������
�����������������
��������������������������
�������
����������
������������������
�������������������������
�����������������
����������������������������
�����������
�������������������������������
���������
����������
������������������������������
��������
�����������
������������
�������������
���������������������
�������������������������������������������������������������������
�����������������
�������������������������������������������������������������������������
JoãoF.Pimentel,SaumenDey,TimothyMcPhillips,KhalidBelhajjame,DavidKoop,LeonardoMurta,
VanessaBraganholo,BertramLudascher
Yin&Yang:Demonstrating complementaryprovenancefromnoWorkflow &
YesWorkflow
36
module.__build_class__
module.__build_class__
simulate_data_collection
180 return
180 run_logger
201 return
201 new_image_file
230 parser
231 cassette_id
236 add_option
241 add_option
246 add_option
248 set_usage
251 parse_args251 args
251 options254 module.len
24 cassette_id
24 sample_score_cutoff
24 data_redundancy
24 calibration_image_file
30 exists
33 exists
32 filepath
34 module.remove
33 exists
32 filepath
34 module.remove
33 exists
32 filepath
34 module.remove
36 run_log
37 write
38 str(sample_score_cutoff)
38 write
38 str(sample_score_cutoff)
49 str.format
49 sample_spreadsheet_file
50 spreadsheet_rows
cassette_q55_spreadsheet.csv
50 spreadsheet_rows(sample_spreadsheet_file)
51 str.format 51 write
50 sample_name
50 sample_quality
61 calculate_strategy
61 rejected_sample
61 energies
61 accepted_sample
61 num_images
72 str.format
72 write
73 open
73 rejection_log
74 str.format
74 TextIOWrapper.write
50 spreadsheet_rows
50 spreadsheet_rows(sample_spreadsheet_file)
51 str.format
51 write
50 sample_name
50 sample_quality
61 calculate_strategy
61 rejected_sample
61 energies
61 accepted_sample
61 num_images
90 str.format
90 write91 sample_id
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
calibration.img
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format 93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer 120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format 106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format 93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
50 spreadsheet_rows
50 spreadsheet_rows(sample_spreadsheet_file)
51 str.format
51 write
50 sample_name
50 sample_quality
61 calculate_strategy
61 rejected_sample
61 energies
61 accepted_sample
61 num_images
90 str.format
90 write
91 sample_id
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open119 collection_log_file 120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer 120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open 119 collection_log_file 120 module.writer 120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format 106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file 120 module.writer120 collection_log
121 writer.writerow
92 collect_next_image
50 spreadsheet_rows
128 return
run/run_log.txt
run/rejected_samples.txt
run/raw/q55/DRT240/e10000/image_001.raw
run/data/DRT240/DRT240_10000eV_001.img
run/collected_images.csv
run/raw/q55/DRT240/e10000/image_002.raw
run/data/DRT240/DRT240_10000eV_002.img
run/raw/q55/DRT240/e11000/image_001.raw
run/data/DRT240/DRT240_11000eV_001.img
run/raw/q55/DRT240/e11000/image_002.raw
run/data/DRT240/DRT240_11000eV_002.img
run/raw/q55/DRT240/e12000/image_001.raw
run/data/DRT240/DRT240_12000eV_001.img
run/raw/q55/DRT240/e12000/image_002.raw
run/data/DRT240/DRT240_12000eV_002.img
run/raw/q55/DRT322/e10000/image_001.raw
run/data/DRT322/DRT322_10000eV_001.img
run/raw/q55/DRT322/e10000/image_002.raw
run/data/DRT322/DRT322_10000eV_002.img
run/raw/q55/DRT322/e11000/image_001.raw
run/data/DRT322/DRT322_11000eV_001.img
run/raw/q55/DRT322/e11000/image_002.raw
run/data/DRT322/DRT322_11000eV_002.img
noWorkflow:not onlyWorkflow!
• Scriptshaveprovenance,too!
• Transparently capturesome/allprovenancefromPythonscriptruns.
• Usefilterqueries to“zoom”intorelevantparts..
37
simulate_data_collection
230 parser = <optparse.OptionParser object at 0x7fcb6e16e3c8>
251 parse_args = (<Values at 0x7fcb6cbe15c ... cutoff': 12.0}>, ['q55'])
251 args = ['q55']
251 options = <Values at 0x7fcb6cbe15c0 ... ple_score_cutoff': 12.0}>
24 cassette_id = 'q55'
24 sample_score_cutoff = 12.0 24 data_redundancy = 0.0
24 calibration_image_file = 'calibration.img'
49 str.format
49 sample_spreadsheet_file = 'cassette_q55_spreadsheet.csv'
50 spreadsheet_rows(sample_spreadsheet_file)
50 sample_name = 'DRT240'50 sample_quality = 45
61 calculate_strategy = ('DRT240', None, 2, [10000, 11000, 12000])
61 accepted_sample = 'DRT240'61 num_images = 2
61 energies = [10000, 11000, 12000] 91 sample_id = 'DRT240'
92 collect_next_image(casset ... _{frame_number:03d}.raw')
92 energy = 11000 92 frame_number = 292 raw_image_file = 'run/raw/q55/DRT240/e11000/image_002.raw'
106 str.format
106 transform_image = (980, 10, 'run/data/DRT240/DRT240_11000eV_002.img')
calibration.img
run/data/DRT240/DRT240_11000eV_002.img
$now dataflow-f"run/data/DRT240/DRT240_11000eV_002.img"
$(NW_FILTERED_LINEAGE_GRAPH).gv: $(NW_FACTS)now helper df_style.pynow dataflow -v 55 -f $(RETROSPECTIVE_LINEAGE_VALUE) -m simulation| python df_style.py -d BT -e > $(NW_FILTERED_LINEAGE_GRAPH).gv
..auto-“make” this!
noWorkflow lineageofanimagefile
ProvenanceinformationaboutPythonfunctioncalls,variable assignments,etc.
38
simulate_data_collection
initialize_run
run_log load_screening_results
sample_namesample_quality
calculate_strategy
accepted_samplerejected_sample num_imagesenergies
log_rejected_sample
rejection_log
collect_data_set
sample_id energyframe_number raw_image
transform_images
corrected_imagetotal_intensitypixel_count
log_average_image_intensity
collection_log
sample_spreadsheet
calibration_image
sample_score_cutoffdata_redundancy
cassette_id
simulate_data_collection
collect_data_set
sample_id energy frame_number raw_image
calculate_strategy
accepted_sample num_imagesenergies
load_screening_results
sample_namesample_quality
transform_images
corrected_image
sample_spreadsheet
calibration_image
sample_score_cutoff data_redundancy
cassette_id
module.__build_class__
module.__build_class__
simulate_data_collection
180 return
180 run_logger
201 return
201 new_image_file
230 parser
231 cassette_id
236 add_option
241 add_option
246 add_option
248 set_usage
251 parse_args251 args
251 options254 module.len
24 cassette_id
24 sample_score_cutoff
24 data_redundancy
24 calibration_image_file
30 exists
33 exists
32 filepath
34 module.remove
33 exists
32 filepath
34 module.remove
33 exists
32 filepath
34 module.remove
36 run_log
37 write
38 str(sample_score_cutoff)
38 write
38 str(sample_score_cutoff)
49 str.format
49 sample_spreadsheet_file
50 spreadsheet_rows
cassette_q55_spreadsheet.csv
50 spreadsheet_rows(sample_spreadsheet_file)
51 str.format 51 write
50 sample_name
50 sample_quality
61 calculate_strategy
61 rejected_sample
61 energies
61 accepted_sample
61 num_images
72 str.format
72 write
73 open
73 rejection_log
74 str.format
74 TextIOWrapper.write
50 spreadsheet_rows
50 spreadsheet_rows(sample_spreadsheet_file)
51 str.format
51 write
50 sample_name
50 sample_quality
61 calculate_strategy
61 rejected_sample
61 energies
61 accepted_sample
61 num_images
90 str.format
90 write91 sample_id
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
calibration.img
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format 93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer 120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format 106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format 93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
50 spreadsheet_rows
50 spreadsheet_rows(sample_spreadsheet_file)
51 str.format
51 write
50 sample_name
50 sample_quality
61 calculate_strategy
61 rejected_sample
61 energies
61 accepted_sample
61 num_images
90 str.format
90 write
91 sample_id
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open119 collection_log_file 120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer 120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open 119 collection_log_file 120 module.writer 120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format 106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file 120 module.writer120 collection_log
121 writer.writerow
92 collect_next_image
50 spreadsheet_rows
128 return
run/run_log.txt
run/rejected_samples.txt
run/raw/q55/DRT240/e10000/image_001.raw
run/data/DRT240/DRT240_10000eV_001.img
run/collected_images.csv
run/raw/q55/DRT240/e10000/image_002.raw
run/data/DRT240/DRT240_10000eV_002.img
run/raw/q55/DRT240/e11000/image_001.raw
run/data/DRT240/DRT240_11000eV_001.img
run/raw/q55/DRT240/e11000/image_002.raw
run/data/DRT240/DRT240_11000eV_002.img
run/raw/q55/DRT240/e12000/image_001.raw
run/data/DRT240/DRT240_12000eV_001.img
run/raw/q55/DRT240/e12000/image_002.raw
run/data/DRT240/DRT240_12000eV_002.img
run/raw/q55/DRT322/e10000/image_001.raw
run/data/DRT322/DRT322_10000eV_001.img
run/raw/q55/DRT322/e10000/image_002.raw
run/data/DRT322/DRT322_10000eV_002.img
run/raw/q55/DRT322/e11000/image_001.raw
run/data/DRT322/DRT322_11000eV_001.img
run/raw/q55/DRT322/e11000/image_002.raw
run/data/DRT322/DRT322_11000eV_002.img
simulate_data_collection
230 parser = <optparse.OptionParser object at 0x7fcb6e16e3c8>
251 parse_args = (<Values at 0x7fcb6cbe15c ... cutoff': 12.0}>, ['q55'])
251 args = ['q55']
251 options = <Values at 0x7fcb6cbe15c0 ... ple_score_cutoff': 12.0}>
24 cassette_id = 'q55'
24 sample_score_cutoff = 12.0 24 data_redundancy = 0.0
24 calibration_image_file = 'calibration.img'
49 str.format
49 sample_spreadsheet_file = 'cassette_q55_spreadsheet.csv'
50 spreadsheet_rows(sample_spreadsheet_file)
50 sample_name = 'DRT240'50 sample_quality = 45
61 calculate_strategy = ('DRT240', None, 2, [10000, 11000, 12000])
61 accepted_sample = 'DRT240'61 num_images = 2
61 energies = [10000, 11000, 12000] 91 sample_id = 'DRT240'
92 collect_next_image(casset ... _{frame_number:03d}.raw')
92 energy = 11000 92 frame_number = 292 raw_image_file = 'run/raw/q55/DRT240/e11000/image_002.raw'
106 str.format
106 transform_image = (980, 10, 'run/data/DRT240/DRT240_11000eV_002.img')
calibration.img
run/data/DRT240/DRT240_11000eV_002.img
lineagequerylineagequery
YesWorkflow:Conceptual workflowmodel
noWorkflow:Python tracemodel
Buthowdowebridgethisgap???
WouldliketouseYWmodeltoqueryNW
data!
39
HabemusPons!We’vegottheBridge!Thebridgeisthejourney..(Thejourneyisthedestination)
LineageofimagefileintermsofYW
model,withdetailsfromNWprovenance
40
DataONE:SearchandProvenanceDisplay
41Ludäscher:Workflows&Provenance=>Understanding
DataONE:SearchandProvenanceDisplay
42Ludäscher:Workflows&Provenance=>Understanding
Adding YesWorkflow to DataONEYaxing’s script withinputs &outputproducts
Christopher’sYesWorkflow
model
ChristopherusingYaxing’s outputsasinputsforhisscript
Christopher’sresultscanbetracedbackall
thewaytoYaxing’sinput
Ludäscher:Workflows&Provenance=>Understanding 43
DemoTime
Ludäscher:Workflows&Provenance=>Understanding 44
(Disclaimer) https://github.com/idaks/dataone-ahm-2016-posterhttps://github.com/idaks/wt-prov-summer-2017https://github.com/yesworkflow-org/yw-idcc-17
WholeTale:Thenextstepintheevolutionofthescholarlyarticle:The“Living”Paper
• 1st Generation:– narrative (prose)
• 2nd Generation:plus …– name..identify..include(accessto)data
• 3rd Generation:plus …– name..reference..includecode (software)..– andprovenance …andexecenvironment(containers)
Ludäscher:Workflows&Provenance=>Understanding 45
WholeTale
WholeTaleDashboard
WholeTale:What’sinaname?
(1)WholeTale⇔WholeStory:◦ Support(computational /data)scientists◦…alongthecompleteresearchlifecycle◦ ...fromexperimentto(newkindof)publication◦ ...andback!
(2)WholeTale⇔ fortheLongTailofScience–Easysharingofyourcomputationalnarratives,data,andexec-env since2017!
–Powerapplicationsforeveryone!
46Ludäscher:Workflows&Provenance=>Understanding
Whole TaleVision• Can'treproduceresultbecause:
• Don'tknowhowtorunanalysis
• Can'tgetthesoftwarerunning
• Can'tpayforthecomputerorcomputepowertheresultwascomputedon
Source:BryceMecum,NCEAS(WTteam)47
Whole TaleVisionAddressingreproducibility
48
Data Code
ExecutionEnvironment
Article
Source:BryceMecum,NCEAS(WTteam)
Whole TaleVision• Livingpublication
(data+code+environment)
• Increaseoddsofreproducibility
• Encourageinvestigationofresultsmakingiteasytorecreatetheenvironmenttheresultwascreatedin
Article
Source:BryceMecum,NCEAS(WTteam)
Whole TaleVisionAddressingreproducibility
Article
Tale
+
Source:BryceMecum,NCEAS(WTteam)
WholeTaleVision
Tale
Data
{Code
D1PROV
Source:BryceMecum,NCEAS(WTteam)
WholeTaleTeamNSF-DIBBSaward:TheWholeTale:MergingScienceandCyberinfrastructurePathways($5Mtotal,over5years,5teams)
WTTeam:• Illinois(NCSA&iSchool)• BertramLudäscher(PI),KandaceTurner(PM),VictoriaStodden(coPI),MattTurk(coPI)
• KacperKowalik(sw-architect),CraigWillis(sw-dev)• UofChicago• KyleChard(coPI),MihaelHategan(sw-dev)
• UTAustin• NiallGaffney(coPI),SivaKulasekaran(sw-dev)
• UNotreDame• JarekNabrzyski(coPI),IanTaylor(sw-dev),AdamBrinckman(sw-dev)
• UCSB• Matt Jones(coPI),BryceMecum(sw-dev)
DEMO!
Ludäscher:Workflows&Provenance=>Understanding 53
Lastnotleast:Non-unitary syntheses
of systematic knowledge
Please
@taxonbytes
Nico Franz
School of Life Sciences, Arizona State University
CIRSS Seminar – Center for Informatics Research in Science and Scholarship
February 17, 2017 – iSchool, University of Illinois Urbana-Champaign
@ http://www.slideshare.net/taxonbytes/franz-2017-uiuc-cirss-non-unitary-syntheses-of-systematic-knowledge 54
55
http://taxonbytes.org/wp-content/uploads/2014/10/Peet-BIGCB-2014-Changing-Perspectives-on-Plant-Distributions.pdf56
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
"Taxonomic concept labels"identify input concept regions
RCC–5 articulations providedfor each species-level concept
• Input visualization: MSW3 (2005) versus MSW2 (1993)
Source: Franz et al. 2016. Two influential primate classifications logical aligned. doi:10.1093/sysbio/syw023
57
• Alignment visualization: "grey means taxonomically congruent"
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
58
One name &congruent region
Many names &congruent region
One name &non-congruent regions
Many names &non-congruent regions
New names &exclusive regions
• Application of coverage constraint: parent-to-parent articulations (><) arefully defined by alignment signal propagated from their respective children.
è Sensible when complete sampling of children is intended.
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
• Alignment visualization: "grey means taxonomically congruent"
59
1 in 3 names is unreliable across MSW2/MSW3 classifications
Source: Franz et al. 2016. Two influential primate classifications logical aligned. doi:10.1093/sysbio/syw023
60
The 'consensus' The 'bible'
The (formerly) federal
'standard'
The 'best', latest regional flora
"Controlling the taxonomic variable"
Expert viewsare in conflict
"Just bad"
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
61
The 'consensus' The 'bible'
The (formerly) federal
'standard'
The 'best', latest regional flora
Impact:Name-based aggregation has created
a novel synthesis that nobody believes in
"Controlling the taxonomic variable"
"Just bad"
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
62
The 'consensus' The 'bible'
The (formerly) federal
'standard'
The 'best', latest regional flora
"Controlling the taxonomic variable"
"Just bad"
Expert viewsare reconciled
Solution:Instead of aggregating
an artificial 'consensus',build translation services
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
63
Leavingtaxonandspeciesheadaches…• ToillustrateEulerthinkofasimplerusecase:• Agreeingtodisagree!• …whentherearemultiple,legitimateperspectives
• Sortingthingsout!– Eulerasataxonconcept(&name)“microscope”...– ..orscalpel– ..or...?
64
Yi-YunCheng1,NicoFranz2,JodiSchneider1,Shizhuo Yu3,ThomasRodenhausen4,BertramLudäscher11SchoolofInformationSciences,UniversityofIllinoisatUrbana-Champaign;2SchoolofLifeSciences,ArizonaStateUniversity;3DepartmentofComputerScience,UniversityofCaliforniaatDavis;4SchoolofInformation,UniversityofArizona
Agreeing to Disagree: Reconciling Conflicting Taxonomic Views using a Logic-based Approach
Acknowledgments
Supportoftheauthors’researchthroughtheNationalScienceFoundationiskindlyacknowledged(DEB-1155984,DBI-1342595,andDBI-1643002).TheauthorsthankProfessorKathrynLaBarreforhercommentsandsuggestions.WewouldalsoliketothankDr.LaetitiaNavarroandJeffTerstriep forhelpwithcreatingmapoverlaysinQGIS.
CONCLUSION
• Ourlogic-basedtaxonomyalignmentapproachcanbeusedtosolvecrosswalking issuesWewillbeabletomitigatethemembershipconditionproblemsthatoccurinequivalentcrosswalking.
• RCC-5approachpreservestheoriginaltaxonomieswhileprovidinganalignmentviewWecansolvedataintegrationproblemsthathappeninthemorecoarse-grainedrelativecrosswalking,whichotherwiseissubjectedtoinformationloss.
• Ourstudyalsounderscoresthebenefitsofdesigningdifferentalignmentworkflows(Bottomupvs.Top-down)tomatchtheneedsofspecifictaxonomyalignmentproblemsBottom-upapproach:seemstoworkwellwheneverwehavenon-overlappingrelationshipsattheleaf-level(lowest-level)articulations,andwearenotsurehowthehigher-levelconceptsshouldbealigned.
Top-downapproach:seemsfavorablewhenthereisanexpectationofcertainhigher-levelarticulationsinconjunctionwithunder-specified,complex,andoftenoverlappingleaf-levelrelations.
RELATEDWORK
• TaxonomyAlignmentProblems(TAP)TaxonomiesT1,T2 areinter-linkedviaasetofinputarticulations A,definedasRCC-5relations, toyielda“merged”taxonomyT3 .
• Euler/XArticulations – aconstraintorrulethatdefinesarelationship(asetconstraint)betweentwoconceptsfromdifferenttaxonomies.
RegionConnectionCalculus(RCC-5)
PossibleWorlds–WhenencodingandsolvingTAPsviaASP,thedifferentanswersetsrepresentalternativetaxonomymergesolutionsorpossibleworlds(PWs).
INTRODUCTION
Tina:HeyAmy,canyourecommendasignaturedishfromwhereyoulive?
Amy:Oh,definitelythehalf-smokesfromtheNortheast!Theyarethesetastyhalf-porkandhalf-beefsausages.
Tina:Whatacoincidence!Wehavehalf-smokesintheSouth,too!WheredoyouliveintheNortheast?NewYork?Boston?
Amy:Wrongguesses!WheredoyouliveintheSouth?
TinaandAmytogether:Washington,D.C.
[Thetwoofthemlookateachother,confused.]
“Inthefaceofincompatibleinformationordatastructuresamongusersoramongthosespecifyingthesystem,attemptstocreateunitaryknowledgecategoriesarefutile.Rather,parallelormultiplerepresentationalformsarerequired…”(Bowker&Star,2000).
CASE1RESULTS:CENvs.NDC
• State-levelalignmentsareallcongruent(Bottom-up)• Inferrednewarticulationsforregional-levelalignments
CASE2RESULTS:CENvs.TZ
Figure 3. (Left) CEN-NDC taxonomy alignment problem with 49 input articulations between TCEN and TNDC
Figure 4. (Right) The unique possible world (PW) T3 reconciling TCEN and TNDC via inferred relationships
Figure 1. National Diversity Council map (NDC) vs. Census Bureau map (CEN)
• Github link:https://github.com/EulerProject/ASIST17
• Email:yiyunyc2@illinois.edu
West
Southwest Southeast
Midwest North-east
West
South
Midwest North-east
PacificMountain
CentralEastern
West
South
Midwest
North-east
RESEARCHDESIGN
Step1. SupplyinputtaxonomiesT1 andT2Step2.FormulateRCC-5articulationsbetweenT1 andT2Step3. IterativelyeditarticulationsinEuler/X
Y X X YX Y X Y X Y
CongruenceX == Y
InclusionX > Y
Inverse InclusionX < Y
OverlapX>< Y
DisjointnessX ! Y
T1 T2
T1 T2
Inconsistent (N=0) Ambiguous (N>1)
T3
Add/Edit Articulations A
Euler/X
N Possible Worlds
N=1 N=0 or N>1
R1
R2
R3
R4
R5
R6
R7
R8
R9
CEN.Midwest
CEN.USATZ.USA
CEN.West
CEN.NortheastTZ.Eastern\CEN.Midwest
TZ.Eastern\CEN.South
CEN.South
CEN.South*TZ.CentralTZ.Central\CEN.Midwest
CEN.South\TZ.Eastern
CEN.South\TZ.Mountain
TZ.Central
CEN.Midwest\TZ.Eastern
TZ.Mountain\CEN.SouthTZ.Mountain
CEN.Midwest\TZ.Mountain
TZ.Mountain\CEN.Midwest
CEN.Midwest*TZ.Mountain
CEN.Midwest\TZ.Central
TZ.Mountain\CEN.West
CEN.Midwest*TZ.Eastern
CEN.West*TZ.Mountain
CEN.South*TZ.MountainCEN.South\TZ.Central
TZ.Eastern
CEN.South*TZ.Eastern
CEN.Midwest*TZ.CentralTZ.Central\CEN.South
TZ.PacificCEN.West\TZ.Mountain
Nodes
CEN 4newComb 18comb 1TZ 4
Edges
input 6inferred 37
CEN.IL NDC.IL==
CEN.IN NDC.IN==
CEN.RI NDC.RI==
CEN.IA NDC.IA==
CEN.WV NDC.WV==
CEN.KS NDC.KS==
CEN.KY NDC.KY==
CEN.TX NDC.TX==
CEN.NortheastCEN.VTCEN.MA
CEN.ME
CEN.CT
CEN.PA
CEN.NY
CEN.NH
CEN.NJ
CEN.South
CEN.TN
CEN.MS
CEN.MD
CEN.DC
CEN.DE
CEN.VA
CEN.FL
CEN.AR
CEN.AL
CEN.OK
CEN.SC
CEN.LACEN.GA
CEN.NC
CEN.ID NDC.ID==
NDC.TN==
CEN.WY NDC.WY==
NDC.VT==
NDC.MS==
CEN.MT NDC.MT==
NDC.MA==
CEN.USA
CEN.Midwest
CEN.West
NDC.ME==
NDC.MD==
CEN.MI NDC.MI==
CEN.MN NDC.MN==
NDC.DC==
NDC.DE==
CEN.OR NDC.OR==
CEN.OH NDC.OH==
NDC.VA==
NDC.FL==
NDC.AR==
CEN.AZ NDC.AZ==
NDC.AL==
NDC.OK==
NDC.CT==
CEN.CO NDC.CO==
CEN.CA NDC.CA==
CEN.SD NDC.SD==
NDC.SC==
CEN.MO
CEN.ND
CEN.NE
CEN.WI
NDC.LA==
NDC.MO==
CEN.UT NDC.UT==
NDC.GA==
NDC.PA==
CEN.NV
CEN.NM
CEN.WA
NDC.NY==
NDC.NV==
NDC.NM==
NDC.WA==
NDC.NH==
NDC.NJ==
NDC.ND==
NDC.NE==
NDC.WI==
NDC.NC==
NDC.West
NDC.Midwest
NDC.Northeast
NDC.Southeast
NDC.USA
NDC.Southwest
Nodes
CEN 54NDC 55 Edges
isa_CEN 53isa_NDC 54Art. 49
CEN.West
NDC.Southwest
CEN.USANDC.USA
CEN.Northeast
NDC.Northeast
CEN.SouthNDC.Southeast
NDC.West
CEN.DCNDC.DC
CEN.NMNDC.NM
CEN.NDNDC.ND
CEN.MidwestNDC.Midwest
CEN.AZNDC.AZ
CEN.CANDC.CA
CEN.MTNDC.MT
CEN.MANDC.MA
CEN.INNDC.IN
CEN.NVNDC.NV
CEN.MDNDC.MD
CEN.CTNDC.CT
CEN.NHNDC.NH
CEN.KYNDC.KY
CEN.PANDC.PA
CEN.CONDC.CO
CEN.WANDC.WA
CEN.MINDC.MI
CEN.VANDC.VA
CEN.WINDC.WI
CEN.NENDC.NE
CEN.SDNDC.SD
CEN.MNNDC.MN
CEN.MSNDC.MS
CEN.IDNDC.ID
CEN.WVNDC.WV
CEN.NYNDC.NY
CEN.NJNDC.NJ
CEN.UTNDC.UT
CEN.MENDC.ME
CEN.ILNDC.IL
CEN.TNNDC.TN
CEN.VTNDC.VT
CEN.GANDC.GA
CEN.DENDC.DE
CEN.NCNDC.NC
CEN.OKNDC.OK
CEN.MONDC.MO
CEN.SCNDC.SC
CEN.ARNDC.AR
CEN.TXNDC.TX
CEN.LANDC.LA
CEN.OHNDC.OH
CEN.IANDC.IA
CEN.KSNDC.KS
CEN.RINDC.RI
CEN.WYNDC.WY
CEN.FLNDC.FL
CEN.ORNDC.OR
CEN.ALNDC.AL
Nodes
CEN 3NDC 4comb 51 Edges
input 61inferred 3
overlapsinferred 3
CEN.Northeast
TZ.Eastern
<
CEN.Midwest><
TZ.Mountain
><
TZ.Pacific
!
CEN.South
><
><
!
TZ.Central
><
CEN.USA
CEN.West
TZ.USA
==
!
><
!
Nodes
CEN 5TZ 5
Edges
isa_CEN 4isa_TZ 4Art. 12
CEN.Midwest
CEN.USATZ.USA
TZ.Eastern
TZ.Central
TZ.Mountain
CEN.South
CEN.Northeast
CEN.West TZ.Pacific
Nodes
CEN 4comb 1TZ 4
Edges
input 7overlapsinput 6overlapsinferred 1
inferred 1
R1 R2
R3
R4
R5
R6 R7
R8
R9
Figure 2. The process of aligning taxonomies T1 and T2 with Euler/X
Figure 5. Top-downinput alignments between TCEN and TTZ
Figure 6. The unique PW for the TCEN with TTZ alignment
Figure 10. Combined concepts solution for TCEN and TTZ
taxonomy CEN Census_Regions(USA Northeast Midwest South West)(Northeast CT MA ME NH NJ NY PA RI VT)(Midwest IL IN IA KS MI MN MO NE ND OH SD WI)(South AL AR DE DC FL GA KY LA MD MS NC OK SC TN TX VA WV)(West AZ CA CO ID MT NV NM OR UT WA WY)
taxonomy NDC National_Diversity_Council(USA Midwest Northeast Southeast Southwest West)(Northeast CT DC DE MD MA ME NH NJ NY PA RI VT)(Midwest IA IL IN KS MI MN MO ND NE OH SD WI)(Southeast AL AR FL GA KY LA MS NC SC TN VA WV)(Southwest AZ NM OK TX)(West CA CO ID MT NV OR WA WY UT)
articulations CEN NDC[CEN.AL equals NDC.AL][CEN.AR equals NDC.AR][CEN.AZ equals NDC.AZ][CEN.CA equals NDC.CA][CEN.CO equals NDC.CO][CEN.CT equals NDC.CT][CEN.DC equals NDC.DC][CEN.DE equals NDC.DE][CEN.FL equals NDC.FL][CEN.GA equals NDC.GA][CEN.IA equals NDC.IA][CEN.ID equals NDC.ID][CEN.IL equals NDC.IL][CEN.IN equals NDC.IN][CEN.KS equals NDC.KS][CEN.KY equals NDC.KY][CEN.LA equals NDC.LA][CEN.MA equals NDC.MA][CEN.MD equals NDC.MD][CEN.ME equals NDC.ME][CEN.MI equals NDC.MI][CEN.MN equals NDC.MN]...
Quick Scan!
taxonomy CEN Census_Regions(USA Midwest South West Northeast)
taxonomy TZ Time_Zone(USA Pacific Mountain Central Eastern)
articulations CEN TZ[CEN.Midwest disjoint TZ.Pacific][CEN.Midwest overlaps TZ.Eastern][CEN.Midwest overlaps TZ.Mountain][CEN.Northeast is_included_in TZ.Eastern][CEN.South disjoint TZ.Pacific][CEN.South overlaps TZ.Central][CEN.South overlaps TZ.Eastern][CEN.South overlaps TZ.Mountain][CEN.USA equals TZ.USA][CEN.West disjoint TZ.Central][CEN.West disjoint TZ.Eastern][CEN.West overlaps TZ.Mountain]
TwoTaxonomies:NDC vs CEN
“…in the face of incompatible information or data structures among users or among thosespecifying the system, attempts to create unitary knowledge categories are futile. Rather, parallelor multiple representational forms are required” [Bowker & Star, 2000, p.159]
West
Southwest Southeast
Midwest North-east
West
South
Midwest North-east
NationalDiversityCouncilmap(NDC) USCensusBuero map(CEN)
Source:Yi-Yun(Jessica)Cheng(PhDstudent,iSchool @Illinois)
Thetaxonomies
11/01/17Cheng
• TheCensusRegionsMap(CEN),consistsoffour regions:West,Midwest,Northeast,andSouth,i.e.,thecontiguous48statesandWashingtonD.C.
West
South
Midwest
North-east
Thetaxonomies
• TheNationalDiversityCouncilMap(NDC),consistsoffiveregions:West,Southwest,Midwest,Northeast,Southeast,the48statesandWashingtonD.C.
NDC(withstates)
West
Southwest Southeast
Midwest North-east
• NDC splits South into SW and SE
• Do NDC and CEN agree on “West”? “Midwest”? …
• How can we sort this out?
Sortingthingsout…
11/01/17Cheng
CEN.Midwest
CEN.USA
CEN.South CEN.West CEN.Northeast NDC.Northeast
NDC.USA
NDC.Southeast NDC.Midwest NDC.Southwest NDC.West
Nodes
CEN 5NDC 6 Edges
is_a (CEN) 4is_a (NDC) 5
CEN.South
NDC.Northeast
o
NDC.Southwest
o
NDC.Southeast>
CEN.Midwest NDC.Midwest=
CEN.USA
CEN.West
CEN.NortheastNDC.USA
=
!
oNDC.West
>
<
Nodes
CEN 5NDC 6 Edges
is_a (CEN) 4is_a (NDC) 5articulations 9
CEN.Midwest
CEN.USA
CEN.South CEN.West CEN.Northeast NDC.Northeast
NDC.USA
NDC.Southeast NDC.Midwest NDC.Southwest NDC.West
Nodes
CEN 5NDC 6 Edges
is_a (CEN) 4is_a (NDC) 5
• Given:– taxonomiesT1,T2– andrelationsT1~T2
(articulations,alignment)• Find:
– mergedtaxonomyT3• Suchthat:
– T1,T2arepreserved– allpairwiserelationsare
explicit
T1 T2
5waystorelateconcepts(regions)
• Idea:relateconceptsXandYwitharticulations
• ArticulationLanguage:RegionConnectionCalculus (RCC5):congruence,inclusion,inverseinclusion,overlap,disjointness
Y X X YX Y X Y X Y
CongruenceX == Y
InclusionX > Y
Inverse InclusionX < Y
OverlapX>< Y
DisjointnessX ! Y
CEN.South
NDC.Northeast
><
NDC.Southwest
><
NDC.Southeast>
CEN.Midwest NDC.Midwest==
CEN.USA
CEN.West
CEN.NortheastNDC.USA
==
!
><NDC.West
>
<
Nodes
CEN 5NDC 6 Edges
is_a (CEN) 4is_a (NDC) 5articulations 9
MergedtaxonomyT3
CEN.South
NDC.Northeast
NDC.Southwest
CEN.USANDC.USA
CEN.West
CEN.Northeast
NDC.Southeast
NDC.West
CEN.MidwestNDC.Midwest
Nodes
CEN 3NDC 4
congruent 2 Edges
is_a (input) 8overlaps (input) 3
CEN.Midwest
CEN.USA
CEN.South CEN.West CEN.Northeast NDC.Northeast
NDC.USA
NDC.Southeast NDC.Midwest NDC.Southwest NDC.West
Nodes
CEN 5NDC 6 Edges
is_a (CEN) 4is_a (NDC) 5
CEN.Midwest
CEN.USA
CEN.South CEN.West CEN.Northeast NDC.Northeast
NDC.USA
NDC.Southeast NDC.Midwest NDC.Southwest NDC.West
Nodes
CEN 5NDC 6 Edges
is_a (CEN) 4is_a (NDC) 5
CEN.South
NDC.Northeast
><
NDC.Southwest
><
NDC.Southeast>
CEN.Midwest NDC.Midwest==
CEN.USA
CEN.West
CEN.NortheastNDC.USA
==
!
><NDC.West
>
<
Nodes
CEN 5NDC 6 Edges
is_a (CEN) 4is_a (NDC) 5articulations 9
T1 T2
T1~T2 T3
HowwealigntwotaxonomiesT1andT2
• Step1. SupplyinputtaxonomiesT1andT2
• Step2.DescribetherelationshipsbetweenT1 andT2
• Step3. IterativelyeditarticulationsinEuler/X
T1 T2
T1 T2
Inconsistent (N=0) Ambiguous (N>1)
T3
Add/Edit Articulations A
Euler/X
N Possible Worlds
N=1 N=0 or N>1
• … but where do the articulationscome from??– expert opinion– automatically derived from data
Case1:CensusRegionvs.NationalDiversityCouncil
Cheng
West
South
Midwest
North-east
NDC(withstates)
West
Southwest Southeast
Midwest North-east
CEN NDC
• … but where do the articulationscome from??– automatically derived from data– expert input
11/01/17Cheng
CEN.IL NDC.IL==
CEN.IN NDC.IN==
CEN.RI NDC.RI==
CEN.IA NDC.IA==
CEN.WV NDC.WV==
CEN.KS NDC.KS==
CEN.KY NDC.KY==
CEN.TX NDC.TX==
CEN.NortheastCEN.VTCEN.MA
CEN.ME
CEN.CT
CEN.PA
CEN.NY
CEN.NH
CEN.NJ
CEN.South
CEN.TN
CEN.MS
CEN.MD
CEN.DC
CEN.DE
CEN.VA
CEN.FL
CEN.AR
CEN.AL
CEN.OK
CEN.SC
CEN.LACEN.GA
CEN.NC
CEN.ID NDC.ID==
NDC.TN==
CEN.WY NDC.WY==
NDC.VT==
NDC.MS==
CEN.MT NDC.MT==
NDC.MA==
CEN.USA
CEN.Midwest
CEN.West
NDC.ME==
NDC.MD==
CEN.MI NDC.MI==
CEN.MN NDC.MN==
NDC.DC==
NDC.DE==
CEN.OR NDC.OR==
CEN.OH NDC.OH==
NDC.VA==
NDC.FL==
NDC.AR==
CEN.AZ NDC.AZ==
NDC.AL==
NDC.OK==
NDC.CT==
CEN.CO NDC.CO==
CEN.CA NDC.CA==
CEN.SD NDC.SD==
NDC.SC==
CEN.MO
CEN.ND
CEN.NE
CEN.WI
NDC.LA==
NDC.MO==
CEN.UT NDC.UT==
NDC.GA==
NDC.PA==
CEN.NV
CEN.NM
CEN.WA
NDC.NY==
NDC.NV==
NDC.NM==
NDC.WA==
NDC.NH==
NDC.NJ==
NDC.ND==
NDC.NE==
NDC.WI==
NDC.NC==
NDC.West
NDC.Midwest
NDC.Northeast
NDC.Southeast
NDC.USA
NDC.Southwest
Nodes
CEN 54NDC 55 Edges
isa_CEN 53isa_NDC 54Art. 49
CEN.IL NDC.IL==
CEN.IN NDC.IN==
CEN.RI NDC.RI==
CEN.IA NDC.IA==
CEN.WV NDC.WV==
CEN.KS NDC.KS==
CEN.KY NDC.KY==
CEN.TX NDC.TX==
CEN.NortheastCEN.VTCEN.MA
CEN.ME
CEN.CT
CEN.PA
CEN.NY
CEN.NH
CEN.NJ
CEN.South
CEN.TN
CEN.MS
CEN.MD
CEN.DC
CEN.DE
CEN.VA
CEN.FL
CEN.AR
CEN.AL
CEN.OK
CEN.SC
CEN.LACEN.GA
CEN.NC
CEN.ID NDC.ID==
NDC.TN==
CEN.WY NDC.WY==
NDC.VT==
NDC.MS==
CEN.MT NDC.MT==
NDC.MA==
CEN.USA
CEN.Midwest
CEN.West
NDC.ME==
NDC.MD==
CEN.MI NDC.MI==
CEN.MN NDC.MN==
NDC.DC==
NDC.DE==
CEN.OR NDC.OR==
CEN.OH NDC.OH==
NDC.VA==
NDC.FL==
NDC.AR==
CEN.AZ NDC.AZ==
NDC.AL==
NDC.OK==
NDC.CT==
CEN.CO NDC.CO==
CEN.CA NDC.CA==
CEN.SD NDC.SD==
NDC.SC==
CEN.MO
CEN.ND
CEN.NE
CEN.WI
NDC.LA==
NDC.MO==
CEN.UT NDC.UT==
NDC.GA==
NDC.PA==
CEN.NV
CEN.NM
CEN.WA
NDC.NY==
NDC.NV==
NDC.NM==
NDC.WA==
NDC.NH==
NDC.NJ==
NDC.ND==
NDC.NE==
NDC.WI==
NDC.NC==
NDC.West
NDC.Midwest
NDC.Northeast
NDC.Southeast
NDC.USA
NDC.Southwest
Nodes
CEN 54NDC 55 Edges
isa_CEN 53isa_NDC 54Art. 49
11/01/17Cheng
CEN.West
NDC.Southwest
CEN.USANDC.USA
CEN.Northeast
NDC.Northeast
CEN.SouthNDC.Southeast
NDC.West
CEN.DCNDC.DC
CEN.NMNDC.NM
CEN.NDNDC.ND
CEN.MidwestNDC.Midwest
CEN.AZNDC.AZ
CEN.CANDC.CA
CEN.MTNDC.MT
CEN.MANDC.MA
CEN.INNDC.IN
CEN.NVNDC.NV
CEN.MDNDC.MD
CEN.CTNDC.CT
CEN.NHNDC.NH
CEN.KYNDC.KY
CEN.PANDC.PA
CEN.CONDC.CO
CEN.WANDC.WA
CEN.MINDC.MI
CEN.VANDC.VA
CEN.WINDC.WI
CEN.NENDC.NE
CEN.SDNDC.SD
CEN.MNNDC.MN
CEN.MSNDC.MS
CEN.IDNDC.ID
CEN.WVNDC.WV
CEN.NYNDC.NY
CEN.NJNDC.NJ
CEN.UTNDC.UT
CEN.MENDC.ME
CEN.ILNDC.IL
CEN.TNNDC.TN
CEN.VTNDC.VT
CEN.GANDC.GA
CEN.DENDC.DE
CEN.NCNDC.NC
CEN.OKNDC.OK
CEN.MONDC.MO
CEN.SCNDC.SC
CEN.ARNDC.AR
CEN.TXNDC.TX
CEN.LANDC.LA
CEN.OHNDC.OH
CEN.IANDC.IA
CEN.KSNDC.KS
CEN.RINDC.RI
CEN.WYNDC.WY
CEN.FLNDC.FL
CEN.ORNDC.OR
CEN.ALNDC.AL
Nodes
CEN 3NDC 4comb 51 Edges
input 61inferred 3
overlapsinferred 3
CEN.West
NDC.Southwest
CEN.USANDC.USA
CEN.Northeast
NDC.Northeast
CEN.SouthNDC.Southeast
NDC.West
CEN.DCNDC.DC
CEN.NMNDC.NM
CEN.NDNDC.ND
CEN.MidwestNDC.Midwest
CEN.AZNDC.AZ
CEN.CANDC.CA
CEN.MTNDC.MT
CEN.MANDC.MA
CEN.INNDC.IN
CEN.NVNDC.NV
CEN.MDNDC.MD
CEN.CTNDC.CT
CEN.NHNDC.NH
CEN.KYNDC.KY
CEN.PANDC.PA
CEN.CONDC.CO
CEN.WANDC.WA
CEN.MINDC.MI
CEN.VANDC.VA
CEN.WINDC.WI
CEN.NENDC.NE
CEN.SDNDC.SD
CEN.MNNDC.MN
CEN.MSNDC.MS
CEN.IDNDC.ID
CEN.WVNDC.WV
CEN.NYNDC.NY
CEN.NJNDC.NJ
CEN.UTNDC.UT
CEN.MENDC.ME
CEN.ILNDC.IL
CEN.TNNDC.TN
CEN.VTNDC.VT
CEN.GANDC.GA
CEN.DENDC.DE
CEN.NCNDC.NC
CEN.OKNDC.OK
CEN.MONDC.MO
CEN.SCNDC.SC
CEN.ARNDC.AR
CEN.TXNDC.TX
CEN.LANDC.LA
CEN.OHNDC.OH
CEN.IANDC.IA
CEN.KSNDC.KS
CEN.RINDC.RI
CEN.WYNDC.WY
CEN.FLNDC.FL
CEN.ORNDC.OR
CEN.ALNDC.AL
Nodes
CEN 3NDC 4comb 51 Edges
input 61inferred 3
overlapsinferred 3
USA,MidwestandState-levelalignmentsareallcongruent
11/01/17Cheng
CEN.West
NDC.Southwest
CEN.USANDC.USA
CEN.Northeast
NDC.Northeast
CEN.SouthNDC.Southeast
NDC.West
CEN.DCNDC.DC
CEN.NMNDC.NM
CEN.NDNDC.ND
CEN.MidwestNDC.Midwest
CEN.AZNDC.AZ
CEN.CANDC.CA
CEN.MTNDC.MT
CEN.MANDC.MA
CEN.INNDC.IN
CEN.NVNDC.NV
CEN.MDNDC.MD
CEN.CTNDC.CT
CEN.NHNDC.NH
CEN.KYNDC.KY
CEN.PANDC.PA
CEN.CONDC.CO
CEN.WANDC.WA
CEN.MINDC.MI
CEN.VANDC.VA
CEN.WINDC.WI
CEN.NENDC.NE
CEN.SDNDC.SD
CEN.MNNDC.MN
CEN.MSNDC.MS
CEN.IDNDC.ID
CEN.WVNDC.WV
CEN.NYNDC.NY
CEN.NJNDC.NJ
CEN.UTNDC.UT
CEN.MENDC.ME
CEN.ILNDC.IL
CEN.TNNDC.TN
CEN.VTNDC.VT
CEN.GANDC.GA
CEN.DENDC.DE
CEN.NCNDC.NC
CEN.OKNDC.OK
CEN.MONDC.MO
CEN.SCNDC.SC
CEN.ARNDC.AR
CEN.TXNDC.TX
CEN.LANDC.LA
CEN.OHNDC.OH
CEN.IANDC.IA
CEN.KSNDC.KS
CEN.RINDC.RI
CEN.WYNDC.WY
CEN.FLNDC.FL
CEN.ORNDC.OR
CEN.ALNDC.AL
Nodes
CEN 3NDC 4comb 51 Edges
input 61inferred 3
overlapsinferred 3
CEN.West
NDC.Southwest
CEN.USANDC.USA
CEN.Northeast
NDC.Northeast
CEN.SouthNDC.Southeast
NDC.West
CEN.DCNDC.DC
CEN.NMNDC.NM
CEN.NDNDC.ND
CEN.MidwestNDC.Midwest
CEN.AZNDC.AZ
CEN.CANDC.CA
CEN.MTNDC.MT
CEN.MANDC.MA
CEN.INNDC.IN
CEN.NVNDC.NV
CEN.MDNDC.MD
CEN.CTNDC.CT
CEN.NHNDC.NH
CEN.KYNDC.KY
CEN.PANDC.PA
CEN.CONDC.CO
CEN.WANDC.WA
CEN.MINDC.MI
CEN.VANDC.VA
CEN.WINDC.WI
CEN.NENDC.NE
CEN.SDNDC.SD
CEN.MNNDC.MN
CEN.MSNDC.MS
CEN.IDNDC.ID
CEN.WVNDC.WV
CEN.NYNDC.NY
CEN.NJNDC.NJ
CEN.UTNDC.UT
CEN.MENDC.ME
CEN.ILNDC.IL
CEN.TNNDC.TN
CEN.VTNDC.VT
CEN.GANDC.GA
CEN.DENDC.DE
CEN.NCNDC.NC
CEN.OKNDC.OK
CEN.MONDC.MO
CEN.SCNDC.SC
CEN.ARNDC.AR
CEN.TXNDC.TX
CEN.LANDC.LA
CEN.OHNDC.OH
CEN.IANDC.IA
CEN.KSNDC.KS
CEN.RINDC.RI
CEN.WYNDC.WY
CEN.FLNDC.FL
CEN.ORNDC.OR
CEN.ALNDC.AL
Nodes
CEN 3NDC 4comb 51 Edges
input 61inferred 3
overlapsinferred 3
Theoverlappingrelationsareautomaticallyderivedfromdata
11/01/17Cheng
CEN.West
NDC.Southwest
CEN.USANDC.USA
CEN.Northeast
NDC.Northeast
CEN.SouthNDC.Southeast
NDC.West
CEN.DCNDC.DC
CEN.NMNDC.NM
CEN.NDNDC.ND
CEN.MidwestNDC.Midwest
CEN.AZNDC.AZ
CEN.CANDC.CA
CEN.MTNDC.MT
CEN.MANDC.MA
CEN.INNDC.IN
CEN.NVNDC.NV
CEN.MDNDC.MD
CEN.CTNDC.CT
CEN.NHNDC.NH
CEN.KYNDC.KY
CEN.PANDC.PA
CEN.CONDC.CO
CEN.WANDC.WA
CEN.MINDC.MI
CEN.VANDC.VA
CEN.WINDC.WI
CEN.NENDC.NE
CEN.SDNDC.SD
CEN.MNNDC.MN
CEN.MSNDC.MS
CEN.IDNDC.ID
CEN.WVNDC.WV
CEN.NYNDC.NY
CEN.NJNDC.NJ
CEN.UTNDC.UT
CEN.MENDC.ME
CEN.ILNDC.IL
CEN.TNNDC.TN
CEN.VTNDC.VT
CEN.GANDC.GA
CEN.DENDC.DE
CEN.NCNDC.NC
CEN.OKNDC.OK
CEN.MONDC.MO
CEN.SCNDC.SC
CEN.ARNDC.AR
CEN.TXNDC.TX
CEN.LANDC.LA
CEN.OHNDC.OH
CEN.IANDC.IA
CEN.KSNDC.KS
CEN.RINDC.RI
CEN.WYNDC.WY
CEN.FLNDC.FL
CEN.ORNDC.OR
CEN.ALNDC.AL
Nodes
CEN 3NDC 4comb 51 Edges
input 61inferred 3
overlapsinferred 3
CEN.West
NDC.Southwest
CEN.USANDC.USA
CEN.Northeast
NDC.Northeast
CEN.SouthNDC.Southeast
NDC.West
CEN.DCNDC.DC
CEN.NMNDC.NM
CEN.NDNDC.ND
CEN.MidwestNDC.Midwest
CEN.AZNDC.AZ
CEN.CANDC.CA
CEN.MTNDC.MT
CEN.MANDC.MA
CEN.INNDC.IN
CEN.NVNDC.NV
CEN.MDNDC.MD
CEN.CTNDC.CT
CEN.NHNDC.NH
CEN.KYNDC.KY
CEN.PANDC.PA
CEN.CONDC.CO
CEN.WANDC.WA
CEN.MINDC.MI
CEN.VANDC.VA
CEN.WINDC.WI
CEN.NENDC.NE
CEN.SDNDC.SD
CEN.MNNDC.MN
CEN.MSNDC.MS
CEN.IDNDC.ID
CEN.WVNDC.WV
CEN.NYNDC.NY
CEN.NJNDC.NJ
CEN.UTNDC.UT
CEN.MENDC.ME
CEN.ILNDC.IL
CEN.TNNDC.TN
CEN.VTNDC.VT
CEN.GANDC.GA
CEN.DENDC.DE
CEN.NCNDC.NC
CEN.OKNDC.OK
CEN.MONDC.MO
CEN.SCNDC.SC
CEN.ARNDC.AR
CEN.TXNDC.TX
CEN.LANDC.LA
CEN.OHNDC.OH
CEN.IANDC.IA
CEN.KSNDC.KS
CEN.RINDC.RI
CEN.WYNDC.WY
CEN.FLNDC.FL
CEN.ORNDC.OR
CEN.ALNDC.AL
Nodes
CEN 3NDC 4comb 51 Edges
input 61inferred 3
overlapsinferred 3
DCisinboththeSouthandtheNortheast
Case2:CensusRegionvsTimeZone
Cheng
PacificMountain
CentralEastern
West
South
Midwest
North-east
CEN TZ
• … but where do the articulationscome from??– automatically derived from data– expert input
Cheng
CEN.Northeast
TZ.Eastern
<
CEN.Midwest><
TZ.Mountain
><
TZ.Pacific
!
CEN.South
><
><
!
TZ.Central
><
CEN.USA
CEN.West
TZ.USA
==
!
><
!
Nodes
CEN 5TZ 5
Edges
isa_CEN 4isa_TZ 4Art. 12
CEN.Midwest
CEN.USATZ.USA
TZ.Eastern
TZ.Central
TZ.Mountain
CEN.South
CEN.Northeast
CEN.West TZ.Pacific
Nodes
CEN 4comb 1TZ 4
Edges
input 7overlapsinput 6overlapsinferred 1
inferred 1
InputOutput:PossibleWorld
Top-downregionalalignment
Howdoweknowifour‘expertarticulations’arecorrect?
11/01/17Cheng
R1 R2
R3
R4
R5
R6 R7
R8
R9
GIS solution as the Ground Truth..
11/01/17Cheng
R1
R2
R3
R4
R5
R6
R7
R8
R9
CEN.Midwest
CEN.USATZ.USA
CEN.West
CEN.NortheastTZ.Eastern\CEN.Midwest
TZ.Eastern\CEN.South
CEN.South
CEN.South*TZ.CentralTZ.Central\CEN.Midwest
CEN.South\TZ.Eastern
CEN.South\TZ.Mountain
TZ.Central
CEN.Midwest\TZ.Eastern
TZ.Mountain\CEN.SouthTZ.Mountain
CEN.Midwest\TZ.Mountain
TZ.Mountain\CEN.Midwest
CEN.Midwest*TZ.Mountain
CEN.Midwest\TZ.Central
TZ.Mountain\CEN.West
CEN.Midwest*TZ.Eastern
CEN.West*TZ.Mountain
CEN.South*TZ.MountainCEN.South\TZ.Central
TZ.Eastern
CEN.South*TZ.Eastern
CEN.Midwest*TZ.CentralTZ.Central\CEN.South
TZ.PacificCEN.West\TZ.Mountain
Nodes
CEN 4newComb 18comb 1TZ 4
Edges
input 6inferred 37
Combinedconceptssolutionforregional-levelalignments
DothetaxonomieshavetobespatialinordertouseRCC-5?
• No!Themoretypicalcasesfortaxonomyalignmentareusuallybetweennon-spatialtaxonomies– forwhichno“GISroute”ordirectvisualcuesaboutregionalextensionsareavailable
– theuseofRCC-5asanalignmentvocabularyisasuitableapproachtoperformawiderangeofmulti-hierarchyreconciliations
Cheng
Conclusion&Discussion• Underscoresthebenefitsofdesigningdifferentalignmentworkflows(Bottom-upvs.Top-Down)– Bottom-up:non-overlappingrelationshipsatthelowest-levelarticulations,notsurehowtoalignthehigher-levelconcepts
– Top-Down:whenthereisoftenoverlappingleaf-levelrelations..Expertinputwillfrequentlybeneededtoestablishsuchexpectationsunderthetop-downapproach
11/01/17Cheng
https://github.com/EulerProject/ASIST17yiyunyc2@illinois.edu
Implications
• Logic-basedtaxonomyalignmentapproach– Disambiguatename-basedtaxonomyalignmentovertime
• 40%oftheconceptsinbiologytaxonomiesundergoesnamechangeovertime(Franzetal.,2016)
– Maymitigateproblemsinequivalentcrosswalking• Membershipconditionproblemthatwasoftencriticizedincrosswalking
– Preservestheoriginaltaxonomieswhileprovidinganalignmentview
• Solvedataintegrationproblemsthathappeninthemorecoarse-grainedrelativecrosswalking
11/01/17Cheng
https://github.com/EulerProject/ASIST17yiyunyc2@illinois.edu
• …Aristotle…• …Euler…• …• …GregWhitbread…
• [BPB93]J.H.Beach,S.Pramanik,andJ.H.Beaman.Hierarchictaxonomicdatabases.,Advances inComputerMethodsforSystematicBiology:ArtificialIntelligence,Databases,ComputerVision,1993
• [Ber95]WalterG.Berendsohn.Theconceptof“potentialtaxa” indatabases.Taxon,44:207–212,1995.
• [Ber03]WalterG.Berendsohn.MoReTax – HandlingFactualInformationLinkedtoTaxonomicConceptsinBiology.No.39inSchriftenreihe fürVegetationskunde.Bundesamt für Naturschutz,2003.
• [GG03]M.Geoffroy andA.Güntsch.Assemblingandnavigatingthepotentialtaxongraph.In[Ber03],pages71–82,2003.
• [TL07]Thau,D.,&Ludäscher,B.(2007).Reasoningabouttaxonomiesinfirst-orderlogic.EcologicalInformatics,2(3),195-209.
• [FP09]Franz,N.M.,&Peet,R.K.(2009).Perspectives:towardsalanguageformappingrelationshipsamongtaxonomicconcepts.SystematicsandBiodiversity,7(1),5-20.
• … 85
SomeHistory
Recommended