Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
•2
TheDataSciencesGroupatNASAAmes
DataMiningResearchandDevelopment(R&D)forapplicationtoNASAproblems(Aeronautics,EarthScience,SpaceExploration,SpaceScience)
GroupMembersIlya AvrekhKamalika Das,Ph.D.DaveIversonVijayJanakiraman,Ph.D.RodneyMartin,Ph.D.BryanMatthewsDavidNielsenNikunjOza,Ph.D.VeronicaPhillipsJohnStutzHamed Valizadegan,Ph.D.+summerstudents
FundingSources
• ScienceMissionDirectorate:AISTandCMACprograms
• NASAAeronauticsResearchMissionDirectorate- ATD,SMART-NAS,SASOProject
• NASAEngineeringandSafetyCenter
• ExplorationSystemsMissionDirectorate,ExplorationTechnologyDevelopmentProgram
• Non-NASA:DARPA,DoD
Team Members are NASA Employees, Contractors, and Students.
ExampleDataMiningProblems
• Aeronautics:AnomalyDetection,PrecursorIdentification,textmining(classification,topicidentification)
• EarthScience:Fillinginmissingmeasurements,anomalydetection,teleconnections,climateunderstanding
• SpaceScience:Kepler planetcandidates• SpaceExploration:systemhealthmanagement,vascularstructureidentification
FourV’sofBig Tough,SleepDeprivingData
ØVolume:Ø RadarTracks:47facilities(1
year)~423GB(Compressed),~3.2TB(CSV)
Ø WeatherandForecast(EntireNAS):CIWS~2.8TB
ØVeracityØ DatadropoutsØ DuplicatetracksØ TrackendinginmidairØ Reusedflightidentifiers
ØVelocityØ RadarTracks:47Facilities
Ø ~35GB/month(compressed).
Ø ~268GB/month(uncompressed)
Ø WeatherandForecast(EntireNAS):CIWS~233GB/month
ØVarietyØ Numerical
(continuous/binary)Ø Weather(forecast/actual)Ø Radar/AirportmetadataØ ATCVoiceØ ASRStextreports
(Pilot/Controller)
AmazingAlgorithm
IntuitiveReports
AeronauticsDataMiningProblems
• AnomalyDetection– AnomalyDiscoveryoverlargesetofvariables– Particularvariableofinterest,forexample,fuelburn
• Determineexpectedinstantaneousfuelburngivencurrentstateofaircraft
• Comparewithactualinstantaneousfuelburn• Wheredifferenceishigh,problemmaybeoccurring
• PrecursorIdentification– Givenundesirableeffect(e.g.,go-around),identifyprecursors(e.g.,overtakesituation,highspeedapproach)
• Textmining– Textclassification,topicidentification
TopicExtractionExampleTOPIC1
autopltacftspd
capturemoderatelevel
engagedleveloffvertctl
disconnectedselectedfpmlightclbpitch
manuallywarningpwr
TOPIC2
timedayleg
contributingfactorshrscrewfactorfatiguenighttriprestdutyflyinglonglate
previousincidentlack
alerter
TOPIC3apchrwyvisualilstwrlndglocarptfinal
missedclredmsl
interceptvectoredsightgar
terrainfield
uneventfulctl
Otherexamplesof‘fatigue’
AltitudeDeviationSpatialDeviationRampExcursionLandingwithoutclearanceRunwayIncursionUnstableApproach
AeronauticsAnomalyDetection:CurrentMethods
Exceedance-BasedMethods• Knownanomalies• Conditionsover2-3variables(e.g.,speed>250knots,altitude=1000ft,landing)
• Cannotidentifyunknownanomalies• Lowfalsepositiverate,highfalsenegative(misseddetection)rate.
Data-DrivenMethods
• DISCOVERanomaliesby– learningstatisticalpropertiesofthedata– findingwhichdatapointsdonotfit(e.g.,faraway,lowprobability)
– nobackgroundknowledgeonanomaliesneeded
• Complementarytoexistingmethods– Lowfalsenegative(misseddetection)rate– Higherfalsepositiverate(identifiedpoints/flightsunusual,butnotalwaysoperationallysignificant)
• Data-drivenmethods->insights->modificationofexceedance detection
Example:HighSpeedGo-Around
• OvershootsExtendedRunwayCenterline(ERC)byover1SM
• Over250Kts @2500Ft.• Angleofintercept>40°• Overshoots2nd approach
BryanMatthews,DavidNielsen,JohnSchade,Kennis Chan,andMikeKiniry,AutomatedDiscoveryofFlightTrackAnomalies,33rd DigitalAvionicsSystemsConference,2014
ProvidingDomainExpertFeedbackActive learning with rationales frameworkTraining
Input Features
MKAD
Nominals
Anomalies
SME
Activelearning strategy
Testing
Input Features
MKAD
Nominals
Anomalies
Rationale features
2-class classification/ranking
algorithm
Uninteresting anomalies
Operationally significant anomalies
Manali Sharma,Kamalika Das,MustafaBilgic,BryanMatthews,DavidNielsen,andNikunjOza,ActiveLearningwithRationalesforIdentifyingOperationallySignificantAnomaliesinAviation,EuropeanConferenceonMachineLearningandPrinciplesandPractices
OfKnowledgeDiscovery(ECML-PKDD),2016
EarthScienceExample
• Understandrelationshipsbetweenecosystemdynamics andclimaticfactors
• Modelasaregressionanalysisproblem• 3sciencequestions– Magnitudeandextentofecosystemexposure,sensitivityandresiliencetothe2005and2010Amazondroughts
– Understandhuman-inducedandotherattributionascausesofvegetationanomalies
– Howlearneddependencymodelvariesacrosseco-climaticzonesandgeographicalregionsonaglobalscale
NASAESTOAIST-14project,UncoveringEffectsofClimateVariablesonGlobalVegetation(PI:Kamalika Das,Ph.D.)
ProblemFormulation
• Point-to-pointregressionanalysis(GeneticProgrammingbasedSymbolicRegression)
• Estimatespatio-temporaldependencyofforestecosystemsonclimatevariables
Vijt=f(Lcij, CVij
t, CVnbt, CVij
t-1, CVnbt-1,.....CVij
t-k, CVnbt-k)
V:vegetation,LC:landcover type,CV:climate variable(s)
i,j:pixellocationindicest:timeindexnb:spatialneighborhoodof
indexi,jk:temporaldependencyOpenchallenges: 1.Estimatingfunctionf
2.Estimatingbestchoicesfork,nb
DataPipeline
NDVIResolution: 250 m
Projection: Sinusoidal
LSTResolution: 1 km
Projection: Sinusoidal
TRMM (Ver 6)Resolution: 25 kmProjection: WGS84
Reprojectand
resampledata
2000 – 2010 Monthly data
NDVI, TRMM, LST
Resolution: 1 kmProjection: WGS84
Time-Series:Changetoseasonal
2000 – 2010 Seasonal data
Monthly -> Seasonal
4 Seasons/yea
r
Season 1: March – MaySeason 2: June – SepSeason 3: OctSeason 4: Nov - Feb
Windowing:Smoothingover
25x25sizewindow
Filterdatabasedonlandcover
Resultsfor2004-2010
Mean Squared Error
Year RidgeRegression LASSO SVR Symbolic
Regression
2004 0.284 0.284 0.280 0.262
2005 0.289 0.289 0.288 0.278
2006 0.426 0.426 0.430 0.321
2007 0.374 0.374 0.370 0.318
2008 0.308 0.308 0.310 0.336
2009 0.353 0.353 0.360 0.328
2010 0.546 0.547 0.540 0.479
Marcin Szubert,Anuradha Kodali,Sangram Ganguly,Kamalika Das,andJoshC.Bongard,ReducingAntagonismbetweenBehavioralDiversityandFitnessinSemanticGeneticProgramming,ProceedingsoftheGeneticandEvolutionaryComputation
Conference(GECCO),pp.797-804,2016.
OngoingandFutureWork• Experimentwithdifferentcombinationsoftemporal
lookback and/orspatialeffects• Introduceadditionalregressors(radiation,forestfiremaps,
deforestationmaps)• StudytheeffectofdifferentregressorsondifferentAmazon
tiles• DerivenonlinearGPmodelsonAmazontiles• Givenappropriatehistoricaldata,havetheabilitytopredict:
“Underwhatconditionsdoesvegetationnotrecoverwithinacertaintimeframe.”
• Doglobalscaleanalysisinparallel
VESsel GENeration (VESGEN)AnalysisPatriciaParsons-Wingerter,PhD,NASAChiefInnovator/POCNASAAmes2016InnovationFundAward,ChiefTechnologist’sOffice
• VESGEN2Dmapsandquantifiesvascularremodelingforawidevarietyofquasi-2Dvascularizedbiomedicaltissueapplications.
• WorkingontransformingtoVESGEN3D,inlinewithmostvascularizedorgansandtissuesinhumansandvertebrates.
• Vascular-dependentdiseasesincludecancer,diabetes,coronaryvesseldisease,andmajorastronauthealthchallengesinthespacemicrogravityandradiationenvironments,especiallyforlong-durationmissions.
• Onekeycomponentisbinarization:conversionofgrayscaleimagestoblack/whitevascularbranchingpatterns.– Takes10-25hoursofhumaneffort.– Exploringpatternrecognition,matchingfiltering,vessel
tracking/tracing,mathematicalmorphology,multiscaleapproaches,andmodelbasedalgorithms.
OTSUThresholding
OTSUvs.AdaptiveThresholding
FutureWork• Workinprogress:exploringmorepreprocessingandpost-processingtechniques
• Eachstepofpreprocessingandpostprocessing hassomeinputparameters– Theresultissensitivetothisparameters– Weaimtomaketheparameterselectioneitherautomated(machinelearning)orsemi-automated(usercanchoosetherightparameter)
• MachineLearningtolearnthebinarization– Giventhemanuallabels,performsupervisedorsemi-supervisedlearning
– Eachpixelanditsclasslabel(foregroundorbackground)isthetrainingexample
How dowegettheWordOut?DASHlink
disseminate.collaborate.innovate.https://dashlink.ndc.nasa.gov/
DASHlinkisacollaborativewebsitedesignedtopromote:• Sustainability• Reproducibility• Dissemination• Communitybuilding
Userscancreateprofiles• Sharepapers,uploadanddownloadopensourcealgorithms• FindNASAdatasets.