21
NASA Ames Data Sciences Group Nikunj C. Oza, Ph.D. Leader, Data Sciences Group [email protected] www.nasa.gov •1

NASA Ames Data Sciences Group - Amazon Web …...Vijay Janakiraman, Ph.D. Rodney Martin, Ph.D. Bryan Matthews David Nielsen Nikunj Oza, Ph.D. Veronica Phillips John Stutz HamedValizadegan

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: NASA Ames Data Sciences Group - Amazon Web …...Vijay Janakiraman, Ph.D. Rodney Martin, Ph.D. Bryan Matthews David Nielsen Nikunj Oza, Ph.D. Veronica Phillips John Stutz HamedValizadegan

NASAAmesDataSciencesGroupNikunjC.Oza,Ph.D.

Leader,[email protected]

www.nasa.gov •1

Page 2: NASA Ames Data Sciences Group - Amazon Web …...Vijay Janakiraman, Ph.D. Rodney Martin, Ph.D. Bryan Matthews David Nielsen Nikunj Oza, Ph.D. Veronica Phillips John Stutz HamedValizadegan

•2

TheDataSciencesGroupatNASAAmes

DataMiningResearchandDevelopment(R&D)forapplicationtoNASAproblems(Aeronautics,EarthScience,SpaceExploration,SpaceScience)

GroupMembersIlya AvrekhKamalika Das,Ph.D.DaveIversonVijayJanakiraman,Ph.D.RodneyMartin,Ph.D.BryanMatthewsDavidNielsenNikunjOza,Ph.D.VeronicaPhillipsJohnStutzHamed Valizadegan,Ph.D.+summerstudents

FundingSources

• ScienceMissionDirectorate:AISTandCMACprograms

• NASAAeronauticsResearchMissionDirectorate- ATD,SMART-NAS,SASOProject

• NASAEngineeringandSafetyCenter

• ExplorationSystemsMissionDirectorate,ExplorationTechnologyDevelopmentProgram

• Non-NASA:DARPA,DoD

Team Members are NASA Employees, Contractors, and Students.

Page 3: NASA Ames Data Sciences Group - Amazon Web …...Vijay Janakiraman, Ph.D. Rodney Martin, Ph.D. Bryan Matthews David Nielsen Nikunj Oza, Ph.D. Veronica Phillips John Stutz HamedValizadegan

ExampleDataMiningProblems

• Aeronautics:AnomalyDetection,PrecursorIdentification,textmining(classification,topicidentification)

• EarthScience:Fillinginmissingmeasurements,anomalydetection,teleconnections,climateunderstanding

• SpaceScience:Kepler planetcandidates• SpaceExploration:systemhealthmanagement,vascularstructureidentification

Page 4: NASA Ames Data Sciences Group - Amazon Web …...Vijay Janakiraman, Ph.D. Rodney Martin, Ph.D. Bryan Matthews David Nielsen Nikunj Oza, Ph.D. Veronica Phillips John Stutz HamedValizadegan

FourV’sofBig Tough,SleepDeprivingData

ØVolume:Ø RadarTracks:47facilities(1

year)~423GB(Compressed),~3.2TB(CSV)

Ø WeatherandForecast(EntireNAS):CIWS~2.8TB

ØVeracityØ DatadropoutsØ DuplicatetracksØ TrackendinginmidairØ Reusedflightidentifiers

ØVelocityØ RadarTracks:47Facilities

Ø ~35GB/month(compressed).

Ø ~268GB/month(uncompressed)

Ø WeatherandForecast(EntireNAS):CIWS~233GB/month

ØVarietyØ Numerical

(continuous/binary)Ø Weather(forecast/actual)Ø Radar/AirportmetadataØ ATCVoiceØ ASRStextreports

(Pilot/Controller)

AmazingAlgorithm

IntuitiveReports

Page 5: NASA Ames Data Sciences Group - Amazon Web …...Vijay Janakiraman, Ph.D. Rodney Martin, Ph.D. Bryan Matthews David Nielsen Nikunj Oza, Ph.D. Veronica Phillips John Stutz HamedValizadegan

AeronauticsDataMiningProblems

• AnomalyDetection– AnomalyDiscoveryoverlargesetofvariables– Particularvariableofinterest,forexample,fuelburn

• Determineexpectedinstantaneousfuelburngivencurrentstateofaircraft

• Comparewithactualinstantaneousfuelburn• Wheredifferenceishigh,problemmaybeoccurring

• PrecursorIdentification– Givenundesirableeffect(e.g.,go-around),identifyprecursors(e.g.,overtakesituation,highspeedapproach)

• Textmining– Textclassification,topicidentification

Page 6: NASA Ames Data Sciences Group - Amazon Web …...Vijay Janakiraman, Ph.D. Rodney Martin, Ph.D. Bryan Matthews David Nielsen Nikunj Oza, Ph.D. Veronica Phillips John Stutz HamedValizadegan

TopicExtractionExampleTOPIC1

autopltacftspd

capturemoderatelevel

engagedleveloffvertctl

disconnectedselectedfpmlightclbpitch

manuallywarningpwr

TOPIC2

timedayleg

contributingfactorshrscrewfactorfatiguenighttriprestdutyflyinglonglate

previousincidentlack

alerter

TOPIC3apchrwyvisualilstwrlndglocarptfinal

missedclredmsl

interceptvectoredsightgar

terrainfield

uneventfulctl

Otherexamplesof‘fatigue’

AltitudeDeviationSpatialDeviationRampExcursionLandingwithoutclearanceRunwayIncursionUnstableApproach

Page 7: NASA Ames Data Sciences Group - Amazon Web …...Vijay Janakiraman, Ph.D. Rodney Martin, Ph.D. Bryan Matthews David Nielsen Nikunj Oza, Ph.D. Veronica Phillips John Stutz HamedValizadegan

AeronauticsAnomalyDetection:CurrentMethods

Exceedance-BasedMethods• Knownanomalies• Conditionsover2-3variables(e.g.,speed>250knots,altitude=1000ft,landing)

• Cannotidentifyunknownanomalies• Lowfalsepositiverate,highfalsenegative(misseddetection)rate.

Page 8: NASA Ames Data Sciences Group - Amazon Web …...Vijay Janakiraman, Ph.D. Rodney Martin, Ph.D. Bryan Matthews David Nielsen Nikunj Oza, Ph.D. Veronica Phillips John Stutz HamedValizadegan

Data-DrivenMethods

• DISCOVERanomaliesby– learningstatisticalpropertiesofthedata– findingwhichdatapointsdonotfit(e.g.,faraway,lowprobability)

– nobackgroundknowledgeonanomaliesneeded

• Complementarytoexistingmethods– Lowfalsenegative(misseddetection)rate– Higherfalsepositiverate(identifiedpoints/flightsunusual,butnotalwaysoperationallysignificant)

• Data-drivenmethods->insights->modificationofexceedance detection

Page 9: NASA Ames Data Sciences Group - Amazon Web …...Vijay Janakiraman, Ph.D. Rodney Martin, Ph.D. Bryan Matthews David Nielsen Nikunj Oza, Ph.D. Veronica Phillips John Stutz HamedValizadegan

Example:HighSpeedGo-Around

• OvershootsExtendedRunwayCenterline(ERC)byover1SM

• Over250Kts @2500Ft.• Angleofintercept>40°• Overshoots2nd approach

BryanMatthews,DavidNielsen,JohnSchade,Kennis Chan,andMikeKiniry,AutomatedDiscoveryofFlightTrackAnomalies,33rd DigitalAvionicsSystemsConference,2014

Page 10: NASA Ames Data Sciences Group - Amazon Web …...Vijay Janakiraman, Ph.D. Rodney Martin, Ph.D. Bryan Matthews David Nielsen Nikunj Oza, Ph.D. Veronica Phillips John Stutz HamedValizadegan

ProvidingDomainExpertFeedbackActive learning with rationales frameworkTraining

Input Features

MKAD

Nominals

Anomalies

SME

Activelearning strategy

Testing

Input Features

MKAD

Nominals

Anomalies

Rationale features

2-class classification/ranking

algorithm

Uninteresting anomalies

Operationally significant anomalies

Manali Sharma,Kamalika Das,MustafaBilgic,BryanMatthews,DavidNielsen,andNikunjOza,ActiveLearningwithRationalesforIdentifyingOperationallySignificantAnomaliesinAviation,EuropeanConferenceonMachineLearningandPrinciplesandPractices

OfKnowledgeDiscovery(ECML-PKDD),2016

Page 11: NASA Ames Data Sciences Group - Amazon Web …...Vijay Janakiraman, Ph.D. Rodney Martin, Ph.D. Bryan Matthews David Nielsen Nikunj Oza, Ph.D. Veronica Phillips John Stutz HamedValizadegan

EarthScienceExample

• Understandrelationshipsbetweenecosystemdynamics andclimaticfactors

• Modelasaregressionanalysisproblem• 3sciencequestions– Magnitudeandextentofecosystemexposure,sensitivityandresiliencetothe2005and2010Amazondroughts

– Understandhuman-inducedandotherattributionascausesofvegetationanomalies

– Howlearneddependencymodelvariesacrosseco-climaticzonesandgeographicalregionsonaglobalscale

NASAESTOAIST-14project,UncoveringEffectsofClimateVariablesonGlobalVegetation(PI:Kamalika Das,Ph.D.)

Page 12: NASA Ames Data Sciences Group - Amazon Web …...Vijay Janakiraman, Ph.D. Rodney Martin, Ph.D. Bryan Matthews David Nielsen Nikunj Oza, Ph.D. Veronica Phillips John Stutz HamedValizadegan

ProblemFormulation

• Point-to-pointregressionanalysis(GeneticProgrammingbasedSymbolicRegression)

• Estimatespatio-temporaldependencyofforestecosystemsonclimatevariables

Vijt=f(Lcij, CVij

t, CVnbt, CVij

t-1, CVnbt-1,.....CVij

t-k, CVnbt-k)

V:vegetation,LC:landcover type,CV:climate variable(s)

i,j:pixellocationindicest:timeindexnb:spatialneighborhoodof

indexi,jk:temporaldependencyOpenchallenges: 1.Estimatingfunctionf

2.Estimatingbestchoicesfork,nb

Page 13: NASA Ames Data Sciences Group - Amazon Web …...Vijay Janakiraman, Ph.D. Rodney Martin, Ph.D. Bryan Matthews David Nielsen Nikunj Oza, Ph.D. Veronica Phillips John Stutz HamedValizadegan

DataPipeline

NDVIResolution: 250 m

Projection: Sinusoidal

LSTResolution: 1 km

Projection: Sinusoidal

TRMM (Ver 6)Resolution: 25 kmProjection: WGS84

Reprojectand

resampledata

2000 – 2010 Monthly data

NDVI, TRMM, LST

Resolution: 1 kmProjection: WGS84

Time-Series:Changetoseasonal

2000 – 2010 Seasonal data

Monthly -> Seasonal

4 Seasons/yea

r

Season 1: March – MaySeason 2: June – SepSeason 3: OctSeason 4: Nov - Feb

Windowing:Smoothingover

25x25sizewindow

Filterdatabasedonlandcover

Page 14: NASA Ames Data Sciences Group - Amazon Web …...Vijay Janakiraman, Ph.D. Rodney Martin, Ph.D. Bryan Matthews David Nielsen Nikunj Oza, Ph.D. Veronica Phillips John Stutz HamedValizadegan

Resultsfor2004-2010

Mean Squared Error

Year RidgeRegression LASSO SVR Symbolic

Regression

2004 0.284 0.284 0.280 0.262

2005 0.289 0.289 0.288 0.278

2006 0.426 0.426 0.430 0.321

2007 0.374 0.374 0.370 0.318

2008 0.308 0.308 0.310 0.336

2009 0.353 0.353 0.360 0.328

2010 0.546 0.547 0.540 0.479

Marcin Szubert,Anuradha Kodali,Sangram Ganguly,Kamalika Das,andJoshC.Bongard,ReducingAntagonismbetweenBehavioralDiversityandFitnessinSemanticGeneticProgramming,ProceedingsoftheGeneticandEvolutionaryComputation

Conference(GECCO),pp.797-804,2016.

Page 15: NASA Ames Data Sciences Group - Amazon Web …...Vijay Janakiraman, Ph.D. Rodney Martin, Ph.D. Bryan Matthews David Nielsen Nikunj Oza, Ph.D. Veronica Phillips John Stutz HamedValizadegan

OngoingandFutureWork• Experimentwithdifferentcombinationsoftemporal

lookback and/orspatialeffects• Introduceadditionalregressors(radiation,forestfiremaps,

deforestationmaps)• StudytheeffectofdifferentregressorsondifferentAmazon

tiles• DerivenonlinearGPmodelsonAmazontiles• Givenappropriatehistoricaldata,havetheabilitytopredict:

“Underwhatconditionsdoesvegetationnotrecoverwithinacertaintimeframe.”

• Doglobalscaleanalysisinparallel

Page 16: NASA Ames Data Sciences Group - Amazon Web …...Vijay Janakiraman, Ph.D. Rodney Martin, Ph.D. Bryan Matthews David Nielsen Nikunj Oza, Ph.D. Veronica Phillips John Stutz HamedValizadegan

VESsel GENeration (VESGEN)AnalysisPatriciaParsons-Wingerter,PhD,NASAChiefInnovator/POCNASAAmes2016InnovationFundAward,ChiefTechnologist’sOffice

• VESGEN2Dmapsandquantifiesvascularremodelingforawidevarietyofquasi-2Dvascularizedbiomedicaltissueapplications.

• WorkingontransformingtoVESGEN3D,inlinewithmostvascularizedorgansandtissuesinhumansandvertebrates.

• Vascular-dependentdiseasesincludecancer,diabetes,coronaryvesseldisease,andmajorastronauthealthchallengesinthespacemicrogravityandradiationenvironments,especiallyforlong-durationmissions.

• Onekeycomponentisbinarization:conversionofgrayscaleimagestoblack/whitevascularbranchingpatterns.– Takes10-25hoursofhumaneffort.– Exploringpatternrecognition,matchingfiltering,vessel

tracking/tracing,mathematicalmorphology,multiscaleapproaches,andmodelbasedalgorithms.

Page 17: NASA Ames Data Sciences Group - Amazon Web …...Vijay Janakiraman, Ph.D. Rodney Martin, Ph.D. Bryan Matthews David Nielsen Nikunj Oza, Ph.D. Veronica Phillips John Stutz HamedValizadegan

OTSUThresholding

Page 18: NASA Ames Data Sciences Group - Amazon Web …...Vijay Janakiraman, Ph.D. Rodney Martin, Ph.D. Bryan Matthews David Nielsen Nikunj Oza, Ph.D. Veronica Phillips John Stutz HamedValizadegan

OTSUvs.AdaptiveThresholding

Page 19: NASA Ames Data Sciences Group - Amazon Web …...Vijay Janakiraman, Ph.D. Rodney Martin, Ph.D. Bryan Matthews David Nielsen Nikunj Oza, Ph.D. Veronica Phillips John Stutz HamedValizadegan

FutureWork• Workinprogress:exploringmorepreprocessingandpost-processingtechniques

• Eachstepofpreprocessingandpostprocessing hassomeinputparameters– Theresultissensitivetothisparameters– Weaimtomaketheparameterselectioneitherautomated(machinelearning)orsemi-automated(usercanchoosetherightparameter)

• MachineLearningtolearnthebinarization– Giventhemanuallabels,performsupervisedorsemi-supervisedlearning

– Eachpixelanditsclasslabel(foregroundorbackground)isthetrainingexample

Page 20: NASA Ames Data Sciences Group - Amazon Web …...Vijay Janakiraman, Ph.D. Rodney Martin, Ph.D. Bryan Matthews David Nielsen Nikunj Oza, Ph.D. Veronica Phillips John Stutz HamedValizadegan

How dowegettheWordOut?DASHlink

disseminate.collaborate.innovate.https://dashlink.ndc.nasa.gov/

DASHlinkisacollaborativewebsitedesignedtopromote:• Sustainability• Reproducibility• Dissemination• Communitybuilding

Userscancreateprofiles• Sharepapers,uploadanddownloadopensourcealgorithms• FindNASAdatasets.

Page 21: NASA Ames Data Sciences Group - Amazon Web …...Vijay Janakiraman, Ph.D. Rodney Martin, Ph.D. Bryan Matthews David Nielsen Nikunj Oza, Ph.D. Veronica Phillips John Stutz HamedValizadegan

NASAAmesDataSciencesGroupNikunjC.Oza,Ph.D.

Leader,[email protected]

www.nasa.gov •21