EARTHCUBE CONCEPTUAL DESIGN A Scalable ?· A Scalable Community Driven Architecture ... High Performance…

  • Published on
    07-Sep-2018

  • View
    212

  • Download
    0

Transcript

<ul><li><p>EARTHCUBE CONCEPTUAL DESIGN A Scalable Community Driven Architecture http://earthcube.org/group/scalable-community-driven-architecture </p><p>Overview PI: G. Djorgovski (Caltech) </p><p>CO-I: D. Pilone, T. Pilone (Element 84), D. Crichton, E. Law (JPL) </p><p>Other key personnel: S. Caltagirone (E84), S. Hughes (JPL), </p><p>T. Huang (JPL), A. Mahabal (Caltech) </p><p>1/7/16 1 2016 ESIP Winter Meeting </p></li><li><p>A high level system blueprint for the definition, construction, and deployment of both existing and new components to ensure that they can be unified and integrated into an evolutionary national infrastructure for EarthCube </p><p>1/7/16 2 </p></li><li><p>Methodology </p><p>! Identification of stakeholders, concerns and requirements </p><p>! Identification of architectural use cases and drivers </p><p>! Selection of an architectural framework </p><p>! Development of the architectural principles </p><p>! Development of the architectural models </p><p>! Capture of the architecture artifacts in a consolidated report </p><p>! Generation of recommendations for adopting the architecture for the EarthCube program </p><p>1/7/16 3 </p></li><li><p>1/7/16 4 </p></li><li><p>Stakeholders Stakeholder/Actor Concerns</p><p>NSFProgramManagersMakedecisionandprovideguidanceattheEarthCubeprogramlevel.</p><p>Providesuf&gt;icientfundingtosupporttheEarthCubemission.</p><p>EarthCubeScientistsUseEarthCuberesourcesandservicestoconductscienti&gt;icresearch.</p><p>Publishscienti&gt;icresults&amp;curatedataasneeded.</p><p>EarthCubeDevelopers DeveloptechnologiesandservicesthatcanbeintegratedintoEarthCube.</p><p>EarthCubeArchitects</p><p>EstablishEarthCuberequirements,frameworkandoperationalconcept.</p><p>Developinformationmodel(vocabulary,ontology).Establishstandardsguidelines.EnsureinteroperabilitybetweenEarthCubeBuildingBlocks.</p><p>ExternalDataUsers UseEarthCuberesourcesandservicesforresearch,education,anddecision-making.</p><p>Curator EnsuredataisproperlycapturedinEarthCubecompliantdatarepositories.</p><p>DataOwner Responsibleforproducingthedata.Concernedaboutitsdistributionanduse.</p><p>ExternalDataFacility Responsibleforarchivingdataatotheragencies(NASA,NOAA,USGS,etc);interoperabilitywiththeEarthCubeCyberinfrastructure.</p><p>EarthCubeGovernanceCommittees</p><p>Responsibleforgeneratingandmonitoringthegovernanceforthesystemincludingdatacuration,access,usecasepriority,interoperabilitystandards,etc.</p><p>EarthCubeOf&gt;iceStaff ResponsibleformaintainingthecommunityinvolvementwithinEarthCubeandcommunicatingchangesandhowtousethesystem.1/7/16 5 </p></li><li><p>Use Cases ! Big Science Discovery, Comparison, Provenance, Model &amp; visualization </p><p>! Collaborative Science </p><p>! Dark Data Contribution </p><p>! Tools Contribution </p><p>! Data Documentation </p><p>! Models Sharing </p><p>! High Performance Computing and Storage Resources </p><p>! Real Time Data </p><p>! Physical Sample Curation </p><p>1/7/16 6 </p></li><li><p>Drivers ! Transform and accelerate research and discovery by turning data </p><p>into knowledge and enabling interdisciplinary data integration. </p><p>! Provide critically needed data, tools, and computational resources and frameworks for cross-domain scientific collaboration, analysis and with long-term geoscience software and data preservation, discovery and use. </p><p>! Provide a geosicences cyberinfrastructure and architecture that is scalable, extensible and sustainable. </p><p>1/7/16 7 </p></li><li><p>Frameworks ! Zachman Framework - For organizing stakeholder concerns and </p><p>perspectives. </p><p>! ISO/IEC/IEEE 42010:2011- For architectural description guidelines. </p><p>! Reference Model for Open Distributed Processing (RM-ODP) For architectural patterns for distributed systems. </p><p>! Open Group Architecture Framework (TOGAF) For managing the architecture. </p><p>! Federal Enterprise Architecture Framework (FEAF) For classifying the architecture into architectural elements and viewpoints. </p><p>! ISO 14721:2003 - Open Archival Information System (OAIS) Reference Model - Provides a standard for information objects. </p><p>! ISO/IEC 11179:3 Registry Metamodel and Basic Attributes specification - Provides a schema for a metadata registry. </p><p>1/7/16 8 </p></li><li><p>! Scalability ! Community Driven ! Open Science ! Interoperability ! Sustainability ! Distributed ! Data Model Driven </p><p>1/7/16 9 </p></li><li><p>ScienceDataManage</p><p>SatelliteInstrumentDataSystems</p><p>ScienceDataManageAirborne</p><p>Data</p><p>ScienceDataManageAgency</p><p>EarthDataArchives</p><p>Data Provider</p><p>EarthCubeCI</p><p>EarthCube Discovery </p><p>1/7/16 10 </p></li><li><p>ScienceDataManage</p><p>SatelliteInstrumentDataSystems</p><p>ScienceDataManageAirborne</p><p>Data</p><p>ScienceDataManageAgency</p><p>EarthDataArchives</p><p>Data Provider</p><p>EarthCubeCI</p><p>OtherDataSystems(e.g.NOAA)OtherDataSystems(e.g.NOAA)OtherDataSystems(In-Situ,University)</p><p>EarthCube Repository EarthCube Discovery </p><p>1/7/16 11 </p></li><li><p>ScienceDataManage</p><p>SatelliteInstrumentDataSystems</p><p>ScienceDataManageAirborne</p><p>Data</p><p>ScienceDataManageAgency</p><p>EarthDataArchives</p><p>Data Provider</p><p>EarthCubeCI</p><p>OtherDataSystems(e.g.NOAA)OtherDataSystems(e.g.NOAA)OtherDataSystems(In-Situ,University)</p><p>EarthCube Repository </p><p>Data Science Infrastructure (Data, Algorithms, Machines) </p><p>ScienceTeams</p><p>EarthCube Discovery </p><p>1/7/16 12 </p></li><li><p>Applica&gt;ons</p><p>DecisionSupport</p><p>ScienceDataManage</p><p>SatelliteInstrumentDataSystems</p><p>ScienceDataManageAirborne</p><p>Data</p><p>ScienceDataManageAgency</p><p>EarthDataArchives</p><p>Research</p><p>Data ProviderData Analysis</p><p>EarthCubeCI</p><p>OtherDataSystems(e.g.NOAA)OtherDataSystems(e.g.NOAA)OtherDataSystems(In-Situ,University)</p><p>EarthCube Repository </p><p>Data Science Infrastructure (Data, Algorithms, Machines) </p><p>Earthcube Data Analytics Centers </p><p>ScienceTeams</p><p>EarthCube Discovery </p><p>1/7/16 13 </p></li><li><p>Benchmark </p><p>! Earth System Grid Federation (ESGF) </p><p>! Early Detection Research Network (EDRN) </p><p>! NASAs Earth Observing System Data and Information System (EOSDIS) </p><p>ExArch'Mee*ng,'October'2012</p><p>Node2Architecture</p><p>Internally,'each'ESGF'Node'is'composed'of'services'and'applica*ons'that'collec*vely'enable'data'and'metadata'access,'and'user'management.'ESGF'soNware'stack'combines'custom'soNware'components'developed'by'ESGF'with'other'freely'available'applica*ons'from'eCommerce'(Apache'Tomcat,'Solr,'Postgres,...)'and'geoIinforma*cs'(Thredds'Data'Server,'LAS,'...)SoNware'components'are'grouped'into'4'areas'of'func*onality'(aka'flavors):</p><p>Data'Node':'secure'data'publica*on'and'accessIndex'Node':'metadata'indexing'and'searchingweb'portal'UI'to'drive'human'interac*ondashboard'suite'of'admin'applica*onsmodel'metadata'viewer'plugin</p><p>'Iden*ty'Provider':'user'authen*ca*on'and'group'membership'Compute'Node':'analysis'and'visualiza*on</p><p>Nodes'flavors'can'be'installed'in'various'combina*ons'depending'on'site'needs,'or'to'achieve'higher'performance'and'scalability</p><p>ExArch'Mee*ng,'October'2012</p><p>SoGware2Stack2:2Node2Manager</p><p>Enables'con*nuos'exchange'of'service'and'state'informa*on'among'NodesInternally,'it'collects'Node'health'informa*on'and'metrics'(cpu,'disk'usage,'etc.)Installed'for'all'Node'flavors</p><p>PeerIToIPeer'(P2P)'protocol</p><p>Gossip'protocol:'informa*on'is'exchanged'randomly'among'peersEach'Node'receives'informa*on'from'one'Node,'merges'it'with'its'own'informa*on,'and'propagates'it'to'two'other'Nodes'at'random</p><p>No'central'coordina*on,'no'single'point'of'failureNodes'can'join/leave'the'federa*on'dynamicallyEach'Node'is'bootstrapped'with'knowledge'of'one'default'peerEach'Node'can'belong'to'one'or'more'peer'groups'within'which'informa*on'is'exchanged</p><p>XML'Registry</p><p>XML'document'that'is'payload'of'P2P'protocolContains'service'endpoints'and'SSL'public'keys'for'all'Nodes'in'the'federa*on</p><p>Derived'products'(list'of'search'shards,'trusted'IdPs,'loca*on'of'Airibute'Services,...)'are'used'by'federa*onIwide'services</p><p>Challenge:'good'news'travel'fast,'bad'news'travel'slow...</p><p> ASF DAAC SAR Products Sea Ice, Polar </p><p>Processes </p><p>SEDAC Human Interactions </p><p>in Global Change LP DAAC </p><p>Land Processes &amp; Features </p><p>PO.DAAC Ocean Circulation </p><p>Air-Sea Interactions ASDC </p><p>Radiation Budget, Clouds, Aerosols, Tropo Chemistry </p><p>ORNL DAAC Biogeochemical </p><p>Dynamics, EOS Land Validation </p><p>GES DISC Atmos Composition &amp; </p><p>Dynamics, Global Modeling, Hydrology, </p><p>Radiance </p><p>LAADS/ MODAPS </p><p>Atmosphere </p><p>OBPG Ocean Biology &amp; Biogeochemistry </p><p>GHRC Hydrological Cycle &amp; </p><p>Severe Weather </p><p>CDDIS Crustal Dynamics </p><p>Solid Earth NCAR, U of Col. HIRDLS, MOPITT, </p><p>SORCE GSFC </p><p>GLAS, MODIS, OMI, OBPG </p><p>LaRC CERES, SAGE III </p><p>GHRC AMSR-E, LIS, </p><p>AMSR2 </p><p>JPL MLS, TES </p><p>San Diego ACRIM </p><p>NSIDC DAAC Cryosphere, Polar </p><p>Processes </p><p>SIPSs </p><p>Key Data </p><p>Center </p><p>ECS Sites </p><p>1/7/16 14 </p></li><li><p>ProcessArchitecture</p><p>EarthCubeSystem</p><p>Architecture</p><p>DataLifecycle</p><p>Data Generation</p><p>Data Curation</p><p>DataTransport</p><p>Data Ingest</p><p>DataManagement</p><p>SearchDistribution</p><p>DataAnalytics</p><p>Visualization</p><p>SoftwareLifecycle Administrative</p><p>TechnologyPlanning</p><p>SoftwareDevelopment</p><p>Release</p><p>Governance</p><p>Standards</p><p>Technology</p><p>Policies</p><p>ResourcePlanning</p><p>DataArchitecture</p><p>TechnologyArchitecture</p><p>Ingest (Receive, Validate, Accept)</p><p>Catalog/DataManagement</p><p>Storage(Repository)</p><p>Processing</p><p>Search and Discovery</p><p>DataIntegration</p><p>DataAnalysis</p><p>Distribution</p><p>Visualization</p><p>InformationModel</p><p>ArchiveModel</p><p>Query/Access</p><p>DataFormats</p><p>ArchiveOrganization</p><p>Grammar</p><p>DataDictionary</p><p>DistributedArchitecture</p><p>Data Access</p><p>IT Security</p><p>Collaboration</p><p>Publication</p><p>DomainCrosscutting Research Software Lifecycle</p><p>Software Development</p><p>Software Versioning</p><p>Software Archiving</p><p>Software Search &amp; </p><p>Distribution</p><p>Algorithm Storage &amp; Discovery</p><p>Data Standards Evaluation</p><p>User Roles, Support and Feedback</p><p>Use metrics for data, software and site use</p><p>Architecture Elements </p><p>1/7/16 15 </p></li><li><p>Data Lifecycle Data$Genera)on$</p><p>Data$Cura)on$and$Prepara)on$</p><p>Data$Transport$</p><p>Data$$Ingest$</p><p>Data$Management$</p><p>Discovery,$Access$&amp;$Distribu)on$</p><p>Data$Analy)cs$</p><p>Visualiza)on$</p><p>Prepare&amp;data&amp;for&amp;use&amp;and&amp;submission&amp;into&amp;EarthCube&amp;</p><p>Original&amp;genera7on&amp;of&amp;data&amp;(from&amp;sensors,&amp;inves7gators,&amp;etc)&amp;</p><p>Maximize&amp;informa7on&amp;throughput&amp;against&amp;available&amp;bandwidth&amp;</p><p>Provides&amp;overall&amp;data&amp;management&amp;services&amp;for&amp;the&amp;data&amp;in&amp;EarthCube&amp;&amp;</p><p>Provides&amp;a&amp;plaAorm&amp;for&amp;integra7ng&amp;analy7cs&amp;with&amp;rendering&amp;and&amp;understanding&amp;the&amp;data&amp;</p><p>Supports&amp;the&amp;capture&amp;and&amp;valida7on&amp;of&amp;data&amp;into&amp;EarthCube&amp;</p><p>Enables&amp;the&amp;analysis&amp;of&amp;massive,&amp;distributed&amp;heterogeneous&amp;data&amp;</p><p>Enables&amp;discovery,&amp;access&amp;and&amp;distribu7on&amp;of&amp;the&amp;data&amp;</p><p>1/7/16 16 </p></li><li><p>Information Model Context </p><p>1/7/16 17 </p></li><li><p>Framework </p><p>Sources</p><p>Images</p><p>Measurements/Observations</p><p>RemoteSensing</p><p>Text file/ASCII</p><p>Spread-sheets</p><p>Metadata</p><p>etc.</p><p>Data Ingest</p><p>Data Management</p><p>AbstractionJavaPythonRubyGroovyScala</p><p>Data Analysis</p><p>Science Workflow</p><p>Analytics</p><p>MachineLearning</p><p>PatternRecognition</p><p>Climatologies</p><p>Data Reduction</p><p>UncertaintyAnalysis</p><p>etc.</p><p>Visualization</p><p>OGC (WMS,WMTS, )</p><p>TWMS</p><p>Data Slices</p><p>Plots andCoordination</p><p>IntegratedViews</p><p>Data Distribution</p><p>Query/Retrieval</p><p>Data Viewer and Interactive </p><p>Query</p><p>Data Science Framework</p><p>Analysis Platform</p><p>Search</p><p>Metadata Publication</p><p>Data Push</p><p>Data Access</p><p>OpenSearchLuceneSolrElasticSearch</p><p>RDBMS Postgres Oracle MySQL</p><p>NoSQL MongoDB Cassandra</p><p>Array SciDB</p><p>Storage SAN S3 SSD</p><p>Hadoop/HDFS MapReduce ZooKeeper Spark</p><p>Graph DB TitanDB Neo4J</p><p>Triple Store Virtuoso AllegroGraph Sesame Fuseki</p><p>Message Passing Interface</p><p>SingleMachine</p><p>High Performance Computing</p><p>GPU</p><p>Data Providers Applied Science</p><p>OPeNDAP</p><p>W10N</p><p>LAS</p><p>THREDDS</p><p>Data StewardshipCuration</p><p>Virtual Machine</p><p>Container</p><p>InformationData Knowledge</p><p>Lucene</p><p>OpenSearch</p><p>SPARQ</p><p>etc.</p><p>Transfer</p><p>Validation</p><p>Metadata</p><p>Harvesting</p><p>Packaging</p><p>Search</p><p>Query</p><p>Subset</p><p>etc.</p><p>DataNode</p><p>AnalyticNode</p><p>1/7/16 18 </p></li><li><p>Example Instantiation </p><p>Research</p><p>Applications</p><p>EarthCube Cyberinfrstructure</p><p>Applied Science</p><p>SatelliteInformation</p><p>Data Systems</p><p>AirborneData</p><p>AgencyEarth Data Archives</p><p>Research</p><p>Applications</p><p>Decision Support</p><p>OtherData Systems</p><p>(In-Situ, University)</p><p>Data Provider</p><p>EarthCubeData Science Infrastructure</p><p>EarthCubeData Analytics Centers</p><p>EarthCubeDiscipline-Specific</p><p>Data Management withData Analytic</p><p>Node</p><p>EarthCubeData Management</p><p>Node</p><p>EarthCubeData Management</p><p>Node</p><p>Data AnalyticNode</p><p>EarthCubeRepository</p><p>EarthCubeRepository</p><p>Sources</p><p>Images</p><p>Measurements/Observations</p><p>RemoteSensing</p><p>Text file/ASCII</p><p>Spread-sheets</p><p>Metadata</p><p>etc.</p><p>Data Ingest API</p><p>Data Management</p><p>AbstractionJavaPythonRubyGroovyScala</p><p>Data Distribution</p><p>Data Science Framework</p><p>Search</p><p>Metadata Publication</p><p>Data Push</p><p>Data Access</p><p>OpenSearchLuceneSolrElasticSearch</p><p>RDBMS Postgres Oracle MySQL</p><p>NoSQL MongoDB Cassandra</p><p>Array SciDB</p><p>Storage SAN S3 SSD</p><p>OPeNDAP</p><p>W10NTHREDDS</p><p>Data StewardshipCuration</p><p>Transfer</p><p>Validation</p><p>Metadata</p><p>Harvesting</p><p>Packaging</p><p>Sources</p><p>Images</p><p>Measurements/Observations</p><p>RemoteSensing</p><p>Text file/ASCII</p><p>Spread-sheets</p><p>Metadata</p><p>etc.</p><p>Data Ingest API</p><p>Data Management</p><p>AbstractionJavaPythonRubyGroovyScala</p><p>Data Distribution</p><p>Data Science Framework</p><p>Search</p><p>Metadata Publication</p><p>Data Push</p><p>Data Access</p><p>OpenSearchLuceneSolrElasticSearch</p><p>RDBMS Postgres Oracle MySQL</p><p>NoSQL MongoDB Cassandra</p><p>Array SciDB</p><p>Storage SAN S3 SSD</p><p>OPeNDAP</p><p>W10NTHREDDS</p><p>Data StewardshipCuration</p><p>Transfer</p><p>Validation</p><p>Metadata</p><p>Harvesting</p><p>Packaging</p><p>Sources</p><p>Images</p><p>Measurements/Observations</p><p>RemoteSensing</p><p>Text file/ASCII</p><p>Spread-sheets</p><p>Metadata</p><p>etc.</p><p>Data Ingest API</p><p>Data Management</p><p>AbstractionJavaPythonRubyGroovyScala</p><p>Data Distribution</p><p>Data Science Framework</p><p>Search</p><p>Metadata Publication</p><p>Data Push</p><p>Data Access</p><p>OpenSearchLuceneSolrElasticSearch</p><p>RDBMS Postgres Oracle MySQL</p><p>NoSQL MongoDB Cassandra</p><p>Array SciDB</p><p>Storage SAN S3 SSD</p><p>OPeNDAP</p><p>W10NTHREDDS</p><p>Data StewardshipCuration</p><p>Transfer</p><p>Validation</p><p>Metadata</p><p>Harvesting</p><p>Packaging</p><p>EarthCubeRepository</p><p>EarthCubeData Management</p><p>Node</p><p>Data AnalyticNode</p><p>Data AnalyticNode</p><p>EarthCubeData Management</p><p>Node</p><p>Sources</p><p>Images</p><p>Measurements/Observations</p><p>RemoteSensing</p><p>Text file/ASCII</p><p>Spread-sheets</p><p>Metadata</p><p>etc.</p><p>Data Ingest API</p><p>Data Management</p><p>AbstractionJavaPythonRubyGroovyScala</p><p>Data Analysis</p><p>Science Workflow</p><p>Analytics</p><p>MachineLearning</p><p>PatternRecognition</p><p>Climatologies</p><p>Data Reduction</p><p>UncertaintyAnalysis</p><p>etc.</p><p>Visualization</p><p>OGC (WMS,WMTS, )</p><p>TWMS</p><p>Data Slices</p><p>Plots andCoordination</p><p>IntegratedViews</p><p>Data Distribution</p><p>Query/Retrieval </p><p>API</p><p>Data Viewer and Interactive Query API</p><p>Data Science Framework</p><p>Analysis Platform</p><p>Search</p><p>Metadata Publication</p><p>Data Push</p><p>Data Access</p><p>OpenSearchLuceneSolrElasticSearch</p><p>RDBMS Postgres Oracle MySQL</p><p>NoSQL MongoDB Cassandra</p><p>Array SciDB</p><p>Storage SAN S3 SSD</p><p>Hadoop/HDFS MapReduce ZooKeeper Spark</p><p>Graph DB TitanDB Neo4J</p><p>Triple Store Virtuoso AllegroGraph Sesame Fuseki</p><p>Message Passing Interface</p><p>SingleMachine</p><p>High Performance Computing</p><p>GPU</p><p>OPeNDAP</p><p>W10N</p><p>LAS</p><p>THREDDS</p><p>Data StewardshipCuration</p><p>Virtual Machine</p><p>Container</p><p>Lucene</p><p>OpenSearch</p><p>SPARQ</p><p>etc.</p><p>Transfer</p><p>Validation</p><p>Metadata</p><p>Harvesting</p><p>Packaging</p><p>Search</p><p>Query</p><p>Subset</p><p>etc.</p><p>Data Analysis</p><p>Science Workflow</p><p>Analytics</p><p>MachineLearning</p><p>PatternRecognition</p><p>Climatologies</p><p>Data Reduction</p><p>UncertaintyAnalysis</p><p>etc.</p><p>Visualization</p><p>OGC (WMS,WMTS, )</p><p>TWMS</p><p>Data Slices</p><p>Plots andCoordination</p><p>IntegratedViews</p><p>Query/Retrieval </p><p>API</p><p>Data Viewer and Interactive Query API</p><p>Data Science Framework</p><p>Analysis Platform</p><p>Hadoop/HDFS MapReduce ZooKeeper Spark</p><p>Graph DB TitanDB Neo4J</p><p>Triple Store Virtuoso AllegroGraph Sesame Fuseki</p><p>Message Passing Interface</p><p>SingleMachine</p><p>High Performance Computing</p><p>GPU</p><p>LAS</p><p>Virtual Machine</p><p>Container</p><p>Lucene</p><p>OpenSearch</p><p>SPARQ</p><p>etc.</p><p>Search</p><p>Query</p><p>Subset</p><p>etc.</p><p>Data Analysis</p><p>Science Workflow</p><p>Analytics</p><p>MachineLearning</p><p>PatternRecognition</p><p>Climatologies</p><p>Data Reduction</p><p>UncertaintyAnalysis</p><p>etc.</p><p>Visualization</p><p>OGC (WMS,WMTS, )</p><p>TWMS</p><p>Data Slices</p><p>Plots andCoordination</p><p>IntegratedViews</p><p>Query/Retrieval </p><p>API</p><p>Data Viewer and Interactive Query API</p><p>Data Science Framework</p><p>Analysis Platform</p><p>Hadoop/HDFS MapReduce ZooKeeper Spark</p><p>Graph DB TitanDB Neo4J</p><p>Triple Store Virtuoso AllegroGraph Sesa...</p></li></ul>

Recommended

View more >