37
Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI([email protected]), Yuanke Wei([email protected]) Yuanchun Zhou([email protected]) Computer Network Information Center Chinese Academy of Sciences

Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

Scientific data cloud

infrastructure and services in

Chinese Academy of Sciences

Jianhui LI([email protected]),

Yuanke Wei([email protected])

Yuanchun Zhou([email protected])

Computer Network Information Center

Chinese Academy of Sciences

Page 2: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

Outline

• About us – CAS (Chinese Academy of Sciences)

– CNIC(Computer Network Information Center), CAS

– SDC(Scientific Data Center), CNIC, CAS

• About Scientific Data Cloud of CAS – Data Challenge

– Architecture

– Infrastructure Service

– Middleware Service

– Data Service

• Conclusion 2

Page 3: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

• CAS is a leading academic

institution and comprehensive

research and development

center in natural science,

technological science and

high-tech innovation in China.

• It was founded in Beijing on

1st November 1949 on the

basis of the former Academia

Sinica (Central Academy of

Sciences) and Peiping

Academy of Sciences.

3

Page 4: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

4

Page 5: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

• a public support institution

for consistent construction,

operation and services of

information infrastructure of

CAS.

• a pioneer, promoter and

participator for informtion of

domestic scientific

research and scientific

research management

5

Page 6: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

Operation and Services in CNIC

6

—— Provided by 7 Business Departments

Respectively

Scientific Research Network Environment

Scientific Data Environment

Supercomputing Environment

Informatization of Research Management

Internet-based Science Popularization and Education

Internet Fundamental Resource Services

Page 7: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

• Scientific Data Center (SDC) is the support facility in charge of

the construction, management, operation and maintenance of

CAS Informatization Data Application Environment, and has

been taking the lead in implementing the CAS Scientific

Database Project for more than 20 years.

• SDC provides storage services, data services and related

application technology services for the entire CAS

• SDC hosts the Secretariat of Committee on Data for Science

and Technology (CODATA) and the CAS Secretariat for World

Wide Web Consortium (W3C).

• The vision of SDC is striving to become an important facilitator

of exchange and application of scientific data resources, key

technology supplier during lifecycle of scientific data, and leader

in transforming scientific data into knowledge service.

Scientific Data Center

7

Page 8: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

Outline

• About us – CAS (Chinese Academy of Sciences)

– CNIC(Computer Network Information Center), CAS

– SDC(Scientific Data Center), CNIC, CAS

• About Scientific Data Cloud of CAS – Data Challenge

– Architecture

– Infrastructure Service

– Middleware Service

– Data Service

• Conclusion 8

Page 9: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

Hotter and hotter in data research

Mar.29, 2012, the Obama Administration “ Big Data

Research and Development Initiative ”($200 Million) :

improving our ability to extract knowledge and insights

from large and complex collections of digital data

Feb. 11, 2011, 《Science》issued a Special Online

Collection: “Dealing with Data”

Sep., 2009, 《Nature》 issued “Data’s shameful

neglect”: Research cannot flourish if data are not

preserved and made accessible. All concerned must act

accordingly.

The Second International Symposium on Dataology &

Data Science was held 3 days ago in China

Difficult to

discover

Difficult to access

Being lost

9

Page 10: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

Data Driven Scientific Discovery • Data is regarded as the most valuable thing.

“The impact of Jim Gray’s thinking is continuing to get people to think in a new

way about how data and software are redefining what it means to do science."

— Bill Gates

Scientific discovery based on data intensive

computing is now considered as the ''fourth

paradigm'' after theoretical, experimental, and

computational science.

10

Page 11: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

Over Moore’s Law in Data • IDC: Data doubles less every 18 months

• Huge volume

• Rapid increase

• Various types and formats

11

Page 12: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

Data Challenge

• Scientists are being overwhelmed with exploding scientific

data.

• Much scientific research needs data distributed in different

locations.

• There is a growing gap between ability of modern scientific

instruments and that of scientists.

• It has been a great challenge to view, manipulate, store,

move, share, and interpret the massive data. 12

Page 13: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

Scientific Data Deluge in CAS

• Large scientific facilities produce huge data – +20 being operation

– +20 under construction

• Long-Term field observation stations – +100 stations including Ecology, Environment, Space, etc.

• Long-Term Research data need to be archived and shared – 100+ institutes

Large Scientific facilities Field observation stations

13

Page 14: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

High Speed Network -CSTNET

-CSTNET-CNGI

-GLORIAD

1.Field observation stations

2.Large scientific facilities

3.others

Advanced CI for Data Lifecycle in CAS

Application

Generation

&Collection

Trans-

mission

Computing

&Analysis

Storage

&Curation

Data

Information Stream

Data Centers -storage &preservation

-Curation

-Sharing and Service

Supercomputing Grid -Computing

-Analysis

-Mining

-visualization

Data intensive e-

Science activities and

Applications

14

Page 15: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

It is mixed evolution of grid computing, distributed computing, parallel computing, utility computing, network storage technologies, virtualization, and etc.

It has the characteristics of large-scale, virtualization, high reliability, generality, expandability, on-demand service, extremely cheap, which enables it a popular computing paradigm.

It can bridge the scientists and massive data.

Chinese Academy of Sciences Scientific Data Cloud (CASSDC) is focused on cloud technology to provide facilitated ways for scientists to make use of powerful information infrastructure, massive scientific data and rich scientific software.

Cloud Computing

15

Page 16: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

Integrated

Service

Middleware

Infrastructur

e

Scientific

Data

Data

Service

Infrastructure

Service

Infrastructure

Service

Network

Job Scheduler

Data publisher

MetaData Manager

Data Transport

Services of CASSDC

16

Page 17: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

Scientific Data infrastructure

Middle ware (Scientific data grid middleware,

internet-based storage service

middleware…)

Scientific databases

Massive storage system

Data-intensive computing facility

High speed network

Application enabled environments

and typical e-science practice

Software and Toolkits

(scientific data collection, curation, and

publishing, data analyzing and

visualization…)

17

Page 18: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

Data Centers Distribution of CASSDC Scientific Data

~1PB

Above 60 institutions

Multiple Disciplines

Storage Capacity

~ 22PB(50PB)

1 major center

1 archive center

12 middle-size center

Computing Capacity

~ 5000(10000) CPU

cores

Dedicated design for

DIC

18

Page 19: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

System Ach. Of Major Center

19

Page 20: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

Enabling Technology: Infrastructure

Global File System of Cloud Storage

20

Page 21: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

Enabling Technology: Infrastructure

On fly provision of a computing cluster

CPU

MemoryCPU

MemoryCPU

Memory

CPU

Memory

CPU

Memory

IP kernelWOL

(1) (2) (3)

(4)

Computing Nodes Pool

Image

(4)(4)

switch to root

file systemswitch to root

file system……

……

Storage

Image Image

DHCP Server

TFTPServer

ClusterManager

21

Page 22: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

Scientific Databases (SDB) • A Long-term mission started

in 1986 which funded by CAS – many institutes involved

– long-term, large-scale collaboration

– data from research, for research

• Collecting multi-discipline research data and promoting data sharing

– More than 350 research

databases and 500 datasets by

61 institutes

– Over 200TB data available to

open access and download http://www.csdb.cn

22

Page 23: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

Scientific Databases (cont.) • focusing on data integration and improving

research database to be resource database and

even reference database)

Research database Research database

Resource database

Reference database

Application oriented database

23

Page 24: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

Scientific Databases (cont.) • 8 Resource databases

– Geo-Science

– Biodiversity

– Chemistry

– Astronomy

– Space Science

– Micro biology and virus

– Material science

– Environment

2 Reference databases

– China Species

– compound

4 application-Oriented

databases

– High Energy (ITER)

– Western Environment

Research

– Ecology research

– Qinghai Lake Research

24

Page 25: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

Scientific Databases (cont.) • 37 research databases

– Physics & Chemistry, Geosciences, Biosciences,

Atmospheric & Ocean Science, Energy Science,

Material Science, Astronomy & Space Science

GeoScience 43%

Chemistry 9%BioScience 18%

ICT 6%

Space 4%

Astronomy 1%

Physics 6%Ocean 5%Material 5%

Energy 3%

25

Page 26: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

CAS Scientific Data Grid • SDG is

– built upon the Scientific Database, supporting to find and access

large scale, distributed and heterogeneous scientific data

uniformly and conveniently in a SECURE and proper way

• Building scientific data application grid according to

domain requirements

– Integrate distributed data, analysis tools and storage and

computing facilities, providing a uniform data service interface

– 4 pilot grids

• bioscience grid

• geoscience grid

• Chemistry grid

• Astronomy and space science grid

26

Page 27: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

Scientific Data Grid-Architecture

Organization Architecture of SDG 27

Page 28: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

SDG-Platform && Middleware

• Platform – SDGIM: Information

Management

– SDGOM: Operation

Management

– SDGSA: Storage Service

– SDGMS: Monitor && Statistic

• Middelware – SDGDD: Data Publish

– SDGDT:Data Transfer Toolkit

– SDGDC: Data Compress

Toolkit

– SDGMM:MetaData

Management

– SDGJS: Job Scheduler

28

Page 29: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

Tools for data management and service

29

Page 30: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

An Integrated Case on Geography Supported

by CASSDC

• Data and computing resource are both distributed

• Model is from CAS scientist

• Adopted Middleware: • Data search

• Data transport

• On-fly computing provision

• Job scheduler

• It solves massive data computing while some commercial geometric software can’t work

• Project: High Precision Display of Earth Surface

30

Page 31: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

• Data: • Microbiology Institute

• World Data Center for

Microorganisms

• Wuhan Virus Institute

• Computing: • CNIC

• Microbiology Institute

• Adopted Middleware: • Data search

• Data transport

• Job scheduler

• User athentication

• Gene Alignment Project

An Integrated Case on Biography Supported

by CASSDC

31

Page 32: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

An Integrated Case on Biography Supported

by CASSDC

32

Page 33: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

Cooperation

• International Organization Membership

33

Page 34: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

Cooperation with Europe

CSTNET provide network support for the data

transmission between Europe and China

34

ITER

Global Earth Observation System

of Systems

CERN LHC: ATLAS & CMS

ARGO-Yangbajing

Page 35: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

Challenges

• On-demand Linking multi-disciplinary data

based on semantic

• Big Data processing

– High scalable, Low cost, high Throughput

– On-demand flexible data processing

• Integrate data, storage, computing,

analysis model and etc. as a whole system

driven by one specific scientific problem

– Making infrastructure invisible for scientists 35

Page 36: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

Conclusion

• Science discovery has increasingly become

data intensive, and it calls for reliable and easily

accessible scientific data infrastructure

• CAS is always promoting to build scientific data

infrastructure and data intensive e-Science

practices

• Seeking potential cooperation in data intensive

e-Science and data cloud

36

Page 37: Scientific data cloud infrastructure and services in ...€¦ · CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific

Thank you!

37