22
DATA FOR SCIENCE HOW ELSEVIER IS USING DATA SCIENCE TO EMPOWER RESEARCHERS Paul Groth | @pgroth | pgroth.com Disruptive Technology Director Elsevier Labs | @elsevierlabs European Data Forum 2016

Data for Science: How Elsevier is using data science to empower researchers

Embed Size (px)

Citation preview

Page 1: Data for Science: How Elsevier is using data science to empower researchers

DATA FOR SCIENCEHOW ELSEVIER IS USING DATA SCIENCE TO EMPOWER RESEARCHERS

Paul Groth | @pgroth | pgroth.com

Disruptive Technology Director

Elsevier Labs | @elsevierlabs

European Data Forum 2016

Page 2: Data for Science: How Elsevier is using data science to empower researchers
Page 3: Data for Science: How Elsevier is using data science to empower researchers

12 million people per month

Page 4: Data for Science: How Elsevier is using data science to empower researchers
Page 5: Data for Science: How Elsevier is using data science to empower researchers

40 million reactions 75 million compounds500 million facts

Page 6: Data for Science: How Elsevier is using data science to empower researchers

3 EXAMPLES• Personalized: what should I read?

• Actionable: who should I collaborate with?

• Consumable: how do I make my data available?

Page 7: Data for Science: How Elsevier is using data science to empower researchers

RECOMMENDATIONS AT MENDELEY

• Maya Hristakeva• Data Scientist at Mendeley• @mayahhf• Spark Summit 2015• http://www.slideshare.net/SparkSummit/

sparking-science-up-with-research-recommendations-by-maya-hristakeva

Page 8: Data for Science: How Elsevier is using data science to empower researchers

Read &

Organize

Search &

Discover

Collaborate &

Network

Experiment&

Synthesize

MENDELEY BUILDS TOOLS TO HELP RESEARCHERS …

Page 9: Data for Science: How Elsevier is using data science to empower researchers

BEING THE BEST RESEARCHER YOU CAN BE!• Good researchers are on top of their game

• Large amount of research produced

• Takes time to get what you need

• Help researchers by recommending relevant research

Page 10: Data for Science: How Elsevier is using data science to empower researchers
Page 11: Data for Science: How Elsevier is using data science to empower researchers

PERSONALIZED ARTICLE RECOMMENDATIONInput:User libraries

Output:

Suggested articles to read

Algorithms:• Collaborative Filtering

– Item-based

– User-Based

– Matrix Factorization

• Content-based

Page 12: Data for Science: How Elsevier is using data science to empower researchers

Costly & GoodCostly & Bad

Cheap & GoodCheap & Bad

Tuned IB Mahout

Tuned UB Mahout

Tuned UB Spark

Tuned IB Spark

UB DimSumSpark MLlib

ALS Matrix Fact.Spark MLlib

Performance

+100%

+150%~$50

Page 13: Data for Science: How Elsevier is using data science to empower researchers
Page 14: Data for Science: How Elsevier is using data science to empower researchers

CALCULATING 75 TRILLION METRICS• Benchmark 4600 institutions & 220 countries updated weekly

• 40 terabytes of data

• HPCC massively parallel compute system – 40 node system

Page 15: Data for Science: How Elsevier is using data science to empower researchers
Page 16: Data for Science: How Elsevier is using data science to empower researchers

ALL DATA ISN’T CURATED

Page 17: Data for Science: How Elsevier is using data science to empower researchers

60 % OF TIME IS SPENT ON DATA PREPARATION

Page 18: Data for Science: How Elsevier is using data science to empower researchers

10 ASPECTS OF HIGHLY EFFECTIVE RESEARCH DATA

https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data

Page 19: Data for Science: How Elsevier is using data science to empower researchers

http://data.mendeley.com/

Each dataset receives a versioned DOI, so it can be cited

The citation for the associated article is

displayed

Page 20: Data for Science: How Elsevier is using data science to empower researchers
Page 21: Data for Science: How Elsevier is using data science to empower researchers

ACADEMIC COLLABORATIONS

Page 22: Data for Science: How Elsevier is using data science to empower researchers

CONCLUSION• Researchers are faced with an ever growing amount of data and content

• Data Science is key to making systems that help them

• I’ve shown three Elsevier examples. Many more!

• Antonio Gulli’s codingplayground.blogspot.nl • labs.elsevier.com

• Of course, we’re hiring

Contact: Paul Groth @pgroth