Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Starrydata2:AplotminingwebsystemforliteraturedatacollectionYukariKatsura1,2*,MasayaKumagai3, Takushi Kodani1,2,Hideyasu Ouchi1,2,SakikoGunji2,YukiAndo2,YojiImai3,KaoruKimura1 andKojiTsuda1,2,31Univ.ofTokyo2MaDIS-MI2I,NIMS3RIKENAIP 1
MIワークショップ2018年 1⽉29⽇東京⼤学本郷キャンパス
KeyforthenextMI:Experimentaldatafromthousandsofpastpapers
2
• Searchenginesforallresearchers
• Datasourceformaterialsinformatics
Recoveryofexperimentaldatafromvariousplots
3
C.Kimetal.,ACSAppl.Mater.Interfaces4(2012)2949-2954.V.Stavila etal.,ACSAppl.Mater.Interfaces5(2013)6678-6686.L.Yangetal.,ACSAppl.Mater.Interfaces7(2015)23694-23699.Y.Minetal.,ACSNano9(2015)6843-6853.D.Kimetal.,Acta Materialia 59(2011)405-411.D.Kimetal.,Acta Materialia 59(2011)4957-4963.L.Huetal.,Acta Materialia 60(2012)4431-4437.K.Biswasetal.,Adv.EnergyMater.2(2012)634-638.Y.Minetal.,Adv.Mater.25(2013)1425-1429.L.Huetal.,Adv.Funct.Mater.24(2014)5211-5218.Q.Zhangetal.,Adv.Funct.Mater.25(2015)966-976.
Tasks:1.Plotdigitizationfromimages2.Sampledescriptionfromtext(chemicalcompositionetc.)
Time-consuming&boringforresearchers→ Developedanewwebsystem
Targetareaofourexperimentaldatabase
4Reliabilityofsingledata
Collectionspeed
Large-scaleDB?Fully-automaticdigitizationText-mineddescription
Non-selective
Analysisofroughtrends
IntermediateDBSemi-automaticdigitization
ManualdescriptionNon-selective
Experimentalmaterialsinformatics
Searchengineforpastexperimental
samples
Ourtarget!
ClassicalDBManualdataextractionManualdescription
Selective
Directoriesforreliablereferences
Starrydata2websystem:Onlinedata-sharingplatformforliteraturedata
5
URL:http://starrydata2.orgProgramming:MasayaKumagai (RIKEN)FREEtouse
OtherMaterlalsdatabase?
OtherMaterlalsdatabase?
OtherMaterlalsdatabase?
OtherMaterlalsdatabase?
OtherMaterlalsdatabase?
OtherMaterlalsdatabase?
Catalystmaterials?Magneticmaterials?Batterymaterials?Strongly-correlatedsystems?Superconductors?Dieletric materials?….
Prototypedatabase:“TEdbProject”Adatabaseofexperimentalthermoelectricproperties
Databaseselectionpage
Starrydata2websystem:Indexpage
6
InputmultipleDOIs(DigitalObjectIdentifier)
Searchforpapers
Additionofselectedpapersonmylists
Linkforplot-mineddata
Bibliographicinformation:automaticallyretrievedfromDOI
ListofallpapersinDB
LinkforPublisher'swebsiteforfulltext
Suggestivefilenameforfulltext download
Createoriginalmylists
Mylists ofpapers
Designedasatoolforresearcherstoorganizeinformationaboutpublishedpapers
URL:http://starrydata2.orgProgramming:MasayaKumagai (RIKEN)
Starrydata2:Data-browsingpage
7
①Click"Data"
Listoffigureswithdatasets
Listofsamplesreportedinthepaper
② Figureselector
③Sampleselector
④DigitaldatainSIunit
ConvertUnit
⑤Dataintheselectedfigure
Browse/Add/Editdata
Addnewdata
Bibliographicinformationoforiginalpaper
Automaticunitconversione.g.10mW/(cm*K)→1W/(m*K)
Plot-miningprocedure:(1)Getplotimages
8
③CaptureplotimagesbyPrintScreen /FireShot /PDFreaderetc.
①AccessPublisher'swebpage ②AccessFulltext PDF
(ExampleinGoogleChrome)
④Copyplotimage ⑤GotobottomofStarrydata 2 ⑦Pastetheplotimage
WebPlotDigitizer isembedded
AnkitRohatgi
Align-axesdialogue
Zoombutton
(2)Semi-automaticplottracingbyWebPlotDigitizer
9
①Chooseplottype ②AlignX-YAxes ③Click4pointsonaxes ④Inputreadings
⑤SelecttracingregionsBox,Pen,Erase toolsareavailable
⑥SelectcolorandRunSelectalgorithmandtuneparameters
↑Automaticallydetectedpoints
←click
←click
←click
←click
⑦Viewdataandcopythedigitizeddata
↑click
⑧PastethedigitizeddatatoStarrydata2→Save
TracingbymanualclickingisalsopossibleOptional
AnkitRohatgi
RecordedsamplesofPbTe-typethermoelectricmaterials
10
Circlesize∝ ZT
Thermalconductivity
Powerfactor
208sampleswithdataat300K
Datacollection:Takushi Kodani (Univ.ofTokyo)
ComparisontopresentDB:UCSB-MRLdataminingchart
11
PbTe-type:7datasets
M.W. Gaultois et al., Chemistry of Materials, 25 (2013) 2911.http://www.mrl.ucsb.edu:8080/datamine/thermoelectric.jsp
ApopularlearningdatasetformachinelearningofexperimentalTEproperties
ComparisonofourdatasetwithUCSB-MRLdataminingchart
12
OurPbTe-typedataset:morediverse!
LargerdatasetswillcontributepresentMIsystemsby・Reductionofoverlearning・Improvementinpredictionaccuracy
Starrydata 2websystem:Dataexportfunctions
13
AJSONfilecontainingalldatasetsinthemylist
DOIlisttosharethemylist
Referencelisttowritepapers
…{'figureid':14,'paperid':2,'propertyid_x':0,'propertyid_y':1,'sampleid':17,'x':1003.2921810699587,'y':-7.749077490774911e-05},{'figureid':14,'paperid':2,'propertyid_x':0,'propertyid_y':1,'sampleid':18,'x':300.4115226337448,'y':-0.00011881918819188195},...],'sample':[{'composition':'Ba24Ga0Ag0Ge100','paperid':0,'sampleid':0,'samplename':'x=0,y=0'},{'composition':'Ba24Ga4Ag0Ge96','paperid':0,'sampleid':1,'samplename':'x=4,y=0'},…
clathrate.json149papers/623figures/890samples73281datapointsFilesize:10MB/Downloadtime:~3min
rawdata (allnumeric)ReadingStarrydata2dataonPython
14
paper
figure
sample
property
clathrate.json (10MB)
Possiblecollaborations
15
DonationfromindustriesDatasetsofinterestcanberequested
Donation
Tousetheelectronicjournalsystems,theprojecthastobenon-commercial.
Univ.ofTokyoProjectmanager(Ourteam+α)
Studentworkers
Researchassistants
RIKEN
NIMS
NewdataisaddedtoStarrydata2(publictoeveryone)
Currentproject(TEdb):
Data Data
Payment
Forgreaterdatabaseprojects:
Researchassistants
Data
ResearchbudgetsMEXT,JSTetc.
〜¥1,000/paper
Collaboratorsfromindustries
Papers
Data
Summary
Starrydata2websystem(FREE)・Forefficientcollectionandsharingofdatafromplotimagesonliterature・Includesdataexportsystemformaterialsinformatics
Prototypedatabase:TEdbProject (FREE)・Sample-baseddatabaseofexperimentalthermoelectricproperties
・200〜500samplespermaterialfamily(Largestexperiental dataset)
URL:http://starrydata2.orgforfirstuse:http://starrydata2.org/signupManualpage:https://sites.google.com/site/yukarisearch/starrydata
E-mail:[email protected]
16