19
Data.bnf.fr as a sandbox for FRBRization Automated work creation in data.bnf.fr

Data.bnf.fr as a sandbox for FRBRization - SWIBswib.org/swib18/slides/2_lapotre_data-bnf-fr.pdf · a homogenic corpus of documents →the XXth century authors. an exhaustive collection

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data.bnf.fr as a sandbox for FRBRization - SWIBswib.org/swib18/slides/2_lapotre_data-bnf-fr.pdf · a homogenic corpus of documents →the XXth century authors. an exhaustive collection

Data.bnf.fr as a sandbox for

FRBRizationAutomated work creation in data.bnf.fr

Page 2: Data.bnf.fr as a sandbox for FRBRization - SWIBswib.org/swib18/slides/2_lapotre_data-bnf-fr.pdf · a homogenic corpus of documents →the XXth century authors. an exhaustive collection

Five entities...

Page 3: Data.bnf.fr as a sandbox for FRBRization - SWIBswib.org/swib18/slides/2_lapotre_data-bnf-fr.pdf · a homogenic corpus of documents →the XXth century authors. an exhaustive collection
Page 4: Data.bnf.fr as a sandbox for FRBRization - SWIBswib.org/swib18/slides/2_lapotre_data-bnf-fr.pdf · a homogenic corpus of documents →the XXth century authors. an exhaustive collection
Page 6: Data.bnf.fr as a sandbox for FRBRization - SWIBswib.org/swib18/slides/2_lapotre_data-bnf-fr.pdf · a homogenic corpus of documents →the XXth century authors. an exhaustive collection

The data

Page 7: Data.bnf.fr as a sandbox for FRBRization - SWIBswib.org/swib18/slides/2_lapotre_data-bnf-fr.pdf · a homogenic corpus of documents →the XXth century authors. an exhaustive collection
Page 8: Data.bnf.fr as a sandbox for FRBRization - SWIBswib.org/swib18/slides/2_lapotre_data-bnf-fr.pdf · a homogenic corpus of documents →the XXth century authors. an exhaustive collection

“Old works” at the BnF : a handcrafted artefact...

https://catalogue.bnf.fr/ark:/12148/

cb14473195cValidity control =

persistence guarantee

Page 9: Data.bnf.fr as a sandbox for FRBRization - SWIBswib.org/swib18/slides/2_lapotre_data-bnf-fr.pdf · a homogenic corpus of documents →the XXth century authors. an exhaustive collection

Where to start ?

Page 10: Data.bnf.fr as a sandbox for FRBRization - SWIBswib.org/swib18/slides/2_lapotre_data-bnf-fr.pdf · a homogenic corpus of documents →the XXth century authors. an exhaustive collection

We need ...● a homogenic corpus of documents → the XXth century authors.● an exhaustive collection of records from the legal deposit.● A highly configurable robot which likes every kind of metadata…

DATABOT !

… and to keep it simple : no “aggregates” records !

Page 11: Data.bnf.fr as a sandbox for FRBRization - SWIBswib.org/swib18/slides/2_lapotre_data-bnf-fr.pdf · a homogenic corpus of documents →the XXth century authors. an exhaustive collection

AUTHOR 1

AUTHOR 2

AUTHOR 3

Subtitle 1

Title 1

Title 4

Title 2

Title 3

Page 12: Data.bnf.fr as a sandbox for FRBRization - SWIBswib.org/swib18/slides/2_lapotre_data-bnf-fr.pdf · a homogenic corpus of documents →the XXth century authors. an exhaustive collection

Then, from titles clusters, generate the

two faces...

Page 13: Data.bnf.fr as a sandbox for FRBRization - SWIBswib.org/swib18/slides/2_lapotre_data-bnf-fr.pdf · a homogenic corpus of documents →the XXth century authors. an exhaustive collection

The interface...

Page 14: Data.bnf.fr as a sandbox for FRBRization - SWIBswib.org/swib18/slides/2_lapotre_data-bnf-fr.pdf · a homogenic corpus of documents →the XXth century authors. an exhaustive collection

...The data

Page 15: Data.bnf.fr as a sandbox for FRBRization - SWIBswib.org/swib18/slides/2_lapotre_data-bnf-fr.pdf · a homogenic corpus of documents →the XXth century authors. an exhaustive collection

...Calendar Information

Page 16: Data.bnf.fr as a sandbox for FRBRization - SWIBswib.org/swib18/slides/2_lapotre_data-bnf-fr.pdf · a homogenic corpus of documents →the XXth century authors. an exhaustive collection

● First semester of 2019 : ○ uploading computed works in the data.bnf.fr

interface○ Validation process

● Second semester of 2019 :○ Uploading computed and validated works in the

catalog○ Attribution of permanent URIs

Page 17: Data.bnf.fr as a sandbox for FRBRization - SWIBswib.org/swib18/slides/2_lapotre_data-bnf-fr.pdf · a homogenic corpus of documents →the XXth century authors. an exhaustive collection

Concomitantly...

Evaluating the quality of the Main Catalog metadata :

o date : content and coherenceo title : content and structurationo author : homonyms et function codeso Language

Curation of the metadata in order to improve clustering performances

Page 18: Data.bnf.fr as a sandbox for FRBRization - SWIBswib.org/swib18/slides/2_lapotre_data-bnf-fr.pdf · a homogenic corpus of documents →the XXth century authors. an exhaustive collection

After works’ integration into the Main Catalog...

• Side projectso Non textual workso Foreign workso Before 1900 workso Expressions

• “Benchmarking”

o Linking toward the ABES computed works to check validity of newly created works at the BnF

Page 19: Data.bnf.fr as a sandbox for FRBRization - SWIBswib.org/swib18/slides/2_lapotre_data-bnf-fr.pdf · a homogenic corpus of documents →the XXth century authors. an exhaustive collection

Thank you for your attention !