Introduction to the Venice Time Machine

Preview:

Citation preview

A brief introduction to the Venice Time Machine

Giovanni Colavizza EPFL

Who am I

Giovanni Colavizza PhD student in Management of Technology chair of Digital Humanities, EPFL

previously: Computer Science, History, Archival and Library Sciences, 2 start-ups and some positions in IT and research.

Today

Venice Time Machine 1- Vision (where to go) 2- Pipeline (how) and Projects (what) 3- Methods and DH in context (or why, and how again)

VTM Vision

VTM Vision

Preservation (from analog to digital) Access (from browsing to searching) Valorisation by use

Preservation

Digitisation and replication as a preservation strategy.. Quite complicated: 1- metadata (digital provenance) 2- replication protocols: IT infrastructure (centralised vs distributed) 3- rights and partners’ needs (far away goal of open access for public heritage)

Access

An Information System down to contents:

Valorisation

1- research 2- teaching 3- digital reconstruction and outreach 4- technology transfer 5- methodology transfer

Pipeline illustrated by projects

1. Digitisation (Tomography) 2. Image processing (Pre-processing Suite) 3. Content extraction (Automatic transcription, READ Project) 4. Information modelling (Garzoni Project) 5. Building an information system (Document Viewer) 6. Content enrichment and network effects (Linked Books Project) 7. Valorisation and use (GIS, digital experiences, …)

Tomography

Fauzia Albertin EPFL

Tomography

Fauzia Albertin EPFL

Pipeline illustrated by projects

1. Digitisation (Tomography) 2. Image processing (Pre-processing Suite) 3. Content extraction (Automatic transcription, READ Project) 4. Information modelling (Garzoni Project) 5. Building an information system (Document Viewer) 6. Content enrichment and network effects (Linked Books Project) 7. Valorisation and use (GIS, digital experiences, …)

Image pre-processing suite

Andrea Mazzei ODOMA

Image pre-processing suite

Video pt. 1

Andrea Mazzei ODOMA

Pipeline illustrated by projects

1. Digitisation (Tomography) 2. Image processing (Pre-processing Suite) 3. Content extraction (Automatic transcription, READ Project) 4. Information modelling (Garzoni Project) 5. Building an information system (Document Viewer) 6. Content enrichment and network effects (Linked Books Project) 7. Valorisation and use (GIS, digital experiences, …)

Semi-automatic transcription

or the Big Data quest for script family resemblances

READ Horizon 2020 project: 8.2 million €, 7 partners, maximum peer reviewers’ score.

Opt. 1: Alignment

ello$ stara$ en$ carcere$ domentre$ chel$ fara$queste$ chose$ opagera.$ Et$ e5am$ deo$stagando$collui$encarcere$se$sauera$la$che$ sia$ dellauer$ de$ collui$ lodoxe$comandera$ chello$ sia$ entromesso$edara$ sse$ allo$ so$ credetor.$ Et$ e5am$deo$ selo$ creditor$ uora$ enues5r$lapprietade$ del$ debitor$ enquella$ fia$da$ alcreditor$ sera$ data$ en$ ues5xon.$Mosella$ femena$ che$ none$ maritata$sera$9depnata$segon$do$che$desoura$edito$ tuto$ se$ fara$ segondo$ che$ nui$auemo$ soura$ dito$ delomo$ remetuda$questa$ cho$ sa$ chello$ stara$ enlo$teratorio$de$san$ҫacharia$e$$

Fouad Slimane EPFL

Opt. 1: Alignment

Fouad Slimane EPFL

! chose! opagera.! Et! e.am! deo!stagando!collui!encarcere!se!sauera!la!che! sia! dellauer! de! collui! lodoxe!comandera! chello! sia! entromesso!edara! sse! allo! so! credetor.! Et! e.am!deo! selo! creditor! uora! enues.r!lapprietade! del! debitor! enquella! fia!da! alcreditor! sera! data! en! ues.xon.!Mosella! femena! che! none! maritata!sera!9depnata!segon!do!che!desoura!edito! tuto! se! fara! segondo! che! nui!auemo! soura! dito! delomo! remetuda!questa! cho! sa! chello! stara! enlo!teratorio!de!san!ҫacharia!e!!

Opt. 1: Alignment

Fouad Slimane EPFL

!!encarcere!se!sauera!la!

che! sia! dellauer! de! collui! lodoxe!comandera! chello! sia! entromesso!edara! sse! allo! so! credetor.! Et! e3am!deo! selo! creditor! uora! enues3r!lapprietade! del! debitor! enquella! fia!da! alcreditor! sera! data! en! ues3xon.!Mosella! femena! che! none! maritata!sera!9depnata!segon!do!che!desoura!edito! tuto! se! fara! segondo! che! nui!auemo! soura! dito! delomo! remetuda!questa! cho! sa! chello! stara! enlo!teratorio!de!san!ҫacharia!e!!

Opt. 1: Alignment

Fouad Slimane EPFL

!!

lodoxe!comandera! chello! sia! entromesso!edara! sse! allo! so! credetor.! Et! e2am!deo! selo! creditor! uora! enues2r!lapprietade! del! debitor! enquella! fia!da! alcreditor! sera! data! en! ues2xon.!Mosella! femena! che! none! maritata!sera!9depnata!segon!do!che!desoura!edito! tuto! se! fara! segondo! che! nui!auemo! soura! dito! delomo! remetuda!questa! cho! sa! chello! stara! enlo!teratorio!de!san!ҫacharia!e!!

Opt. 1: Alignment

Fouad Slimane EPFL

!!

sse! allo! so! credetor.! Et! e-am!deo! selo! creditor! uora! enues-r!lapprietade! del! debitor! enquella! fia!da! alcreditor! sera! data! en! ues-xon.!Mosella! femena! che! none! maritata!sera!9depnata!segon!do!che!desoura!edito! tuto! se! fara! segondo! che! nui!auemo! soura! dito! delomo! remetuda!questa! cho! sa! chello! stara! enlo!teratorio!de!san!ҫacharia!e!!

Opt. 2: Word spotting and Neural Networks

Andrea Mazzei ODOMA

Video pt. 2

Pipeline illustrated by projects

1. Digitisation (Tomography) 2. Image processing (Pre-processing Suite) 3. Content extraction (Automatic transcription, READ Project) 4. Information modelling (Garzoni Project) 5. Building an information system (Document Viewer) 6. Content enrichment and network effects (Linked Books Project) 7. Valorisation and use (GIS, digital experiences, …)

Information modelling

Garzoni Project Lille University and EPFL

ANR+FNS funded

Valentina Sapienza Lille Maud Ehrmann EPFL

Information modelling

Valentina Sapienza Lille Maud Ehrmann EPFL

Information modelling

Valentina Sapienza Lille Maud Ehrmann EPFL

Pipeline illustrated by projects

1. Digitisation (Tomography) 2. Image processing (Pre-processing Suite) 3. Content extraction (Automatic transcription, READ Project) 4. Information modelling (Garzoni Project) 5. Building an information system (Document Viewer) 6. Content enrichment and network effects (Linked Books Project) 7. Valorisation and use (GIS, digital experiences, …)

Information system

Fabio Bortoluzzi EPFL

Information system

Fabio Bortoluzzi EPFL

Not all documents are the same in connecting to each other.

Fiscal declarations (for taxation)

Personal acts (contracts, testaments, etc.)

State machinery (office holding)

Information system

Fabio Bortoluzzi EPFL

How Venetians indexed this information?

Information system

Fabio Bortoluzzi EPFL

Real estate surveysFiscal declarations

Testaments

Information system

Fabio Bortoluzzi EPFL

Entities

Indexes

Documents

Information system

Orlin Topalov EPFL

Pipeline illustrated by projects

1. Digitisation (Tomography) 2. Image processing (Pre-processing Suite) 3. Content extraction (Automatic transcription, READ Project) 4. Information modelling (Garzoni Project) 5. Building an information system (Document Viewer) 6. Content enrichment and network effects (Linked Books Project) 7. Valorisation and use (GIS, digital experiences, …)

Content enrichment

Linked Books Project EPFL, Ca’ Foscari, Marciana

FNS funded

Approx. half of the citations in humanities are to primary sources [Wiberley (2009)].

Their use has hardly ever been studied with citation analytic methods.

Network effects: directly link scholarship with primary sources.

Content enrichment

• Primary and secondary sources • Citation history (e.g. Google Scholar) • Citation semantics • Algorithmic History of the History of Venice

Content enrichment

Content enrichment

Content enrichment

Content enrichment

Network-based models. Remember primary and secondary sources, how many graphs can we build?

Bibliographic coupling and co-citation

Content enrichment: multiple perspectives

Pipeline illustrated by projects

1. Digitisation (Tomography) 2. Image processing (Pre-processing Suite) 3. Content extraction (Automatic transcription, READ Project) 4. Information modelling (Garzoni Project) 5. Building an information system (Document Viewer) 6. Content enrichment and network effects (Linked Books Project) 7. Valorisation and use (GIS, digital experiences, …)

Valorisation: some examples

Immersive reality

Valorisation: some examplesGIS and 3d virtual reconstructions

Valorisation: some examples

Teaching and interdisciplinary collaborations

Valorisation: some examples

Replication and transfer

VTM in the context of DH

1- The Big vs Small Data debate, or a proposal for reframing

2- The quest for evidence of value, or overcoming the DH drudgery conundrum

3- Humanities in the digital era, or why we need historians more than ever ;)

VTM in the context of DH

The Big vs Small Data debate, or a proposal for reframing

Big Data (for Humanities): 1- a matter of dimensions (in Tb or Pb)

2- networked, relational vs well-bounded (Kaplan 2015) 3- Telescope vs Microscope

“Data” are not big or small per se, but are so according to the observer. Do I want to aggregate or disaggregate? Do I have

“larger” or “smaller” questions?

VTM in the context of DH

The Big vs Small Data debate, or a proposal for reframing

Macro MicroMeso

VTM in the context of DH

The quest for evidence of value, or overcoming the DH drudgery conundrum

Tool-building not an end in itself. Developing tools to answer old questions should lead to new questions and perspectives. The great quest in DH

now is for new arguments.

VTM in the context of DH

Humanities in the digital era, or why we need historians more than ever ;)

“historians are fundamentally in the business of taking complex, incomplete sources that are full of biases and errors, and interpreting them critically to develop an argument that answers a research question. Digital sources do not change this.”

Ian Gregory

VTM in the context of DH

Humanities in the digital era, or why we need historians more than ever ;)

“Data of different kinds must be understood in their historical

relationship.”

Historians as critical arbiters of information trained to work with time (“comparative modelling of multiple

variables over time” in jargon).

A brief introduction to the Venice Time Machine

Thank you

Giovanni Colavizza EPFL

“Computers are incredibly fast, accurate and stupid; humans are incredibly slow, inaccurate and brilliant;

together they are powerful beyond imagination.” Albert Einstein (or was it someone else??)