16
News networks in XVII century Italy Giovanni Colavizza EPFL, Mario Infelise Ca’ Foscari

Mapping the News Networks in XVII Italy

Embed Size (px)

Citation preview

News networks in XVII century Italy

Giovanni Colavizza EPFL, Mario Infelise Ca’ Foscari

Subject: the European news flow

Hypothesis: 1 system of news exchange through Europe.

Raise in demand during 30y War, regular postal service.

Key traits of this information system: • multi-media (handwritten long and short range, more

flexible on demand; printed short range and broader public)

• adaptive “hub and spoke” network • multi-language

Our questions and general aims

How to: 1. prove the existence and extent of the flow 2. reconstruct its fine-grained dynamic cartography 3. study the problem of information supply and

exchange: media interactions

Basic approach: detect text reuse.We start by developing robust methods for this end.

How gazettes look

Sources (year 1648)

Asti

Cartagena

Francia

Catalogna

Provenza

Livorno

Alicante

CasaleParma

Bruxelles

Avignone

Colonia

Palermo

Riviera diPonente

Madrid

Marsiglia

Inghilterra

Lione

Torino

Napoli

Lisbona

Roma

Londra Germania

Milano

Genova

Barcellona

Parigi

VeneziaBologna

Francia

Svezia

Augusta

Palatinato

Costantinopoli

Monaco

Erfurt

Norimberga

Londra

Franconia

Cassel

Venezia

Vienna

Svevia

Munster

Ratisbona

Amburgo

Francoforte PragaColonia

Printed gazettes: Turin and Genoa

Handwritten: from Vatican Archives,

Segreteria di Stato, Avvisi.

Methods: data preparation - printed

Results: editorial policies (printed gazettes)

Most frequent sequence order of printed news in each issue:

• Genoa: Genoa, Rome/Naples/Marseille, Milan, Lisbon, Barcelona, Paris, London, Germany and Venice. • Turin: (i1) Turin, Barcelona, Paris, London, Germany; (i2) Milan, Genoa, Naples, Rome and Venice.

Statistic Genoa TurinTotal character

count 281206 579381

Total number of paragraphs 263 1221

Average characters per

paragraph1069 474

Results: editorial policies (printed gazettes)Sheet1

Page 1

1 2 3 4 5 6

0

200

400

600

800

1000

1200

Average text per item per month Turin

Genoa

Month

Cha

r co

unt

1 2 3 4 5 6

0

2000

4000

6000

8000

10000

12000

14000

16000

Average text per issue Turin

Genoa

Month

Cha

r co

unt

Sheet1

Page 1

1 2 3 4 5 6

0

200

400

600

800

1000

1200

Average text per item Turin

Genoa

Month

Cha

r co

unt

1 2 3 4 5 6

0

2000

4000

6000

8000

10000

12000

14000

16000

Average text per issue Turin

Genoa

Month

Cha

r co

unt

Methods: matching algorithms - printed

Strategy: compare paragraphs (units of formatting/reading but also meaning)

Global match: SubString Kernels (similarity of sequences of non-contiguous characters) Local alignment: Smith-Waterman (finds local matching passages) Threshold filtering and manual evaluation of 2 highest scoring matches

Results: the flow (printed gazettes)

Turin

Paris

Barcelona

Lisbon

Milan Venice

London

Naples

Rome

Genoa

Germany

Results: comparisons (printed gazettes)

Categories: 1. verbatim copy of a whole paragraph or parts of it 2. paraphrasing or translations of the same source 3. same news from different sources 4. same topic but different news

Results: 1 and 3 <1%2 circa 30% 4 circa 43%

Evaluation: precision by hand recall “intractable”

Methods: data preparation - handwritten

Plenipotentiario di Spagna (keyword)

Re di Spagna (name_of_person)

Conte d'Avò (name_of_person)

spagnoli (quantity)

Ambasciatore di Portogallo (keyword)

Perera (name_of_person)

Hassi (keyword)

Cassel (name_of_place)

Plenipotentiario di Franza (keyword)

Sua Maestà Cesarea (name_of_person)

Landgraviessa d'Assia (name_of_person)

Osnapruch (name_of_place)

trattato dell'Imperio (keyword)

Lantgravio di Darmstat (name_of_person)

Amnistia nello stati hereditarij (keyword)

anni (quantity)

Pinorada (name_of_person)

Svedesi (keyword)

Provincia d'Utrecht (name_of_place)

pace (keyword)

Spagna (name_of_place)

Olanda (name_of_place)

Zelanda (name_of_place)

Provinzie Basse (name_of_place)

Francia (name_of_place)

Methods: matching algorithms - handwritten

Strategy: compare paragraphs

Typed canonicalisation: similar words are grouped into typed categories (Jaro-Winkler distance) Paragraph comparison: Tf-idf vectors, cosine distance Manual evaluation of 2 highest scoring matches

Too limited and skewed corpus for now..

Results: matchings (handwritten)

Munster 24 April 1648:

Cologne 19 April 1648:

High score, same topic, different news. Different news-sheets

Open questions

1. How to effectively evaluate results? The open question of scalable recall and precision

2. How to get a larger corpus (e.g. at least 2 years to study seasonality)? 1) lack of data 2) cost of data preparation

3. How to compare printed and handwritten news? Ongoing work

4. What to focus on? Variations are as interesting as verbatim copies to study the interaction of different medias and types of gazettes..

News networks in XVII century Italy

Thanks

Giovanni Colavizza EPFL, Mario Infelise Ca’ Foscari