49
Data Archiving and Networked Services SHEBANQ Dirk Roorda - researcher @ DANS,TLA System for HEBrew Text: ANnotations for Queries and Markup TEI pre-conference workshop: Query Roma – 2013-10-01

Shebanq roma-2013-10-01

Embed Size (px)

DESCRIPTION

SHEBANQ project (half-way) as a use case in querying language resources. The corpus is the text of the Hebrew Bible with linguistic features, packaged in de special text database and converted to LAF

Citation preview

Page 1: Shebanq roma-2013-10-01

Data Archiving and Networked Services !

SHEBANQ !

Dirk Roorda - researcher @ DANS,TLA !

System for HEBrew Text: ANnotations for Queries and Markup !

TEI pre-conference workshop: Query !Roma – 2013-10-01 !

Page 2: Shebanq roma-2013-10-01

Overview

1.  Context: text, data, research in Hebrew Bible

2.  MdF database model, MQL query language

3.  Sharing the research process

4.  CLARIN-NL project: SHEBANQ

5.  Towards new tools

Page 3: Shebanq roma-2013-10-01

1 (of 5) Context

Text, data and research in the Hebrew Bible

Page 4: Shebanq roma-2013-10-01

VU Amsterdam

Eep Talstra Centre for Bible and Computer

text + linguistic features => database

database + research questions => publications

4 !

Page 5: Shebanq roma-2013-10-01

2 (of 5) MdF and MQL

•  MdF database model

•  MQL query language

Page 6: Shebanq roma-2013-10-01

Monad Object Feature

1977-now: Eep Talstra et al. ECA, WIVU. Print reference (Google Books)

1988-1994 Crist-Jan Doedens: Text Databases – One Database Model and Several Retrieval Languages (google books reference)

2004: Ulrik Petersen. Emdros - a text database engine for analyzed or annotated text. COLING

Page 7: Shebanq roma-2013-10-01

word objects

standardedition

text

monads(atomic chunks

of text)

lexeme_utf8= תישארold_lexeme_utf8= תישאר

vocalized_lexeme_utf8= תישארsurface_consonants_utf8= תישאר

graphical_lexeme_utf8= ישאר

׃ץראה תאו םימשה תא םיה.א ארב תישארב

1234567891011

23456789101112

84383

59559

34680

7763777638

40770

7 .. 511 .. 9

11 .. 5

11 .. 5

11 .. 1

11 .. 1

clause_atom_number=1clause_atom_relation=0

clause_atom_relation_daughter_tense=unknownclause_atom_relation_kind=No_relation

clause_atom_relation_mother_tense=unknownclause_atom_relation_preposition_class=none

clause_atom_type=xQtlindentation=0

phrase objects

Monad-Object-Feature

subphrase objects

phrase_atom objects

clause_atom objects

sentence objects

Page 8: Shebanq roma-2013-10-01

MQL query language

topographic, i.e:

query expression =~= query results w.r.t.

•  sequence

•  embedding

Page 9: Shebanq roma-2013-10-01

Example SELECT ALL OBJECTS !WHERE ![Clause ! [Phrase ! [Word FOCUS !" " "part_of_speech = verb AND !" " "lexeme = "FJM["] !

] ! .. ! [Phrase FOCUS !" "phrase_function = Objc OR !" "phrase_function = IrpO!

] ! .. ! [Phrase FOCUS !" "phrase_function = Objc OR !" "phrase_function = IrpO!

] !] !

!

Page 10: Shebanq roma-2013-10-01

3 (of 5) Sharing

Problem: how to share (intermediate) results of analysis

Solution: saving queries as annotations

Page 11: Shebanq roma-2013-10-01

Lock - in

scholarly-bi

bles.com!

Stuttgart Electronic Study Bible

⇒ massive dissemination

But

⇒ not the right dynamics for tool development

Page 12: Shebanq roma-2013-10-01

Leiden: international workshop biblical scholarship

Desiderata:

new tool development

text transmission (variants)

linguistic analysis (features)

even combined!

a short history: 2012

leiden loren

tz!

Page 13: Shebanq roma-2013-10-01

Hebrew Text in the Archive

urn:nbn:nl:u

i:13-ikjj-ek

!

Page 14: Shebanq roma-2013-10-01

Hebrew Text in the Archive

urn:nbn:nl:u

i:13-ikjj-ek

!

how can the people annotate

our work? !

Page 15: Shebanq roma-2013-10-01

Research Data Cycle

Page 16: Shebanq roma-2013-10-01

Research Data Cycle Text transmission, tradition, editorial

processes

Free University, theology faculty,

server department, WIVU project

!

NWO projects !NWO projects

religious communities

theol. scholars

theol. scholars

enlightened lay people

scholarly-

bibles.com!

Page 17: Shebanq roma-2013-10-01

Research Data Cycle Text transmission, tradition, editorial

processes

Free University, theology faculty,

server department, WIVU project

!

NWO projects !NWO projects

religious communities

theol. scholars

theol. scholars

CLARIN SHEBANQ

linguists

Wider public: Annotation,

Query Saving, via Linked Data

dig. hum

comp. hum

enlightened lay people

scholarly-

bibles.com!

Research Data Archiving

DANS

Page 18: Shebanq roma-2013-10-01

3 (of 5) Sharing (c’t’d)

Solution: Queries As Annotations

Page 19: Shebanq roma-2013-10-01

queries-as-annotations

model ! query ! example !

body ! query instruction !SELECT ALL OBJECTS WHERE [Word FOCUS part_of_speech = verb AND lexeme = "שים"] !

targets ! query results in context !

ו ישכם יעקב ב בקר ו יקח את ה אבן אשר שם מראשתיו ו ישם אתה מצבה ו יצק שמן

על ראשה

annotation ! published query ! qu123 (just an identifier) !

metadata !

researcher, date created, date last

run, research question !

Janet Dyk 2004-02-16 2012-01-27 Can the verb ים have a double שobject? - article in Foundations for Syriac Lexicography !

Page 20: Shebanq roma-2013-10-01

OpenAnnotation openannotati

on.org!

Page 21: Shebanq roma-2013-10-01

provenance

Page 22: Shebanq roma-2013-10-01

motivation

Page 23: Shebanq roma-2013-10-01

demonstrator datane

tworkservice

.nl/qaa!

Page 24: Shebanq roma-2013-10-01

demonstrator datane

tworkservice

.nl/qaa!

Page 25: Shebanq roma-2013-10-01

demonstrator datane

tworkservice

.nl/qaa!

Page 26: Shebanq roma-2013-10-01

demonstrator datane

tworkservice

.nl/qaa!

Page 27: Shebanq roma-2013-10-01

demonstrator

Page 28: Shebanq roma-2013-10-01

demonstrator

Page 29: Shebanq roma-2013-10-01

demonstrator

Page 30: Shebanq roma-2013-10-01

demonstrator

still missing:

saving queries

not semantic-web-enabled

sustainability

Page 31: Shebanq roma-2013-10-01

4 (of 5) Project

CLARIN-NL: SHEBANQ:

(A) Curation

(B) Demonstrator

Page 32: Shebanq roma-2013-10-01

SHEBANQ

System for Hebrew Text: ANnotations for Queries

CLARIN-NL project

data curation: LAF

demonstrator: query saver

#!/etc bc

s/g$/q/ !

Page 33: Shebanq roma-2013-10-01

Linguistic Annotation Framework

ISO 24612:2012

Nancy Ide, Laurent Romary

Page 34: Shebanq roma-2013-10-01
Page 35: Shebanq roma-2013-10-01
Page 36: Shebanq roma-2013-10-01
Page 37: Shebanq roma-2013-10-01
Page 38: Shebanq roma-2013-10-01

feature definitions

Page 39: Shebanq roma-2013-10-01

feature definitions

Page 40: Shebanq roma-2013-10-01

TEI ISO-FS schema

Page 41: Shebanq roma-2013-10-01

dcr:datcat on <fDecl> versus <f>

26,225,966 <f>s ! !2.5 GB redundant attribute material !!

Page 42: Shebanq roma-2013-10-01

5 (of 5) Project

CLARIN-NL: SHEBANQ: (B) Demonstrator

Page 43: Shebanq roma-2013-10-01

select all objects where

[clause [phrase phrase_function = Objc [word FOCUS tense = infinitive_absolute] ]]

Execute

Query executed

Passage

תאו םימשה תא םיהלא ארב תישארב׃ץראה

תיב הלעא יכ תוא המ והיקזח רמאיו׃הוהי

Controls

תיב הלעא יכ תוא המ והיקזח רמאיו׃הוהי

Gen 1:1

2Chron 3:4

Gen 1:1 תאו םימשה תא םיהלא ארב תישארב׃ץראה

תיב הלעא יכ תוא המ והיקזח רמאיו׃הוהי

Text

1Sam 12:4

Ex 23:2

Query results

Prev 2 3 65 ... 2241 Next21 313 results

Executing query ...

view in context

Save this query

Researcher Oliver Glanz

Date created 2013-08-25

Date last run 2013-08-25

Project Data and Tradition

Institute VU/Eep Talstra Centre for Bible and Computing

Reason irregular valency of ארב

Comments needs to be combined with query on םיהלא

Save PublishCancel

Name valency ארב

Edit Query

Page 44: Shebanq roma-2013-10-01

Passage

תאו םימשה תא םיהלא ארב תישארב׃ץראה

תיב הלעא יכ תוא המ והיקזח רמאיו׃הוהי

Controls

תיב הלעא יכ תוא המ והיקזח רמאיו׃הוהי

Gen 1:1

2Chron 3:4

Gen 1:1 תאו םימשה תא םיהלא ארב תישארב׃ץראה

תיב הלעא יכ תוא המ והיקזח רמאיו׃הוהי

Text

1Sam 12:4

Ex 23:2

Saved Query Results

Prev 2 3 65 ... 2241 Next21 313 results

view in context

Information on this query

Researcher Oliver Glanz

Date created 2013-08-25

Date last run 2013-08-25

Project

Institute

Reason

Comments

Name

Query Info

select all objects where

[clause [phrase phrase_function = Objc [word FOCUS tense = infinitive_absolute] ]]

MQL query text Persistent Identifier urn:nbn:nl:ui:13-scpm-ji

http://www.persistent-identifier.nl/?identifier=urn...

valency ארב

Data and Tradition

VU/Eep Talstra Centre for Bible and Computing

irregular valency of ארב

needs to be combined with query on םיהלא

Page 45: Shebanq roma-2013-10-01

datanetworks

ervice.nl/qa

a!

Page 46: Shebanq roma-2013-10-01

SHEBANQ: implementing Q-a-A

Page 47: Shebanq roma-2013-10-01

5 (of 5) Towards new tools

•  LAF tools

•  or generic graph algorithms

•  Emdros tools

•  or generic database technology

•  Linked Data tools

•  or generic SPARQL queries

Page 48: Shebanq roma-2013-10-01

Side conditions •  development close to the researchers

•  preferably in their own institutions

•  decent performance

•  within the scale of a laptop

•  usable to researchers

•  that is: non-programmers

•  persistence in mind

•  new results will be archived and re-enter the data cycle

Page 49: Shebanq roma-2013-10-01

thank you

[email protected]

slideshare.net/dirkroorda/

s/g$/q/ !

#!/etc bc Eep Talstra Centre for Bible and Computer!