Nooshin Allahyari
1
Linked Open Government Data (LOGD): Ontology Usage Experimental ResultsSecond Presentation
Nooshin Allahyari
2
Outlines
• Categorizing data provider
• Dataset collection
• Dataset characteristics
▫ Namespace
▫ Ontology Usage
▫ Annotation property
• Concept Coverage
• Case-Based Analysis
• Conclusion
Nooshin Allahyari
3
Categorizing data provider
• US Government Agencies
• Dividing agencies based on US Federal Government
Reference Model
• Each agency is in charge of publishing related datasets
• Data.gov catalog also provide topic related
categorization
Nooshin Allahyari
4
Outlines• Categorizing data provider
• Dataset collection
• Dataset characteristics
▫ Namespace
▫ Ontology Usage
▫ Annotation property
• Concept Coverage
• Case-Based Analysis
• Conclusion
Nooshin Allahyari
5
Dataset Collection
• All 25 Datasets collected from Data.gov
• Datasets are in RDF format
• Difficulties running huge datasets
• Using different tools As endpoint
▫ Virtuoso commercial version as SPARQL endpoint
Easy to Install
GUI
Lots of visual tools
SQL,SQL tools and connection tools.
• Increasing dataset number for reliability
Nooshin Allahyari
6
Outlines• Categorizing data provider
• Dataset collection
• Dataset Composition Characteristics
▫ Namespace
▫ Ontology Usage
▫ Annotation property
• Concept Coverage
• Case-Based Analysis
• Conclusion
Nooshin Allahyari
7
NameSpace
•Same Namespace usage for all datasets
Nooshin Allahyari
8
Ontology Vocabulary Usage• FEA Reference Model Ontology(RMO)
• Vocabulary Related to Government Context
▫General Vocabulary
Country
State
City
▫Government programs, Services:
Health Program
Cultural Program
Nooshin Allahyari
9
Annotation Property
•Useful to provide additional information
about datasets. All datasets have:
▫rdfs:lable
▫Rdfs:comments
▫No language tag or metadata
Some datsets from Italy dataset catalog in TWC
LOGD contain Language Tag .
Nooshin Allahyari
10
Outlines• Categorizing data provider
• Dataset collection
• Dataset characteristics
▫ Namespace
▫ Ontology Usage
▫ Annotation property
• Concept Coverage
• Case-Based Analysis
• Conclusion
Nooshin Allahyari
11
Concept Coverage •Same Concept in all datasets•Metadata for Data.gov wiki and TWC
LOGDPrefix Concept
foaf Homepage
rdfs isDefinedBy
dcterms Source
dgtwc uses-property
dgtwc number-of-triples
dgtwc number-of-properties
dgtwc number-of-enteries
Nooshin Allahyari
12
Concept Coverage•General Concept Related Government•Low Coverage of concept• Multi-name concepts
Concept Coverage(percentage)
State 48%
City 32%
State-Abbreviation 16%
Region 12%
Zip 12%
Country 8%
Country origin code 8%
Area code 8%
Nooshin Allahyari
13
Outlines
•Categorizing data provider•Dataset collection•Dataset characteristics
▫Namespace▫Ontology Usage▫Annotation property
•Concept Coverage•Case-Based Analysis•Conclusion
Nooshin Allahyari
14
Case-Based Analysis• Three dataset from same agency in same
category▫Department of Veterans Affairs
dataset1213 dataset1288 Dataset1290
• Result of each dataset queries shows all three of them have similar concepts
State City VISN Station
Nooshin Allahyari
15
Case-Based Analysis-1288• The query lists all station with their specific
code(VISN) in each city and determine the state in which the city is located in:
SELECT DISTINCT ?city ?station ?visn ?stWHERE { ?s <http://www.data.gov/semantic/data/alpha/1288/dataset-1288.rdf#city> ?cityOPTIONAL{ ?s <http://www.data.gov/semantic/data/alpha/1288/dataset-1288.rdf#station> ?station}OPTIONAL{?s <http://www.data.gov/semantic/data/alpha/1288/dataset-1288.rdf#visn> ?visn}OPTIONAL{?s <http://www.data.gov/semantic/data/alpha/1288/dataset-1288.rdf#st> ?st}}
State VISN Station City
"NJ" "3" "561" "East Orange"
"NY" "3" "620" "Montrose"
"NY" "3" "630""New York Harbor"
"NY" "3" "632" "Northport"
"DE" "4" "460" "Wilmington"
"PA" "4" "503" "Altoona""PA" "4" "529" "Butler"
"WV" "4" "540" "Clarksburg"
Nooshin Allahyari
16
Case-Based Analysis-1290• The query lists all station with their specific
code(VISN) in each city and determine the state in which the city is located in:
SELECT DISTINCT ?city ?station ?visn ?stWHERE { ?s <http://www.data.gov/semantic/data/alpha/1288/dataset-1288.rdf#city> ?cityOPTIONAL{ ?s <http://www.data.gov/semantic/data/alpha/1288/dataset-1288.rdf#station> ?station}OPTIONAL{?s <http://www.data.gov/semantic/data/alpha/1288/dataset-1288.rdf#visn> ?visn}OPTIONAL{?s <http://www.data.gov/semantic/data/alpha/1288/dataset-1288.rdf#st> ?st}}
State VISN Station City
"ME" "1" "402" "Togus"
"VT" "1" "405" "White River Junction"
"MA" "1" "518" "Bedford"
"MA" "1" "523" "West Roxbury"
"NH" "1" "608" "Manchester"
"MA" "1" "631" "Northampton""RI" "1" "650" "Providence"
"CT" "1" "689" "West Haven"
Nooshin Allahyari
17
Case-Based Analysis-1213• The query lists all station with their specific
code(VISN) in each city and determine the state in which the city is located in:
SELECT DISTINCT ?visn ?city ?state WHERE { ?s <http://www.data.gov/semantic/data/alpha/1213/dataset-1213.rdf#visn> ?visn. ?s <http://www.data.gov/semantic/data/alpha/1213/dataset-1213.rdf#city> ?city. ?s <http://www.data.gov/semantic/data/alpha/1213/dataset-1213.rdf#state> ?state}
State VISN City
"CT" "1" "West Haven"
"MA" "1" "Bedford"
"MA" "1" "West Roxbury"
"MA" "1" "Northampton"
"ME" "1" "Togus"
"NH" "1" "Manchester""RI" "1" "Providence"
"VT" "1""White River
Junction"
Nooshin Allahyari
18
Case-Based Analysis-1206•Dataset 1206 similarities VISN STATE Facility-name City
"1" "CT" "VA Connecticut HCS" "West Haven"
"1" "MA""Edith Nourse Rogers Memorial
Veterans Hospital""Bedford"
"1" "MA""VA Boston HCSW Roxbury
Brockton Jamaica Plns""West Roxbury"
"1" "MA" "VAMC" "Northampton"
"1" "ME" "VAMC/RO" "Togus""1" "NH" "VAMC" "Manchester""1" "RI" "VAMC" "Providence"
"1" "VT" "VAM/ROC""White River
Junction"
SELECT DISTINCT ?state ?facilityname ?city ?visnWHERE {?s <http://www.data.gov/semantic/data/alpha/1206/dataset-1206.rdf#visn> ?visn.?s <http://www.data.gov/semantic/data/alpha/1206/dataset-1206.rdf#state> ?state.?s <http://www.data.gov/semantic/data/alpha/1206/dataset-1206.rdf#city> ?city.?s <http://www.data.gov/semantic/data/alpha/1206/dataset-1206.rdf#facility_name> ?facilityname}
Nooshin Allahyari
19
Case-Based Analysis-Comparison• We need to explicitly define “owl:sameAs”
property for similar properties in order to get query results:
SELECT DISTINCT ?state ?city WHERE { GRAPH <http://localhost8890/vad/dataset1288> { ?s1 <http://www.data.gov/semantic/data/alpha/1288/dataset-1288.rdf#st >?state. ?s1 <http://www.data.gov/semantic/data/alpha/1288/dataset-1288.rdf#city> ?city .<http://www.data.gov/semantic/data/alpha/1288/dataset-1288.rdf#st> owl:sameAs<http://www.data.gov/semantic/data/alpha/1290/dataset-1290.rdf#st> . http://www.data.gov/semantic/data/alpha/1288/dataset-1288.rdf#cityOwl:sameAshttp://www.data.gov/semantic/data/alpha/1290/dataset-1290.rdf#city.} GRAPH <http://localhost8890/vad/dataset1290> { ?s2 <<http://www.data.gov/semantic/data/alpha/1290/dataset-1290.rdf#st> ?st. ?s2 <http://www.data.gov/semantic/data/alpha/1290/dataset-1290.rdf#city> ?city.
<http://www.data.gov/semantic/data/alpha/1290/dataset-1290.rdf#st> owl:sameAs <http://www.data.gov/semantic/data/alpha/1288/dataset-1288.rdf#st>.
<http://www.data.gov/semantic/data/alpha/1290/dataset-1290.rdf#city>Owl:sameAs<http://www.data.gov/semantic/data/alpha/1288/dataset-1288.rdf#city>.
}}order by ?state
State City
"CT" "West Haven"
"MA" "Bedford"
"MA" "West Roxbury"
"MA" "Northampton"
"ME" "Togus"
"NH" "Manchester""RI" "Providence"
"VT""White River
Junction"
Nooshin Allahyari
20
Outlines• Categorizing data provider
• Dataset collection
• Dataset characteristics
▫ Namespace
▫ Ontology Usage
▫ Annotation property
• Concept Coverage
• Case-Based Analysis
• Conclusion
Nooshin Allahyari
21
Conclusion
• No Government ontology have been used in
experimental datasets
• Weak vocabulary usage in US Government
• Multi-vocabulary usage for same concept
• Multi-vocabulary usage in same government
agency
• Lack of well defined, coherent, and consistent
government ontology.
Nooshin Allahyari
22
Thank you