Linked Open Government Data (LOGD)

  • Published on
    18-Dec-2014

  • View
    68

  • Download
    5

Embed Size (px)

DESCRIPTION

 

Transcript

<ul><li> 1. Linked Open Government Data (LOGD): Ontology Usage Experimental Results Second Presentation Nooshin Allahyari 1 </li> <li> 2. Outlines Categorizing data provider Dataset collection Dataset characteristics Namespace Ontology Usage Annotation property Concept Coverage Case-Based Analysis Conclusion Nooshin Allahyari 2 </li> <li> 3. Categorizing data provider US Government Agencies Dividing agencies based on US Federal Government Reference Model Each agency is in charge of publishing related datasets Data.gov catalog also provide topic related categorization Nooshin Allahyari 3 </li> <li> 4. Outlines Categorizing data provider Dataset collection Dataset characteristics Namespace Ontology Usage Annotation property Concept Coverage Case-Based Analysis Conclusion Nooshin Allahyari 4 </li> <li> 5. Dataset Collection All 25 Datasets collected from Data.gov Datasets are in RDF format Difficulties running huge datasets Using different tools As endpoint Virtuoso commercial version as SPARQL endpoint Easy to Install GUI Lots of visual tools SQL,SQL tools and connection tools. Increasing dataset number for reliability Nooshin Allahyari 5 </li> <li> 6. Outlines Categorizing data provider Dataset collection Dataset Composition Characteristics Namespace Ontology Usage Annotation property Concept Coverage Case-Based Analysis Conclusion Nooshin Allahyari 6 </li> <li> 7. NameSpace Nooshin Allahyari 7 Same Namespace usage for all datasets </li> <li> 8. Ontology Vocabulary Usage FEA Reference Model Ontology(RMO) Vocabulary Related to Government Context General Vocabulary Country State City Government programs, Services: Health Program Cultural Program Nooshin Allahyari 8 </li> <li> 9. Annotation Property Useful to provide additional information about datasets. All datasets have: rdfs:lable Rdfs:comments No language tag or metadata Some datsets from Italy dataset catalog in TWC LOGD contain Language Tag . Nooshin Allahyari 9 </li> <li> 10. Outlines Categorizing data provider Dataset collection Dataset characteristics Namespace Ontology Usage Annotation property Concept Coverage Case-Based Analysis Conclusion Nooshin Allahyari 10 </li> <li> 11. Concept Coverage Same Concept in all datasets Metadata for Data.gov wiki and TWC LOGD Nooshin Allahyari 11 Prefix Concept foaf Homepage rdfs isDefinedBy dcterms Source dgtwc uses-property dgtwc number-of-triples dgtwc number-of-properties dgtwc number-of-enteries </li> <li> 12. Concept Coverage General Concept Related Government Low Coverage of concept Multi-name concepts Nooshin Allahyari 12 Concept Coverage(percentage) State 48% City 32% State-Abbreviation 16% Region 12% Zip 12% Country 8% Country origin code 8% Area code 8% </li> <li> 13. Outlines Categorizing data provider Dataset collection Dataset characteristics Namespace Ontology Usage Annotation property Concept Coverage Case-Based Analysis Conclusion Nooshin Allahyari 13 </li> <li> 14. Case-Based Analysis Three dataset from same agency in same category Department of Veterans Affairs dataset1213 dataset1288 Dataset1290 Result of each dataset queries shows all three of them have similar concepts State City VISN Station Nooshin Allahyari 14 </li> <li> 15. Case-Based Analysis-1288 The query lists all station with their specific code(VISN) in each city and determine the state in which the city is located in: Nooshin Allahyari 15 SELECT DISTINCT ?city ?station ?visn ?st WHERE { ?s ?city OPTIONAL{ ?s ?station} OPTIONAL{?s ?visn} OPTIONAL{?s ?st} } State VISN Station City "NJ" "3" "561" "East Orange" "NY" "3" "620" "Montrose" "NY" "3" "630" "New York Harbor" "NY" "3" "632" "Northport" "DE" "4" "460" "Wilmington" "PA" "4" "503" "Altoona" "PA" "4" "529" "Butler" "WV" "4" "540" "Clarksburg" </li> <li> 16. Case-Based Analysis-1290 The query lists all station with their specific code(VISN) in each city and determine the state in which the city is located in: Nooshin Allahyari 16 SELECT DISTINCT ?city ?station ?visn ?st WHERE { ?s ?city OPTIONAL{ ?s ?station} OPTIONAL{?s ?visn} OPTIONAL{?s ?st} } State VISN Station City "ME" "1" "402" "Togus" "VT" "1" "405" "White River Junction" "MA" "1" "518" "Bedford" "MA" "1" "523" "West Roxbury" "NH" "1" "608" "Manchester" "MA" "1" "631" "Northampton" "RI" "1" "650" "Providence" "CT" "1" "689" "West Haven" </li> <li> 17. Case-Based Analysis-1213 The query lists all station with their specific code(VISN) in each city and determine the state in which the city is located in: Nooshin Allahyari 17 SELECT DISTINCT ?visn ?city ?state WHERE { ?s ?visn. ?s ?city. ?s ?state } State VISN City "CT" "1" "West Haven" "MA" "1" "Bedford" "MA" "1" "West Roxbury" "MA" "1" "Northampton" "ME" "1" "Togus" "NH" "1" "Manchester" "RI" "1" "Providence" "VT" "1" "White River Junction" </li> <li> 18. Case-Based Analysis-1206 Dataset 1206 similarities Nooshin Allahyari 18 VISN STATE Facility-name City "1" "CT" "VA Connecticut HCS" "West Haven" "1" "MA" "Edith Nourse Rogers Memorial Veterans Hospital" "Bedford" "1" "MA" "VA Boston HCSW Roxbury Brockton Jamaica Plns" "West Roxbury" "1" "MA" "VAMC" "Northampton" "1" "ME" "VAMC/RO" "Togus" "1" "NH" "VAMC" "Manchester" "1" "RI" "VAMC" "Providence" "1" "VT" "VAM/ROC" "White River Junction" SELECT DISTINCT ?state ?facilityname ?city ?visn WHERE { ?s ?visn. ?s ?state. ?s ?city. ?s ?facilityname } </li> <li> 19. Case-Based Analysis-Comparison We need to explicitly define owl:sameAs property for similar properties in order to get query results: Nooshin Allahyari 19 SELECT DISTINCT ?state ?city WHERE { GRAPH { ?s1 ?state. ?s1 ?city . owl:sameAs . http://www.data.gov/semantic/data/alpha/1288/dataset-1288.rdf#city Owl:sameAs http://www.data.gov/semantic/data/alpha/1290/dataset-1290.rdf#city. } GRAPH { ?s2 &lt; ?st. ?s2 ?city. owl:sameAs . Owl:sameAs . }}order by ?state State City "CT" "West Haven" "MA" "Bedford" "MA" "West Roxbury" "MA" "Northampton" "ME" "Togus" "NH" "Manchester" "RI" "Providence" "VT" "White River Junction" </li> <li> 20. Outlines Categorizing data provider Dataset collection Dataset characteristics Namespace Ontology Usage Annotation property Concept Coverage Case-Based Analysis Conclusion Nooshin Allahyari 20 </li> <li> 21. Conclusion No Government ontology have been used in experimental datasets Weak vocabulary usage in US Government Multi-vocabulary usage for same concept Multi-vocabulary usage in same government agency Lack of well defined, coherent, and consistent government ontology. Nooshin Allahyari 21 </li> <li> 22. Thank you Nooshin Allahyari 22 </li> </ul>