34
Quality issues…Evaluation… Metadata Resolution-Levels of details Antti Jakobsson Key Users Meeting 15 th Oct 2010, Brussels

Quality key users

Embed Size (px)

DESCRIPTION

How quality can be automated and handled in geographic information

Citation preview

  • 1. Quality issuesEvaluationMetadata Resolution-Levels of details Antti Jakobsson Key Users Meeting 15th Oct 2010, Brussels

2. Key sucesses of the WP8 Utilization of International and Open standards Common understanding of what quality means in respect to the target specifications and user requirements and How to measure it ! Provision of these results in metadata Automation of the quality evaluation services 3. Benefits Early data error detection; Faster product turnaround; Reduced maintenance costs; Consistent evaluation procedures Better harmonisation; Improved spatial analysis; Confident decision making; Data that is trusted and usable. Data providers Data consumers 4. ESDIN approach 5. ESDIN approach to quality 6. ESDIN approach to quality 7. Quality Spreadsheets GEOGRAPHICAL NAMES 0 DATA QUALITY ELEMENTS COM PLE TEN ESS LOGICAL CONSISTENCY POSITIONAL ACCURACY TEMPORAL ACCURACY THEMATIC ACCURACY FEATURE TYPE & Attributes COM MIS SIO N OMISSI ON CONCE PTUAL CONSIS TENCY DOMAIN CONSIS TENCY FORMA T CONSIS TENCY TOPOL OGICAL CONSIS TENCY ABSOL UTE ACCUR ACY RELATI VE ACCUR AY GRIDDE D DATA POSITI ON ACCUR ACY ACCUR ACY OF A TIME MEASU REMEN T TEMPO RAL CONSIS TENCY TEMPO RAL VALIDIT Y CLASSI FICATIO N CORRE CTNES S NON- QUANTI TATIVE ATTRIB UTE CORRE CTNES S QUANTI TATIVE ATTRIB UTE ACCUR ACY NamedPlace DQ basic measure error rate: Id 7 DQ basic measure error count: Id 10 inspireId DQ basic measure error count: Id 16 name (Geographical Name) 8. Sampling/Full Inspection The cells of DQ basic measures are colour coded. The colours indicate the evaluation procedure: Attribute inspection by sampling according to ISO 2859 series (yellow cell) Variable inspection by sampling according to ISO 3951-1 (green cell) Full inspection (orange cell) FEATURES AND ATTRIBUTES SAMPLING (ISO 2859) FULL INSPECTION (automatic) SAMPLING (ISO 3951) ISO 2859 states the principles of testing sufficient items of the whole population by sampling. When expressed as two integers the error ratios of data subsets can be summed up to data set error rate by dividing the total number of errors with the total c If errors exist (error count > 0) the sub set should be rejected and corrective action by the producer is needed. It is assumed that the number of errors found is quite small. The customer may be attempted to make those few corrections them selves. This i ISO 3951 variable sampling gives reliable results on small sample sizes. CE95/LE95 is close enough the upper limit (U) of the standard on AQL 4 level. The ISO 3959 offers a clear acceptance criteria based on the sample. Mandatory Voidable Optional According to INSPIRE Data Specifications v3 9. Relevant data quality measures Relevant ISO/TS 19138 data quality measures 1 Name Rat e of exc ess ite ms Rate of missing items Numbe r of items not compli ant with the rules of the concep tual schem a Numbe r of invalid overlap s of surface s Numbe r of items not in confor mance with their value domain Physic al structur e conflict s number of faulty point- curve connec tions number of missing connec tions due to unders hoots number of missing connec tions due to oversh oots number of invalid slivers number of invalid self- interse ct errors number of invalid self- overlap errors mean value of position al uncerta inties (1D, 2D and 3D) Linear map accura cy at 95 % signific ance level Circula r error at 95 % signific ance level Misclas sificatio n rate Rate of incorre ct attribut e values attribute value uncertainty at 95 % significance level 2 Alias - - - overlapping surfaces - extraneous nodes undershoots overshoots slivers loops kickbacks - LMAS 95 % navigation accuracy - 3 Data quality element compl etenes s completenes s logical consistency logical consistency logical consistency logical consistency logical consistency logical consistency logical consistency logical consistency logical consistency logical consistency positional accuracy positional accuracy positional accuracy thematic accuracy thematic accuracy thematic accuracy 4 Data quality subeleme nt commi ssion omission conceptual consistency conceptual consistency domain consistency format consistency topological consistency topological consistency topological consistency topological consistency topological consistency topological consistency absolute or external accuracy absolute or external accuracy absolute or external accuracy classification correctness non- quantitative attribute correctness quantitative attribute accuracy 5 Data quality basic measure err or rate error rate error count error count error count error count error count error count error count error count error count error count not aplicab le LE95 or LE95I, depen ding on the evaluat ion CE95 error rate error rate LE95 or LE95(r), depending on the evaluation procedure 10. Testing plans 24 11. How to utilize the quality model Quality model will be transformed to a rule set and conformance levels ELF specifications will include these for the NMCAs Automated tools utilizing the rule and conformance levels 12. Quality requirements/Conformance levels To set the requirements use the quality measures To consider the nature of reality Feature vagueness Change rates Themes Suggested guidance for positional accuracy Suggestion on setting the classification of conformance levels 13. Setting conformance levels (examples) Geometric accuracy is critical and mostly well defined characteristic of cadastral parcels while the geographical names like a name of a lake does not have just one correct location. Any location within the area of the lake is acceptable. Completeness of transportation network is important to know and it can be explicitly evaluated. Wetlands may be important areas in hydrography but their existence or delineation can be hard to evaluate during a dry season 14. Example logical consistency 15. Example Thematic Accuracy 16. Positional accuracy 17. Quality evaluation Process Step 1: Applying the data quality measure to the data to be checked. The procedure for this is described in the the ISO19113/19114 standards Step 2: Reporting the score for each measure in a report form for each measure Step 3: Comparing the result from step two to the defined conformance level In addition, two continuing steps can be done: Step 4: Summarizing the conformance results into one result for each for each data quality elements Step 5: Summarising the results from step 4 into one overall dataset result 18. Aggregation of data quality conformance results Aggregation where the measurements are on different scales and have different units. -> transform all the data quality quantitative results into conformance results using a set of conformance levels/classes. See previous slides Aggregation for inhomogeneous data. This can be done by just reporting the lowest quality found in the most remote areas (see nature of reality slide). Another way (the one recommended here) is to use different conformance classifications for the different kind of areas (urban, rural, remote), and then summarise based on conformance score. To make this useful, a metadata description is needed to give the distribution between the kinds of area. Reporting details. The simplest way of reporting is just to give one value for the dataset. This can be a simple passed or failed with a reference to the product specification. But doing a lot of work in quality assessment, and just report one value, can be considered oversimplification. One way of giving quality statements as grades may be useful on the step 4 and step 5 (see above).. 19. Grading data example Grade Data Quality description Excellent Only class A for all quality measures Very good A majority of As, but also some Bs Good A majority of Bs, some As, no Cs Adequat Only a very few Cs, the other Bs and better Marginal A majority of Cs but also some Bs Not good No measure reached the class B (i.e. all measures on class C) 20. ESDIN approach to quality 21. Where you utilize quality webservices? If you are a data provider for SDI For quality control during production (automated) called here conformance testing (this includes edge- matching and generalization) For quality evaluation after the production (semi- automated) If you are the SDI co-ordinator or data custodian For quality audit for process accreditation or data certification doing either conformance testing and/or quality evalution If you are customer or data user To evaluate usability using metadata information 22. Rulesets & TemplatesDatabase Object Oriented Geospatial Rules Engine Collaborative Web-based Rule Authoring Web Services Interface Data Quality Evaluation Service Business Rules Data for Evaluation Quality Measures Geospatial Data File Rule Builder: Intuitive user interface to author, agree and manage DQ measures. DQ Client Application: Accessible, easy to use, automatic Data Quality Evaluation Service DQ Rules Engine: W3C Web Services interface using open standards to describe & execute geospatial rule evaluation. Rule Repository: Data Quality Rules, derived and guided by Quality Model. Web Feature Service Quality Evaluation Service SOAP HTTP 40 23. DQ Rule Builder Environment 41 24. DQ Evaluation Service Concept 42 25. DQ Evaluation Report 43 26. ESDIN approach to quality 27. Metadata approach Metadata needed for discovery of datasets through metadata catalogues and registries Metadata needed for the evaluation of those datasets, as to whether they are of sufficient quality to meet end users needs Metadata specific to the requirements of the ELF specifications 28. Are we INSPIRE compliant? Yes. We suggest some of the measures to be changed in the future editing of the INSPIRE data specification There are some mistakes in the current specification that should be corrected We also propose additional mesures 29. ESDIN/INSPIRE difference Admin units Suggested by INSPIRE Data Specificatio n v3 Administrati ve Units Section Data Quality Element Data Quality sub-element ISO 19138 measure Measure name / Basic quality measure Scope ESDIN quality model Comment 7.1.1 Completene ss Commission Id 3 Rate of excess items / error rate dataset- level The same as ESDIN 7.1.2 Completene ss Omission Id 7 Rate of missing items / error rate dataset- level The same as ESDIN 7.2.1.1 Logical Consistency Topological consistency Id 21 * Number of faulty point-curve connections / error count dataset- level The same as ESDIN 7.2.1.2 Logical Consistency Topological consistency Id 23 Number of missing connections due to undershoots / error count dataset- level The same as ESDIN 7.2.2 Logical Consistency Conceptual consistency Id 9 Conceptual schema compliance / correctness indicator dataset- level Number of items not compliant with the rules of the conceptual schema / error count used ID 10 in stead. Id 9 applicable just on single instance level 7.3.1 Positional Accuracy Absolute External positional accuracy Id 28 Mean value of positional uncertainties (1D,2D and 3D) / not applicable dataset- level Linear map accuracy at 95 % significance level / LE95 or LE95I Not used, used 36 instead * Id 21 in ISO 19138, but has the incorrect id 9 in INSPIRE DataSpecification AU 30. Additional quality measures Additional ones from ESDIN WP8 Logical Consistency Conceptual consistency Id 11 Number of invalid overlaps of surfaces / error count dataset-level Topological consistency Logical Consistency Domain consistency Id 16 Number of items not in conformance with their value domain / error count dataset-level Logical Consistency Conceptual consistency Id 10 Number of items not compliant with the rules of the conceptual schema / error count dataset-level Logical Consistency Format consistency Id 19 Physical structure conflicts / error count dataset-level Logical Consistency Topological consistency Id 25 number of invalid slivers / error count dataset-level Logical Consistency Topological consistency Id 26 number of invalid self-intersect errors / error count dataset-level Logical Consistency Topological consistency Id 27 number of invalid self-overlap errors / error count dataset-level Positional accuracy Absolute or external postitional accuracy Id 36 Linear map accuracy at 95 % significance level / LE95 or LE95I dataset-level Thematic accuracy Classification correctness Id 61 Misclassification rate / error rate dataset-level Thematic accuracy Non-quantitative attribute correctness Id 67 Rate of incorrect attribute values / error rate dataset-level 31. Resolution and Level of Details Target level of detail Scale 1:2,500,000 1:1,000,000 1,500,000 1,250,000 1,100,000 1,50,000 1:25,000 1:10,000 1:5,000 1:2,500 Global Target level of detail Regional Master Urban Rural Level of details Mountainous Target level of detail 32. Conclusions It is important that INSPIRE will give a platform for data quality information; minimum data quality comformance levels set and then ability to report other user community related conformance levels Quality evaluation metadata should be available for automated conformance testing Introducing a quality model which uses a same principles for all Annex I themes -> we will suggest this a guideline for INSPIRE implementation Introducing comformance levels that can be evaluated using semi- automated or automated based on ISO standards Automation of quality evaluation and conformance testing can be done for all transformation related workflows including schema transformation, generalization and edge matching Significant saving potential in quality reporting and improvement of data