16
TECHNIQUES FOR OPTIMIZING THE QUERY PERFORMANCE OF DISTRIBUTED XML DATABASE - NAHID NEGAR

TECHNIQUES FOR OPTIMIZING THE QUERY PERFORMANCE OF DISTRIBUTED XML DATABASE - NAHID NEGAR

Embed Size (px)

Citation preview

TECHNIQUES FOR

OPTIMIZING THE QUERY PERFORMANCE

OFDISTRIBUTED XML DATABASE

- NAHID NEGAR

PROBLEM STATEMENT

• EXPLORING THE RESEARCH SCOPE FOR IMPROVING THE PERFORMANCE OF THE DISTRIBUTED QUERY PROCESS FOR XML DATABASE.

• THE RESEARCH PAPER DESCRIBES:

• THE ISSUES AND CONSIDERATIONS FOR DISTRIBUTED XML QUERY PROCESSING.

• EXPLORING CLASSICAL QUERY OPTIMIZATION TECHNIQUES

• PRESENTING SIMILAR RESEARCH WORK DONE BY OTHERS.

• ANALYZED THE RESEARCH SCOPE AND DIRECTIONS.

DISTRIBUTED XML DATABASE

• XML FILES ARE IDEAL FOR DESCRIBING SEMI STRUCTURED DATA.

• WITH THE INCREASE AMOUNT OF DATA, THE XML DATABASES ARE EXPANDED [1]

• STORAGE OF A LARGE NUMBER OF XML FILES

• PRESERVING THE HIERARCHICAL FORMAT.

• DATA IS DISTRIBUTED OR FRAGMENTED IN DIFFERENT LOCATIONS, CAN BE EVEN DIFFERENT GEOGRAPHIC LOCATION.

• DATA INTEGRATION IS NEEDED WHEN PROCESSING A QUERY ON DISTRIBUTED DATABASE [2].

WHY DISTRIBUTED XML DATABASE IS NEEDED [6]

• LOWER COSTS

• INCREASED SCALABILITY

• INCREASED AVAILABILITY

• DISTRIBUTION OF SOFTWARE MODULES

• NEW APPLICATIONS BASED ON DISTRIBUTION

• MARKET FORCES

XML DATABASE AND QUERY PROCESSING

• XML DDL – DTD

• XML SCHEMA - XSD

• XML DML

• XML QUERY LANGUAGES (EXAMPLE XQUERY)

• ATTRIBUTES OF XML DATABASE:

• MULTIPLE LEVELS OF VALIDITY

• ENTITIES AND URI

• TRANSFORMATIONS

DISTRIBUTED XML QUERY PROCESSING CONSIDERATIONS [7]

• ARCHITECTURE OF DISTRIBUTED QUERY PROCESSING SYSTEMS

• CENTRALIZED VS. DISTRIBUTED PROCESSING OF DISTRIBUTED QUERY

• STATIC VS. DYNAMIC QUERY PROCESSING

• DATA VS. QUERY SHIPPING

DISTRIBUTED XML QUERY PROCESSING ISSUES [7]

• DIFFERENT QUERY PROCESSING CAPABILITIES OF THE DATA SOURCES

• UNAVAILABILITY OF STATISTICAL INFORMATION ON THE DATA SOURCES

• UNRELIABLE RESPONSE TIMES

• DATA REDUNDANCY

• TIME TO LAST VS. TIME TO FIRST ELEMENT

POPULAR PERFORMANCE IMPROVEMENT TECHNIQUE FOR DISTRIBUTED XML QUERY

[6]• SELECTIVITY: FACILITATE QUERY PLANNER WITH ABILITY OF SELECTIVITY

ESTIMATION

• SELECTION PUSHDOWN: PERFORM SELECTIONS AS SOON AS POSSIBLE IN THE QUERY TREE

• INCREMENTAL UPDATES: THE MATERIALIZED VIEW IS UPDATED TO REFLECT THE CHANGES

• VIEW QUERYING: QUERIES CAN BENEFIT FROM EXPLOITING EXISTING MATERIALIZED VIEWS

• QUERY CONTAINMENT: FIND THE COMMON SUB-QUERIES AND EXECUTE THOSE JUST ONCE

APPROACHES TAKEN BY OTHERS

• AN OPTIMIZING QUERY PROCESSING WITH AN EFFECTIVE CACHING MECHANISM FOR DISTRIBUTED DATABASE [5]

• EFFICIENTLY PROCESSING XML QUERIES OVER FRAGMENTED REPOSITORIES WITH PARTIX [8]

• A METHODOLOGY FOR QUERY PROCESSING OVER DISTRIBUTED XML DATABASES [4]

• SCALABLE AND DISTRIBUTED PROCESSING OF SCIENTIFIC XML DATA [3]

AN OPTIMIZING QUERY PROCESSING WITH AN EFFECTIVE CACHING MECHANISM FOR

DISTRIBUTED DATABASE [5]• DATABASE OPTIMIZATION FRAMEWORK HAS BEEN DESCRIBED.

• THE SQL STATEMENT CONTAINS ELEMENTS WHICH IS ACCEPTED BY AN XML ORIENTED COMMON DATA .

• A HISTORICAL DATABASE AND QUERY BASED CACHE REPLACEMENT HAS BEEN USED.

• AN XML DATABASE SYSTEM IS SUITABLE FOR THE IMPLEMENTATION OF DATA ANALYSIS APPLICATION.

• A COMMON OPTIMIZATION QUERY PROCESSING MODEL IS ALSO USED .

EFFICIENTLY PROCESSING XML QUERIES OVER FRAGMENTED REPOSITORIES WITH

PARTIX [8]• THE DATA VOLUME OF XML REPOSITORIES AND THE RESPONSE TIME OF

QUERY PROCESSING HAVE BECOME AS CRITICAL ISSUES.

• THE TRADITIONAL FRAGMENTATION DEFINITIONS DON NOT DIRECTLY USE FOR XML DOCUMENTS.

• HIGH PERFORMANCE OF XML DATA SERVERS IS FOCUSED.

• PATRIX IS USED FOR EXPERIMENT.

A METHODOLOGY FOR QUERY PROCESSING OVER DISTRIBUTED XML DATABASES [4]

• THE METHODOLOGY FOR XQUERY QUERY PROCESSING OVER DISTRIBUTED XML DATABASES.

• THE TECHNIQUE CAN BE USED IN AN XML DATABASE WHICH ALLOWS FRAGMENTATION AND HOMOGENEOUS XML DATABASES.

• AN ARCHITECTURE BASED MEDIATOR WITH ADAPTORS ATTACHED TO REMOTE DATABASES IS PROPOSED.

• THREE TYPES OF FRAGMENTATION SUCH AS HORIZONTAL, VERTICAL AND HYBRID WERE USED FOR SEVERAL EXPERIMENTS.

SCALABLE AND DISTRIBUTED PROCESSING OF SCIENTIFIC XML DATA [3]

• THE BIG DATA TECHNIQUE IN XML METADATA INDEXING FOR DISTRIBUTED XML DATABASE.

• THE MAPREDUCE PROCESSING IS INCORPORATED.

• THE DATASET PROCESSING IS A CRITICAL TO ENSURE EFFECTIVE USE.

• AN AUTOMATED PROCESS CAN BE HELPFUL.

• THIS PAPER TESTED THE PERFORMANCE RESULTS USING TWO MAPREDUCE IMPLEMENTATIONS, APACHE HADOOP AND LEMO-MR.

RESEARCH SCOPE IN DISTRIBUTED XML QUERY PROCESSING PERFORMANCE

• STRUCTURED-NESS – HOW TO DETERMINE THE STRUCTURE AND THE INDEXES.

• SCHEMA HETEROGENEITY – HOW TO INTEGRATE HETEROGENEOUS SCHEMA.

• RELATION DEFINITION – HOW TO DEFINE RELATIONS AND COMPARISON BETWEEN XML ELEMENTS

• DATA SOURCE PROCESSING POWER - HOW TO DO DISTRIBUTED QUERY PROCESSING PLANNING

• ANSWER QUALITY – HOW TO PRODUCE AND VERIFY THE BEST RESULT.

• ANSWERING SPEED – HOW TO KEEP DB STATISTICS AND IMPROVE OPERATIONS.

• DATA SOURCE AND USER QUANTITY – PARALLEL QUERY PROCESSING ALGORITHM.

CONCLUSION

• XML IS A HIGHLY ACCEPTABLE FORMAT TO STORE DATA AND IS WIDELY USED

• WITH THE LARGE AMOUNT OF DATA PRODUCED FROM DIFFERENT LOCATION, A DISTRIBUTED XML DATABASE IS OFTEN USED.

• IT IS IMPORTANT TO MAINTAIN A REASONABLE PERFORMANCE FOR QUERY PROCESSING IN DISTRIBUTED DATABASE.

• THE GOAL OF THE PAPER IS TO, IDENTIFY THE RESEARCH SCOPE FOR DISTRIBUTED XML QUERY PROCESSING PERFORMANCE IMPROVEMENT.

REFERENCES

• 1. G. FIGUEIREDO, V. BRAGANHOLO, M. MATTOSO.PROCESSING, "PROCESSING QUERIES OVER DISTRIBUTED XML DATABASES." JOURNAL OF INFORMATION AND DATA MANAGEMENT ,1(3):455-470, OCTOBER 2010.

• 2. A. M. KULKARNI, J. THIRUNAVUKKARASU, P. S. PILLAI, S. S. SULEGAI, S. RAO "INSERTION AND QUERYING MECHANISM FOR A DISTRIBUTED XML DATABASE SYSTEM" IN: PROCEEDINGS OF THE 5TH ACM COMPUTE

• 3. E. DEDE, Z. FADIKA, C. GUPTA, M. GOVINDARAJU, "SCALABLE AND DISTRIBUTED PROCESSING OF SCIENTIFIC XML DATA", 2011 12TH IEEE/ACM INTERNATIONAL CONFERENCE ON GRID COMPUTING (GRID), VOL., NO.,

• 4. G. FIGUEIREDO1, V. BRAGANHOLO2, M. MATTOSO1, "A METHODOLOGY FOR QUERY PROCESSING OVER DISTRIBUTED XML DATABASES" PROGRAMA DE ENGENHARIA DE SISTEMAS E COMPUTAR IM/UFRJ, BRAZIL

• 5. S. PRABHA, A.KANNAN, P.A. KUMAR, "AN OPTIMIZING QUERY PROCESSING WITH AN EFFECTIVE CACHING MECHANISM FOR DISTRIBUTED DATABASE"

• 6. DONALD KOSSMANN, "THE STATE OF THE ART IN DISTRIBUTED QUERY PROCESSING," ACM COMPUTING SURVEYS, VOL. 32 , NO. 4, 2000, PP. 422-469.

• 7. M. SMILJANIĆ, H. BLANKEN, M V. KEULEN, W. JONKER, "DISTRIBUTED XML DATABASE SYSTEMS"

• 8. R. ANDRADE, G. RUBERG, A. BAI˜AO, V. BRAGANHOLO, AND M. MATTOSO. PARTIX: PROCESSING XQUERY QUERIES OVER FRAGMENTED XML REPOSITORIES. TECHNICAL REPORT ES-691, DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING - COPPE/FEDERAL UNIVERSITY OF RIO DE JANEIRO, BRAZIL, DEPARTMENT OF APPLIED INFORMATICS - UNIRIO, BRAZIL, DEC. 2005

• 9. J. SMITH AND P. WATSON. FAULT-TOLERANCE IN DISTRIBUTED QUERY PROCESSING. IN 9TH INTERNATIONAL DATABASE ENGINEERING AND APPLICATION SYMPOSIUM, 2005. IDEAS 2005., PAGES 329 – 338, JULY 2005.