advanced ansi sql data modeling

Page i

Advanced ANSI SQL Data Modeling and StructureProcessing

Page ii

For a complete listing of the Artech House Computer Science Library, turn to the back of thisbook.break

Page iii

Advanced ANSI SQL Data Modeling and StructureProcessing

Michael M. David

Page iv

Library of Congress Cataloging-in-Publication DataDavid, Michael M.Advanced ANSI SQL data modeling and structure processing / Michael M. Davidp. cm. — (Artech House computer library)Includes bibliographical references and index.ISBN 1-58053-038-9 (alk. paper)1. SQL (Computer program language) 2. Data structures (Computer science)I. Title. II. Series.QA76.73.S67 D39 1999005.75'65—dc21 99-33463 CIPBritish Library Cataloguing in Publication DataDavid, Michael M.Advanced ANSI SQL data modeling and structure processing. — (Artech House computinglibrary)1. SQL (Computer program language) 2. Data structures (Computer science)I. Title005.7'1262ISBN 1-58053-038-9

Cover and text design by Darrell Judd

1999 ARTECH HOUSE, INC.685 Canton StreetNorwood, MA 02062

All rights reserved. Printed and bound in the United States of America. No part of this bookmay be reproduced or utilized in any form or by any means, electronic or mechanical, includingphotocopying, recording, or by any information storage and retrieval system, withoutpermission in writing from the publisher.

All terms mentioned in this book that are known to be trademarks or service marks have beenappropriately capitalized. Artech House cannot attest to the accuracy of this information. Useof a term in this book should not be regarded as affecting the validity of any trademark orservice mark.

International Standard Book Number: 1-58053-038-9Library of Congress Catalog Card Number: 99-3346310 9 8 7 6 5 4 3 2 1break

Page v

This book is dedicated to my family—Maggie, Jason, Stephanie, Alina, Luis, Philip, Alicia, and Aunt Faye

Page vii

CONTENTS

Preface xv

Introduction xix

Part I: The Basics of the Relational Join Operation 1

1Relational Join Introduction

3

1.1 Standard Inner Join Review 4

1.2 Problems with Relational Join Processing 5

1.3 Outer Join Review 6

1.4 Problems with Previous Outer Join Syntax 7

1.5 Conclusion 9

2The ANSI SQL Join Operation

11

2.1 ANSI SQL Join Syntax 11

2.2 ANSI SQL Join Operation 14

2.3 ANSI SQL Join Does Not Follow the Cartesian Product Model 17

2.4 Determining ANSI SQL Join Associativity and Commutativity 18

Page viii

2.5 What Outer Join Commutativity Is 19

2.6 What Outer Join Associativity Is 19

2.7 Hierarchictivity in Addition to Associativity and Commutativity 20

2.8 Conclusion 21

2.8 Conclusion 21

3ANSI SQL Join Types and Their Operation

23

3.1 FULL Outer Join 23

3.2 One-Sided Outer Join 26

3.3 INNER Join 31

3.4 CROSS Join 32

3.5 UNION Join 32

3.6 Intermixing Join Types 33

3.7 Conclusion 34

4Natural Joins

37

4.1 Explicit and Implicit Natural Joins 37

4.2 Multitable Natural Outer Joins 39

4.3 Natural One-Sided Outer Join 41

4.4 Natural FULL Outer Join 42

4.5 Natural Inner Joins 44

4.6 Intermixing Natural Join Types 45

4.7 Natural One-Sided Join Transformation 46

4.8 Conclusion 47

Part II: Outer Join Data Modeling and StructuredProcessing

49

5Data Structure Review

51

5.1 The Power of Hierarchical Data Structures 51

5.2 Three-Tier Database Architecture 53

5.2 Three-Tier Database Architecture 53

5.3 External and Internal Views 54

5.4 Conceptual View 54

Page ix

5.5 Many-to-One and One-to-Many Relationships 55

5.6 Many-to-Many Relationships 55

5.7 Converting Network Structures to Hierarchical Structures 57

5.8 Relating Hierarchical Processing to Relational Processing 57

5.9 Physical Versus Logical Data Structures 59

5.10 Sibling Legs Query Semantics 60

5.11 Ordering of Data Structures Can Cause Their Restructuring 62

5.12 Data Structure Composition 63

5.13 Good Data Modeling Design Principles 64

5.14 Conclusion 65

6Outer Join Does Data Modeling

67

6.1 SQL Data Modeling Using the Outer Join 67

6.2 ON Clause Data Modeling Join Condition Rules 70

6.3 Valid and Invalid ON Clause Data Modeling Examples 72

6.4 Valid and Invalid Data Modeling Results 73

6.5 Substructure Views 74

6.6 WHERE Clause Filtering with Data Structures 77

6.7 WHERE Clause Filtering with Substructures 77

6.7 WHERE Clause Filtering with Substructures 77

6.8 Complex Data Modeling Example 79

6.9 Conclusion 79

7Outer Join Data Modeling-Related Capabilities

81

7.1 Data Structure Filtering 81

7.2 Indirect Structure Linking 83

7.3 Nonhierarchical Join Type Support 83

7.4 Nonhierarchical Joining of Data Structures 87

7.5 Many-to-Many Data Modeling and Intersecting Data 90

Page x

7.6 Conclusion 91

8More about Outer Join Data Modeling

93

8.1 Importance of SQL's Inherent Data Structure Processing Ability 93

8.2 Efficient Client/Server Data Structure Processing 94

8.3 Coding Data Modeling Outer Join Statements 94

8.4 Generation of Data Modeling Outer Join Statements 95

8.5 Hierarchical Data Structure Processing Empirical Proof 95

8.5.1 Hierarchical Control 96

8.5.2 Structure Control 97

8.6 Nonhierarchical Data Structure Processing Empirical Proof 98

8.7 Embedded Structured View Support Empirical Proof 99

8.8 Indirect Link Empirical Proof 101

8.8 Indirect Link Empirical Proof 101

8.9 SQL:1999 and Data Modeling 102

8.10 What Makes the ANSI Outer Join Unique for Data Modeling 103

8.11 Data Modeling with Old-Style Outer Joins 104

8.12 The New Role of the Inner Join Operation 105

8.13 Conclusion 105

Part III: New Capabilities Based on Outer Join DataModeling

107

9Data Structure Extraction (DSE) Technology

109

9.1 Extracting Data Structure Information from the Outer Join 109

9.2 DSE Example 110

9.3 Logical Table Example 111

9.4 Symmetric Linking of Data Structures Example 111

Page xi

9.5 DSE Internal Logic 113

9.6 Why Vendors Need the DSE Technology 113

9.7 DSE Avoids Imposing Data Structures on SQL 114

9.8 Conclusion 115

10Outer Join Advanced Capabilities

117

10.1 Database Navigation 117

10.2 Access Optimizations 118

10.3 Enterprise and Legacy Database Access 119

10.3 Enterprise and Legacy Database Access 119

10.4 Open Database Access Interface 120

10.5 Seamless Value-Added Features 120

10.6 Data Warehouse Interface 121

10.7 Hierarchical Relational Processing 121

10.8 Object Relational Interface 122

10.9 View Update Capability 123

10.10 Multimedia Application Directory Support 124

10.11 Universal Data Access of Structured Data 127

10.12 The SQL XML Data Structure Connection 128

10.13 Conclusion 130

11Outer Join Optimization

131

11.1 Join Table Reordering 131

11.2 Dynamic Shortening of the Access Path 132

11.3 Removal of Unnecessary Tables from Outer Join View 132

11.4 Increased Efficiency of Parallel Database Processing 135

11.5 Dynamic Rebuild to Pick up New SQL Features 135

11.6 Optimization of Nonrelational SQL Interfaces 136

11.7 Applying Hierarchical Optimizations to Network Structures 138

11.8 Shifting ON Clauses to the WHERE Clause 139

Page xii

11.9 Conclusion 141

11.9 Conclusion 141

12Hierarchical Relational Processor Prototype

143

12.1 Hierarchical Relational Prototype Operation 144

12.2 Basic Data Modeling 144

12.3 Many-to-Many Relationships 146

12.4 Embedded Views 147

12.5 View Optimization 148

12.6 Conclusion 150

13Object/Relational Interface

153

13.1 Standardized SQL Interface 153

13.2 Data Modeling and Structure Processing 154

13.3 Data Abstraction and Reusability 155

13.4 Data Inheritance 156

13.5 Database Navigation, Efficiency, and Nonerlational Access 157

13.6 Late Binding and Polymorphism 158

13.7 Plug and Play 159

13.8 Conclusion 160

14Nonrelational SQL-Based Universal Data Access

161

14.1 Structured Record Overview 162

14.2 SQL Structured Data Access Basics 164

14.3 Internal Navigation and Mapping of Structured Data 165

14.4 SQL-Based Universal Data Access of Structured Data 167

14.5 Handling Multiple Structure Formats within a File 168

14.5 Handling Multiple Structure Formats within a File 168

14.6 Interfacing to Prerelational and Postrelational Data 168

Page xiii

14.7 The Importance of the View for Contiguous Data 168

14.8 Conclusion 170

Part IV: Miscellaneous Data Modeling Topics 171

15Advanced Lower Structure Linking

173

15.1 Overview of Nonroot Lower Level Linking 173

15.2 Previous Nonroot Lower Level Linking Method 174

15.3 Semantics of Nonroot Lower Level Linking 174

15.4 Single Path Reference to Lower Structure 176

15.5 Multiple Path References to Lower Structure 178

15.6 Optimization Concerns for Nonroot Lower Level Linking 180

15.7 Using Lower Structure Linking with a View WHERE Clause 180

15.8 Restructuring the Data 182

15.9 Conclusion 183

16Data Modeling Outer Join Generator

185

16.1 Product Overview 185

16.2 Operational Overview 186

16.3 Menu Overview 188

16.4 Adding a Structure Box 189

16.5 Specifying the Link Criteria 191

16.5 Specifying the Link Criteria 191

16.6 Specifying a Data Filter 192

16.7 Changing or Removing a Structure Box 193

16.8 Optimizing the Outer Join Statement and Data Structure 194

16.9 Saving, Retrieving, or Deleting a Stored Structure 195

16.10 Running the Outer Join Query 197

16.11 Loading the Database Data 198

Page xiv

16.12 Data Modeling Diagramming Symbols 199

16.13 Conclusion 200

17Summary

201

Appendix A: Database Views Used in This Book 203

Notes on the Company Database Views 204

Notes on the Parts-Suppliers Views 205

Glossary 207

Bibliography 229

About the Author 233

Index 235

Page xv

PREFACE

The many data modeling and related capabilities of the ANSI SQL join syntax and its outer joinoperation added in SQL-92 are one of SQL's biggest secrets today. Most of these capabilitiesare not generally known, if known at all. These capabilities are lying dormant, waiting to beutilized. Their full utilization can be extremely beneficial and useful to all SQL programmers,DBAs, database designers, product developers, and product users. While these capabilities arefree for the using, they can be tricky to use and are not documented in SQL reference books orSQL vendor's user manuals. The ANSI SQL join syntax is actually a flexible programminglanguage with powerful data modeling and structure processing capabilities. Theseprogramming capabilities are also not described in any SQL book or vendor's manual. Thisbook remedies this by thoroughly documenting the ANSI SQL join's data modeling languageand its inherent data modeling operation, and demonstrates its advanced capabilities for alldatabase professionals to utilize.

Using this book, SQL beginners through experts will be able to immediately utilize thisinformation in SQL systems that support the ANSI SQL outer join operation. The outer jointechnology presented can be safely applied because it is open and ANSI compatible, avoidinginterface problems now and in the future. Since the inherent and direct processing of complexdata structures is new to SQL, data structures, their semantics, and direct use with the ANSISQL outer join are also well covered in this book to fully round out the outer join coverage andits many uses.

The ANSI SQL join has many different join types and a very flexible syntax for specifying themthat can significantly control its operation and affect its join result. This makes outer joinsdifficult to use and prone to semantic errors.continue

Page xvi

Many combinations of join types produce illogical structures that can produce ambiguousresults. It is a complicated topic, and for these reasons there has not been a book or vendormanual on SQL that demonstrates or discusses anything more than very simple two-table outerjoins. The outer join operation is just too complex a topic to deal with in a limited way.

The real power of the outer join is achieved when these advanced capabilities are used inouter joins involving three or more tables. This book instructs the ANSI SQL user how toperform powerful multitable outer joins by following the rules and principles set forth in thisbook. The data modeling capability of the outer join establishes a powerful data modelingframework and context that is utilized in this book to make constructing and understanding theeffects and semantics of multitable outer joins very intuitive. This data modeling and structureprocessing ability can establish a default database standard or model for data modeling since itis supported completely by an ISO/ANSI standardized function.

Most SQL vendors now, or in the near future, will support the ANSI SQL outer join operationin their products. These currently include SYBASE with Enterprise Server, IBM with DB2,Computer Associates with CA-Ingres, HP with ALLBASE, Tandem with NonStop SQL, NCRwith Teradata SQL, and Microsoft with Access, SQL Server, and its popular ODBC standard.And since the ANSI SQL outer join is standardized, those SQL vendors who do not currentlysupport it will eventually.

The SQL examples in this book have been designed so that the intended meaning of the query

results are self-explanatory. This means there is usually no need to compare query output datain the examples against actual data in the database. There is a consistent set of familiar datastructures used throughout the book (see Appendix A). In addition, if the structure is importantto the example, it is shown again in the example. The query result columns are usually orderedby their structure so that they can be more easily interpreted based on the data structure. It isimportant to keep in mind that when comparing the results of queries, the column order of thevalues has no significance other than to emphasize the data structure that may be differentbetween the results being compared.

There are two types of SQL examples used in the examples: real examples and pseudoexamples. The real examples are valid SQL and are used to show specific examples, while thepseudo examples are not necessarily complete or totally valid SQL—they are used when it isimportant to easily convey a general idea or principle. Often, the pseudo examples use tablenames such as T1, T2, or A, B, C, and may also use these conventions instead of column namesto highlight that the importance is not the column name per se, but which table the column namebelongs to. A pseudo SQL example may have the form ofcontinue

Page xvii

FROM A LEFT JOIN B ON A=B where there may be no SELECT clause or fully qualifiedcolumn names in the ON clause or ON Cond when Cond is not necessary to the concept beingdiscussed.

This book is divided into four parts that are best read sequentially, though important points arerepeated or referenced in the text when their understanding is necessary for the topic beingcovered. Part I covers the basics of the ANSI SQL outer join operation. Part II investigates thepowerful data modeling and structure processing features that are inherent with the ANSI SQLouter join and are available for immediate use. Part III looks at new capabilities not previouslypossible in SQL that are now made possible by the outer join's data modeling capability. Itexamines the operation of several advanced features and applications that can be built aroundthese advanced capabilities by SQL vendors. Part IV examines miscellaneous and advancedouter join data modeling topics.

Part I covers the basics of the relational join operation with a concentrated look at the morepowerful and less familiar outer join operation. The inner join is the more common and simplerstandard join. Chapter 1 introduces the inner and outer join operations and explains their basicfunctions and operations, and their strong and weak points. Chapter 2 defines the ANSI SQLouter join and discusses its operation. Chapter 3 goes into the many different types and featuresof the ANSI SQL outer joins and their specific operations. Chapter 4 concentrates on onespecific optional feature of the outer join known as the natural join. This often overlooked andgreatly misunderstood feature makes each outer join type operate in a different way, which iswhy it has its own chapter.

Part II covers in detail the inherent data modeling and structure processing capabilities of theANSI SQL outer join operation. These are capabilities that relational database professionalscan utilize immediately. Chapter 5 supplies the background necessary in data modeling andstructure processing to understand the concepts presented in this book. Chapter 6 shows indetail how the ANSI SQL outer join operation can perform complex data modeling. Chapter 7introduces additional data modeling-related capabilities. Chapter 8 supplies additional useful

background information on the outer join's inherent data modeling abilities.

Part III documents advanced SQL capabilities made possible by the ANSI SQL outer join datamodeling capability, which SQL vendors can offer to their users. Chapter 9 introduces the datastructure extraction (DSE) technology that can be used to extract the data structure metainformation naturally embedded in ANSI SQL outer join specifications. Chapter 10 identifies anumber of advanced capabilities made possible by the data modeling capability of the ANSISQL outer join. Chapter 11 describes the many powerful semantic SQLcontinue

Page xviii

optimizations that are possible based on the data modeling information available from outerjoins. Chapter 12 demonstrates a hierarchical relational database processor prototype thatoperates by utilizing the data structure information extracted from outer join statements.Chapter 13 presents an object relational interface that is also based on the data structure metainformation extracted from outer join statements. Chapter 14 looks at SQL based nonrelationaluniversal data access frameworks and how outer join processing naturally fits in by using astructured data record interface as an example.

Part IV presents miscellaneous data modeling topics. Chapter 15 introduces a powerfulseamless extension to the data modeling capability of the outer join that naturally allowssubstructures to be linked into the hierarchical structure without requiring that the linkage bebased on their root table. And lastly, Chapter 16 presents the external design of a SQL outerjoin statement generator utility program that the SQL professional can use interactively to helpin the design and generation of powerful outer join statements that directly model and processcomplex data structures.

There is one appendix. Appendix A covers the diagrams and descriptions of the database datastructures used in the examples in this book.break

Page xix

INTRODUCTION

The outer join operation was introduced in the SQL-92 standard for ANSI SQL. Basically, itpreserves data in a join operation so no data is lost when joining tables. The older standardjoin, known as the inner join, will lose data in a join when a row from one table does not find amatch in the other table being joined. For example, joining a Department table with anEmployee table using a standard inner join will lose all departments that do not have anyemployees. The same could be true of inner joining employees with their dependents found inthe Dependent table. All employees who had no dependents would be dropped from the result.The outer join prevents this data loss and would preserve departments and employees in theseexamples.

To carry out this data preservation, the outer join has an important characteristic that the olderinner join did not have, this being that the order that the joins are performed in an outer join canaffect the result. This meant that the capability to control the join order had to be introduced

into the syntax of the ANSI SQL join operation. This further meant that if the order of the joinsaffects the result and the order can be controlled, the join criteria of these joins are able tohave their own join criteria specified at each join point to further utilize the extra join ordercontrol. So, this feature was also added to the ANSI SQL standard.

These added capabilities are significant to SQL. A cornerstone of SQL has always been thatthe order of joins does not matter, and SQL has always been a navigationless language, notrequiring instruction from the user on how to navigate the database. The ANSI SQL join syntaxand outer join capability changes all of that. These previous restrictions are now lifted for thefirst time! This makes the ANSI SQL join syntax a very powerful, self-contained datacontinue

Page xx

modeling language with fantastic capabilities that can be utilized by users directly out of thebox, and utilized by database product developers to add new features and capabilities tostandard ANSI SQL. But if the ANSI SQL join is a self-contained data modeling language, howis it used, and what can it do? That's what this book is about.

There are many books on data modeling on the market, and some are even specific to aparticular data modeling methodology. The big difference between a book on a data modelingmethodology and SQL's data modeling language is that ANSI SQL's data modeling capability isnot just another methodology per se. It is a complete data modeling language that can actuallycontrol SQL and its operation. This book is not proposing a SQL data modeling language, butdefines how the one that inherently exists in ANSI SQL operates so that it can be utilizedimmediately after ANSI SQL is installed. This means that when a complex hierarchical datastructure is modeled using the ANSI SQL join operation, the result reflects exactly thesemantics of the data structure modeled.break

Page 1

PART I—THE BASICS OF THE RELATIONAL JOIN OPERATION

Part I covers the basics of the relational join operation with a concentrated look at the morecomplex and less known outer join operation. The inner join is the more common and simplerstandard join. Chapter 1 introduces the inner and outer join operations and explains their basicfunctions and operations, and their strong and weak points. Chapter 2 defines the ANSI SQLouter join operation and discusses its main operation. Chapter 3 goes into the many differenttypes and features of the ANSI SQL outer join operation and their specific operations. Chapter4 concentrates on one specific optional feature of the join operation, the NATURAL option ofthe join. This feature makes each outer join type operate in a different way, which is why it hasits own chapter.break

Page 3

1—Relational Join Introduction

In relational databases, data is stored in two-dimensional tables. These tables are arranged inrows and columns of data where each row can be thought of as a record and the columns arethe data fields. For example, a given row would contain related data such as employee number,salary, and department number. Other rows in the table would contain these same types ofinformation (attributes) for other employees.

An application database view usually requires multiple tables, because standard relationaltables do not yet allow for variable repeating fields in a row. This is because standardrelational databases require first normal form data. Thus, repeating data is supported by usingadditional tables to hold repeating values in multiple rows. Second and third normal form datamodeling decisions can also account for related data being split across multiple tables, butthese decisions relate to good database design and are not a requirement.

In relational terms, rows are also known as tuples. Each table column contains the same type ofdata (attributes), such as salary or department number. Every row needs to be uniquelyidentified by a primary-key field such as employee number or social security number. Rowscan also contain non-unique key fields such as alternate and foreign keys, like a departmentnumber in the Employee table. These can be used to access a group of related rows, such as allemployees for a given department.

A primary-key field in one table can be a foreign-key field in another table. This is the case inthe familiar Department and Employee tables, where the department number in the Departmenttable is its primary key, and in the Employee table the department number is the foreign key. Ajoin operation is used to combine tables like the Department and Employee tables usingacontinue

Page 4

common key in both tables, such as the department number keys to match the rows that will becombined.

1.1—Standard Inner Join Review

The standard join operation is known as the inner join. It horizontally combines two or moretables into a single working table or view. The matching of the rows over the same domain iscontrolled by the WHERE clause join condition as specified in this join statement: SELECT ∗FROM Department, Employee WHERE DeptNo=EmpDeptNo.

An inner join is performed in principle by logically performing the Cartesian product(generating all combinations of rows) of the tables and then applying the WHERE joincondition, which specifies the join criteria such as DeptNo=EmpDeptNo. The WHERE joincondition will remove all combinations of rows that do not satisfy the join criteria, leavingonly those combined rows that link up properly (i.e., their keys match up); otherwise, in the

SELECT statement in the paragraph above, each employee would remain joined to eachdepartment instead of only the department to which the employee belongs.

One problematic characteristic or side effect of the inner join operation is that it will eliminateentire rows from the generated result table that fail any part of the join criteria conditions.Therefore, inner joining the Department table with the Employee table will always excludeboth departments that have no employees and employees that do not belong to a department.This side effect of losing data is magnified when more than two tables are inner joined. Forexample, when inner joining the Department, Employee, and Dependent tables, a departmentthat has employees but no dependents will exclude employees, which in turn will exclude thedepartment from the result. This side effect, if not known, can often go unnoticed, producingundesirable results. The inner join example in Figure 1.1 demonstrates the data loss conceptspresented here.

The example in Figure 1.1 demonstrates the inner joining of the Department table with theEmployee table, producing the join result table shown. The data in the Department andEmployee tables are also shown, demonstrating how department A's data and employee Y'sdata are excluded from the result because they have no matching row in the other table. Theouter join operation described in Section 1.3 solves this problem of missing data. Also noticein Figure 1.1 that the replicated data, ''DeptB 456," from the Department table was introducedinto the join result table because relational tables have a flat two-dimensional structure.break

Page 5

Figure 1.1Sample inner join of Department and Employee tables.

With the inner join, the order that the tables are specified for joining does not affect the result.If the order that the table names were specified in the inner join statement in Figure 1.1 werereversed, the result would remain the same. Because the order that the table joins areprocessed has no effect on the result, this allows internal optimizations to pick the mostefficient join order for execution.

It is also worth mentioning that the WHERE clause can specify filtering criteria as well as joincriteria, as in SELECT ∗ FROM Department, Employee WHERE DeptNo= EmpDeptNo ANDSalary >= 50,000. In this case, the result of the join operation also filters out result rowswhere the salary is less than 50,000.

1.2—Problems with Relational Join Processing

The inner join result in Figure 1.1 demonstrates three problems: lost data, replicated data, and

lack of data modeling. Lost data caused by unmatched rows (dangling tuples) is normal forrelational database operation. It keeps the underlying operational principles mathematicallysound. Unmatched rows present a problem in how to preserve them so that they aremathematically sound, operate consistently, and are unambiguous (which is discussed in thenext section).

Replicated data also becomes necessary with relational data stored in two-dimensional tables.In the join result in Figure 1.1, department B's data is replicated so that any row taken inisolation has all the data required. Unfortunately, this can easily and unknowingly throwsummaries off by introducing replicated values into the result.

Closely related to the replicated data problem is the lack of data modeling and data structureprocessing. This is demonstrated by the replicated data problem just discussed above. Datastructure processing would not introducecontinue

Page 6

replicated data values unless it is necessary to reflect the proper data structure (as will bedemonstrated in Chapter 12). But as we saw earlier, there is no way in the inner join syntax tospecify the data structure or to represent the data structure. When joining the Department tablewith the Employee table, there are two data structures possible, Department over Employee orEmployee over Department. Each has its own and distinct semantics, but neither can berepresented in the inner join result of these two tables as demonstrated in Figure 1.1.

1.3—Outer Join Review

Lost data? Outer join to the rescue! The outer join operation preserves data from unmatchedrows. This is done by replacing missing data with null values in the result table. When joiningtables, they are joined two at a time. This means there are three choices for how to preservedata as the tables are joined: preserving data for the left table, preserving data for the righttable, or preserving data for both tables. Correspondingly, these are known as LEFT joins,RIGHT joins, and FULL joins. LEFT and RIGHT joins are also known collectively asone-sided joins because they preserve data on only one side.

As the tables are joined two at a time, the data-preserving effect of the outer join in theworking set continues to influence the result as it progresses. This is because once a data valueis preserved or not preserved (replaced as a null) and placed into the working set, this value isthen accessed there when it is referenced. The major significance of this operationalcharacteristic is that the order that the tables are joined can affect the result of the joinoperation.

The outer join operation can be simulated using additional SELECT statements with UNIONoperations to regenerate the missing data and introduce it back into the result table. This is veryinefficient, as is evident in Figure 1.2. While this example looks complex, it is simulating onlya single one-sided outer join. A FULL join would involve twice the work, as in Figure 1.3.And when more than two tables are involved, the additional effort per table growsgeometrically more complicated to recalculate the data to be added back into the result sinceall the previous operations need to be repeated for each outer joined table.

Outer joins can also be more difficult to optimize by the SQL system than inner joins. This isbecause with inner joins, the SQL system can freely change the table join order to reduce thenumber of table accesses by using the less populated tables to drive the first join operations.With outer joins, this is not as easy since changing the join order can affect the results.Fortunately,continue

Page 7

Figure 1.2Simulated one-sided outer join operation.

Figure 1.3Simulated FULL outer join operation.

there are some interesting and powerful new optimizations that can be applied to outer joins.These are discussed in detail in Chapter 11.

1.4—Problems with Previous Outer Join Syntax

Earlier implementations of the outer join operation before the SQL-92 standard were notstandardized. Unfortunately, many of these implementations have remained in use even today. Acommon implementation used by these early outer join operations was to place a specialsymbol like an asterisk or plus sign by the table name reference in the FROM clause or columnname in the WHERE clause. This special symbol would indicate that the associated table (orthe other table in some implementations) was to be augmented with an all-null value row thatwould match a join criterion if all other rows in the table didn't match the row in the othertable. This means that the unmatched row incontinue

Page 8

the other table is preserved (which may seem confusing). The example in Figure 1.4demonstrates a case where the Department table is preserved and the Employee table is not.

The example in Figure 1.4 demonstrates a one-sided join. This is because the Department tablerepresented in the WHERE clause by the DeptNo column preserves data because the matchingEmpDeptNo join column is flagged with an asterisk. This, as described below, causes it to beaugmented with an all-null value row that will match with any nonmatching row in DeptNo.FULL outer joins can also be specified by each join comparison column having its ownasterisk, as in: EmpDeptNo ∗=∗DeptNo, which is demonstrated in Figure 1.5.

Notice that the result table in Figures 1.4 and 1.5 below have department A's data preservedeven though there were no matching employees for it, and in Figure 1.5 employee Y was alsopreserved even though there was no matching department for it. This is the reason for the twonull values representing the missing employee and department data in the join result. While thisSQLcontinue

Figure 1.4Early nonstandard one-sided outer join implementation example.

Figure 1.5Early nonstandard FULL outer join implementation example.

Page 9

Figure 1.6Ambiguous early nonstandard outer join implementation example.

example operates fine, there is a problem when more than two tables are being joined. Theproblem, as mentioned earlier, is that the join table order can affect the result when outer joins

are involved, and these early outer join operations do not have a method of specifying orcontrolling the join order. This makes the result unpredictable when more than two tables arebeing joined. For example, the join statement in Figure 1.6 is ambiguous.

How is the SELECT statement in Figure 1.6 processed? Is the Department table outer joinedwith the Employee table first, or is the Employee table inner joined with the Dependent tablefirst? The inner join is very destructive —if performed after the outer join, it can negate thedata-preserving effect of the outer join. So, the join order can be very significant to the result,and there is no provision in this early nonstandard SQL syntax to control the join order.

1.5—Conclusion

Inner joins lose data when there is no matching data. Outer joins preserve unmatched data bypadding the missing data columns with null values in the result. Its operation may be morecostly than the inner join because of its more complex requirements. The first outer joins werenot standardized, and operated ambiguously when three or more tables were joined. The ANSISQL outer join is standardized, and its syntax is nonambiguous, as will be shown in the nextchapter.break

Page 11

2—The ANSI SQL Join Operation

The SQL-92 version of the ANSI SQL standard officially introduced an outer join operation.Much study went into the design of this outer join operation to correct the problems that hadbeen identified from previous nonstandardized versions, which were covered in Chapter 1.The inner join is still the standard and default join operation. The syntax of the outer join hasbeen seamlessly grafted onto the FROM clause, leaving the inner join operation downwardlycompatible with existing SQL code.

2.1—ANSI SQL Join Syntax

The ANSI SQL outer join syntactical definition is shown in Figure 2.1. This definition is asimplified form of the FROM clause syntax that conveys the main features, format, andcapabilities involving the outer join operation. The ANSI SQL join syntax fully supplies andexceeds the capabilities necessary to support the outer join capability. Most importantly, itsupplies table join order control and join criteria for each table joined.

The outer join syntax in Figure 2.1 is fairly complex for standard SQL code. Needless to say, itcan be very difficult to use. The syntax definition is recursive, revolving around theJoined-Table specification. This syntax allows for the specification of multiple tables or theirworking sets to be outer joined two at a time in a controlled order. The syntax design alsoinfluences the operation of the outer join by introducing what this book refers to as "nesting" tointroduce additional tables and add control for table join order. This nesting can take place as

left- and right-sided nesting of ANSI SQL join operationscontinue

Page 12

Figure 2.1Simplified ANSI SQL outer join syntax definition.

such as the LEFT, RIGHT, FULL, and INNER joins. Left-sided nesting occurs on the left sideof outer join operations, and right-sided nesting occurs on the right side of outer join operationswhere tables are brought in by the recursive syntax. This is reflected in the outer join definitionin Figure 2.1. For completeness sake, the syntactical notations used in this outer join definitionare specified in Figure 2.2.

To simplify the ANSI SQL outer join definition in Figure 2.1, three versions of the joined tableconstruct were specified. The first is the most standard and common syntax. In the secondversion, a NATURAL option adds a NATURAL keyword that eliminates the join specification.The third version is a CROSS join, which also does not use a join specification. The joinspecification with its ON or USING clause also controls nesting, which controls table joinorder. Since the CROSS join and natural joins using the NATURAL join option do not use anON or a USING clause to control nesting, parentheses can be used to control nesting andtherefore table join order. Normally the join table order cannot be changed by the use ofparentheses because the join order is determined by the ON and USING clauses. This isdiscussed further in Section 2.2.

The FROM clause of the outer join definition, FROMTable-Reference[,Table-Reference] . . . , shown in Figure 2.1 allows multiple table referencesto be specified. At this top level, multiple table references are relationallycontinue

Page 13

Figure 2.2Outer join syntactical notations used in Figure 2.1.

joined using standard inner join logic, making this definition compatible with the standard innerjoin.

The ANSI SQL outer join operation comes into play when a table reference contains a joinedtable specification. Coding more than one table reference at this top level when outer joinoperations are performed at the lower level is not desirable. This is because the data-losingproperties of the inner join operation occurring at the top level would negate thedata-preserving effects of the outer join at the lower level. For this reason, this particularsyntax use will not be explored further in this book.

The order the tables are joined using the new outer join syntax is usually controlled by thenesting (recursive) syntax, which is not always straightforward. This is because it follows theorder of join processing that is not always apparent with right-sided nesting (nesting occurringwith the right table argument). Left-sided nesting is naturally processed left to right, butright-sided nesting in combination with left-to-right processing is not a straightforwardprocess. It requires a stacking procedure to internally assist execution. The reason for this willbecome clear in the next section.

The join specification in Figure 2.1 can consist of an ON clause with a join condition, or aUSING clause specifying one or more column names to be used for joining. Each column namethat is specified with a USING clause must exist in both table inputs, and are used internally toform an equal join (equijoin). The ON and USING clauses specify the join criteria for theirassociated join operations. The USING clause turns the join operation into a natural join just asif the NATURAL option was specified. The NATURAL option and USING clause will bedescribed further in Chapter 4.

Because tables and working sets are joined two at a time in a specific order, a single WHEREclause specifying the join criteria that is logically applied after all tables are joined (seeChapter 1) does not work well with outercontinue

Page 14

joins whose tables need to be joined in a specific order. What is needed and supplied by theANSI SQL outer join is a clause like the ON or USING clause that specifies the join criteria ateach join point. This also has the effect of separating join criteria specified on these clausesfrom selection data-filtering criteria specified on the WHERE clause. The column names thatare referenced on an ON or USING clause must be found in the tables or working sets

processed by their associated join operation. This is known as the columns being in the "scopeof control."

Data-filtering criteria can also be specified on the ON clause. This will achieve a finer levelof filtering control than is capable on the WHERE clause. This filtering will affect only partialareas of the resulting rows. This is covered further in Chapter 7.

If no join type is specified with a join operation, an inner join is assumed. The OUTERkeyword is an optional informational keyword. The examples in this document will exclude theOUTER keyword in order to save space in the SQL examples. The JOIN keyword, whiledefined as required in the ANSI SQL specification, and therefore the join syntax definition inFigure 2.1, is not necessary in the join syntax to enable it to be processed correctly. For thisreason, many SQL implementations treat its use as optional. Taking advantage of this fact, someof the examples in this book may also exclude the JOIN keyword when example space isscarce.

2.2—ANSI SQL Join Operation

The following outer join specification in Figure 2.3 joins the Department table with theEmployee table while preserving data in the Department table. The working set produced fromthis operation is then LEFT joined with the Dependent table, preserving data in the workingset. As you can see, this produces very powerful and controlled semantics. This LEFT outerjoin specification is an example of left-sided nesting that introduces tables left to right verynaturally. Note that the first ON clause is not capable of accessing columns from the Dependenttable since it had not been accessed yet and therefore is not in its scope of control. The secondON clause could access columns fromcontinue

Figure 2.3Example of LEFT outer join with left-sided nesting.

Page 15

the Department table because it had been accessed in the generation of the working set used asthe left input of its associated LEFT join operation, and is therefore in its scope of control.

The outer join specification shown in Figure 2.4 is an example of right-sided nesting.Parentheses are used in this example to emphasize join execution order, but have no effectbecause join order is controlled by the placement of ON clauses when they are present. Noticethat the ON clause for the first LEFT join is actually delayed until after the second LEFT join iscompletely specified. This causes the latter join to be performed first, returning the result tothe previous LEFT join as its right-sided input. This nesting can be carried to any depth. Notealso that the first specified ON clause associated with the second LEFT join operation cannotreference columns in the Department table, since it has not been previously joined with eithertable input associated with the second join operation and is therefore not in its scope ofcontrol. This is because right-sided nesting outer joins like this one generate multiple workingsets concurrently, each with a different scope of control associated with it. This is described

further in Chapter 7.

One question you might be asking yourself is why anyone would construct such a nonintuitiveand complex SQL statement as that specified in Figure 2.4 when it is fairly easy to avoidright-sided nesting by using left-sided nesting as in Figure 2.3. The answer is that sometimesthis added flexibility is necessary to achieve the desired result. Right-sided nesting is alsonecessary to support embedded SQL views when they are expanded. For example, if thesecond SQL line in Figure 2.4 below were replaced with a view reference representing theline, then the expanded statement would cause right-sided nesting. Expanding the viewintroduces right-sided nesting, and the outer join's syntax does support this for a seamlessoperation. This is demonstrated in Figure 2.5. This capability and the additional featuresenabled by it are described further in Chapter 7.

As mentioned earlier in this chapter, joins with ON and USING clauses can't have their joinorder changed by the use of parentheses. Their join order is solely determined by the placementof ON or USING join criteria clauses. As proof of this, Figure 2.6 attempts to change the joinorder using parentheses tocontinue

Figure 2.4Example of LEFT outer join with right-sided nesting.

Page 16

Figure 2.5Embedded views cause right-sided nesting when expanded.

Figure 2.6Invalid attempt to use parentheses to control join order.

override the join order so that the Department and Employee tables are joined first. But thiscauses a syntax error since the ON clause for this join operation can't be isolated inside therange of these parentheses.

This does not mean that parentheses can never be used with outer joins to control the joinorder. Parentheses can control the join order with join types like the CROSS join and outerjoins that specify the NATURAL option, because they have no join criteria clause to get in theway. This means that parentheses are necessary to change the join order associated with theCROSS and natural joins to cause a change in the result. Take, for example, the SQL statementin Figure 2.7. Without the parentheses, this join statement first CROSS joins Table 1 andTable2 and then joins the working set with Table3 using a LEFT join. This join order ischanged by using parentheses, as is also shown in Figure 2.7. Using the parentheses shown, theLEFT join is performed first, left joining Table2 to Table3 before the CROSS join isperformed. The CROSS join then uses the working set generated from the LEFT join as its rightargument. This will usually produce a different result than without parentheses because of themixture of different join types.break

Page 17

Figure 2.7Valid use of parentheses to change default join order.

2.3—ANSI SQL Join Does Not Follow the Cartesian Product Model

It is interesting to note that the ANSI SQL outer join syntax does not follow the Cartesianproduct model for performing joins as documented in Chapter 1. This is particularly importantfor SQL vendors to realize because it frees up many SQL syntax restrictions, allowing moreoptimizations (see Chapter 11) and the elimination of much unnecessary replicated data (alsodiscussed in Chapter 11).

The Cartesian product model is used as the processing model for performing joins. Basically,it produces the Cartesian product of all the tables being joined and then applies the WHERErestriction clause. The outer join operation has introduced the notion of an ''extended"Cartesian product to account for the rows that are only partially filled because of the outer joindata preserving. These partially filled rows do not appear in a strict Cartesian product. Theextended Cartesian product operates by augmenting the tables taking part in the outer joinoperation with a null row that will match the missing table row when it has no match. Thisextended result is shown in Figure 2.8.

While the extended Cartesian product with its null augmented tables does allow for thepartially filled rows produced by the outer join operation, it still cannot consistently producethe outer join result by applying the selection criteria after the extended Cartesian product ofall the involved tables is formed. This is demonstrated in Figure 2.9, which relies on multipleON clauses that operate at different times during the join operation to produce a result notderivable from the extended Cartesian product of all the involved tables. The first SQLstatement in Figure 2.9 uses two filtering qualifications—Salary>50continue

Figure 2.8Outer join result does not produce a strict Cartesian product

subset.

Page 18

Figure 2.9Use of ON clause that is not possible in Cartesian product model.

and Salary>100—at different times during the join process. This effect cannot be duplicatedwith a single selection clause that is applied logically after all the extended join operationshave been performed as in the standard Cartesian product model. This means that the ON joinclause must be logically applied at each join point. This additional flexibility in joinprocessing is an extreme departure from standard relational processing, and opens the door tomany far-reaching new possibilities.

2.4—Determining ANSI SQL Join Associativity and Commutativity

The associativity and commutativity properties are difficult to apply to ANSI SQL outer joinoperations because the outer join statement is not always a binary (dyadic) operation. Theseterms were meant to apply to binary operations such as addition, subtraction, multiplication,and division. The outer join operation is not always a binary operation since in addition toaccepting a left and right table input, it can require a third argument: the join criteria via theON or USING clause. This presents a problem for defining associativity and commutativity forthe outer join and reduces the ability to freely combine and utilize these properties. Normally, astatement that has both associative and commutative properties can be freely reordered in anyfashion. The ON and USING clauses of the outer join will usually prevent this flexibility, aswill be shown below. To prove associativity and commutativity—or the lack of—exampleswill be used in the following two chapters to disprove these properties, since disproving theseproperties is easier than proving them.break

Page 19

2.5—What Outer Join Commutativity Is

With the commutative property, we can say this term applies to the ability to reverse the leftand right table join arguments of a join operation without affecting the result. This is the onlychange allowed in this definition—the matching outer join ON clause must remain unmodified.In this case, the INNER, CROSS, UNION, and FULL joins are commutative in operation.Reversing their table input arguments will not change the data result. As can be expected, theone-sided (LEFT and RIGHT) joins are not commutative since reversing their table argumentslogically changes a LEFT join into a RIGHT join and vice versa, making their semantics andresults very different.

The lack of commutativity shown by the one-sided join can appear to change to commutativewhen two or more one-sided joins are involved. This can be seen in Figure 2.10, whichreverses the table arguments in the second join operation in the SQL examples withoutchanging the result. This example does not disprove the one-sided commutativity principle justdefined. This is because the outer join's ON clauses in Figure 2.10 were also flipped around,thereby changing the semantics of the outer join operation, which in this case compensated forthe tables being reversed.

2.6—What Outer Join Associativity Is

The associative property is also hard to apply to the ANSI SQL outer join since it deals withthe ability to change the default table join processing precedence without affecting the result. Ina binary outer join operation, this can be tested by using parentheses to change the joinexecution order. The characteristic of outer joins that requires a join criteria clause is that theirjoin order cannot becontinue

Figure 2.10Multiple one-sided joins may appear commutative.

Page 20

changed by using parentheses. To change the join order of these joins, the outer join statementmust be rewritten because the position of the ON or USING join clause can affect the joinorder via right-sided nesting, as covered in Section 2.2 of this chapter. This means thedefinition of associativity for the ANSI SQL outer join includes respecifying the outer join toeffect a change in the table join order precedence. This includes moving the ON clause but notthe modification of it, which would change the semantics. Unfortunately, these additional

operations can reduce the significance of associativity used with the ANSI SQL outer join.

Nonassociativity is proven if any outer join statement containing all the same join type can beregrouped as a valid SQL statement that changes the join precedence to effect a change in theresult. But changing the join order to change the join order precedence is not always possiblebecause of join criteria conditions and their scope of control, as shown in Figure 2.11. Notbeing able to change the join order should not be a reason to consider joins with ON clauses asnonassociative. Also, note that it would not be possible to test commutativity in the valid SQLstatement in Figure 2.11 by reversing the B and C table arguments for the second LEFT joinoperation because it would also cause a scope of control error. This example and the otherspresented in this section have shown that associativity and commutativity of the outer join is acomplex issue, and for this reason is covered in detail in Chapters 3 and 4.

2.7—Hierarchictivity in Addition to Associativity and Commutativity

As shown above, it's difficult to always apply the associative and commutative properties tothe ANSI SQL outer join operation's syntax and semantics. In future chapters, you will see thatthe outer join can be used to build hierarchical data structures. When building these datastructures, the outer join follows hierarchical principles and properties. These hierarchicalproperties can be used in addition to associative and commutative properties. This means thatwhile hierarchical data structures do not necessarily obey associative and commutativeproperties, they will obey hierarchical properties. In this book, this property has been termed"hierarchictivity" for lack of a better word.

This hierarchictivity property operates on a class of clearly defined outer joins that modelhierarchical data structures (discussed in Chapter 3) that can be reordered without changing theresult. The SQL example in Figure 2.12 demonstrates this hierarchictivity property. Thisexample falls outside the range of associativity and commutativity since it actually reorders thejoin rather than just changing its join precedence, and reverses the table arguments ofone-sidedcontinue

Page 21

Figure 2.11It is not always possible to rewrite a query to

change the join order.

Figure 2.12Example of a hierarchical property.

joins by moving the ON clause. Normally, the ability to reorder the joins requires bothassociative and commutative properties, and one-sided outer joins are not commutative asstated earlier. This example builds the same multi-leg hierarchical data structure in both SQLstatements by reversing the construction of its legs. This does not change the semantics forhierarchical structures. This is one of many hierarchical properties that will be covered inChapter 5. This example demonstrates that the hierarchictivity property can be useful inaddition to associativity and commutativity when using outer joins.

2.8—Conclusion

The ANSI SQL outer join preserves data and corrects problems with earlier nonstandard outerjoins. The ANSI SQL join syntax also has a separate ON or USING clause for each join typethat requires them. These ON and USING clauses specify the join condition, and each use hasits own scope of control. The ANSI SQL join syntax supports both the inner join and manyother typescontinue

Page 22

of join operations (LEFT, RIGHT, FULL, CROSS, UNION), which can be combined in anyorder. Unfortunately, sometimes parentheses are necessary to control table join order—at othertimes parentheses can't be used. When parentheses can't be used, ON or USING clausesindirectly control table join order.

A new operational property, hierarchictivity, was introduced to apply to a class of outer joinsthat covers hierarchical structures. Other important topics covered in this chapter were theANSI SQL join's right-sided nesting, its fine level of data-filtering capability, and the fact thatthe ANSI SQL outer join does not follow the Cartesian product model for generating its results.These topics will be covered and expanded on later.break

Page 23

3—ANSI SQL Join Types and Their Operation

There are two basic types of outer join operations, one-sided joins and FULL joins. One-sidedANSI SQL joins are either RIGHT or LEFT joins, which will preserve data from unmatched

rows on the side that their name signifies, while a FULL join preserves data on both sides. Thediscussion of these joins in this chapter does not include the influence of the optionalNATURAL option, which is discussed in Chapter 4. This option has a significant effect on theouter join's operation. In addition to one-sided and FULL outer joins, the ANSI SQL standardsupports other join types, including a CROSS join, UNION join, and INNER join. All of thejoin types mentioned here can be intermixed in a single join statement.

3.1—FULL Outer Join

FULL outer joins preserve data on both sides of the join operation, and for this reason are alsoknown as symmetric outer joins. With both sides of the join being preserved, no data is lostbecause of unmatched rows. This implies that both tables carry equal weight. Because of this,FULL joins are usually used to join two or more tables based on a common primary key in alltables—for example, combining two customer information lists where many of the samecustomers are in each list and each list contains different information. Since both tables arepreserved in a FULL join, it is commutative in operation. Thiscontinue

Page 24

means the placement of its two table operands does not affect the result, as shown in Figure3.1.

The ANSI SQL FULL outer join also operates associatively, as defined in Chapter 2. Since theFULL outer join is associative and commutative, the table join order, when more than twotables are being joined, can be changed without affecting the result. There are two reasons forthis. First, the FULL join loses no data regardless of the table join order. Secondly, the ANSISQL FULL outer join has separate join clauses for each join, which controls and limits thepossible valid FULL joins that are possible. This was not true of the older, nonstandardizedouter joins that were less associative in nature. The examples in Figure 3.2 demonstrate FULLouter joins where the table join order is changed without changing the result. Each tablecontains a row that will not be matched. The first join example joins the Department table tothe Employee table first, while the second join example uses right-sided nesting (discussed inChapter 2) to join the Employee table to the Dependent table before joining the Departmenttable.

There is one situation where FULL outer joins may appear to be nonassociative, but thissituation does fit the definition of associativity and nonassociativity as described in Chapter 2.Many SQL books use this situation to prove that the outer join is nonassociative. This situationoccurs when three or more tables are joined across a common domain (key value). This allowsthe opportunity to have more valid join combinations. In the SELECT statements in Figure 3.2,there are only two possible join combinations. If this join was joined over one commondomain, there would be three possible combinations—Department and Dependent could alsobe joined directly. This is demonstrated in Figure 3.3, which joins all three tables overDeptNo. The third joincontinue

Figure 3.1The FULL outer join demonstrating its commutative behavior.

Page 25

Figure 3.2The FULL outer join demonstrating its associative behavior.

statement in Figure 3.3 may produce different results than the two SQL statements above itsince it has a different join condition than they do, this being DeptNo=DpndDeptNo. Eventhough DeptNo=EmpDeptNo and EmpDeptNo=DpndDeptNo, which intuitively meansDeptNo=DpndDeptNo, this transitive logic does not hold up for the ANSI SQL join with itsmultiple ON clauses that are each processed separately.

The FULL outer join examples in Figure 3.3 do not lose any data. This means all the resultswill contain the same data, but the way their rows are combined may be different because thethird example in Figure 3.3 is referencing different combinations of field locations, which canchange the result in this situation. This is not a case of simply rewriting the outer join statement.

In this case, a different join condition referring to a different table was used, which changes thesemantics and the results. This is demonstrated in their results, also shown in Figure 3.3.

With FULL joins involving more than two tables joined across a common domain, you maynotice, as in Figure 3.3, that the results may contain rows that could have been combined moreefficiently to reduce the number of rows generated. For example, the first example results inFigure 3.3 where the rows had null values added by the join process could be compressedintocontinue

Page 26

Figure 3.3Misleading attempt to prove FULL join is nonassociative.

two rows without losing any data, as in the second set of results in Figure 3.3. The fact that thesecond set of results had a more compressed result was determined by the data and not the SQLstatements alone. In this same situation, it is always possible to generate the most compressedresult by using the NATURAL option of the FULL outer join, which is described in Chapter 4.

3.2—One-Sided Outer Join

One-sided joins are either LEFT joins or RIGHT joins. They are called one-sided because theypreserve data on only one side—either the left side or the right side as their name indicates.The LEFT and RIGHT joins are actually different forms of the same operation, as shown inFigure 3.4. The LEFT join is the more natural one to use because it preserves data on the leftside and processing occurs from the left to right, using the more natural left-sided nesting. Thisallows for a top-down specification to define a top-down execution, allowing for an intuitivedefinition and operation. The less intuitive RIGHT joincontinue

Page 27

Figure 3.4LEFT and RIGHT joins are different forms of the same basic

operation.

may be useful for complex outer joins, but can usually be avoided by using the LEFT outer join.

Since one-sided outer joins only preserve data on one side, they are non-commutative inoperation. This means that the location of the two table input arguments makes a difference inthe results, as shown in Figure 3.5. You can see that the results of the two LEFT joins havedistinctively different semantics.break

Figure 3.5One-sided outer join is noncommutative.

Page 28

Since one-sided outer joins only preserve data on one of the two sides— the dominantside—their result is hierarchical in nature. For example, Department LEFT JOIN EmployeeON DeptNo=EmpDeptNo produces a result where Department table values can exist without a

matching Employee table value, but Employee table values can't exist without a matchingDepartment table value. This means that Department is hierarchically over Employee. Whenjoining more than two tables, the effect can be extended as shown in Figure 3.6. In this SQLexample, Department table values can exist without a matching Employee or Dependent tablevalue. Employee table values can exist without a matching Dependent, but require a matchingDepartment, and so on. This means that the Department value is hierarchically over Employeeand Employee is hierarchically over Dependent. One-sided joins can also modelnonhierarchical data structures, which will be covered in Chapter 6. Join table order and itseffect on one-sided outer join operations involving three or more tables is a complex issue thatwill also be covered in further detail in Chapter 6, having to do with data modeling with theouter join.

Being hierarchical in nature, one-sided outer joins can build hierarchical structures top-down,as shown in Figure 3.6, or by changing the join order to affect building the hierarchicalstructure bottom-up, as shown in Figure 3.7. Because the one-sided outer join is hierarchical innature, reordering the join from top-down to bottom-up execution does not change the result. Ifthis is true, it would prove that the one-sided join is associative in operation—at leastcontinue

Figure 3.6One-sided outer joins are hierarchical in nature.

Page 29

Figure 3.7One-sided outer join can also build structures bottom-up.

when defining hierarchical structures. The following examples will demonstrate that this is so.

The RIGHT outer join also builds hierarchical data structures, which is shown in Figure 3.8.

The RIGHT outer join naturally builds the hierarchical data structure bottom-up usingleft-sided nesting. As tables are added from the right, they take the top position since they arebeing preserved.

The one-sided outer join examples above demonstrate building a one-leg hierarchical datastructure. The one-sided outer join can also build multileg data structures. The SQL examplesin Figure 3.9 demonstrate a one-sided outer join operation building a multileg hierarchicalstructure. These examples use the data and data relationships that the previous one-sided outerjoin examples did, but produce different results. In these examples, the Employee table isdirectly over the Department and Dependent tables. Note that the legs of the structure can beadded in any order. This characteristic of hierarchical structures will be discussed further inChapter 5.

Up until the multileg hierarchical example in Figure 3.9, the single-leg hierarchical structuresshown in Figures 3.6–3.8 behaved associatively as defined in Chapter 2. The multileg structurein Figure 3.9 demonstrates that multiple legs of structures can be joined in any order withoutchanging the result, but the rules for associativity and/or commutativity, as specified in Chapter2, cannot be applied here to explain this behavior. This is because one-sided joins are notcommutative, yet in this example changing the tablescontinue

Figure 3.8RIGHT outer join also builds hierarchical structures.

Page 30

Figure 3.9Multileg hierarchical data structure example.

around in the join operations did not change the results. The principle of hierarchictivity ascoined and defined in Chapter 2 can be applied to multileg hierarchical structures like this oneas well as the single-leg hierarchical structures shown in Figures 3.6–3.8.

The principles of hierarchictivity intuitively make sense, since one-sided joins are hierarchicalin nature and hierarchical structures can be built top-down, bottom-up, left to right, right to left,or in any combination of these methods. These one-sided outer join operations can build verycomplex and powerful hierarchical data structures. Chapter 5 supplies a review onhierarchical data structures, and Chapter 6 describes in detail how to model these datastructures using one-sided outer joins.

One-sided joins can also model complex structures that are not hierarchical structures. Whenthese structures are used in applications, it may be difficult to predict their operation becausethey can lack unambiguous semantics. It is useful to see how this nonhierarchical modeling canoccur through one-sided joins. This awareness can prevent the accidental use ofnonhierarchical data structures. Figure 3.10 demonstrates a nonhierarchical structure beingmodeled. As is shown, this structure can be modeled in more than one way. While this structureresembles a network structure, it doesn't actually operate like onecontinue

Page 31

Figure 3.10Nonhierarchical one-sided join example.

because the legs relate to each other hierarchically. In this structure, the Department table ishierarchically above the Dependent table. If an Employee row doesn't have a link to aDepartment row, then the unmatched Employee rows and their parent Dependent rows areexcluded from the result. Other nonhierarchical structures can be created from complex ONclauses consisting of references to more than two tables. More information on thesenonhierarchical structures can be found in Chapter 6.

Following the rules for assessing associativity specified in Chapter 2, the one-sided outer joindoes not operate nonassociatively, making its operation under our definition associative. Thisdoes not include intermixing LEFT and RIGHT joins, which may perform nonassociatively.The modeled nonhierarchical structure in Figure 3.10 will also produce a different result if theorder its legs are joined in is reversed. In this structure, the order of the legs has significance,but the table reordering required to accomplish this is outside the scope of associativity, whichonly includes regrouping.

3.3—INNER Join

The INNER join's older SQL-89 format is still valid in the newer SQL-92 ANSI SQL format.This newer INNER join format can be explicitly specified or specified by default if no jointype is specified. This is shown in Figure 3.11. The INNER join does not preserve data oneither side of the join operation. This enables ordering a series of INNER joins in any fashioninvolved without affecting the result. This means the INNER join operation is bothcommutative and associative.break

Page 32

Figure 3.11Example of INNER join formats.

3.4—CROSS Join

The CROSS join is a basic operation. It is the same as an inner join with no join criteria, sothat all combinations of the input table arguments are generated. This is the Cartesian product,which is not usually a very useful end product. The CROSS join is commutative andassociative in operation, so the join order does not affect the result. The inner join can be usedto simulate the CROSS join, as is shown in Figure 3.12, by specifying it so that the join criteriais always satisfied.

3.5—UNION Join

The UNION join, also known as the outer union, is a new UNION operation that can bespecified with the ANSI SQL join syntax. Like the CROSS join, it does not have anaccompanying ON or USING clause. This operation is different than standard UNIONoperations in that the two tables being UNIONed can have different column formats so that theycannot be joined directly under each other. The UNION join is performed by offsetting therows of one table to the right with nulls that match the other table's format and reversing thisprocedure for the other table, performing this offsetting of rows on the left sidecontinue

Figure 3.12Example of the CROSS join operation.

Page 33

with nulls. Then the two tables can be UNIONed one on top of the other as shown in Figure3.13. This outer UNION effect can also be performed by a FULL join by specifying the joincriteria to never match, as shown below in Figure 3.13.

3.6—Intermixing Join Types

Intermixing of different join types in an ANSI SQL join specification is possible and makes thespecification nonassociative, as you would suspect. There are two concerns when intermixingjoin types. First, care must be used when mixing join types that include join conditions withthose that do not have join conditions. This complicates determining the join order for the user.This was discussed in Chapter 2. Second, care must be used when intermixing different jointypes because they have different levels of data preservation abilities and attributes that canconflict with each other, making their operation destructive and the result illogical. This isbecause some joins will remove data that was preserved by previous data-preserving joins, asshown in Figure 3.14. In these examples, a line is drawn through the rows that are created fromthe first join and then removed by the second join.

In both SQL examples in Figure 3.14, data preserved from the Department table when there isno matching row in the Employee table can still be lost if there is no matching row in theDependent table. This is because in the first SQL example the inner join loses data from allsides, and in the second SQL example the RIGHT join loses data introduced from the left,which had been preserved from the preceding LEFT join. This is probably notdesirablecontinue

Figure 3.13Example of a UNION join.

Page 34

Figure 3.14The intermixing of different join types can be destructive.

since the Department data was preserved for some purpose. Chapter 7 documents a powerfulcoding technique to prevent this destructive behavior when nondata-preserving (destructive)joins or intermixing join types must be used.

3.7—Conclusion

This chapter has looked at all of the different ANSI SQL join types: the FULL, RIGHT, LEFT,CROSS, UNION, and INNER joins. Except for the INNER join, all of these joins also preserverows when there are no matching rows.

The two types of outer joins, FULL and one-sided, while logically similar, behave verydifferently when three or more tables are being joined together. One-sided joins operatehierarchically, while FULL joins do not since they are symmetrical in operation.break

Page 35

Because the ON clause plays a major role with the outer join and greatly limits its ability to befreely regrouped, the FULL and one-sided joins behave associatively. This can change whenthe NATURAL option is used. The NATURAL option is documented in Chapter 4. Intermixingjoin types can also make FULL and one-sided joins operate nonassociatively.

Commutativity and associativity do not account for all the valid cases where the outer joinspecification can be rearranged and still produce the same result. To help account for these

additional cases, the term hierarchictivity was introduced to account for the principles ofhierarchical structures, which can also be applied to the reordering of one-sided outer joinstatements.break

Page 37

4—Natural Joins

Natural joins are INNER, FULL, and one-sided joins where the common named columns usedin the join criteria are coalesced (turned into singlecolumn values) in the result. For example,when inner joining the Department and Employee tables over the common key value of thedepartment number, DeptNo, it is usually convenient to have only one occurrence of the joinkey value in the result instead of two (or more) copies of the same key value. This assumesequal join (equijoin) conditions were used, and natural joins always use equal join conditions.Natural joins take on added significance with outer joins because of their data-preservingbehavior. This introduces a situation where one side or the other side of the join condition'skey values may be missing (null) from the result, making the key location unpredictable. In thiscase, the coalesced key values allow a single key location to be used for each row in theresulting table so it can be referenced easily and consistently. Depending on the situation,coalescing of the join columns and natural join processing can increase or decrease theassociativity of outer joins across three or more tables that are under a common domain. Thiscan significantly change the operation of the outer join operation, which is why it is beingexamined separately in this chapter.

4.1—Explicit and Implicit Natural Joins

In ANSI SQL, natural joins can be specified explicitly or implicitly. The explicit and implicitNATURAL options of the ANSI SQL syntax work in conjunction with the LEFT, RIGHT,FULL, and INNER join operations tocontinue

Page 38

coalesce the common named join column keys into single key values. As indicated in the outerjoin syntax in Figure 2.1, when the NATURAL keyword option is specified, the ON andUSING clauses are not specified. This is because the join condition is automatically taken asthe equal join between columns having the same name in the tables that are in the scope ofcontrol of the outer join operation being performed.

An implicit natural join does not specify the NATURAL keyword; the NATURAL option isindicated by coding the USING clause instead of an ON clause to indicate which columns areto be equijoined and coalesced. This is why this is also called a column name join. It assumesthat the specified column names occur in both table inputs or their scope of control. This givesmore control than the explicit natural join option by externally controlling the specification ofwhich common named columns take part in the join condition. Just as in the explicit natural

join, the column names that take part in the join condition are coalesced in the result. Theexample in Figure 4.1 demonstrates the explicit and implicit natural joins and how the columnresults are affected by natural joins. In this example, the explicit and implicit natural joinsproduce identical results, as you would expect.

The first SQL example in Figure 4.1 is a standard inner join statement that shows in its resulttwo copies of the join condition key value 123. The next two SQL join examples demonstratean explicit and implicit natural inner join.continue

Figure 4.1Explicit and implicit natural inner join example.

Page 39

They are equivalent statements. In these examples, DeptNo is the key in the Department table(Dept) and a foreign key in the Employee table (EMP). This key is used to perform the joinoperation. Because this is an equijoin, the join condition column named DeptNo in eachresulting row will always have the same DeptNo values and can be coalesced for convenience.

The NATURAL option when applied to columns across two tables does not affect its internaloperation. This is not the case for natural joins across three or more tables over a commoncolumn (domain). This is described directly below.

4.2—Multitable Natural Outer Joins

With the outer join, the NATURAL operation can have a significant effect on the results whenthe join involves more than two tables joined over a common named key. This is because thecoalesced result in the working set continues to be referenced after the initial join operation.For example, in the explicit natural FULL join SELECT ∗ FROM T1 NATURAL FULL JOINT2 NATURAL FULL JOIN T3, the join condition for Table T3 will reference its key columnsfrom itself and the coalesced key column value produced from the previously coalesced keyvalues of table T1 and table T2, which are stored in the working set. This is demonstrated

visually in Figure 4.2, which uses the Coalesce function to simulate the operation of a naturaljoin. The NATURAL option has a significant effect that changes the operation of the outer join,altering its operation and result. One-sided and FULL outer join operations are affecteddifferently by this coalescing operation, as described below under one-sided and FULL outerjoins.

The simulation of a multitable natural join, shown in Figure 4.2, applies to both the explicit andimplicit natural joins. The implicit natural join's operation with its join requirements specifiedexternally through the USING clause operates just as if it was externally specified. The explicitnatural join's operation is driven internally by the column names that match from the tablesbeing joined. The table names that match may seem obvious if you are familiar with the columnnames, but there is one situation where the explicit natural join may act nonassociatively thatyou should be aware of. This can happen when the common named columns are not in all of thetables being joined at each join point. This can cause the explicit natural join to operatedifferently depending on the table join order. This is demonstrated in Figure 4.3.

The two explicit natural FULL joins in Figure 4.3 demonstrate that the table join order canmake a difference in the result when all the tables do not have the same matching columnnames. In fact, the resulting data is not onlycontinue

Page 40

Figure 4.2Simulating the coalescing effect of the natural outer join.

Figure 4.3Explicit natural join may act nonassociatively.

Page 41

arranged differently between columns—it is different. This is because the join columns aredetermined as the join statement is processed, driven by the table join order. The equivalentimplicit natural join specifications in the example indicate how the explicit natural join willoperate. Notice that the USING clause specifications in the equivalent implicit natural joins aredifferent between the first and second examples, proving that the two explicit natural joins arenot equivalent, making the explicit natural join nonassociative in this example.

Let's take a closer look at the explicit natural join process in Figure 4.3. In the first explicitnatural join example, tables T1 and T2 are joined first and the common named join columnselected is X. When table T3 is joined to the working set, the common named columns selectedare Z and X, which were also in the working set. This produced the first result shown. In thesecond explicit natural join example, tables T2 and T3 are joined first and the common namedjoin column selected is Z. When table T1 is joined to the working set, the columns selected areX and Y, which were in the working set. This produced the second result shown. The resultsare different because the selected column names in these two examples are combineddifferently. In the first example, table T1 is joined using column X, and in the second exampleit is joined using columns Y and X.

4.3—Natural One-Sided Outer Join

Because of the data-preserving effect of one-sided joins joined across more than two tableswith common join columns, one-sided join results can be affected by the natural join operation.With these one-sided joins, the results can no longer model hierarchical structures. This isbecause the coalesced value of the one-sided operation does not retain the chaining effectnecessary to model hierarchical structures. With a standard one-sided join, for example, tableT1 can reach table T2, and table T2 can then reach table T3. If table T1 cannot reach table T2,or table T2 cannot reach table T3, then table T3 cannot be reached. But when join keycoalescing is performed, table T3 can be reached even if table T2 cannot be reached, becausetable T1's key value is used because of the coalescing operation. This behavior is nothierarchical in nature since table T3 can be reached from multiple paths—table T1 or table T2.The examples in Figure 4.4 demonstrate this behavior.

Notice in Figure 4.4 how the hierarchical LEFT join (the first join statement) goes down thestructure in a chain fashion, joining on columns from tables T1,T2, and then from tables T2,T3.This means that as soon as a missing table row occurrence (or link) is encountered, the rest ofthe row will becontinue

Page 42

Figure 4.4Natural LEFT joins are nonhierarchical.

null because the chain has been broken. The natural LEFT join does not support this chainingeffect. Basically, the first table (T1) is always preserved and its key join value(s) remains inforce because of the coalescing effect of the NATURAL option. This will increase the amountof data preserving that is possible based on table T1's key values, as can be seen in theinclusion of value T3C in the natural join result.

After the lead table is processed in one-sided natural joins as in Figure 4.4, the join order ofthe other tables can be changed without affecting the result. This means that the first statement

establishes the result, making the natural one-sided join nonassociative. This is proven inFigure 4.5, which demonstrates that changing the join order of a natural join can produce adifferent result.

4.4—Natural FULL Outer Join

FULL joins consisting of more than two tables across common named join columns open thepossibility of generating results that can be affected by the NATURAL option. All FULL joinswill preserve the total amount of data possible regardless of the order that the tables are joinedin. This is because no data is lost. The effect that the NATURAL option has on the FULL outerjoin is tocontinue

Page 43

Figure 4.5Natural LEFT joins are nonassociative.

join the tables producing the fewest number of rows possible. It condenses the rows. This isbecause with coalesced data, there is always a non-null key available to match on, reducing thegeneration of null data and creating a predictable result. The examples in Figure 4.6demonstrate this effect.

The standard FULL join shown at the top of Figure 4.6 is not a natural join. Because of this, itis difficult to predict the order that the rows will be combined in, as shown in the firstexample. Using the explicit or implicit natural FULL join in the second example in Figure 4.6,the rows are condensed, more predictable, and easier to process, because with the NATURALoption there is always a fixed key position available to match on. Notice also that the resultrows of the natural FULL join, excluding nulls, contain the same data ascontinue

Figure 4.6Natural FULL join producing the most condensed

result.

Page 44

the standard FULL join. This, as explained above, is because no data is lost with a FULL join.Because of this condensing effect, the natural FULL join is associative in operation (except forthe special situation concerning explicit natural joins documented in Section 4.2).

Since the natural join produces the most condensed result, it also follows that the natural FULLjoin can also be reordered in any manner without changing the result. This is also demonstratedin Figure 4.7. There is another reason for this behavior, which applies here and in the innerjoin example in Figure 4.8. The natural FULL join and natural inner join are both commutativeand associative in operation. By applying both these properties together, the SQL statement canbe completely reordered in any fashion without changing the result.

4.5—Natural Inner Joins

The NATURAL option of the inner join does not produce any side effects, so the results of anatural inner join and a standard inner join produce the same result except for the resultingcoalesced values, as shown in Figure 4.1. This is because there is no data preserving occurringwith inner joins, so the coalesced value of its join condition values is always the same as thevalues that make it up. There is never a case where one side is missing and the other side isnot. Either both sides exist or both sides are missing. With inner joins, nulls cannotcontinue

Figure 4.7Natural FULL join is associative and supports reordering.

Page 45

be introduced into the result from missing rows because this condition causes the entire row tobe eliminated.

The natural inner join examples in Figure 4.8 demonstrate that the natural inner join can becompletely reordered and it will not change the result. This behavior includes associativity.Because rows are so easily eliminated with inner joins, the example data was increased in thisexample from the previous examples to derive a result; otherwise, the inner joins in theseexamples would have produced empty results.

4.6—Intermixing Natural Join Types

Applying natural joins to different join types in a join statement is perfectly acceptable, withthe same warnings already covered in Chapter 3, which discussed intermixing join types. Eachnatural join is executed in turn, leaving its coalesced result in a working set as input into thenext natural join. So each natural join is executed in isolation when its execution turn comes up.This means the operation of intermixing natural join types is predictable and in some cases mayeven be useful.

This intermixing of natural join types can also include join types that do not include theNATURAL operation for the same reasons as explained above. This means having join typesthat do not include NATURAL operations does not interfere with the NATURAL operation ofother natural joins in the joincontinue

Figure 4.8A natural inner join is associative and supports reordering.

Page 46

statement, or vice versa. Explicit and implicit natural joins can also be intermixed. Intermixingof natural join types is nonassociative. An example of this is shown in Figure 4.9.

4.7—Natural One-Sided Join Transformation

The NATURAL one-sided join operation applied across multiple joins, as described inSection 4.3, has an interesting characteristic where the lead key value is propagated through thejoin operations. This characteristic prevents the normal hierarchical chaining operation thatwas shown in Chapter 3. But this characteristic does have a hierarchical mapping. This isdemonstrated in Figure 4.10. Since the root key is propagated through the structure, all otherelements are related directly and solely to the root producing the structure shown. This alsomeans that the natural one-sided join specification can be transformed into a more intuitivenon-natural one-sided SQL specification that more directly models the structure. This is alsoshown in Figure 4.10.

The SQL transformation in Figure 4.10 above is from a series of natural one-sided joins to aseries of non-natural one-sided joins. The only difference in these two join specifications isthat the join keys are coalesced into a single column value in the NATURAL join specification,and the join keys are not coalesced in the non-natural join. But in the non-natural join, the joinkey from the first preserved join table contains the same value as in the natural join result, so itshould be treated as the coalesced key.break

Figure 4.9Intermixing natural join types is nonassociative.

Page 47

Figure 4.10Natural one-sided outer join transformation.

The fact that this natural one-sided outer join transformation is possible also points out that thenatural feature for one-sided outer joins does not offer any additional capabilities beyond theone-sided outer join operation. This means it can be avoided by using the more intuitivenon-natural one-sided outer join.

4.8—Conclusion

The NATURAL join option takes on new meaning with outer joins because it can significantlyaffect the results of outer joins. This occurs when more than two tables are natural outer joinedacross a commonly named column. The natural outer join operation guarantees that there isalways a coalesced key column value available to join with any of the following tables to bejoined. This changes the operation of one-sided outer joins and FULL outer joins. Withone-sided outer joins, it can cause more data to be preserved and change their operation to benonassociative. With FULL outer joins, the NATURAL option can produce more condensedand predictable results having fewer rows while containing the same data, and it remainsassociative in operation except for one case—this being that explicit natural joins can behavenonassociatively when all of the tables do not have the same commonly named tablesconsistently across the natural join.break

Page 49

PART II—OUTER JOIN DATA MODELING AND STRUCTUREDPROCESSING

Part II documents in detail the inherent data modeling and structure-processing capabilities ofthe ANSI SQL outer join operation. These are capabilities that outer join users can utilizeimmediately. Chapter 5 supplies a background in data modeling and data structure processing.Chapter 6 shows in detail how the ANSI SQL outer join operation can perform complex datamodeling. Chapter 7 introduces new data modeling-related features. Chapter 8 supplies furtherinformation on the outer join's data modeling capabilities.break

Page 51

5—Data Structure Review

Working with SQL and its lack of data modeling, relational database professionals may have atendency to forget about data structures and their inherent capabilities. This chapter serves as ashort review on data structures, data modeling, and data structure processing necessary tounderstand the outer join's data modeling and structure-processing capabilities identified anddemonstrated in this book.

5.1—The Power of Hierarchical Data Structures

Hierarchical structures, unlike network structures, contain only one path to each data item in thestructure, which can be seen in Figure 5.1. This makes them unambiguous and singular inmeaning. Unambiguous structures have powerful semantics that can implicitly control the dataprocessing of the data structure. This is primarily what controls the nonprocedural operation offourth generation (declarative) languages (4GLs) and gives them their self-navigating andnonprocedural processing ability.

Since data structures are not unique to relational databases, the term segment is often used torefer to a group of singularly related data analogous to a relational data table. This term will beused instead of table when a more generic term is called for.

Both of the data structures in the Department and Employee views in Figure 5.1 are comprisedof the same tables and the same relationships, yet they both have very different structures.Different structures means they have different semantics, which produces different results. Inthe Department view, ancontinue

Page 52

employee and his or her dependents cannot exist if they are not associated with a department(i.e.. Bill is missing). This is not the case in the Employee view, which has the oppositesemantics that prevent department DeptC from existing since it has no employees associatedwith it. This situation is possible if an entire department is outsourced. In the Department view,DeptC can still exist and can have a budget and other information associated with it.

Ignoring which fields are present and their column order in Figure 5.1, notice that theDepartment and Employee views' data appear to handle replicated data differently.Hierarchical higher level values control (or own) lower level values, as shown in both dataview displays. Most obvious is that replicated data is totally eliminated in the Departmentview. To represent this in the data display, a blank field means that the last value printed in thatcolumn is still valid (unless a dash appears, which means the value is missing). Replication ofthe department name is not necessary since any given department can have many employees inthis view and shouldn't need repeating for each employee occurrence. The structured outputrepresents the actual data in the view. This is WYSIWYG (''What You See Is What You Get")display processing based on the semantics of the data structure.

Over in the Employee view in Figure 5.1 you will notice that DeptA is replicated when thenext employee, Mary, is introduced in the display. This follows the semantics of the Employeeview where Employee segment is hierarchically over Department segment so that eachemployee has its own department occurrence. This view's WYSIWYG display is also valid,showing the correct replication (notice that employee Mike, with two dependents, didnotcontinue

Figure 5.1Two application views with the same relationships and their data.

Page 53

cause a replication). Knowledge of the data structure will further improve the usefulness andapplication of this intuitively formatted data.

The data displays of the Department and Employee views in Figure 5.1 represent the semantics

of their data structures—for example, if you were to take and divide up both views' data intoseparate structured records based on the root value as the record key. Then each view wouldstill reflect the same data value occurrence counts (cardinality) shown. This verifies that thecontrolled replicated values are correct.

Most fourth-generation query languages that operate on hierarchical structures areself-navigating, following the data structure, and are controlled by the semantics of the datastructure. This makes them intuitive and powerful. They follow rules based on parentage andsibling segment (multileg) operation derived from the hierarchical semantics. Parentage rulescan affect processing by controlling internal looping ranges. Sibling segments are different datapaths directly under the same (common) parent, such as the Department and Dependent paths inthe Employee view in Figure 5.1. The segment occurrences in each of the paths do notcorrespond in a one-to-one fashion; they are related only by their common parent—in this case,Employee—and are otherwise independent of each other. The left-to-right positioning ofsegments under a common parent is not significant. In the Employee view in Figure 5.1, theDependent and Department segments could be reversed without changing the semantics orresults.

Combining the above fourth-generation semantics with the Employee view in Figure 5.1, forexample, data selection based on a given department value from the Department leg anddisplaying dependents from the Dependent leg will select all dependents under the activecommon parent Employee. Using the Employee view in Figure 5.1, SELECT Dpnd FROMEmployeeView WHERE Dept="DeptA", will in this case display all dependents—Jason, Jane,and Sam—from department DeptA. This query works by satisfying the selection criteria todetermine the active common parent(s): Mike and Mary from the Employee table, whichcontrols the range of selected data; Jason and Jane under Mike; and Sam under Mary. Thiscycle is repeated until all selection criteria in the database have been tested.

5.2—Three-Tier Database Architecture

The three-tier schema approach to database modeling and design consists of three levels ofviews that define all aspects of how the database is stored and how it can be accessed. Thesethree view levels are the external view, the conceptual view, and the internal or physical view,which are used respectivelycontinue

Page 54

by the user, the DBA, and the database system. This is shown visually in Figure 5.2. Thesethree levels allow for a much greater level of database flexibility than if they were not used.Unfortunately, relational databases do not inherently support this, but by following gooddatabase design, it can be supported externally.

5.3—External and Internal Views

The external view is how an application perceives the database, and for this reason it is alsoknown as the application view. Different applications can view the same database in differentways. For example, the Employee and Department views shown in Figure 5.1 are comprised of

the same tables and relationships, but have very different views, semantics, and associateddata. Application views have to be unambiguous, and for this reason they use the hierarchicaldata model.

Internal views represent and control how the tables and data are physically stored and relatedin storage. External views and conceptual views (covered in the next section) are logicalviews. They bear no relationship to how the data in the database is actually stored and related.

5.4—Conceptual View

The conceptual (or global) view is usually a network structure representing all the possiblevalid or necessary relationships that are required in the database. Being a network structure,this structure is ambiguous by itself since a given data element may be accessed from more thanone path, with each having different semantics. The conceptual view in Figure 5.3 encompassesthe Department and Employee application viewsbreak

Figure 5.2Three-tier database architecture.

Page 55

Figure 5.3Conceptual view that encompasses the Department and Employee views.

The conceptual view logically lies between the external and internal views, and is used tocontrol how the external and internal views are related or mapped to one another. Theconceptual view logically separates the external and internal structures, allowing the internalview to change without changing the external views, and allows the external views to changewithout changing the internal view. This adds greatly to the data and structure independence,database flexibility, and reduced maintenance requirements.

5.5—Many-to-One and One-to-Many Relationships

Many-to-one (M to 1) and one-to-many (1 to M) relationships are the main types of datarelationships that deal with occurrence count (cardinality) of data items in application datastructures. Their names describe their relationship. The employee-to-department relationship isa many-to-one relationship because many employees can have the same department. In adepartment-to-employee relationship, the relationship is one-to-many because one departmentcan have many employees. This can be seen in Figure 5.4.

One-to-many and many-to-one relationships are hierarchical. As such, they follow the samebehavior as was documented in Section 5.1, which described hierarchical data structures andtheir structured data display. This is reflected in Figure 5.4.

5.6—Many-to-Many Relationships

Notice that one-to-many and many-to-one data structures are the same basic relationshipsturned around. One implies the other. This is also true of a many-to-many (M to M) relationshiplike parts and suppliers. One part cancontinue

Page 56

Figure 5.4WYSIWYG display of many-to-one and one-to-many

relationships.

have many suppliers and one supplier can have many parts. In a hierarchical environment,many-to-many relationships look like a one-to-many relationship in either direction, but inreality, they exhibit characteristics of both. Examine the many-to-many relationships and theirdata in Figure 5.5.

In Figure 5.5, the structured output of the many-to-many Parts and Suppliers views appear to beone-to-many relationships. But if you look closely, you will notice that the data results in thesecond data column of both views (the many occurrence side) also have repeating datasomewhere in the column. This is a characteristic of many-to-one relationships proving that amany-to-many relationship has characteristics of both one-to-many and many-to-onerelationships. But this many-to-one characteristic can usually be overlooked withoutconsequences, so that many-to-many relationships can be viewed primarily as one-to-manyrelationships—since this is the emphasis of the semantics, as the visual WYSIWYG structureddisplay in Figure 5.5 demonstrates.break

Figure 5.5Example data views of a many-to-many relationship.

Page 57

Many-to-many relationships in relational databases require an "association" table to containthe relationships that can simultaneously relate tables as many-to-one relationships in bothdirections. This is shown in Figure 5.5. Normally, the association table operation can betransparent to the result, as also shown in Figure 5.5.

In Figure 5.6, you will notice the inclusion of prices in the Parts and Suppliers data views. Theinteresting thing here is that each supplier can have a different price for a specific product.Where should the price be stored? It is stored in the association table at its intersection point ofSupplier and Part, and is therefore referred to as intersecting data. In a structured database orstructured display, as in Figure 5.6, this intersecting data can be logically viewed as beingassociated with the lower level relation, Part in the Suppliers view and Supplier in the Partsview. The lower level is the only level that can logically accommodate intersecting datawithout causing replicated data.

5.7—Converting Network Structures to Hierarchical Structures

Often it is desired to have the same table in multiple locations of a hierarchical data structure.For example, the same Employee table may be referenced for department manager and productmanager, causing a network type structure. For an application view, this causes problemsbecause network structures are ambiguous, as was explained in Section 5.1. The simplesolution is to rename the multiple referenced table so it can logically become different tables inthe hierarchical data structure, allowing the semantics of the data structure to becomeunambiguous, as shown in Figure 5.7.

5.8—Relating Hierarchical Processing to Relational Processing

With relational databases, the first normal form storage requirement forces the use of flattables. Because of this, the Cartesian product is necessary for joins tocontinue

Figure 5.6Example data of many-to-many relationship and intersecting

data.

Page 58

Figure 5.7Converting a network structure to a hierarchical

structure.

satisfy join processing by producing all combinations of the join rows, as shown in Figure 5.8.All combinations are also necessary for sibling segments (separate legs of the hierarchy). Thisis because sibling segments or tables are not directly related to each other on a row-by-rowbasis and all combinations of the rows are necessary to simulate independent processing of thelegs so they can be accessed in any order or combination.

In Figure 5.8, we can see how the Cartesian product effect can explode the join result whenone-to-many relationships cause multiple keys to match in both tables, such as Key1 in thisexample. This exploded result becomes necessary because standard relational data is forcedinto using flat two-dimensional tables, so the result table as shown above has to be exploded tohold the results. This becomes particularly important in selecting or filtering data based on datafrom two or more tables, as in the WHERE clause of WHERE Alpha= "B" AND Numeric=1applied to the data result in Figure 5.8. Locating the table row result with an Alpha value of Band a Numeric value of 1 requires exploding the result rather than joining the tables in a simpleparallel join method, which would not produce a row with these values since they are ondifferent occurrences.break

Figure 5.8

Cartesian product effect.

Page 59

Applying this Cartesian product effect to the joining of the Department, Employee, andDependent tables produces a flat, tabular SQL table structure, as shown in Figure 5.9.

Notice that with the flattened first normal form structure in Figure 5.9, the same hierarchicalprocessing as was used in Section 5.1 is achieved by processing each row one at a time. Nolooping or navigation is necessary since all combinations have been generated and exist in therows. This means that the same query used for hierarchical access in Section 5.1 can be used inthis case to achieve the same data results with the flattened structure shown in Figure 5.9. Thisquery was SELECT Dpnd FROM DeptEmp WHERE Dept= "DeptA", which will display alldependents—Jason, Jane, and Sam—from department DeptA. While this example produces thesame results as the identical query in Section 5.1, flat structures like the one in Figure 5.9 willoften produce replicated data in the result. This is the result of the replicated data introducedinto the creation of the flat structure as described in Chapter 1 and shown in Figure 5.9. Thiscan be seen in the query SELECT Dept FROM DeptEmp WHERE Dept="DeptA", which whenapplied to the data in Figure 5.9 will replicate the value DeptA three times—once for each rowthat is present.

5.9—Physical Versus Logical Data Structures

Physical data structures are fixed structures that can't be changed easily, if at all. Theirrelationships are based on physical pointers or physical juxtapositioning, as is the case withstructured file records. On the other hand, logical data structures, like relational structures, usedata values that can create linkages dynamically. This allows them to be very flexible inspecifying their data structures. Outside of these differences, there needs to be no basicdifferences in how these structure types are navigated and processed. At the lower level,logical structures may require additional structure comprehension logic.

SQL is a suitable language for the processing of physical and logical data structures. Alimitation imposed on SQL is its Cartesian product processingcontinue

Figure 5.9Data structure relationship to Cartesian product.

Page 60

model. This can introduce problems in determining the logical data structure, which relies ondata values for this purpose. This means that if you are not careful with formulating yourqueries, invalid results can occur, often unnoticed. This does not happen in physical views,which always represent their actual structure correctly. This is shown in Figure 5.10, where

there are two employees with the same name in the same department, but this fact is lost in thelogical database view because the structure is determined by data. While this error could becorrected by taking the count using a unique key, the fact is that the physical data view is notsubject to this error situation.

5.10—Sibling Legs Query Semantics

Since sibling legs do not correspond directly to one another, but are related through theircommon parent, their semantics are more complex than what has been discussed previously. Inthe data structure in Figure 5.11, the parent Div for division has two siblings legs, Prod forproduct and Dept for department. Each has multiple occurrences of data. What happens if aquery qualifies a search from one of these sibling legs and selects data from the other siblingleg, as shown in the query in Figure 5.11? The semantics dictate that if one data occurrence isqualified from one leg, then all data occurrences from the other sibling leg are selected. This isalso depicted in the query's structured data output in Figure 5.11. While these exact semanticsmay seem a bit arbitrary, they are actually backed up by the same query applying the relationalCartesian product processing model, also shown in Figure 5.11.break

Figure 5.10Physical and logical views can produce different results.

Page 61

Figure 5.11Multileg data selection semantics example.

Another example of multileg semantics is when multiple legs are used in the selection criteriaas in Figure 5.12. In this example, the WHERE clause Dept= "DeptY"AND Prod="ProdA" is

used to qualify a selection where at least one entry in the Product leg is ProdA and at least oneoccurrence in the Department leg is DeptY. This example also selects the data that is includedin the qualification criteria, so this data is also filtered. This means that only values ProdA andDeptY are selected from their respective common parent Div1. Notice how the Cartesianproduct model can support this processing one row at a time as performed by relationalprocessing. If the AND operator in the WHERE clause were changed to an OR operator, theCartesian product processing would select rows with a Product value of ProdA or rows with aDepartment value of DeptX. This produces the correct semantics even though replicated valuesare also produced because of the Cartesian product effect. This is shown in Figure 5.13.break

Figure 5.12Multileg AND selection qualification semantics example.

Page 62

As an important point on semantics, both conditions of an OR operation, as in the SQL fromFigure 5.13, have to be tested even if the first condition tests true. In this query, the firstselection condition, Dept=''DeptY", is true, but the outcome is still affected by the secondselection condition, Prod="ProdA", which enables DeptX values to be displayed. This can beverified by comparing this result to the result of the query in Figure 5.11, which only tests forthe selection condition Dept="DeptY" and therefore filters out DeptX values. The ORProd="ProdA" portion of the above query selection condition in Figure 5.13 matches ProdAvalues, which will qualify their sibling segment values and introduce them into the result, suchas the value DeptX. If this still seems illogical, consider that the results from each conditionalone when combined (such as through an OR operation) would contain a union set of resultssuch as in Figure 5.13.

5.11—Ordering of Data Structures Can Cause Their Restructuring

When a physical data structure is ordered (sorted) against its natural structure by not followingits path, the structure is changed to that of the list of fields to be ordered. To format a physicalstructure like the one in Figure 5.14 requires that the structure be flattened in order to be sorted.This will convert physical structures to logical structures. After flattening a data structure, theordering of it will affect its structure, as shown in Figure 5.14.

Since relational databases use logical databases, the ordering effect shown in Figure 5.14 doesnot normally have to be a concern. But with the one-sidedcontinue

Figure 5.13Multileg OR selection qualification semantics example.

Page 63

Figure 5.14Ordering can cause restructuring.

outer join and its inherent hierarchical ordering shown in Chapter 3, there may be someconcern about going against the inherent data structure produced by the outer join since theremay be a semantics conflict.

5.12—Data Structure Composition

Data structures are composed of records that include segments that consist of data fields. Toexplain from the bottom up, fields are grouped into contiguous segments. The fields in asegment are related closely by data content such as name, street number, city, state, and ZIPcode, and represent a given segment type. Fields in a segment do not repeat, but segments can.These are called segment occurrences. Segment types are related in a fixed hierarchical datastructure as in Figure 5.15. The top segment type is known as the root segment. One occurrenceof a root segment, its related segment types, and their segment occurrences are known as astructured record.

This data structure definition fits into the common notion of a file containing variable-lengthstructured records where each record is composed of segments that are arranged into ahierarchical data structure. Relational data-bases as used in this book to model data structurescan also fit naturally into this definition. A relational database can be thought of as beingcomposed of structured records, where the segment types represent the different tables and

their segment occurrences represent the rows of the tables as shown in Figure 5.16. Thesestructured files are supported directly by COBOL and by the C language (with some variablesegment occurrence limitations). More detailed information on structured records can be foundin Chapter 14.break

Page 64

Figure 5.15Data structure composition.

Figure 5.16Relational data structure composition.

5.13—Good Data Modeling Design Principles

Ideally, data modeling is the defining of data structures whose semantics reflect the defineddata model. In this regard, good data modeling design is important to data structure definition.The problems with nonhierarchical structures were covered earlier in this chapter; here, wewill concern ourselves with basic normalization rules. These rules help avoid insertion,deletion, and update anomalies, and increase and support data independence through increaseduse of joins. This means that they also affect data structures with similar problems andadvantages. The basic normalization rules are numbered from first to third normal form.Usually, these rules are specified in a building block fashion where third normal form includessecond normal form and second normal form includes first normal form—we will forgo thisrequirement as explained below.

Except for first normal form, these basic normalization rules are about good database designprinciples, which are normally associated with relational databases but are also veryapplicable to segments of structured databases where segments are analogous to rows oftables. First normal form is a restriction for SQL tables that forbids the use of repeating fieldsbecause of their fixed two-dimensional format. This is not necessarily a good database designprinciple,continue

Page 65

only a relational design constraint. This SQL restriction has been eroding, with establishedSQL vendors starting to support nested relational tables—tables within tables—known asnested relational support.

Second normal form does not permit any partial key dependencies. A nonkey field (column orattribute) must not be functionally dependent on a field that is only part of the primary key. Inother words, every nonkey field is fully dependent on the primary key. Third normal formrequires every nonkey field to be nontransitively dependent on the primary key. This means allfields are directly dependent on the primary key. To correct these potential design problems,the offending fields should be moved into another table or segment where they obey thesedatabase design rules.

These basic normalization rules may not be enough to satisfy a good database design. Improperdatabase design could still produce a condition known as lossy decomposition, introducedfrom the basic normalization process that breaks tables apart. Imagine breaking a table into twotables based on ZIP code instead of account number. When these tables are reconstructed by ajoin operation, this condition introduces additional extraneous rows that were not in theoriginal table. This has the effect of obfuscating the semantics of the valid rows, resulting in aloss of information. To solve this problem, a lossless join property is needed that can besupplied by advanced normalization forms, known as Boyce Codd normal form, fourth normalform, and fifth normal form. The first three basic normal forms explained above removeddependencies. In these advanced normal forms, advanced dependencies that rely on superkeysare used to support lossless joins. Superkeys are composite keys that when broken down stilluniquely identify a row. This eliminates the introduction of extraneous data when tables arejoined.

5.14—Conclusion

This chapter has identified and discussed the elements involved with data modeling. Thesewere three-tier database architecture with its application views and conceptual model; datarelationships such as one-to-many, many-to-one, and many-to-many; data structures such ashierarchical and network; data structure processing as it relates to relational processing; thesemantics of multileg data structures; and good database design principles.

Network structures are necessary for the definition of the conceptual data model, which needsthe ability to define many different data views for the same database (tables). However, ifnetwork data structures are used as application views, there can be problems because datavalues in the structure can be reached from multiple paths, making the view ambiguous. Thisallows invalidcontinue

Page 66

assumptions to be made by nonprocedural languages. This is not true of hierarchical datastructures, which are singular in meaning. This makes their semantics very powerful in thenonprocedural processing of data structures.

Many-to-many relationships are not directly supported in relational databases, and require theuse of an association table. While this will involve additional SQL to process the intersectingtable, it does enable the opportunity to support intersecting data.

The Cartesian product is used in relational processing to enable flat two-dimensionalstructures to be processed in a structured manner. There are side effects caused by this processin the form of replicated values introduced to fill the flat structure. This can hide the datastructure and throw summaries off. This was also shown when the difference between physicaland logical data views was covered earlier in this chapter. Also related to these last two itemsis ordering the database view against its inherent data structure, which was alsodiscussed.break

Page 67

6—Outer Join Does Data Modeling

Previous standard versions of SQL have not supported the capability to perform data modelingand complex data structure processing. The ANSI SQL does not officially claim to support datamodeling and structure processing either. But ANSI SQL does inherently support data modelingand data structure processing through its new outer join operation. With knowledge about thiscapability and instruction on how to use it, SQL users and vendors can take advantage of thispowerful capability.

6.1—SQL Data Modeling Using the Outer Join

Back in Chapter 2, it was shown how one-sided (LEFT and RIGHT) joins are hierarchical innature because they preserve unmatched rows in one table and not the other. In a LEFT join, theleft table is preserved so that the left table is hierarchically over the right table. For example,in the LEFT join Department LEFT JOIN Employee ON DeptNo= EmpDeptNo, departmentscan occur without any matching employees, and employees cannot exist without a matchingdepartment. These semantics precisely define the basic building blocks for constructing ahierarchical data structure.

In one-sided joins involving more than two tables, the hierarchical effect described above isextended such that Department LEFT JOIN Employee ON DeptNo=EmpDeptNo LEFT JOINDependent ON EmpNo=DpndEmpNo produces the hierarchical structure shown in theDepartment view in Figure 6.1. This is a simple one-leg hierarchy. But the outer join can alsomodel and process multileg (complex) data structures as in the Employee view, alsoshowncontinue

Page 68

Figure 6.1Different outer join data structures comprised of the same

relationships.

in Figure 6.1. With the basic modeling capabilities shown in these data structures, anyhierarchical data structure can be modeled.

The relationships depicted in the Department view in Figure 6.1 are one-to-many. Onedepartment has many employees, and one employee can have many dependents. In theEmployee view in Figure 6.1, the department to employee one-to-many relationship shown inthe Department view has been flipped around to define an employee to department many-to-onerelationship.

Both of the structures in Figure 6.1 use the same tables and the same relationships to derivedifferent structures with different semantics. This is shown in the differing query results inFigure 6.1 where department DeptC with no employees can't exist in the Employee view, andemployee Bill can't exist in the Department view because he has no department designation.What triggers this difference? Since the join relationships are identical, it wasn't directly any ofthe ON clauses. It was the initial LEFT join that reyersed the Department and Employee tablearguments from the Department view, putting Employee over Department. This in effecttransformed the structure into the multileg structure shown in the Employee view in Figure 6.1.This is because the Employee table is now hierarchically above the Department and Dependenttables and is directly related to both of them through their ON clauses. This demonstrates thatON clauses are also of importance by controlling the link (join) points between the datastructures.break

Page 69

This flexible data modeling and data structure processing is possible through a combination ofthe one-sided outer joins and the individualized join criteria specified for each joinrelationship via the ON clause. The one-sided outer join controls the hierarchical layering oftables, while the ON clause controls the relationships or pathways between them.

Natural one-sided outer joins should not be used to model hierarchical structures because they

do not directly model hierarchical structures as described in Chapter 4. But if they are used,they can be transformed to nonnatural one-sided joins, as described in Chapter 4, and thenprocessed. This is an optional feature, and is not necessary to perform complete data modeling.

Using the ON clause, concatenated keys and path qualification can also be supported. With aconcatenated key, a key can be comprised of subfields (multiple columns). For example, ONDeptNo1=EmpDeptNo1 AND DeptNo2=EmpDeptNo2. This has the effect of concatenating atwo-part key and comparing the parts as one unified key. With path qualification, the joincriteria can also reference fields further up the path from the point being linked. For example,when linking Dependent with Employee in the Department view in Figure 6.1, the followinglink criteria are valid: ON EmpNo=DpndEmpNo AND DpndVal=DeptVal. Notice that thereferenced DeptVal column is at a higher hierarchical level than the actual link point.Determining the link point is described in the next section.

The minimum outer join requirements for data modeling and data structure processing are thesupport for the ANSI SQL LEFT join and the ON clause. To fully support subviews comprisedof outer join structures, rightsided nesting (see Chapter 2) must also be supported. This meansthat SQL view names can also be specified on the right side. Subviews specified on the leftside of the outer join operation require no special processing requirements.

Using the ANSI SQL outer join, network structures can usually be converted to hierarchicalstructures. This is accomplished by renaming tables that have multiple entry points in thestructure and including them in the structure so that no single logical table has more than oneentry point in the structure. Figure 6.2 demonstrates how a network structure can be rewrittenas a hierarchical data structure using SQL renaming.

The SQL that defines the network structure in Figure 6.2 is ambiguous since table X can beaccessed from more than one path (via table B or table C), making its meaning and semanticsunstable. Each path has its own distinct meaning, and the result can reflect either one. Theremay be situations where these semantics are exactly what you may desire, but the unambiguouspower of the hierarchical structure (see Chapter 5) cannot be utilized in these rare cases.break

Page 70

Figure 6.2Converting network SQL structures to hierarchical SQL

structures.

6.2—

ON Clause Data Modeling Join Condition Rules

As demonstrated in Figure 6.2, there is a right way and a wrong way to join (or link) tables tospecify a valid hierarchical structure. Invalid structures are usually caused by the incorrect useof AND and OR operators in the ON condition. If the join condition rules are not followed,invalid or illogical structures can be created that may produce inconsistent results. These rulespertain to linking (joining) an upper structure to a lower structure when using a one-sided outerjoin operation. In the case of a LEFT join, the higher structure is the structure on the left side; inthe case of a RIGHT join, the higher structure is on the right side.

Normally, building a hierarchical data structure is performed top-down, where the lower leveltable argument is usually a structure consisting of one table since tables are being introducedand linked to the top structure one table at a time. The lower level structure can also becomprised of multiple tables, as in Figure 6.3. These multitable subviews will be described inmore detail in Chapter 7.

The following three basic ON clause join condition rules apply to each ON clause joincondition in outer join statements that are modeling hierarchical structures.

The first rule specifies that the top and bottom structures must both be referenced in each ONclause join condition or subcondition (described below). This is necessary to specify acomplete path from the upper structure's link point to the lower structure's link point. The linkpoint is a specific table in the upper and lower structures determined by the specification of theONcontinue

Page 71

Figure 6.3Example of breaking link rule three to build a hierarchical

structure.

clause join condition that joins (or links) the upper and lower structures. The determination ofthe link points is specified in the second and third ON clause join condition rules describeddirectly below.

The second rule applies to the top structure. In the top structure, only one single path can bereferenced from the link point up the path to the root. Referencing multiple paths using AND orOR operators creates an ambiguous network structure, as demonstrated in the network structurein Figure 6.2. When using AND and OR conditions in the ON clause, OR clauses createsubclauses that can consist of AND operations. When referencing multiple locations along a

path in the upper level structure, the lowest table referenced in each OR subcondition becomesthe link point, and the link point in each OR subclause must specify the same link point table;otherwise, a network or illogical structure is created. When the link point in the upper levelstructure is not the lowest level point on its path, a new leg of the structure is created. This canbe seen in Figure 6.1 when in the Employee view the Dependent table is joined to theEmployee table, forming a multileg structure.

The third and final rule applies to the bottom structure. In the bottom structure, only the root(top) table can be referenced. This is necessary to preserve the top-down processing ofhierarchical structures that is normally expected. While breaking this rule may limit some of theadvantages of a strict hierarchy, it is possible to link to a lower level structure based on tablecolumns below the root of the lower structure. Regardless of which table or tables arereferenced below the root, the root table should still be treated as the bottom structure linkpoint, as demonstrated in Figure 6.3. The exact semantics of this unconventional hierarchicalstructure will be covered in Chapter 15, but up until then this text assumes that the third linkingrule is always obeyed.break

Page 72

6.3—Valid and Invalid ON Clause Data Modeling Examples

In the network structure in Figure 6.4, there is an example of how an OR clause can cause anetwork structure to be created. In determining the upper structure link point, one side of theOR isolates table B and the other side isolates table A. Since table B and table A are fromdifferent legs, table C can sometimes be reached from one leg or the other, making it a networkstructure—which is ambiguous for an application view.

The second ON clause for the hierarchical structure in Figure 6.4 demonstrates how the ANDclause can be used to qualify the path further up. The second ON clause goes with the secondLEFT join, which is linking table C to table B. The lowest referenced table in the upper levelstructure's selected leg—table B—is determined as the link point. But as shown here, a higherlevel table on the path—table A—can also be referenced to further qualify the link conditionwithout altering the link point.

In the first structure in Figure 6.5, there is an example of how an AND clause can cause aninvalid structure to be created. In this example, X is reachable only from both paths at the sametime because of the AND operator. While the form of this structure resembles a networkstructure as shown in Figures 6.2 and 6.4, it does not behave as a typical network structure. Itsbehavior can be considered illogical. Again, this does not mean that there is not some possibleuse for the semantics of this structure.

The second ON clause for the hierarchical structure in Figure 6.5 demonstrates how the ORoperator can be used to specify a choice of two OR subconditions because each ORsubcondition isolates the same two link points: tablescontinue

Figure 6.4The difference between OR and AND operators when linking

structures.

Page 73

Figure 6.5Valid and invalid AND operator use.

B and C. The reference to table A in the upper structure is disregarded in determining the linkpoint since table B is at a lower level. This example also demonstrates that the join conditiondoes not always have to compare two columns directly to each other (i.e., C=''X"AND B="Y").The link can be satisfied as long as each subclause references a table from each structure andsatisfies the join condition rules described in Section 6.2.

6.4—Valid and Invalid Data Modeling Results

In Section 6.3, we saw how to create valid and invalid application data structures; examiningthe results produced by them can be very useful and insightful. The example in Figure 6.6demonstrates the effect of a network structure with multiple paths to data. Each path has its ownsemantics (meaning) which can produce a combination result that can be ambiguous, as shownin Figure 6.6. Path 1 represents the managers for a selected project. Path 2 represents themanagers for a department. As discussed in Chapter 5, network data structures taken on theirown are ambiguous. This means a self-navigating 4GL database like SQL would also produce

an ambiguous result since it is free to take either path to the data (as shown in Figure 6.6),which combines managers for both products and departments.

We saw in Section 6.1 that ambiguous network structures can also be respecified in ANSI SQLas nonambiguous hierarchical data structures. Using this conversion technique, the example inFigure 6.7 was changed from the ambiguous network structure shown in the example in Figure6.6 to a nonambiguous hierarchical data structure. This hierarchical data structurepreventscontinue

Page 74

Figure 6.6Network structure produces ambiguous results.

the ambiguous single result of both managers of products and managers of departmentsproduced in Figure 6.6, allowing the two separate nonambiguous results of managers ofproducts and managers of departments shown in Figure 6.7. These results are not possible bydefault in the ambiguous network view above. They are possible in the hierarchical structurebelow because each path is kept separate, allowing the paths to be queried separately.

6.5—Substructure Views

The syntax and semantics of the ANSI SQL outer join inherently and seamlessly support storedsubstructure views. Substructure views can be specified anywhere a table can. These storedviews can be used to form larger data structures. The result of these combined substructuresfollows the hierarchical semantics as dictated by the newly formed structure. When linkingthese substructures, the same rules apply as those defined earlier in this chapter for buildingstructures. In particular, the ON clause rules in Section 6.2 must be followed.

As mention in Chapter 2, right-sided nesting is required to support stored structured views, ormore precisely the ability of the outer join syntax tocontinue

Page 75

Figure 6.7Network structure converted to hierarchy produces unambiguous results.

support the simultaneous building and handling of multiple data structures. Take for example:(A LEFT JOIN B ON A=B) LEFT JOIN (C LEFT JOIN D ON C=D) ON B=C. The parentheseshave been added to make the outer join statement clearer, but are unnecessary since the joinorder is controlled by the placement of the ON clauses (see Chapter 2). The join operations inparentheses are performed first, forming separate structures, each stored in a different workingset before they are combined into one structure following the last, rightmost ON clause. TheLEFT join operations enclosed in the parentheses can be thought of as two stored structuredviews that have been expanded into their representative SQL when inline expansion is used bythe SQL system.

When the inline expansion of the stored structured views occurred in the above SQL, noticewhat happened to the rightmost ON clause. It got pushed to the right, causing right-sidednesting. Fortunately, the ANSI SQL syntax handles this situation properly to support inlineexpansion. With storedcontinue

Page 76

structured views, this right-sided nesting occurs transparently, so the SQL programmer neednot normally be concerned with right-sided nesting. The transparency of this operation isdemonstrated in Figure 6.8.

The Department view's SQL in Figure 6.8 demonstrates how the embedded subview EmpViewis expanded to define the Department data structure. While the semantics of the expandedDepartment SQL are the same as the depicted Department structure, the order that the joins areperformed is now from the bottom up instead of from the top down. The reason the semanticsremain the same is that with hierarchical structures you can build them up, down, or in anyorder and the semantics remain the same as was described in Chapter 3. There is one caveatwhen building a structure upwards: when the ON clause references a field further up thestructure than the upper link point, the upper level structure must contain all references at thetime of the join. This should not present a problem for stored views since they should only bereferencing columns in their own view domain. Since the stored subview is expanded ormaterialized when invoked, any recent changes to the subview are automatically in effect. So,the support of subviews is very useful and important. Structured views embedded withinstructured views are also naturally supported; this is covered in Chapter 8.break

Figure 6.8Embedded structure view expansion.

Page 77

6.6—WHERE Clause Filtering with Data Structures

Before the existence of the ANSI SQL join operation, the WHERE clause had two functions: tospecify the join criteria and to specify selection criteria. With the ANSI SQL join, the ONclauses are used to specify the join criteria, while the WHERE clause is used primarily tospecify the selection criteria. This does not change when the outer join is used to perform datamodeling. The WHERE clause filters the data structure—it can be specified with a stored viewand/or at the time of the view invocation.

As you would expect, ON clauses cannot be specified on join view invocations, so theWHERE clause is the only way to influence query operation at the time of view invocation.

This does not take away from the outer join's data modeling capability; in fact, it strengthens itbecause the data structure of a stored view cannot be changed when invoked, thereby protectingits integrity. In this way, the stored structure view can only be filtered with the specification ofa WHERE clause, which cannot change the structure of the data being filtered.

The WHERE clause operates on the records or rows of the view. It identifies data that isselected along with all of its associated data in the record or row. For example, the WHEREclause in Figure 6.9 applied to the employee data from the Employee view in Figure 6.1 selectsonly rows in their entirety—containing employees of department DeptA, and all of the otherrows are discarded. For more information on data structure filtering semantics, refer to Chapter5 and Chapter 7.

6.7—WHERE Clause Filtering with Substructures

Normally, WHERE clauses with stored substructures are not needed and are not recommendedexcept for the one case explained below. ON clauses can be used to specify most filteringrequirements for substructures. WHERE clauses that filter data based on filtering criteria frombelow the root of thecontinue

Figure 6.9WHERE clause filtering works with data structures.

Page 78

substructure present a problem because they are not following strict hierarchical rules. This isbecause higher level data is being deleted based on values from lower structure levels,because the entire path length is filtered by the WHERE clause. While not generallyrecommended, this situation can be hierarchically handled by following special operationalprecautions, which are discussed in Chapter 15.

ON clauses for hierarchical substructures views cannot be used to filter the root of the structurebecause ON clause filtering of hierarchical structures only affects the lower structure, whichmeans the root cannot be filtered in this manner. In this situation, a WHERE clause can bespecified in the stored substructure view to filter the root level based on the root values. Thisis shown in the EmpView in Figure 6.10. This filtering operation can be automatically movedto the ON clause that controls the linking of this substructure when it is processed. Thisseamless transformation allows the substructure to be integrated seamlessly into the overallstructure, and allows a top-to-bottom processing order to process the substructure. This is alsoshown in Figure 6.10.break

Figure 6.10WHERE clause transformation for filtering substructure root.

Page 79

In Figure 6.10, moving the WHERE clause data filter of the subview higher up to the ONclause of the join that controls linking the subview works because the filtering applies to thetotal subview, just as the WHERE clause would have.

6.8—Complex Data Modeling Example

So far, we have been using the fairly simple Department/Employee database to demonstratehow the SQL-92 join operation can perform data modeling. The multimedia book example inFigure 6.11 is a more complex data modeling example, consisting of a different subject matterthat should demonstrate that a hierarchical data model of any complexity can be easily modeledwith the SQL-92 join operation, and it will continue to obey hierarchical semantic principles.

6.9—Conclusion

Building hierarchical data structures and the structured processing of them is possible with theone-sided outer join operation. This building of hierarchicalcontinue

Figure 6.11Multimedia book data modeling example.

Page 80

data structures or combining of hierarchical data structures involves two operations. First, theplacement or specification of which structure is hierarchically over the other, and second, thespecification of the pathway from the link points from the upper structure to the lower structure.The first operation is accomplished using a LEFT or RIGHT outer join that places onestructure hierarchically above the other, and the second operation, specifying pathways, isspecified by ON clauses. Both of these operations are required to model hierarchical datastructures. Data structures modeled in such a fashion can still be filtered by the inclusion of aWHERE clause in the data structure definition and/or view invocation.

Amazingly, the syntax of the ANSI SQL join operation naturally supports the use ofsubstructure views as standard SQL views. These structured subviews can be used anywhere atable can be specified to combine with other structures to form larger data structures. Thesesubstructure views can also be embedded in other structure views.

Also shown in this chapter was the capability for the outer join operation to create ambiguousnetwork data structures and illogical structures. While these structures do not have the samepowerful semantics as hierarchical data structures, they still may be useful in certainspecialized situations that the user may have. Unfortunately, when these structures are used, itis usually by accident. The knowledge of how to construct hierarchical structures can alsoprevent ambiguous and illogical structures from being built unintentionally.break

Page 81

7—Outer Join Data Modeling—Related Capabilities

This chapter covers powerful capabilities and features that inherently accompany or enhancethe ANSI SQL outer join data modeling capability. For this reason, they are automaticallyavailable for database professionals to use if they know that they exist and how to use them.

7.1—Data Structure Filtering

The inherent data modeling capability of the outer join also supports data filtering that operatesby naturally following the semantics of the outer join specified data structure. This gives thedata structure filtering capability a very fine filtering control. Normally, filtering criteria suchas DpndStatus="Active" is specified on the WHERE clause. But when data modeling is beingperformed by the outer join, data filtering criteria can be specified on the ON clause along withthe join criteria. When this is done, the ON clause not only specifies how its upper and lowerstructures are linked, but also the data filtering criteria. This filtering affects only the lowerlevel structure being joined; the upper (main) structure is not affected. In this way, its operationis following the semantics of the data structure. The big difference in ON clause filtering fromWHERE clause filtering is that WHERE clause data filtering removes entire rows while ONclause filtering operates only on specific portions of rows. This can be seen in Figure7.1.break

Page 82

Figure 7.1ON clause versus WHERE clause data filtering.

The purpose of Figure 7.1 is to demonstrate the difference between ON clause and WHEREclause data filtering. It does this by first showing the Employee structure and its data, which islisted to its right. Underneath this are two outer join SELECT statements that both model theEmployee structure above it. The first outer join statement uses WHERE clause filtering andthe second outer join statement to its right uses ON clause filtering. Both filtering examplesremove inactive dependents who are not currently covered under the company's medicalbenefits. The WHERE clause filtering removes entire rows since it is performed logically afterthe complete row is assembled. The ON clause neatly filters specific paths in the data

structure, preserving all other unrelated data. In this example, Mary's dependent son Sam iscurrently inactive for medical coverage and he is filtered out, while the rest of the unrelateddata for this row is preserved. This is not true for WHERE clause data filtering, which alsocauses Mary's entire row to be removed.

The unrelated data not affected by the ON clause filtering in Figure 7.1 is employee data,which is above the dependent data, and the department data, which is in an unrelated leg of thedata structure. If the Dependent table had other tables under it, then these tables could beaffected by the ON clause filtering, as you would expect. This follows the semantics of theEmployee data structure, making it useful for specifying business rules.

The ON clause rules for building hierarchical structures that were defined in Chapter 6 muststill be observed when supplying ON clause data-filteringcontinue

Page 83

criteria. Basically, this means that any tables referenced by the ON clause filtering criteriamust be limited to the root of the lower level structure or any tables from the link point up thepath to the root. In this way, the data filtering criteria cannot inadvertently affect the link pointsthat would change the structure being modeled and its semantics.

7.2—Indirect Structure Linking

In some cases, it may be desirable to link a table or substructure under a table in the upperstructure that can't be directly linked to. This can be accomplished using an indirect link—forexample, linking Dependent to Department, which is linked under Employee. In this case,Dependent is linked to Employee, but indirectly through Department, which means thedepartment for an employee must exist for the dependents of that employee to exist. As shownin Figure 7.2, this is done using an existence test for Department since Dependent is notdirectly related to Department.

7.3—Nonhierarchical Join Type Support

Hierarchical structures are very useful. Their single-minded semantics allow powerfulassumptions to be made, like those utilized in fourth generation languages. But there are timeswhen nonhierarchical join operations like the inner and FULL joins are necessary, and wouldbe useful if they could be incorporated into the modeled hierarchical data structure—forexample, take twocontinue

Figure 7.2Indirect linking of Dependent under Department.

Page 84

separate Employee tables that would be useful if FULL outer or inner joined and placed into ahierarchical structure as a single logical table.

Logical tables can be created as temporary tables in a previous step and introduced into thestructure. Unfortunately, these temporary tables cannot take advantage of the semanticcapabilities of hierarchical structures. For example, the optimizations covered in Chapter 11would not be able to optimize the joins performed in a previous step. But performing inlinenonhierarchical joins while building a hierarchical structure can invalidate the structure,turning it into a nonhierarchical structure with unstable application semantics, as described inChapter 5. Such a nonhierarchical structure is defined in Figure 7.3 from a combination ofLEFT and FULL joins.

In Figure 7.3, EmpY becomes a second entry point in the data structure, invalidating thehierarchical data structure. If an inner join was used instead of the FULL outer join, it couldalso cause the removal of the Dept segment, which would be logically above it.

There turns out to be a solution to the problem of incorporating nonhierarchical, symmetric jointypes into the hierarchical model being built. The solution again rests with right-sided nesting,which was discussed in Chapter 6, to support stored and embedded structured views. Whenleft-sided nesting is intermixed with right nesting, we also determined in Chapter 6 thatmultiple separate structures were temporarily formed. When a new structure was created, thecurrent one being built was put on hold and sheltered from the effects of joins to the activestructure. This technique can be used to perform nonhierarchical joins without invalidating thehierarchical structure(s) being built. This is demonstrated in Figure 7.4.

The FULL outer join operation performed in Figure 7.4 is sheltered from invalidating currentlyexisting hierarchical structures because of the strategic use of right-sided nesting. The FULLjoin operation that is highlighted in Figure 7.4 is performed in isolation. In this example, theFULL outer joincontinue

Figure 7.3Invalid hierarchical data structure example.

Page 85

Figure 7.4Hierarchical hybrid structure with logical

nonhierarchical table.

could also have been an INNER or UNION join. These operations are symmetrical inoperation, making their data modeling ability neutral in nature—both sides carry equaldata-preserving ability. This means these operations form a single, flat logical object, likeEmpX|EmpY in the diagram in Figure 7.4. This is why this object can be viewed as a singlelogical table. These logical tables can be composed of more than two tables by using left-sidednesting when building the logical table. And finally, more than one logical table can beincorporated into a hierarchical structure. These concepts are demonstrated in Figure 7.5.

When creating logical tables with the INNER or FULL join operation, it is usually desirable tohave one fixed key location per logical table. This can be easily performed using theNATURAL or USING option, which was described in Chapter 4. This is demonstrated inFigure 7.6. The parentheses are used for readability in this example—they do not affect the joinorder.

As described in Chapter 4, the NATURAL option used with any type join operation will notallow the modeling of hierarchical data structures. But used with right-sided nesting, as shownin Figure 7.6, its nonhierarchical operation used with symmetric joins is also sheltered fromthe hierarchical structure being built.

It is also possible to use a logical table as the root of a structure. This is shown in Figure 7.7.In this example, the root logical table is not being protected by right-sided nesting because it is

specified on the left side. Right-sided nesting is not necessary in this case because the rootlogical table is defined first in the SQL statement, so no sheltering is necessary since there isno other structure in existence or active to be affected. The SQL example in Figure 7.7alsocontinue

Page 86

Figure 7.5Complex hybrid hierarchical structure with multiple logical

tables.

Figure 7.6NATURAL logical table example.

demonstrates by its complex use of AND and OR operators that logical tables follow the samelinking rules and capabilities as standard tables.

The example in Figure 7.7 may raise some concerns that logical tables or substructures ingeneral, when specified on the left, may be subject to interference from or cause interference toother structures—they may come into contact with them on their left side. If true, this wouldmake their use unpredictable or unstable, reducing their usefulness. This, however, is definitelynotcontinue

Figure 7.7Logical table as root of data structure.

Page 87

the case. While left-sided nonhierarchical structures may appear as a possible future danger,they will not affect other structures or tables even when these other structures are introducedfrom the left. This is because the structures added to the left naturally use right-sided nesting.For example, table X LEFT joined to A INNER JOIN B ON A=B LEFT JOIN C ON B=Cproduces X LEFT JOIN A INNER JOIN B ON A=B LEFT JOIN C ON B=C ON X=A, causingtable X to remain preserved and uninfluenced from the destructive inner join operation on itsright side. This natural syntax enables the free, safe, and seamless use of substructures (whichincludes logical tables) under all current and future syntactical situations that they may be usedin.

While intermixing nonhierarchical symmetric joins (FULL, INNER, and UNION) is notassociative in operation, logical tables can intermix these different join types. The result is stilla flat structure, but it does carry with it more meaningful semantics than a flat structure derivedusing a uniform symmetric join type. An example is shown in Figure 7.8.

It's very useful to realize that these logical tables can be easily produced by isolating thelogical table in a stored SQL view because the expansion processing of it automatically createsright-sided nesting. We have previously seen this in Chapter 6, with a view expansion of astructured view that is combined or embedded within another SQL structure definition. Figure7.9 demonstrates an example of a view comprising a logical table being expanded. As in anyother stored view, there are many additional advantages to placing logical tables in storedviews, such as reuse and data abstraction.

7.4—Nonhierarchical Joining of Data Structures

Multitable data structures, just like the single tables described in Section 7.3, can also bejoined nonhierarchically using symmetric joins, such as the FULL outer join and the inner join,to form a valid hierarchical data structure. All of the documentation for joining single tablesdescribed in Section 7.3 also applies to joining data structures, including one additionalrequirement. This requirement is that only the root tables of the data structures can be joinedtogether,continue

Figure 7.8Intermixing symmetric join types in logical tables.

Page 88

Figure 7.9Embedded logical table in view expansion.

which is accomplished by only referencing columns from the root tables for the join criteria.This is demonstrated in Figure 7.10.

Figure 7.10 demonstrates two structures being FULL outer joined. As can be seen in theseexamples, structures naturally form the proper protected environment needed fornonhierarchical joins as described in Section 7.3. These can be expanded views of datastructures or structures built in place, which is equivalent to the expanded structure views asshown in Figure 7.10. Also shown in Figure 7.10 is the expanded SQL rewritten to be moreefficiently executed by avoiding throwaway tuples. This is accomplished by performing theFULL outer join first, as shown.

While the nonhierarchical example in Figure 7.10 uses a FULL outer join to link the datastructures, it could have also been an inner join. While these symmetric operations bothproduce the same valid hierarchical structure, the semantics as far as the resulting data contentare different, as you would expect. The inner join removes both structures being linked if bothdo not exist, while the FULL outer join will preserve data structures even if they have nomatching data structure.

Linking symmetrically at the root level causes no invalidating of the hierarchical data structure.Applying nonhierarchical linking at structure levels lower than their root producesnonhierarchical data structures. Inner joins can cause data loss further up the data structure,which invalidates the datacontinue

Page 89

Figure 7.10Symmetric joining of data structures.

structure, and a FULL outer join can cause only the lower structure to be preserved, which alsoforms an invalid structure. These situations are both avoided by joining the data substructuresonly at their roots. This is also the most natural and common way to join two data structuressymmetrically (nonhierarchically).

Single tables can also be nonhierarchically joined to data structures. Since a single table isactually a data structure consisting of one table with its only table as the root table, it can bejoined nonhierarchically to a multitable structure following the same requirements stated abovefor joining data structures nonhierarchically.

The capability to perform symmetric joins when modeling hierarchical data structures is quiteuseful and an important feature for hierarchical data modeling. Figure 7.11 demonstrates theusefulness of symmetric joins in modeling hierarchical data structures. The first data structurein Figure 7.11 does not use a symmetric join in modeling a structure with two Employee tables.It uses the Department table to join the two Employee tables. This introduces a number ofproblems, such as two separate Employee tables to access with (possibly) different employeesin each. There is also another side effect of having the Employee tables joined by their commondepartment, causing an unnecessary data explosion with rows that contain employee data fromdifferent employees.

The second data structure and its defining SQL in Figure 7.11 solve the problems introducedfrom the first data structure that were noted above. Thecontinue

Page 90

Figure 7.11Symmetric join synchronizes legs of hierarchical structure.

Employee tables are naturally FULL outer joined, preserving all data from both tables andcreating one unique key for each row result produced. And this logical table result is placed inthe data structure hierarchically in the correct position without invalidating the data structure.This correctly matches up the Employee tables without exploding the data or generatingextraneous, incorrectly matched employee rows while still correctly organizing the employeesunder their department. This also allows the joining of the Dependent and Project tables to thestructure by a match from either of the Employee tables, producing a more consistent andaccurate structure.

7.5—Many-to-Many Data Modeling and Intersecting Data

Many-to-many data relationships such as the well known Parts-Suppliers database can behierarchically modeled as either a Parts-over-Suppliers or Suppliers-over-Parts relationship.These many-to-many relationships require an association table to create hierarchicalone-to-many relationships in both directions. These many-to-many relationships were firstdescribed in Chapter 5.

The outer join hierarchical modeling of many-to-many relationships is shown in Figure 7.12.As shown in the structure diagrams in this figure, thecontinue

Page 91

association table (PSX), used in the SQL specification will appear transparent, as it should.This is also the case if intersecting data from the association table, such as prices of parts fromeach supplier, is selected, which will logically appear as data from the lower level table. Anexample of intersecting data use can be found in Chapter 12.

7.6—Conclusion

From the information supplied in this chapter and the preceding chapter, it should be clear thatthe ANSI SQL join operation with its flexible syntax and powerful outer join operation can beused or programmed to accomplish tasks requiring complex semantics. The outer join can beused to model both hierarchical and nonhierarchical data structures. Hierarchical datastructures are advantageous because they have singular meaning, which makes their semanticsunambiguous and for this reason better suited for application use. Nonhierarchical structures,such as network structures, are not generally recommended for application view use, but maystill be useful in applications with very specific requirements as long as the SQL programmeris aware of their unstable or ambiguous semantics.

There has been sufficient information supplied in these last two chapters to enable the designand construction of a hierarchical, network, or hybrid data structure using the ANSI SQL joinoperation. The LEFT and RIGHT outercontinue

Figure 7.12Outer join modeling of a many-to-many relationship.

Page 92

joins are hierarchical operations and are used to model a hierarchical data structure. TheINNER and FULL joins are symmetric joins that do not model hierarchical data structures, andcan in fact invalidate hierarchical structures. It was shown how these symmetric operations canbe used to form logical tables that can be safely and seamlessly introduced into a hierarchicalstructure being modeled without invalidating it by using right-sided nesting. Similarly it wasshown how to symmetrically link data structures so they maintain a valid hierarchical datastructure.

Besides modeling data structures, the ANSI SQL join syntax also seamlessly supports a finelevel of data filtering that precisely filters data, following the defined hierarchical datastructure. To help with the coding of ANSI SQL data modeling joins and features like the fine

data filtering capability, Chapter 8 describes a procedure that can help automate this process.

It was also shown how many-to-many relationships can be seamlessly modeled. Using all thecapabilities documented in this and the previous chapter, any hierarchical data structure can bemodeled.break

Page 93

8—More about Outer Join Data Modeling

This chapter examines the significance of the ANSI SQL outer join's data modeling andstructure-processing ability to SQL, which did not previously support this capability. It alsoexamines how these outer join data modeling statements can be generated, and their efficiency.This chapter also presents empirical proof that the outer join does enable and support datamodeling and structure processing as presented in this book.

8.1—Importance of SQL's Inherent Data Structure Processing Ability

The ANSI SQL outer join's natural data modeling and structure processing capabilityestablishes SQL's ability to inherently perform complex data structure processing. Thisprocessing is not arbitrarily defined, but is a direct result of the ANSI standard outer join'sinherent data modeling syntax and semantics. This data modeling and structure processingcapability, and the fact that it is an ANSI standard, establishes the ANSI SQL outer join as astandardized SQL method for performing data modeling and structure processing. It isimportant for SQL vendors and designers to realize that any data modeling features added totheir SQL or the SQL:1999 standard will not work if they conflict with SQL's inherent supportof data modeling through the outer join. This natural and open data modeling capability alsoestablishes a seamless and compatible integration path from SQL databases to non-SQLdatabases, and vice versa. This is also aided by the fact that the outer join operation is nothindered bycontinue

Page 94

having to follow the old inner join's Cartesian product model of operation as described inChapter 2.

8.2—Efficient Client/Server Data Structure Processing

In a client/server environment, SQL data structure processing is usually performed on the clientplatform. This causes a lot of unnecessary data to be transmitted from the database on theserver to the client. It often requires procedural code on the client, with multiple SQL calls tothe database on the server. But with the outer join operation inherently performing the datastructure processing, it is performed entirely on the server where the database resides,

increasing efficiency and decreasing network traffic.

8.3—Coding Data Modeling Outer Join Statements

Data structure processing outer join statements can be coded by walking down the datastructure from top to bottom and left to right starting with SELECT∗ FROM Root-Table-Name.As each table or logical table (see Chapter 7) is reached, add LEFT JOIN Table-Name ONJoin-Cond. This is visually demonstrated in Figure 8.1. The ON join condition links the lowerlevel table to the join point in the upper structure. The exact join rules were specified inChapter 6. Logical tables, if any are specified in the data structure, are expanded after the datastructure has been walked through. This is demonstrated in Figure 8.2.break

Figure 8.1Coding data modeling outer joins from structure diagrams.

Page 95

Figure 8.2Coding outer join statements that use logical tables.

8.4—Generation of Data Modeling Outer Join Statements

Outer join statements can be automatically generated easily from data structure metainformation sources such as ER (entity relationship) diagrams or users directly (see Chapter

14). Just as in Section 8.3, the outer join statement should be generated following the structuretop to bottom, left to right. If the data structure meta information does not already have themetadata in this order (which is highly unlikely), it should be set to this order first. This willassure that the outer join statements are generated in the most efficient manner, which isdiscussed in Chapter 11. Right-sided nesting can be used to define logical tables that do notconform to strict hierarchical definition. This allows these nonhierarchical definitions to bedefined without invalidating the hierarchical structure being built, as shown in Figure 8.2.

8.5—Hierarchical Data Structure Processing Empirical Proof

By using the interrelationships in the Department-Employee database, it can be shown that thesemantics of the ANSI SQL outer join operation can exactly parallel the semantics ofhierarchical data models. This enables it to perform complex data modeling and data structureprocessing. The Department and Employee data views in Figure 8.3, and their data tables, aretaken from the Department-Employee database comprised of the Department, Employee, andDependent tables. This database will be used to prove that the outer join can inherentlyperform data modeling and structure processing.break

Page 96

Figure 8.3Department and Employee outer join SQL views.

8.5.1—Hierarchical Control

The following progression of outer join examples follows the outer join's operation asdescribed above. The first two examples demonstrate a simple hierarchical modeling operationand show that it works for one-to-many as well as many-to-one relationships.

The outer join specification Department LEFT JOIN Employee ON DeptNo= EmpDeptNocreates the one-to-many hierarchical relationship of Department over Employee because:

• Department can exist if no matching Employee(s) present.

• Employee(s) cannot exist if no matching Department found.

• One-to-many relationship supported:

· One Department can match many Employee(s).

· One missing Department can cause many missingEmployees.

The outer join specification Employee LEFT JOIN Department ON DeptNo= EmpDeptNocreates the many-to-one hierarchical relationship of Employee over Department because:break

Page 97

• Employee(s) can exist if they have no matching Department.

• Department cannot exist if no matching Employee(s) exists.

• Many-to-one relationship supported:

· Many Employee(s) can match the same Department.

· Each missing Employee causes one Department occur-rence to be missing.

8.5.2—Structure Control

The next two examples demonstrate structure control for modeling the Department andEmployee views defined earlier, and when processed they will follow the same semantics.Notice the multiple ON clauses in each outer join specification; they specify how the structureis linked.

The outer join specification Department LEFT JOIN Employee ON DeptNo= EmpDeptNoLEFT JOIN Dependent ON EmpNo=DpndEmpNo creates the Department view.

• Department is linked directly over Employee (via itsON clause).

• Employee is (then) linked directly over Dependent (viaits ON clause).

Proof:

• Dependent can exist only if a matching Department andEmployee exist.

• Employee and Dependent exist only if a matchingDepartment exists.

The outer join specification Employee LEFT JOIN Department ON DeptNo= EmpDeptNoLEFT JOIN Dependent ON EmpNo=DpndEmpNo creates the Employee view.break

• Employee is linked directly over Department (viaON clause).

• Employee is (also) linked directly over Dependent(via ON clause).

Page 98

Proof:

• Department and Dependent can only exist with amatching Employee.

• Department and Dependent are not dependent on oneanother:

· Department can exist without a Dependent.

· Dependent can exist without a Department.

Notice in the outer join proof directly above that the Dependent table was joined after theDepartment table was joined, but that in this case these two tables are on different paths andcannot influence each other. This is because the Dependent table was joined to the Employeetable and not the Department table; therefore, it doesn't rely on the Department table's existenceeven though it was joined in a later join operation.

While the example data structures used in this section do not show many-to-many relationshipsdirectly, many-to-many relationships (see Chapter 5) are composed of many-to-one andone-to-many relationships, which were described in this section. It is therefore not necessary to

show examples of many-to-many relationships.

8.6—Nonhierarchical Data Structure Processing Empirical Proof

Nonhierarchical join operations such as FULL, INNER, and UNION joins do not modelhierarchical data structures, which means they can invalidate hierarchical structures they areused in. A solution is to isolate and shelter their use using right-sided nesting as described inChapter 7, which treats their use as logical tables. These logical tables are comprised ofsymmetric joins that make their structure flat, which is also necessary to preserve the validityof the hierarchical structure. An example is T1 LEFT JOIN TX UNION JOIN TY ON T1 = TXLEFT JOIN T2 ON TY= T2.break

• Table T1 and its LEFT join are put on hold, waiting until thematching ON clause is ready for processing. During thistime, T1's working set cannot be modified.

• While waiting for table T1 and its LEFT join's matching ONclause, tables TX and TY are UNIONed in isolation. Sincethe UNION operation is symmetric, the resulting structure isneutral and not hierarchical, making it a valid logical table.

Page 99

• When table T1's matching LEFT join ON clause is reached,T1 is LEFT joined to the logical table, which is a result ofthe UNION that was processed in the interim. This placesT1 hierarchically over the UNIONed result.

• Finally, the above structure is LEFT joined over table T2,linking table T2 to the TX | TY logical table.

Proof:

• Table T1 can exist if logical table TX | TY or table T2 doesnot exist.

• The logical table cannot exist if no T1 occurrence matches it.

• T2 cannot exist if no logical table occurrence matches it.

It is worth repeating here that logical tables do not have to be specified inline as shown above,they can be specified as views, which are easier to specify and more flexible for reuse. Forexample, the logical table view used above can be defined as the view TX UNION JOIN TY,which can be easily embedded when needed, as in T1 LEFT JOIN LogicalTableView ON T1 =TX LEFT JOIN T2 ON TY = T2, which expands to be identical to the logical table in the proofabove. This means that this and other embedded logical views are also proven by the above

proof, as are symmetric substructure joins, which also utilize logical tables to perform theirnonhierarchical join operation.

8.7—Embedded Structured View Support Empirical Proof

As explained in Chapter 7, structured views can be seamlessly embedded to form largerstructures. It was also shown that logical tables could also be seamlessly embedded. It wasstated that structured and logical table views within views are also inherently supported. Let'slook at some examples and see why they work. The first example in Figure 8.4 examinesembedded left-sided nesting, which occurs with views specified on the left side of the joinoperation—later examples examine right-sided views.

The first example in Figure 8.4 demonstrates the basic left-sided view source replacement(view expansion) that produces left-sided nesting. As this demonstrates, left-sided nesting isnaturally processed left to right without requiring any special internal operations such as tableargument stacking for LIFO processing. The second example demonstrates how this naturalleft-to-hard

Page 100

Figure 8.4Example of nested left-sided view expansion.

right processing handles nested left-sided views, processing them in LIFO fashion (the lastnested view source replacement is the first to be processed). This preserves the data modelingsemantics of each view—allowing logical table views to be specified on the left side wherethey can't affect the data structure.

Let's now examine some examples of right-sided view source replacement and see how andwhy it works. Right-sided nesting occurs when views are expanded on the right side of the joinoperation. The first example in Figure 8.5 demonstrates the basic right-sided viewreplacement, which produces right-sided nesting. As this example demonstrates, right-sided

nesting is not processed left to right, but requires postfix processing and argument stacking,changing the processing order to right to left. This stacking processing will be discussed infurther detail in Chapter 9, Section 9.4. The second example demonstrates how this right-sidedprocessing is handled in nested right-sided views. The stacking creates a protectedenvironment that preserves the data modeling semantics of each view, allowing logical tableviews to also be specified on the right side.

Notice in the second (nested view) examples in Figures 8.4 and 8.5 that the innermost nestedviews of both are processed first. In Figure 8.4, left-sided views expand their view source tothe left as the nested views are expandedcontinue

Page 101

Figure 8.5Example of nested right-sided view expansion.

when encountered in the nesting processing. This causes them to be executed naturally in LIFOorder, as can be plainly seen in the second example in Figure 8.4. In the second example inFigure 8.5, the right-sided expanded views were also executed in reverse (LIFO) order, notbecause of their placement as in Figure 8.4, but because of right-sided nesting. Right-sidednesting controls execution order by placement of the ON clause, as was first described inChapter 2 and later in Chapter 7.

8.8—Indirect Link Empirical Proof

The next example demonstrates structure control for modeling an indirect link (described inChapter 6). When processed, it will follow the semantics shown in the data model displaybelow. Notice the existence test used to accomplish the indirect link.

The outer join specification Employee LEFT JOIN Department ON DeptNo= EmpDeptNo

LEFT JOIN Dependent ON EmpNo=DpndEmpNo AND DeptNo NOT NULL creates thisspecial Employee view.break

Page 102

• Employee is linked directly over Department (via its ONclause).

• Dependent is (then) linked indirectly under Department(via its ON clause existence test).

• Department can exist only if a matching Employee exists.

• Dependent can exist only if a matching Employee existsand a Department exists for the matching Employee.

8.9—SQL:1999 and Data Modeling

SQL:1999 is the next planned ANSI release of SQL, which is planned for ratification in 1999.This new release is known as the object/relational version of SQL. Adding an object-orientedflavor, it has introduced the support of abstract data types (ADTs), which are supported by theaddition of user-defined types (UDTs) and user-defined functions (UDFs). These constructsallow abstract data types to be defined, stored, and processed in SQL. UDFs are externallydefined functions that can be invoked by SQL to process UDTs. Private and commercial objectlibraries can be created to handle and process objects such as multimedia video or medicalADTs that define MRI and X-ray objects, allowing SQL to store and process these powerfuland useful new data types.

UDTs can also represent complex data types such as hierarchical structures, and UDFs canprocess these complex data types. The creation of these complex and abstract data types isperformed external to SQL. This method of complex data structure processing does offer analternative to data modeling and structure processing using the ANSI join operation. UDTs andUDFs are useful for representing and processing less formal, more abstract data types. Thesetend to be by their nature more specialized static objects. On the other hand, processing datastructures using SQL join operations is useful for defining and processing general-purposehierarchical data structures that can be specified and built in real time if necessary and frommany data sources. And since the ANSI SQL join is ANSI standard, the data modeling enabledby it will be available across SQL systems, which is not necessarily true of the UDF structureprocessing procedures that are not standardized.break

Page 103

SQL:1999 has also introduced the capability to store nested, hierarchically structured data in arow using the new composite types ROW and ARRAY. Because these structures are stored in asingle row, the semantics of the data structure cannot be fully utilized by SQL in anonprocedural way. There are also other limitations of this hierarchical data storage. The

structure is fixed, losing its data independence, and substructures cannot be joined to formlarger structures.

SQL:1999 is not the only object query language being designed and put forth as a standard.OQL is a database object query language that supports the ODMG model. ODMG is an objectmodel put forth by the Object Database Group to supply a standard for object databases. It is aseparate standard from SQL:1999, though it does utilize many aspects of SQL. In fact, OQL isbased heavily on ANSI SQL. OQL does not support the ANSI SQL outer join facility thatsupports data modeling, but relies on ODMG's Object Definition Language (ODL), whichincludes a schema definition capability.

SQL:1999 and ODMG appear as competitive object standardization efforts. SQL:1999 startsfrom SQL and moves towards object, while ODMG starts from an object point of view andmoves towards SQL and other database platforms. In this regard, ODMG can be thought of as astandard for supporting the heterogeneous processing of multiple platforms. This should enableODL's language-independent data modeling capability and SQL:1999's data modelingcapabilities to be freely mapped to one another.

8.10—What Makes the ANSI Outer Join Unique for Data Modeling

Besides being standardized, the newer outer join operation has two operational characteristicsthat make it very different from the older nonstandardized outer join. The first characteristic isfound in the outer join's flexible syntax that allows it to specify the table join order, and thesecond is its ability to specify the join criteria at each join point. These capabilities wereadded because it was found that the table join order can influence the result of outer joinoperations. This makes the newer standardized outer join more powerful, with the capability tospecify data structures with the most complex semantics.

With the flexibility to specify the table join order, the use of nonhierarchical, symmetric joinoperations such as the FULL, INNER, or UNION can be utilized in the construction ofhierarchical data structures to form flat virtual tables. The use of nonhierarchical join types isdescribed in Chapter 7. The ability to specify the join criteria at each join point can becomenecessary when qualifying joins based on values further up the path from the join point.Thiscontinue

Page 104

could lead to a conflicting join clause if placed on a single WHERE clause, as demonstrated inFigure 8.6. This means that these two fairly new capabilities provide basic capabilities in SQLthat significantly affect SQL's standard operation, which allows the definition of data structureswith extremely complex semantics not possible otherwise.

8.11—Data Modeling with Old-Style Outer Joins

It is worth noting that ANSI outer joins that model hierarchical data structures that do notrequire the features unique to the ANSI SQL outer join, as described above in Section 8.10,can be converted to old-style outer joins. This is shown in Figure 8.7, where the Departmentand Employee hierarchical views have been converted to the old-style joins. The plus sign is

used in the WHERE clause to specify the table to be preserved.

The data modeling using old-style outer joins in Figure 8.6 is possible because hierarchicalstructures can be built in any order, top to bottom, bottom to top, or any combination of thesetwo, as demonstrated in Chapter 3. Because of this, the old-style outer joins, which are notcapable of specifying the join table order, are capable of modeling simple hierarchicalstructures. These are one-sided outer joins that do not include symmetric join operations, andwhose WHERE clause join conditions must unambiguously define the hierarchical linksbetween link point tables (see Figure 8.6 for an ambiguous WHERE clause example). This datamodeling SQL join statement is also notcontinue

Figure 8.6WHERE clause cannot replace all ON clause uses.

Page 105

Figure 8.7Old-style outer joins can perform limited data modeling.

as obvious as the equivalent ANSI SQL join statement. These old-style outer joins can beeasily translated into ANSI SQL joins.

8.12—The New Role of the Inner Join Operation

Originally, the inner join operation was used in every join condition—there was no otherchoice available. A semantically neutral structure was always produced, whether this wasdesired or not. With the addition of one-sided outer join operations (LEFT and RIGHT), whichspecify hierarchical relationships, inner joins take on a new meaning and use. They no longer

should be used without regard to data relationships or data structures. With one-sided joinsspecifying hierarchical relationships, inner joins should only be used to specify relationshipsthat are truly meant to represent equal or balanced relationships. This will producesemantically structured results that accurately reflect the semantics of the data being accessed,which produces more accurate results. So, inner joins have been elevated from not being ableto definitively specify a relationship to being able to unambiguously specify an equal orbalanced relationship.

8.13—Conclusion

This chapter has presented empirical proof that outer join statements can perform datamodeling and structure processing, and demonstrated that views containing structures andlogical tables can be used seamlessly in building and modeling complex data structures. Itpointed out that because this data modeling capability is possible with standard SQLstatements, it can be used safely, can maintain its usefulness when SQL:1999 arrives, and canalso become acontinue

Page 106

default standard for database data modeling. It was shown how data modeling outer joins canbe generated by constructing them while following the hierarchical data structure, and that itwas possible to use older nonstandard-style outer joins to model simple data structures.Finally, this chapter discussed the importance of SQL's inherent data structure processingability, and how the inner join's role and proper use has changed with the addition of the outerjoin.break

Page 107

PART III—NEW CAPABILITIES BASED ON OUTER JOIN DATAMODELING

Part III describes advanced SQL capabilities made possible by the ANSI SQL outer join datamodeling capability that SQL vendors can offer to users. Chapter 9 introduces the datastructure extraction (DSE) technology used to extract the data structure information naturallyembedded in ANSI SQL outer join statements. Chapter 10 identifies a number of advancedcapabilities made possible by the data modeling capability of the ANSI SQL outer join.Chapter 11 describes the many powerful semantic SQL optimizations that are possible basedon the data modeling information available from outer joins. Chapter 12 demonstrates ahierarchical relational processor prototype that operates by utilizing the data structureinformation from outer join statements. Chapter 13 presents an object relational interface that isbased on the data structure information from outer join specifications. Chapter 14 looks atnonrelational SQL-based universal data access frameworks and how outer join processingnaturally fits in by using a structured data record interface as an example.break

Page 109

9—Data Structure Extraction (DSE) Technology

CompuAid, a company affiliated with the author, has been researching the ANSI SQL joinoperation for a number of years. It realized that the outer join operation, which is part of theSQL standard, along with the ANSI SQL powerful syntax, combine to produce powerful datamodeling and data structure processing capabilities. Since SQL previously had no inherent datamodeling and data structure processing capabilities, CompuAid also realized this would be ofsignificant benefit to users and vendors if recognized, understood, and properly utilized. To aidin this effort, CompuAid developed the technology described in this chapter.

9.1—Extracting Data Structure Information from the Outer Join

After researching and documenting the ANSI SQL join and its data modeling and data structureprocessing capabilities, CompuAid developed and patented a data structure extraction (DSE)technology and software. This technology dynamically recovers the data modeling metadataembedded in outer join specifications. This technology makes it possible for SQL vendors toutilize the powerful ANSI SQL join syntax and semantics to support advanced new capabilitiesnot previously possible. The following chapters demonstrate examples of the technologydescribed in this chapter. The hierarchical relational processor example in Chapter 12 is takenfrom its actual implementation.

A very valuable characteristic of this DSE technology is that it recovers very useful semanticinformation that is naturally present in standard ANSI SQL join specifications. Using this freelyavailable information, advancedcontinue

Page 110

capabilities are possible. These include advanced semantic optimizations, intuitive multitableupdates, truly transparent and seamless access to legacy and nonrelational databases, increasedflexibility and accuracy in reporting capabilities, and important object-oriented databasecapabilities such as database navigation and data inheritance. These capabilities are discussedfurther in Chapter 10, and can result in competitive advantages that are ANSI SQL compatible,consistent with relational technology, and require little or no additional effort on the part of theuser.

9.2—DSE Example

The example in Figure 9.1 demonstrates the DSE software processing a complex ANSI SQLouter join statement. It accepts the SQL statement, producing the extracted data structure metainformation in table form. The data structure diagram in this example is not produced by the

DSE algorithm, but is supplied to help you visualize the data structure. The processed SQLstatement in this example is a complex ANSI SQL join specification that contains acombination of left-and right-sided nesting to demonstrate that this complex syntax can behandled properly by the DSE technology.

Shown in Figure 9.1, the DSE technology extracts and presents in table form the data structuremeta information that is naturally embedded in ANSI SQL join specifications. The ANSI SQLjoin is incredibly rich in syntax and processing options, allowing the user the flexibility tocombine tables of data in any way necessary to produce the desired semantic result. Thisresults incontinue

Figure 9.1SQL DSE example.

Page 111

complex data structures being modeled even though the ANSI SQL join programmer may notrealize that he or she is performing data modeling.

The DSE technology dynamically determines the data structure by analyzing and interpretinghow the outer join statement has been specified, taking into account the table relationships usedand general hierarchical data structure concepts and principles that were discussed in Chapters5 and 6. This data structure extraction is accomplished with no additional or supplementalinformation supplied by the programmer or SQL system other than what is normally available.This makes capabilities supported by the DSE technology seamless and transparent. The DSEtechnology also detects invalid structures (see Chapter 6), and can operate dynamically for usewith ad hoc (i.e., interactive) and object-oriented uses (i.e., late binding).

9.3—Logical Table Example

To support logical tables, the DSE prototype is extended to represent a logical table in the datastructure by modifying its data structure meta information output table while keeping itcompatible with the standard format. To define a logical table in the DSE prototype's output,the structure level indication of the first table in the logical table is set as usual to itshierarchical Structure Level in the data structure. The other tables in the logical table have theirStructure Levels set to zero. This indicates and delimits a logical table entry.

The Parent No. of the first table to be joined in a logical table points to the logical table'sparent in the hierarchical structure being defined. The Parent No. for the other tables in thelogical table specifies the table in the logical table that directly precedes their joining. Thisindicates the logical table's join table order, which may be important for nonhierarchicallogical tables. As shown in Figure 9.2, the tables in a logical table are stored contiguously andin the order they are joined. With this method of specifying logical tables, more than onelogical table can be represented in a data structure.

9.4—Symmetric Linking of Data Structures Example

Similar to the way logical tables can be formed by symmetric join operations as shown inSection 9.3, data substructures can also be joined symmetrically, as documented in Chapter 7.This is demonstrated in Figure 9.3. In this example, the substructures are built inline, but theycould have been expanded in the same fashion as if they were referenced stored structureviews. Because substructures that are symmetrically joined can only be linked at their roottable,continue

Page 112

Figure 9.2Logical table DSE example.

Figure 9.3Symmetric data structure linking DSE example.

the example in Figure 9.3 covers the only situation possible for this type of linking. Notice howthe generated hierarchical structure meta information remains top-down, indicating that linkingof the root substructure tables X and M can be performed before their associated substructuresare built. So, this symmetric data structure join is represented in the structure meta informationthe same way that the logical table was in Section 9.3.break

Page 113

9.5—DSE Internal Logic

As should be apparent by now, the ANSI SQL outer join has the syntax and semanticsnecessary to define and process complex data structures. This includes ON clauses, whichspecify the join condition at each join point. To extract the data structure meta information fromthe complex syntax and semantics used to define data structures requires parsing the joinstatement and mapping the data structure as the statement is processed. The LEFT and RIGHTjoins specify the hierarchy between the two table arguments, and the ON clauses specify thelink point between the two table arguments. With LEFT joins, the left table argument has theupper position, and with RIGHT joins, the right table argument has the upper position.

As mentioned many times already, right-sided nesting triggered by delaying ON clausesrequires stacking the join table arguments and join type. When an ON clause is encounteredwhile parsing the join statement, its matching right and left table arguments on top of the stackare linked using the ON clause criteria as defined in Chapter 6. At times during the parsingprocess, multiple separate structures can be defined because of right-sided nesting, whichstarts a new substructure and working set to contain it, as described in Chapter 7. But at thecompletion of parsing the join statement, all the separate structures will have been combined sothat only one structure will have been mapped. This mapped structure is then represented intable form, as shown in Figures 9.1 to 9.3.

When a symmetric join operation such as a FULL, INNER, or CROSS join is detected, the

existence of logical tables and symmetrically joined data structures is checked. If found, theyare processed as described in Sections 9.3 and 9.4 to produce a valid hierarchical datastructure. All tables joined in a logical table are given the same hierarchical level number,which identifies a flat logical table. Symmetrically joined substructures are reordered so theirrootlevel symmetric join is performed first, making it a logical table and defined as just statedabove. With this logical table in place, symmetrically joined structures do not require any otherspecial definition in the produced data structure meta information.

9.6—Why Vendors Need the DSE Technology

Adding new features and capabilities to SQL products to differentiate them from other similarproducts on the market is a necessity for SQL product vendors, but presents the problem ofintroducing nonstandardized, proprietarycontinue

Page 114

SQL. The DSE technology is a building block technology that allows the easy addition ofpowerful new ANSI SQL-compatible features and capabilities that eliminate or greatly reducethis non-SQL standardization problem. It can also significantly help with the problem of poorefficiency with the ANSI SQL outer join operation, and in many cases can bring its efficiencyup to that of the older standard inner join. Outer join specifications with questionable (i.e.,ambiguous) data structure semantics are also detected. Lastly, with this data structure metainformation freely available, it makes good business sense to put it to use.

9.7—DSE Avoids Imposing Data Structures on SQL

The concept and technique of using SQL for universal data access is quite well accepted andutilized. This includes using SQL to access pre- and postrelarional data. Flat nonrelationalstructures do not present a problem, but structured nonrelational structures do introduce theproblems of data mapping and database navigation, which require access to data structure metainformation. Up until the availability of the DSE technology, specifying or communicating thedata structure meta information to a SQL-based nonrelational processor had to be performedexternally to the SQL access request.

This method of externally supplying the data structure meta information has two obviousproblems. First, its specification and transport are proprietary. Second, it does not necessarilyreflect the true semantics of the SQL it is supposed to be modeling. This is because the SQLspecification is often limited to inner joins, which can only model flat data structures. Thisresults in a mismatch between the flat SQL-defined structure and the very structured externallysupplied data structure meta information, preventing a totally seamless interface. If the SQLspecification is composed of outer joins that are modeling the true physical data structure, theexternally supplied data structure meta information is not necessary. This is because the DSEtechnology can automatically supply this meta information when needed and do it using astandard ANSI SQL solution. This naturally extends the plug-and-play capabilities ofstandardized SQL.

There is a third, less obvious problem lurking when imposing a data structure on a SQL

specification. This occurs when the SQL specification contains one-sided outer join operationsthat do not model the externally supplied data structure meta information. In this case, there canbe a conflict between the externally supplied data structure meta information and the datastructure being naturally modeled by the SQL specification. This will produce semantics thatdo not match either the SQL specification or the imposed externallycontinue

Page 115

supplied data structure meta information. This mismatch will often produce erroneous results.The best solution all the way around is to use the natural data modeling capability of outerjoins and the DSE technology to supply the data structure meta information wherever andwhenever it is needed. Since the DSE technology is deriving the data structure metainformation directly from the SQL, its data structure meta information is always accurate, withlittle or no chance for error.

9.8—Conclusion

The DSE technology developed by CompuAid proved that it is possible to dynamically extractthe data structure meta information embedded in ANSI SQL join specifications. Thesehierarchical data structures can also utilize nonhierarchical, symmetric join operations in theirdefinition to support logical tables and symmetric substructure joins. What makes thistechnology unique is that it is fully ANSI SQL compatible (both syntactically andsemantically), which enables SQL features not previously possible with standard relationaldatabases. It was also shown why this technology offers the best solution to supplying datastructure meta information to SQL-based data access drivers and processors.

The following chapters will demonstrate how this dynamically supplied meta informationprovided by the DSE technology can be utilized to create new products and features. Thesefeatures include powerful semantic optimizations, seamless legacy access, object capabilities,postrelarional processing, and plug-and-play capabilities.break

Page 117

10—Outer Join Advanced Capabilities

This chapter presents advanced capabilities that SQL vendors can implement for their users byutilizing the data modeling and data structure processing capabilities of the ANSI SQL outerjoin operation. The advanced capabilities are made possible by dynamically extracting the datastructure meta information from ANSI SQL outer join specifications. This data structure metainformation is free information, placed in the outer join specification either knowingly orunknowingly by the programmer of the outer join specification. It can be extracted for the SQLproduct's use by a DSE procedure like the one documented in Chapter 9. With this information,the advanced database capabilities covered in this chapter are possible.

10.1—Database Navigation

Database navigation is not useful by itself, but is required to accomplish many of the advancedcapabilities presented in this chapter. Database navigation is the ability to move through thedatabase utilizing its data structure. With relational databases, this is not necessary since theyare navigationless, not requiring manual navigation. In other words, the database systemautomatically navigates for the user, which is standard for fourth-generation languages (4GLs)like SQL. There is a tradeoff with navigationless access—you lose control, but the access canstill be highly optimized.

Obtaining the meta information extracted from the outer join specification enables navigationalinstructions to be generated for nonrelational access, as demonstrated in Figure 10.1. Thesenavigation instructions can becontinue

Page 118

optimized since the entire portion of the data structure being accessed can be determinedbefore being accessed. These navigational instructions can be used to access any database thatsupports hierarchical access. The extracted data structure can be a logical structure composedof more than one physical type of database so that support for disparate heterogeneousdatabases and enterprise-wide access is also possible. When navigating physical databases,the order of sibling legs, such as B before C in Figure 10.1, may be important. It is useful torealize that the database navigation process described here can be performed dynamically.

10.2—Access Optimizations

The data structure semantics that are derived by the extracted data structure meta informationfrom the outer join specification can be used by the database engine to perform many powerfulsemantic optimizations that are not possible otherwise. The most significant is the dynamicremoval of unnecessary tables from outer join views based on which table columns areselected at view invocation. This is demonstrated in Figure 10.2, where the dashed blocksrepresent tables that do not require access. This optimization is not possible for inner joinsviews, which must always access each table in the view, but it is possible for outer join viewstaking into consideration where each table in the view is located in the data structure. Thisoptimized view capability dynamically ''downsizes" outer join views, so there is never apenalty for including too many tables in a view. In fact, this feature should reduce the numberof views necessary, making life easier for database professionals and end users querying thedatabase. This and many other powerful outer join optimizations are covered further in Chapter11.break

Figure 10.1The outer join can enable universal database navigation and access.

Page 119

Figure 10.2Outer join view dynamic optimization based on selection criteria.

10.3—Enterprise and Legacy Database Access

The outer join syntax is not limited or tied to relational databases. By using the databasenavigation ability described earlier in Section 10.1, enterprise, legacy, and postrelationaldatabases can be accessed in any combination by utilizing the data modeling capabilities of theANSI SQL outer join syntax. This is demonstrated in Figure 10.3, and can be performeddynamically via user interaction to support ad hoc queries. Since the outer join can preciselydefine hierarchical structures, only one-to-one mapping is necessary to access hierarchicalnonrelational databases, allowing efficient and truly seamless access. And since the datastructure definition can be specified dynamically using the outer join syntax, and supplieddynamically by the DSE procedure, no external predefined data structure definition isnecessary. With the data structure meta information in hand, nonrelational database calls orlanguage statements can be dynamically constructed and performed. This was demonstrated inFigure 10.1. For more detailed information on nonrelational access, see Chapter 14.

Nonrelational data access can actually be made more efficient using SQL. Since SQL is a 4GL,also known descriptively as a declarative language, itscontinue

Figure 10.3Disparate database access is possible with the outer join.

Page 120

access statements do not instruct how to access the database, but rather what is desired fromthe database. This means that all the information needed to know how to access the database isknown beforehand, allowing an efficient global access strategy to be developed. Because ofthis, very efficient access can be achieved, as in the example in Figure 10.2, which can also beapplied to nonrelational databases. Nonrelational optimized SQL access is described in moredetail in Chapter 11, and nonrelational heterogeneous SQL access is described further inChapter 14.

10.4—Open Database Access Interface

The ANSI SQL outer join operation makes a powerful "open" database access interfacebecause it is supported by most SQL vendors, it is standardized, and its syntax is free to use. Itcan also perform complex ad hoc data structure processing and define access for most databasetypes, and it automatically carries the data structure meta information within it, making it veryuseful for database access over the Internet. These features make the data structure metainformation available to all procedures that process the outer join, as illustrated in Figure 10.4.By carrying the data structure meta information within it, the outer join interface avoids passingthis information around using an arbitrary method and format. This also enables thestandardization of powerful plug-compatible database components, allowing data structuremeta information to be mixed and matched.

10.5—Seamless Value-Added Features

The data structure modeling capability of the ANSI SQL outer join can support manyvalue-added features in SQL that are based on the data structure specified by the outer joinoperation. These include more accurate aggregate functions that can occur anywhere in the datastructure and do not include replicated data values in the results, more flexible aggregateoperations where the range of input columns is controlled naturally by the data structure, andeasing of syntax limitations. An example of more flexible and accurate syntax is shown inFigure 10.5. Summary results are taken at multiple locations in the data structure, and theWHERE and HAVING clauses allow a two-level filtering where rows can be filtered beforebeing summed and then filtered on their summed value. Additionally, the use of this advancedsummary processing in the HAVING clause has avoided the need for a nested SELECTstatement.break

Page 121

Figure 10.4Outer join open database access interface.

Figure 10.5Multiple summaries taken at different locations in the data structure.

10.6—Data Warehouse Interface

Data warehouse access requires ad hoc data modeling and data structure processing. What datawarehousing is missing is a standardized access interface that can supply this. The outer joinsyntax can supply this and do it dynamically via dynamic SQL as indicated in Figure 10.6. Withthe outer join's enterprise access capability discussed in Section 10.3, the data warehouse canbe comprised of nonrelational databases, too.

10.7—Hierarchical Relational Processing

Hierarchical relational processing is the processing by SQL of relational and nonrelationaldata in a structured hierarchical fashion such as COBOL's processing of structured datarecords. Normally, this required the data to be stored in a nonfirst normal form (structured ornested format), doing away with relational's flat two-dimensional table limitation.Unfortunately, this meant that the data structure was fixed and had to be defined beforehand.But with the outer join's data modeling and structure processing ability, this hierarchicalrelational processing can be also performed on standard SQL systems bycontinue

Page 122

Figure 10.6The outer join can access unlimited views from data warehouse repository.

processing standard first normal form tables as hierarchical data structures, and withoutrequiring that the data structure be predefined. The outer join specification can specify andhierarchically process any possible hierarchical data structure that relational data tables andfixed nonrelational databases can logically define. This feature can be considered datastructure independence.

Outer join hierarchical relational processing operates seamlessly, and precisely matches itsdefined hierarchical semantics. This hierarchical relational processing can perform powerfulsemantic operations, avoid unnecessary data replications, support advanced summaryfunctions, produce more accurate and flexible summary operations, and display the data in astructured WYSIWYG format that accurately reflects its data structure, as shown in Figure10.7. If this sounds to good to be true, a prototype using the DSE technology described inChapter 9 was built, and live examples from it are shown in Chapter 12.

10.8—Object Relational Interface

One of the main problems with object databases has been the lack of a standardized objectdatabase interface. A standard and familiar relational databasecontinue

Figure 10.7Hierarchical relational display compared to standard SQL display.

Page 123

interface would make an excellent interface except for its total lack of data modeling and datastructure processing ability, which is an important requirement for object databases. With theouter join and its data modeling and structure processing capability, it would make an excellentstandardized and familiar object interface, such as the one shown in Figure 10.8.

Besides being able to read and write complex relational and nonrelational data structuresdirectly, avoiding relational-to-object mapping, an object relational outer join interface canalso support dynamic specification of the data structure through dynamic execution. Thisenables late binding and polymorphism, support of data abstraction, reuse through itssubstructure view support (described in Chapter 7), and the support of legacy database accessas described earlier in Section 10.3. The outer join object relational interface is covered inmore detail in Chapter 13.

10.9—View Update Capability

Updating of join views is not usually supported in SQL. This is because multiple tables areinvolved, making the join operation ambiguous for updating since its join result is usuallyexploded because of the Cartesian product effect. This makes it very difficult to know how toapply the result back to the under-lying base tables. But when the outer join is used to definevalid hierarchical data structures, it can be possible to update multitable views unambiguouslyand intuitively by following the unambiguous semantics of hierarchical data structures. Thisalso means that these same update semantics can be applied seamlessly across a heterogeneouslogical database composed of relational and nonrelational databases.break

Figure 10.8Object relational interface can read and write structured data.

Page 124

An example of why the inner join view has difficulty being updated can be seen in an inner joinview consisting of the Department and Employee tables. Updating this view is very difficultbecause of its ambiguous semantics. If a department is deleted, are the employees also deleted?What happens if an employee is deleted? Don't be influenced by any meaning attached to thetable names—try renaming the tables X and Y. The reason for this ambiguity is that there is nodata structure semantics associated with the inner join. This was described in Chapter 1. Incontrast are hierarchical views, which can be created by outer joins, such as those in Figures10.9 and 10.10.

Updating outer join views where the Department table is hierarchically over the Employeetable or the Employee table is hierarchically over the Department table is not ambiguous. InFigure 10.9, the effects of deleting a department in these two outer join views are intuitive. Inthe Department view, the associated employees and dependents would also be deleted alongwith the department. In the Employee view, only the affected department would be deleted. In

Figure 10.10, deleting an employee in the same two views as Figure 10.9 has a different effect,which is also intuitive. In the Department view, the employee and the associated dependentswould be deleted, not the associated department. In the Employee view, the employee and theassociated department and dependents would be deleted. All of these update operations use theouter join's defined hierarchical semantics, which are intuitive and fairly universal.

10.10—Multimedia Application Directory Support

Multimedia databases are more than standard databases with multimedia features andcapabilities. Multimedia databases are specialized. Their purpose is to aid in the support ofmultimedia centric applications such as interactive kiosks. This support extends not only tomultimedia storage and playback, but also tocontinue

Figure 10.9Deleting a department from different views produces

different results.

Page 125

Figure 10.10Deleting an employee from different views produces

different results.

the production of the multimedia application—which can be extensive, consisting of mediaacquisition, classification, and organization. To support these functions, a hierarchicaldirectory or modeling system is necessary to catalog and organize the multitude of multimediaaudio and video clips. Since multimedia applications are usually interactive and user-driven,the flexibility of a hierarchical structure organization is necessary.

As an example of such a multimedia application, Figure 10.11 shows the database model andSQL definition of a video book. This book can be viewed sequentially at several differentacademic levels, or as a reference using hyperlinks from the contents or index to access thestored multimedia data.

The application view in Figure 10.11 is an example of a simplified multimedia applicationview. Its design allows for both the organized production of the multimedia application and forthe flexible interactive operation (i.e., playback) of the application. A clip shown in the datamodel is usually made up of a sequential series of video frames and a scene can be made up ofa series of clips. A section can be made up of a number of scenes, and a chapter is composedof a number of sections.

This data model allows the flexibility of rearranging portions of the video very easily, and theaccess can be very efficient regardless of the number of tables because of the outer joinoptimizations (covered in Section 10.2 and later in Chapter 11). This model is general enoughto handle many different multimedia books, and they can be easily modified without having tochange the application that processes the data. For example, chapters and scenes can be added,moved, or deleted without changing the multimedia application. Multimedia databases supplythis data independence. When multimedia applications lack a database, the data structure isburied in the application, where its value is lost. Multimedia databases organize multimediaaround a data model making itcontinue

Page 126

Figure 10.11Multimedia book hierarchical directory example.

available to many applications, thereby avoiding the time-consuming production phase andincreasing reuse of resources.

Multimedia authoring systems that assist the user in building interactive multimediaapplications are missing this type of multimedia database capability. One reason for this is thatthey use only a single unchangeable operational metaphor. One such metaphor example iswhere the author of the multimedia application is the director of a play manipulating themultimedia components as the cast and props around the stage, which is the screen. This worksfine if the metaphor matches the application, but can be awkward when it does not. A solutionis to integrate a multimedia database as described above into the multimedia authoring systemand use the data model defined by the author as the operational metaphor. In this way, the

operational metaphor and the defined data model are tightly integrated, as are the playback andproduction components.

This dynamic data modeling metaphor ability becomes more important when it is realized thatmultimedia data is just a small subset of a larger classification of data, known as abstract dataor abstract data types (ADTs). Multimedia databases and authoring systems can easily storeand utilize all forms of abstract data types, such as fingerprints, X-rays, EKGs, and MRIs.Applications based on these abstract data types can be very different than multimediaapplications, but can still be data modeled in their own unique way using the data modelingcapability shown in Figure 10.11.break

Page 127

10.11—Universal Data Access of Structured Data

Universal data access has become quite popular with its standard platforms such as OLE-DB,ODBC, and JDBC. These platforms are SQL-based, using SQL as the database interfacelanguage. What is missing from these platforms is a standard way to supply structured metadata, which is required for database navigation. The solution is to use the supplied SQL itselfto supply the data structure meta information. This not only supplies the data structureautomatically, but the mapping is always accurate and in an efficient one-to-one mapping. Thismethod utilizes the enterprise and legacy access, and open database access interfacingcapabilities described earlier in Sections 10.3 and 10.4.

The diagram in Figure 10.12 demonstrates graphically how the data structure meta informationis automatically passed from the universal data access platform to the data provider componentthat performs the structured data access. The data provider component uses the data structureextraction technology described in Chapter 9 to retrieve the data structure meta informationfrom the SQL specification. Chapter 14 goes into this topic in more detail.

It is important to realize that the ANSI SQL join data modeling capability is based totally onthe outer join's standard syntax and semantics. This data modeling capability exists inherentlyin the ANSI/ISO SQL standard, and is operating automatically all the time. This means that anyother approach used to supply the data structure of a SQL query could be in conflict with thedata modeling occurring naturally with externally supplied outer join specifications, and thiscould produce incorrect results.

This data structure conflict can be eliminated by generating data modeling SQL from theexternally supplied data definition, thereby introducing SQL that accurately models the datastructure, and from which the data structure can be extracted at any time and location. Thediagram in Figure 10.12 demonstrates this system design.break

Figure 10.12Integrating external data definitions with data modeling SQL.

Page 128

10.12—The SQL XML Data Structure Connection

The Internet's HTML protocol is evolving into XML. The Extensible Markup Languageprotocol has many new capabilities, including the storing and processing of structured data.The format for data documents is in the form of hierarchical structures. This means thatstructured data in databases and structured data in XML Web sites can be moved back andforth using ANSI SQL with its join data modeling capability. This is shown in Figure 10.13.Notice that the data is stored with its meta structure definition. Any hierarchical structure canbe specified with an XML definition. The Employee view was chosen in this example todemonstrate how multiple legs and multiple levels can be specified. The elements of the XMLdefinition are nested by following the hierarchical structure.

XML and SQL's ability to define and process hierarchical structured data has two powerfuland useful uses. The first use is used today to dynamically transfer data from databases intoWeb sites. This technique is greatly improved by SQL's ability to dynamically transferstructured data from any combination of database sources into an XML Web site, where it canbe utilized as structured data by XML. As shown in Figure 10.14, SQL is invoked by thebrowser to transfer data into the Web site in XML format.

The second use of SQL for XML Web sites is a new capability made possible by XML. It isthe capability to treat XML Web sites as databases, where SQL can access their structured dataalong with other databases for retrieval or even update, as shown in Figure 10.15. This meansthat static XML Web sites do not have to be a closed system limited to Web browsers. Theycan be open to disparate and heterogeneous database access controlled by SQL. After all, theyare a form of database.

XML structured data is hierarchically structured, usually contiguous, data. For this reason, it isanalogous to structured data stored in files as records and can be accessed in the same fashion.SQL-based structured data access is shown in Chapter 14 and can be easily adapted to handleXML data.

XML data defining a hierarchically structured document or data located in a Web page can beconsidered a contiguous structured record that we will call ''a structured Web record." This

structured Web record has data structure control information embedded in the data just as astructured file record does. Like a structured record in a file, a structured Web record can becombined with other types of database data to form a larger heterogeneous hierarchicalstructure.

Structured records are located or addressed by a root-key field value. This can beaccomplished with structured Web records by assigning their root-key field value as the Webpage URL address. In this way, a structured Web pagecontinue

Page 129

Figure 10.13Structured data can be moved accurately between SQL and XML.

Figure 10.14SQL can move structured data dynamically into an XML Web site.

Figure 10.15SQL can treat XML Web sites like any other database.

Page 130

can be directly addressed by SQL or joined to from other record types in the heterogeneousvirtual structure using their foreign-key field value.

10.13—Conclusion

The data structure meta information that is extracted by the DSE technology is extremelyvaluable. It has the potential of supporting many powerful new SQL features and capabilitiesnot previously possible. Many of these were identified in this chapter, such as optimization,object relational interface support, view update capability, hierarchical relational processing,seamless legacy database access, and direct access to XML Web sites. The main enabler ofthese capabilities is the database navigation and processing of data structures. While these areglobal solutions, there is also the potential for specific solutions or features that can extend orcompliment individualized products.break

Page 131

11—Outer Join Optimization

The ANSI SQL join operation is more difficult to optimize with its ON clauses and outer joinoperations than the simpler common inner join. With the common inner join, its tables can befreely reordered to best optimize access. With the ANSI SQL join, this ability is constrained byits ON clauses. Working within the constraints of the ON clauses, INNER and FULL joins caneach be reordered in any order because they are both commutative and associative inoperation. The one-sided outer join is not commutative; its tables cannot be freely reordered.But hierarchictivity can play a role in optimization. This chapter explores the hierarchicalsemantics of the one-sided outer join for use in optimization.

11.1—Join Table Reordering

With the outer join, some table reordering is possible and recommended for efficiency. Takefor example the Department view, which can be built top-down or bottom-up. Normally,hierarchical structures are built top-down, but when subviews are used, as were shown inChapter 7, right-sided nesting can cause the structure to be built bottom-up. Top-downexecution is more efficient than bottom-up execution because bottom-up execution can causethrowaways. Throwaways are rows that are retrieved into the working set and then laterdiscarded. For example, using the data structure shown in Figure 11.1, throwaways occur whenthe Dependent table is joined with the Employee table and the result is then joined with theDepartment table, wherecontinue

Page 132

Figure 11.1Join table reordering optimization example.

unmatched employees are discarded with their dependents. These dependents are throwaways.

Throwaways are avoided when the structure is processed top-down since unmatchedemployees are discarded before their dependents are retrieved and stored. While subviewsmay cause throwaways, the SQL engine is free to rewrite the expanded query before itsexecution to change the join table order from bottom-up to top-down, as shown in Figure 11.1.

11.2—Dynamic Shortening of the Access Path

Dynamic shortening of the access path is an optimization that should automatically beperformed along with the join table reordering optimization specified in Section 11.1. Thisoptimization works when the data structure is being processed top to bottom, which it will be ifthe table reordering has been performed as described above. Dynamic path shortening occurswhen a hierarchical active path runs out of data before reaching its end. In this case, accessfurther down the path can be skipped for the current parent occurrence. For example, in theDepartment view shown in Figure 11.1, this can occur when a department has no employeessince it makes no sense to go any further down the active path after dependents. Furthermore,this path can have multiple subpaths that can also be eliminated. Figure 11.2 demonstrates thisdynamic path shortening.

11.3—Removal of Unnecessary Tables from Outer Join View

When a SQL inner join view is invoked, all tables in the view must be accessed to generate theresult table. This happens regardless of which columns are specified for retrieval when theview is invoked. This is necessary because thecontinue

Page 133

Figure 11.2Dynamic path shortening.

materialized view (the data that represents the view) on which the view invocation is based isalways affected by all tables in the inner join view. This is because missing data anywhere inthe inner join will cause unmatched rows to be removed. This was discussed back in Chapter 1where Figure 1.1 showed that an inner join composed of the Department and Employee tableswould not contain departments that had no employees. This means that if this view, call itDeptEmpView, was invoked as in SELECT DeptName FROM DeptEmpView, onlyDeptNames for departments that had employees would be selected. This result required that theEmployee table be accessed, even though no data was selected from it. If this was not thedesired result, then this view should not have been used and the Department table should havebeen accessed directly.

The necessity of accessing all tables in a view is a requirement for the way inner joins use theCartesian product model for processing joins, as described in Chapter 1. This is not necessaryfor outer joins that generate hierarchical structures. ANSI SQL outer joins operate differentlythan inner joins as described in Chapter 2.

Outer join views that model hierarchical structures do not always need to access all tables inthe view when invoked. Take for example the outer join view DeptEmpView, defined asSELECT ∗ FROM Department LEFT JOIN Employee ON DeptNo=EmpDeptNo. When thisview is invoked as SELECT DeptName FROM DeptEmpView, the Employee table is notreferenced and does not need to be accessed. This is because, in the semantics of thishierarchical data structure, the Employee table is at a lower level than the selected tableDepartment. This means that the Employee table cannot affect the Department table, andtherefore does not need to be accessed.

Any hierarchical structure access, no matter how complex, defined by outer joins can apply thispowerful view optimization. This is performed by eliminating tables from access considerationthat are not referenced in the query and are not on a path to a referenced table in the query. Thisexcludescontinue

Page 134

tables referenced on ON clauses since they will not affect the query if they are not referencedanywhere else in the query, because they are only used if access of the table is necessary. Thisoptimization is based on the modeled hierarchical data structure and the columns specified atthe time of the view invocation. This is not new. Hierarchical access logic dictates thisbehavior. The true test of this is that this logic derives the same data results as if all the tableswere accessed. This is demonstrated in Figure 11.3.

There is an additional beneficial side effect of this optimization: it helps eliminate unnecessaryreplicated rows. These replicated rows are introduced by accessing unnecessary tables. Thismeans that the optimized result is more semantically correct than the unoptimized result. Forexample, in the outer join DeptEmpView example described earlier in this section, theunoptimized view invocation would replicate the department's name (DeptName) for eachemployee in the department even though no Employee columns were selected. The optimizedinvocation would not replicate department names since no access to the Employee table was

needed. This is also shown in Figure 11.3.

The two examples in Figure 11.3 demonstrate view optimization applied to two different SQLviews of the same data and relationships. The data structure diagrams shown reflect thestructure of the SQL outer join view definitions and data that were originally defined in Figure6.1. For the Department and Employee views, the dotted lines in the data structure diagrams inFigure 11.3continue

Figure 11.3Outer join view optimizations can produce more accurate

results.

Page 135

represent areas of the structures that can be eliminated from access based on the view selectioncriteria shown directly above the diagrams. Data enclosed in a dotted box representsunnecessary replicated data that is removed when optimization is applied. This duplicateremoval is more semantically controlled than SQL's duplicate row value removal option.

In the examples shown, replicated data is produced because employee Mike has twodependents, causing Mike to be in the virtual view twice when using the old inner joinCartesian product access model (see Chapter 2). Without optimization, this replication isconfusing since dependents are of no importance or significance in either query, and thereforeshould not affect the result. And note, these example data views are small; larger views offer amuch greater opportunity for optimization.

Other benefits of the outer join view optimizations are that it does not penalize the user forpicking a view that is too large, and that large views will eliminate the need for many smallviews, making life easier on end users and DBAs.

11.4—Increased Efficiency of Parallel Database Processing

This book demonstrated in Chapter 6 that the legs of a hierarchical structure have separatesemantics because they are independent of each other. The legs do not depend on each other.

This not only implies that the tables can be processed in any order, but for parallel processingthis means these legs can be processed in parallel with no coordination between them beingnecessary. This can significantly increase asynchronous processing (pipelining in thisexample). This can be gleaned from Figure 11.4.

11.5—Dynamic Rebuild to Pick up New SQL Features

Besides internal optimizations, there may be SQL language functions added to new SQLreleases that can also be used to improve performance. To utilize these new external functionswill require modifying existing SQL code, usually by hand. In SQL: 1999, these functions,which can be user-defined functions, can be navigation functions that can access tables throughother tables to avoid the need to join them. For example, the first outer join example in Figure11.5, which models the structure diagram in Figure 11.4, is only selecting a column from thelower level table C. This SQL statement can be rewritten to avoidcontinue

Page 136

Figure 11.4Parallel processing of hierarchical sibling legs is always

possible.

unnecessary join operations, as in the bottom SQL example in Figure 11.5, by using anavigation function that uses the data structure meta information extracted from the originalquery so that it only returns keys that exist in the structure.

This optimization still conforms to the semantics of the structure shown in Figure 11.4 andoperates seamlessly because it continues to follow and obey the hierarchical semantics of theouter join. Using outer join data modeling today can allow for the capability of automaticallyutilizing future features (like this one) as they are introduced into SQL systems. This isachieved by database system software that dynamically rewrites the SQL specification to usethe new functionality. This capability, with its dynamic operation, also allows it to be appliedto ad hoc queries where it could not be accomplished otherwise, since the selected columnsare not known beforehand.

11.6—Optimization of Nonrelational SQL Interfaces

Procedural code is known for its efficiency, but when nonrelational databases are involved,nonprocedural fourth-generation languages (4GLs) can actually achieve similar levels ofoptimization. This is because with database 4GLs likecontinue

Figure 11.5Automatic SQL rewrite to take advantage of future SQL capabilities.

Page 137

SQL, the data structure (via the outer join) and desired processing requirements are known upfront, allowing a very high level of optimization. Instead of optimizing small pieces of databaselogic procedurally without much knowledge of what is going to be needed, nonproceduraloptimization can optimize globally and react quickly to change its global access logic. Withdatabases, each database access saved eliminates millions of instruction cycles and hardwarewait time.

SQL access of procedural databases like IBM's IMS, which requires manual navigation frompoint to point, is a good example of how nonprocedural access can actually improve databaseaccess efficiency. As stated above, because of the nonprocedural SQL, the total requirementsare known up front, so the access can be globally planned. With IMS, this means path calls canbe used to reduce the number of calls necessary by reading and writing entire paths down thehierarchical structure being accessed. Global strategy can also dynamically plan the beststrategy for database positioning, navigation, and access. These optimizations are demonstratedin Figure 11.6, where IMS segment types A and B bypass direct access until a qualifyingrecord is located. The semantics of this query are defined in Chapter 5.

A further optimization approach that can reap even greater efficiency with IMS and possiblyother navigational databases is to go under the covers and bypass their standard proceduraluser interface, which limits the full global optimization possible. This optimization strategyagain relies on the fact that the processing requirements are known up front because of thenonprocedural outer join data modeling semantics. This under-the-covers processing is alreadyperformed for IMS by a number of database packages on the market. IMS performs this processby accessing its underlying VSAM and ISAM access methods directly. Using this accesstechnique, SQL can actually process IMS databases more efficiently than is possible using thestandard IMS interface directly.

A final note about nonrelational SQL access. All the optimizations for SQL database accessdescribed in this chapter can also be applied to non-soft

Figure 11.6Outer join query can be translated to very efficient IMS access code.

Page 138

relational access. This is because they are based on data structure semantics, making themgeneric access optimizations.

11.7—Applying Hierarchical Optimizations to Network Structures

As we have seen throughout this book, network application structures can have multiple pathsto data, and for this reason they can be ambiguous. Most of the hierarchical optimizationscovered in this chapter are still possible. The data structure diagram in Figure 11.7 is anetwork application structure as defined in Chapter 6. Table D in this structure is at a networkjunction point where two or more paths come together, forcing the processing of the paths tosynchronize. This may limit some optimizations.

After mapping a network structure from an outer join statement, a network structure such as theone in Figure 11.7 can be reordered top to bottom for efficiency, as shown in Section 11.1.Parallel processing is still possible, as described in Section 11.4, but the network junctionpoints are sync points that may retard parallel processing. Dynamic rebuild, as discussed inSection 11.5, is also possible with additional code to support these sync points.

Dynamic path shortening can still operate on network structures that contain paths that havenetwork junction points, as described in Section 11.2. The optimization does not mean thatpaths that have been terminated early will not be accessed from another active path that forksinto it at the junction point. For example, in Figure 11.7, path D to E may be accessed via pathB even when path C has been shortened. This makes sense, since path D to E requires separateaccess from all paths entering it (unless dynamically shortened) since each path entering itmatches different key link values used in the join operation,continue

Figure 11.7Outer join network structures have junction points.

Page 139

which can produce different results in path D to E—depending on the path values entering it.

The removal of unnecessary tables from invoked views is also possible with network views.This can have the effect of actually removing network junction points, which can turn a networkstructure into a valid hierarchical structure dynamically. For example, if tables D and E are notreferenced in the network structure in Figure 11.7 (as documented in Section 11.3), then tablesD and E are eliminated from the materialized view, creating a valid hierarchical structure andenabling all the benefits that go with it, as described in Chapter 5. This is demonstrated inFigure 11.8.

The optimizations shown in Figure 11.8 will also apply for network structures where thenetwork junction points are linked to multiple paths using AND logic instead of OR logic. Thisstructure, while similar, is not actually a network structure, and is described in Chapter 6.

11.8—Shifting on Clauses to the WHERE Clause

Since the WHERE clause has been around a lot longer than the ON clause, there is a tendencyfor SQL optimization to move ON clauses, or portions of them, to the WHERE clause whenpossible. This is probably a good strategy since the WHERE clause probably has much moreoptimization logic than the newer ON clause. When there are both a WHERE clause and ONclauses, there is the opportunity to come up with these types of optimizations because of thesimilarity of these different types of selection clauses. But whatever the case for optimization,it must be done with care because ON clauses can specify complex semantics while theWHERE clause is limited in this area, so the resultcontinue

Figure 11.8Network structure optimized and converted to hierarchical structure.

Page 140

may not always be the same. As an example, Figure 11.9 is performing an optimization wherethe ON clauses are transferred to the WHERE clause.

This example moves all of the ON clauses' join criteria to the WHERE clause, therebyeffectively changing the outer join query to an easier-to-optimize inner join query. Theconverted outer join query now performs the inner join of the three tables involved and thenfilters the result using the WHERE clause criteria. In this case, the WHERE clause in Figure11.9 is based on the lowest level table, Dpnd, which means any missing data for table Dpndwould be filtered out. This further implies that missing data for table Emp would be filtered outand so on up the path. This logically turns the query into an inner join since no data is actuallybeing preserved—this means only complete rows that match the selection criteria are selected.

If the WHERE clause in Figure 11.9 specified a filter on table Emp instead of table Dpnd, theoptimization shown could not have been performed since it would remove data preservingbelow the table Emp level when table Emp passed the filtering test. This leads one to believethis inner join optimization can only work when the WHERE clause is filtering at the lowestlevel. But this is only partially correct. To see why, examine the SQL optimization in Figure11.10.

In Figure 11.10, the WHERE clause is at the lowest level in the data structure and the filteringdata is contained in the last table joined, table Dpnd. But, the problem here is that while tableDpnd is at the lowest level, there are other legs in the structure. Table Dept is on another leg,and if the query were changed to an inner join, no data would be preserved when table Deptdid not match a table Emp row occurrence. In this case, as we learned earlier in Chapter 5 ondata structures, sibling legs are independent of one another. This means what occurs in one legshould not influence the other. By converting the outer join in Figure 11.10 to an inner join, itchanged the semantics such that what happens in one leg can affect all the other legs. Thischanges the result of the query. This means that performing these types of optimizationsrequires analyzing the semantics of the outer join queries very carefully.break

Figure 11.9Shifting ON clauses to the WHERE clause for optimization.

Page 141

Figure 11.10Invalid example of shifting ON clauses to the WHERE clause.

11.9—Conclusion

This chapter has presented powerful semantic optimizations that are enabled by the outer joindata modeling ability. Without utilizing the outer join optimizations presented in this chapter,the outer join will operate less efficiently than the inner join. This will prevent many users andvendors from utilizing this powerful operation. But if the outer join optimizations presentedhere are utilized, the efficiency of the outer join could equal or even surpass the inner join inmany cases. This means that the outer join, with all of its powerful capabilities, can becomparable to the efficiency of the inner join!

It was also demonstrated that outer join view optimization could convert a network structureinto a hierarchical structure, thereby enabling all the features and capabilities available tohierarchical structures.

The optimizations presented in this chapter demonstrate the value of data modeling and theimportance of the capability to determine the data model defined by outer joins. The datamodel represents the semantics of the data and makes it easier to determine the consequencesof changing the SQL to optimize SQL queries.break

Page 143

12—

Hierarchical Relational Processor Prototype

With ANSI SQL having the capability to inherently process hierarchical structures, it is nolonger necessary to force all data into a flat structure that obscures the data structure andunnecessarily replicates data. If the data is being modeled hierarchically, it can be processeddirectly in this more powerful form by using outer join specifications that directly model thedata structure and execution paths.

The examples in this chapter show the operation of an ANSI SQL-based hierarchical relationaldatabase processor prototype that is driven by the inherent data modeling capability of theANSI SQL outer join. It utilizes the DSE technology, described in Chapter 9, to dynamicallyextract the data structure meta information naturally present in outer join specifications. Thisfreely available information is used to control the hierarchical heterogeneous processing ofrelational and nonrelational data. It produces a hierarchical WYSIWYG display that conformsto the underlying data structure of the SQL query request. This produces results that aresemantically superior to standard SQL processing and are more semantically accurate.

This new hierarchical processing prototype does not require that the data be in a fixed formator that the data structure be predefined. The data can be stored in standard first normal formrelational tables, flat files, or hierarchical prerelational or postrelarional databases such as anIMS legacy database or a nested relational database. The data structure can be specifieddynamically,continue

Page 144

giving it data structure independence that is lacking in standard universal data access systems.

12.1—Hierarchical Relational Prototype Operation

Hierarchical relational databases access and process data in non-first normal form (structuredformat). This eliminates having to flatten the data into first normal form (table format) asstandard relational systems do. This flattening of the data can introduce unnecessary replicateddata. By not having to flatten the data, hierarchical relational processing can preserve the datastructure so that all aggregate and summary operations will be accurate and can be controlledwith more flexibility. This is reflected in the structured format used by the hierarchicalrelational processor to display its output. In this structured output format, a blank data fieldindicates that the previous column value is still in effect. A dash inserted in a field indicatesthe data is missing—this prevents a missing data value from inadvertently being taken as theprevious column value.

The first entry of each example is the outer join specification that is processed directly by theSQL hierarchical relational prototype. The prototype then extracts the data structure metainformation embedded in the outer join specification using the DSE technology described inChapter 9, and displays its metadata structure information in table form. This metadatastructure information includes an outer join semantic optimization indication, which is flaggedunder the Access column when a table in the data structure does not require access.

Lastly, using the data structure meta information supplied from the outer join specification, theprototype accesses its internal first normal form relational database in a manner that will

produce the structured data results shown in WYSIWYG format. This hierarchical relationalprocessing can be implemented in any standard SQL system, relying only on the data structuremeta information supplied from outer join operations.

12.2—Basic Data Modeling

The examples in Figures 12.1 and 12.2 demonstrate the basic data modeling capabilities of theANSI SQL outer join. They show how the hierarchical relational prototype using the DSEtechnology can process standard relational data in a hierarchical fashion. In these examples,three tables—Department, Employee, and Dependent—are joined in different ways using thesame relationships to form two different data structures involving one-to-many andcontinue

Page 145

many-to-one relationships. Notice in the query outputs that there is no unnecessary datareplication. All the data replications are accurate regardless of what data structure level thedata is at or if there are multiple legs in the data structure as in Figure 12.2. This allowsaggregate operations applied anywhere in the data structure to be accurate. While the examplein Figure 12.2 does show replicated data (HR and Acct), this correctly reflects themany-to-one data structure relationship of Employee over Department and its semantics (i.e.,many employees have the same department). Notice further that these replication occurrencesare correct—in a standard relational first normal form result, HR would have been replicatedthree times instead of the correct two.

Besides the two different data structures in Figures 12.1 and 12.2, there is also a differencewith the data values displayed or not displayed in the two examples. The first example's queryoutput in Figure 12.1 includes a department named MIS while the second example does not.The second example's query output in Figure 12.2 includes an employee named Irv with adependent named Ben, while the first example in Figure 12.1 does not. These differences areproperly reflected in the semantics of the data structures involved. The MIS department isn'tincluded in the example's query output in Figure 12.2 because this query models an Employeeview (Employee over Department and Dependent), and there are no employees in the MISdepartment. The employee Irv and his dependent Ben aren't included in the first example'squery output in Figure 12.1 because this query models a Department view (Department overEmployee over Dependent) and Irv and his dependent Ben do not belong to any knowndepartment. This was covered in Chapter 5.break

Figure 12.1Department view processed by hierarchical relational processor.

Page 146

Figure 12.2Employee view processed by hierarchical relational processor.

12.3—Many-to-Many Relationships

The examples in Figures 12.3 and 12.4 operate on a Parts and Suppliers many-to-manyrelationship, described in Chapter 7. In this relationship, one supplier can have many parts andone part can have many suppliers. This does not present a problem for hierarchical relationalprocessing and both data structures in the examples in Figures 12.3 and 12.4 produce ahierarchically structured (many-to-many) result. Most texts on data modeling state thatmany-to-many relationships form one-to-many hierarchical relationships. A many-to-manyrelationship is actually a combination of many-to-one and one-to-many. In the one-to-manyportion replications are suppressed, while in the many-to-one portion they are not. In theexample in Figure 12.3—Parts over Suppliers—parts are not replicated but suppliers are (P1occurs once while S1 occurs three times, each related to a different parent value). In a true

one-to-many relationship, the lower level values will not repeat across their parent values asin this many-to-many relationship example.

It is worth noting that many-to-one relationships are found naturally in the database and do notrequire special considerations for processing or printing. But with one-to-many relationships,special handling considerations are needed because the data is nested and requires specialconsideration when processing and displaying.break

Page 147

Many-to-many relationships require the use of an association table as described in Chapter 7.The association table used in the SQL examples in Figures 12.3 and 12.4 is PartSupplier, andis shown in Figure 12.5. It contains keys (Part, Supplier) from both sides of the relationship tomaintain the many-to-many relationship in both directions. In the example in Figure12.3—Parts over Suppliers—the association table is transparent in the result because nocolumn from this table is requested for display.

The Suppliers over Parts example in Figure 12.4 does reference the association table toinclude the QNT (quantity) column. This value is known as intersecting data because its data ismeaningful at the point of intersection (i.e., the quantity of a given part for a given supplier)also explained in Chapter 7. This intersecting data appears to be a value associated with theParts table since values in the association table will always appear to be a value from thelower level table, as shown in Figure 12.4.

12.4—Embedded Views

The example in Figure 12.6 demonstrates that stored views containing outer join defined datastructures can be seamlessly combined to form larger data structures using the same standardANSI SQL outer join syntax already demonstrated. The hierarchical relational prototypeidentifies stored queries by their view name. They are printed out when expanded, as shown inFigure 12.6. The example in Figure 12.6 uses two views shown earlier in thiscontinue

Figure 12.3Part/Supplier view processed by hierarchical relational prototype.

Page 148

Figure 12.4Supplier/Part view processed by hierarchical relational prototype.

Figure 12.5Association table used in many-to-many relationship.

chapter, the Supplier view (Suppliers over Parts) and the Department view (Department overEmployee over Dependent). In this case, the Supplier view is joined over the Department viewusing the DeptSuppNo column in the Department table. Notice that this combined data structureproperly reflects its new structure, the replication counts are accurate, and the data displayed isconsistent with the previously shown data structures in this chapter.

12.5—View Optimization

The final example in Figure 12.7 demonstrates a powerful and very useful optimization forstored views described in detail in Chapter 11. It significantlycontinue

Page 149

Figure 12.6Expanded view example.

enhances the operation and usefulness of SQL's new outer join data structure processingcapability. It often happens that a stored view is used where it is not necessary to access all thetables defined for the desired result. With standard inner join views, it is always necessary thatall tables in the view be accessed. This not only results in more overhead, but often incorrectresults caused by accessing unneeded tables, which in turn can cause replicated data valuesand lost data. With outer join views, this unnecessary data access concern is not necessary andcan be avoided.

The example in Figure 12.7 is identical to the previous example in Figure 12.6, except in thisexample no data is selected from the Dependent table. In this case, the DSE prototypedetermines from the semantics of the data structure that the Dependent table does not need to beaccessed (see the Access column in the data structure table above). Notice that the result of theSQL query statement in the example above, without the Dependent data and access to theDependent table, remains consistent with the previous example. This proves that thisoptimization works in this situation.break

Page 150

Figure 12.7View optimization example.

12.6—Conclusion

This chapter has demonstrated an innovative SQL processor prototype that operates ondisparate heterogeneous data in a high-level hierarchical manner. Previously, SQL processingof disparate heterogeneous data always used the lowest common denominator structure—theflat structure. With ANSI SQL's capability to directly model and process hierarchicalstructures, there is no longer a need to map structured data into a flat structure whenhierarchical structures are being modeled. Besides the ease and efficiency of one-to-onemapping, the powerful hierarchical semantics of the modeled data structure are maintained andutilized.

The live hierarchical SQL examples presented in this chapter prove a number of things aboutthe DSE technology. First, the DSE software operates as expected—it does extract the datastructure meta information embedded in the outer join. Second, it can be utilized to developproducts like the hierarchical relational processor that would not be possible otherwise withstandard SQL. Third, and most importantly, it proves the data modeling technologycontinue

Page 151

behind the DSE software is valid and does work. This means the outer join does indeedinherently support the data modeling of complex data structures consisting of multiple legs, andone-to-many, many-to-one, and many-to-many relationships. Fourth, it demonstrates thistechnology is useful and viable.break

Page 153

13—Object/Relational Interface

The outer join's object/relational interface capability is the best showcase for the features andcapabilities of the outer join. It uses all the inherent features and attributes of the outer join andthe advanced capabilities made possible by the DSE technology described in Chapter 9. Butthe most powerful operation at work is the interaction and synergism of these capabilities.These capabilities and their interrelationships are represented in Figure 13.1. This chapter willcover each capability and attribute in the diagram and explain its function, importance, andinteraction with those capabilities it enhances. Other object/ relational capabilities introducedin SQL: 1999 are described in Chapter 8.

This chapter covers each object feature shown in the diagram in Figure 13.1, one or moretimes. At the top of the diagram, the ANSI SQL outer join operation acquires itsobject-enabling capabilities and attributes, these being standardized via ANSI standardization,dynamic operation, and powerful data modeling capability enabling complex data structureprocessing.

13.1—Standardized SQL Interface

One of the biggest stumbling blocks for object databases (ODBMS) was the lack of astandardized interface that supports the features shown in Figure 13.1. After all, investing timeand money in a nonstandardized database is very risky. The ANSI SQL outer join operation isstandardized. If there were such an object interface, most agree a familiar relational syntaxwould be widely accepted. Again, the outer join fits the bill.break

Page 154

Figure 13.1Object/relational capabilities and their outer join derivation.

13.2—Data Modeling and Structure Processing

One of the biggest, if not the biggest, missing capabilities hampering object/ relationalinterfaces is the lack of complex data modeling and structure processing capability in therelational model. The relational model has previously had no inherent data modelingcapabilities. This capability is extremely important to object databases that deal with complexobjects. Many other capabilities such as blobs (binary large objects), user-defined data types,and functions can be easily added to the relational model. But up until now, data modeling andits related capabilities could not be seamlessly added since it did not fit naturally into therelational model.

With the ANSI SQL outer join operation, seamless complex data modeling and structureprocessing now become possible. As demonstrated in Chapter 6, this powerful capability isperformed inherently in SQL, resulting in direct and seamless processing of complex datastructures. This capability can be further enhanced by the outer join DSE procedure discussedin Chapter 9. This procedure dynamically extracts and makes available to the SQL engine theinherent data structure meta information embedded in outer join statements. This enables thedirect support of many other capabilities and attributes of ancontinue

Page 155

object/relational database. These are data inheritance, efficiency, database navigation,nonrelational database access, reusability, and data abstraction. Figure 13.2 depicts one waythat SQL, via the ANSI SQL outer join, can be seamlessly integrated with an object database tohelp supply these object capabilities.

The example in Figure 13.2 demonstrates how SQL, utilizing the powerful ANSI SQL outerjoin syntax and semantics, can be used to model in parallel hierarchical data structures defined

in memory by programming languages. Then, by utilizing the data structure meta informationrecovered from the outer join specification, the data can be seamlessly transferred between thedatabase and structured storage. The data can be retrieved from any database source (seeFigure 13.1)—it does not have to be relational. In memory, the data can be navigated andmanipulated procedurally by any object language and then written back out automatically to itsnative database. This database access is very efficient since the entire data view is knownbeforehand and can be retrieved more efficiently than with multiple procedural calls.

13.3—Data Abstraction and Reusability

Embedded SQL view structures—that is, views containing data substructures—can becombined to form bigger structures by simply joining them using standard ANSI SQL joinsyntax. This was shown in Chapter 7, and is depicted in Figure 13.3 where the Emp view isbeing used to create two larger views, EmpDept and DeptEmp views. This capability isimportant because it increases reusability and data abstraction. By breaking out commonsubstructure portions as SQL views like the Emp view shown below, reusability is enhancedsince replication is reduced and can be controlled more easily.

Data abstraction is also increased since this substructure view capability hides thecomplexities of data structures, because the data modeling SQL is hidden in the view.Structured subviews are not only useful for data abstractioncontinue

Figure 13.2Object/relational interface transfers data to and from structured memory.

Page 156

Figure 13.3Data abstraction and reusability with substructures.

and reusability, but can be applied to all forms of database access and inheritance described inSection 13.4. Because of outer join optimizations described in Chapter 11, they do notnecessarily add inefficiencies.

13.4—Data Inheritance

Data inheritance is made possible by the hierarchical nature of data modeling and the outerjoin's data structure view's ability to join data structures. Data inheritance is shown in Figure13.4, which demonstrates how tables can be seamlessly designed so that common portions oftheir data can be grouped together into objects to be more easily shared in an objectenvironment. For example, Employee and Dependents (tables or classes) share the same typeof personal information, such as birthdate, sex, and address. Using data modeling, this personalinformation can be moved out of the Employee and Dependent tables and stored separately in aPerson table, to be transparently combined with the Employee and Dependent tables in views.These views represent the complete Employee and Dependent data. This data inheritancecapability also adds to the reusability of the data because it can reduce multiple copies of data.

The Emp View and DpndView structured views shown in Figure 13.4 are hierarchical asrepresented in the diagram, indicating they would be combined with a LEFT outer join.Another possibility that may give more desirable results depending on the situation is to jointhe tables using a FULL natural outer join to create a logical table, as described in Chapter 7.In this way, the Coalesce function can be very useful for data inheritance when the same datatypes exist in both tables and one or the other need to be used or overridden— for example,COALESCE (Person. Birthdate, Employee. Birthdate). In this way, Birthdate would besupplied if it existed in either table, and if it existed in bothcontinue

Page 157

Figure 13.4Data inheritance supported in SQL by structured views.

tables, the Birthdate value from the Person table would be used since it is the first onespecified in the Coalesce function.

13.5—Database Navigation, Efficiency, and Nonrelational Access

Object databases need the flexibility and control to navigate the database structure. Knowledgeof the hierarchical data structure being accessed by the outer join supplies this databasenavigation information. This was covered in Chapter 10. Normally in applications, databasenavigation is supplied procedurally, one instruction at a time. With a nonprocedural 4GL likeSQL, it is all supplied up front, allowing for greater optimization and efficiency whenspecifying database access operations. This allows combining several access operations intoone for more efficiency. With database access, nonprocedural access is usually more efficient

than procedural and can be optimized for each specific use.

As indicated above, database navigation information allows for the generation of databaseaccess operations. These access operations can also be for postrelational databases such asnested relational and object databases, legacy databases such as IBM's IMS, enterprise accessacross many types of databases, and data warehouse databases requiring flexible structuredaccess. These different types of access procedures are all seamless because there is a directmapping possible with the outer join's inherent data modeling ability. This in turn allows fortruly seamless and direct disparate and heterogeneous accessing. This also adds databaseabstraction since the user does not have to be aware of the type of database being accessed.These nonrelational database access capabilities were covered in Chapter 10.

The semantics of the data structures modeled by outer joins offer an excellent opportunity foroptimization. These were disclosed in Chapter 11. They all offer efficiency, but they alsoincrease reusability and data abstraction. This is because view optimization (described inChapter 11) removescontinue

Page 158

unnecessary tables from the view when invoked. This means the user doesn't have to beconcerned about using the most limited view available for the query. One large view can servefor many smaller subviews. This increases data abstraction for the user and helps reusabilityby allowing one view to be used efficiently in many applications. Efficiency is derived fromthe possible semantic optimizations and database navigation that supplies the means toimplement the optimizations.

The optimizations utilize the hierarchical structure modeled by the outer join so they will alsowork seamlessly on nonrelational databases. Another optimization that offers powerfulcapabilities for object databases is the dynamic rewriting of outer join requests that canautomatically utilize advanced capabilities in the underlying database system as they becomeavailable. This was described in Chapter 11 and is shown in Figure 13.5. These include SQL:1999 object capabilities and functions that can be used to perform direct navigation to bypasscostly joins. This means that SQL outer join views do not have to be associated with slowprocessing join bound processing. This can improve the performance of inheritance, describedin Section 13.4, so that it becomes practical to use. Since data modeling and structureprocessing can be improved by outer join optimizations, all capabilities that depend on themare likewise improved.

13.6—Late Binding and Polymorphism

The outer join and the DSE technology can operate dynamically. This has added value for thecapabilities already discussed in this chapter, especially to the object database operation. Itallows all the capabilities shown in Figure 13.1 to operate when initiated interactively, and itenhances many of their operations. Optimizations can be determined and performed at run timewhen dynamic access request requirements are known. Reusability is reinforced whencontinue

Figure 13.5SQL:1999 navigation can avoid joins while maintaining view semantics.

Page 159

views are invoked dynamically and transparently optimized because it no longer becomesnecessary to have as many views. Warehouse database access can support decision support(DSS) by supporting ad hoc requests specified at run time.

But most importantly for object use, it enables late binding and polymorphism. An example oflate binding and polymorphism for the outer join is that it allows different access methods anddata structures to be dynamically linked and accessed, as shown in Figure 13.6. Late bindingallows the data structure to be specified at run time. Polymorphism allows the same outer joinstatement to process different types of databases to satisfy the request and this happens at runtime thanks to late binding. This combination can be used to support plug-and-play capabilities,as shown in Figure 13.7.

13.7—Plug and Play

Utilizing the capabilities of the outer join's late binding and polymorphic capabilitiesdescribed in Section 13.6, it is possible to easily create plug-and-play database components.These plug-and-play components enable applications to specify complex databaserequirements using a neutral database modeling and access language such as SQL with itsANSI SQL join operation. Because of the late binding ability, the database components can beplugged in without reconfiguration. The polymorphic capability enables disparate databasetypes to also be plugged in without any reconfiguration.break

Figure 13.6

Examples of late binding and polymorphism.

Page 160

Figure 13.7Plug and play.

13.8—Conclusion

The data modeling and data structure processing ability of the outer join coupled with the datastructure meta information extraction technology (Chapter 9) can produce the capabilities andattributes shown in Figure 13.1. These capabilities interact with each other to produce featuresthat are more powerful than when taken alone. Used together, they help make a very powerfulobject/relational interface that has the capabilities required of an object database and at thesame time has the features and characteristics of a relational interface.

The capabilities presented in this chapter were not accomplished by grafting on new featuresthat do not meld with relational operation, or by arbitrarily defining new semantics for SQL.The ANSI SQL outer join operation inherently and seamlessly supplies the framework for thecapabilities discussed and shown in this chapter.break

Page 161

14—Nonrelational SQL-Based Universal Data Access

There are a number of universal data access frameworks that are becoming very popular.These include OLE DB, ODBC, and JDBC. Most of these universal data access frameworkshave one thing in common—they use SQL as the database interface. This presents a problemwhen using these frameworks to access nonrelational databases. Most nonrelational databasesare hierarchical in structure, so interfacing seamlessly to them using SQL presents a problem.The outer join's data modeling ability can supply the solution to SQL-based universal dataaccess. To demonstrate this, this chapter presents a method that works with all SQL-baseduniversal data access frameworks to seamlessly process structured data records such as thoseused by COBOL, C, and 4GLs. This process can also be applied to other forms ofhierarchically structured data such as XML data.

Structured record processing is usually the last legacy type access that is implemented by

SQL-based universal data access products. Because of the way the structured data iscontiguously stored in structured records, SQL has had a difficult task interpreting its makeupand mapping it to a relational data structure. This chapter will show how the ANSI outer joinoperation can naturally map these hierarchical structures and how their contiguous structuremakeup can be accessed seamlessly by standard SQL-based universal access frameworks.Some SQL products are starting to support nested relations, where a given column of a tablecan itself contain multiple rows and columns of data. These nested relations can formhierarchical structures very similar to structuredcontinue

Page 162

records, and for this reason can be processed in a similar fashion to that shown in this chapter.

14.1—Structured Record Overview

Structured records are hierarchical data structures that are stored contiguously in programmemory and also when written to storage. Structured records are used inherently byprogramming languages like COBOL and C that can seamlessly map these structures with theirstandard data definition syntax. COBOL can support variable occurring segments while C islimited to fixed occurring segments, but both can model multileg hierarchical data structures.These structured data records are also used heavily by 4GLs to store and transfer hierarchicaldata structures from place to place.

The composition of structured records is fairly standard except for slack bytes that can beadded for boundary alignment by different programming languages. The example in Figure 14.1demonstrates how COBOL defines structured data and how it is represented in memory or onfile, where it can be read into memory, modified, and read out again.break

Figure 14.1View of a variable-length contiguous structured data record.

Page 163

Variable-occurring segments use count fields defined in their parent segment to indicate theirnumber of occurrences. Fixed-occurring segments do not need to store their occurrence count inthe record, since it is fixed and can be kept in the data definition.

The structured record in Figure 14.1 is comprised totally of variable-occurrence repeatingsegments. These variable-occurrence segment types require a count field stored in the datarecord for each separate sequence of these occurrences under their parent segment. This isnecessary because the occurrence count can be different for each parent occurrence.Fixed-occurrence counts can also be specified for segments. They do not require a count fieldin the data because there are always the same number of occurrences reserved in the record.The fixed-occurrence count is contained in the meta data that defines the record format. Anexample is shown in Figure 14.2, where the Emp segment type has been defined as fixed(i.e.,20 Emp Occurs 2 Times). Notice that a fixed-occurrence count does not represent theactual number of data occurrences, only that there are a fixed number of segment blocks—somemay not be used as shown below.break

Figure 14.2View of a structured data record with ''fixed occurs" Emp

segment.

Page 164

14.2—SQL Structured Data Access Basics

The outer join syntax can be used to define a view of the hierarchical structure for a structureddata record so it can be seamlessly accessed. This can be performed by defining each segmenttype of the structured record as a relational table. Then, whenever the structure record isqueried by SQL, either by itself or as part of a larger structure, the outer join structured record

view is used to define the structured record portion of the logical view. Figure 14.3demonstrates this.

Since structured data segments are contiguous, they do not need or usually contain unique andforeign keys for linking. These missing keys are added to the SQL view definition in Figure14.3 as virtual surrogate keys that are processed by the structured record processor, which isdescribed later in Section 16.4. To define the structured record accurately, the order that thestructured record SQL view is defined must specify its legs in the same order they occur in thephysical data structure. This is not necessary in a logical hierarchical structure, but may berequired in a physical structure for navigation.

All SQL access to the structured record is performed through the outer join view that defines itin its entirety. This has the advantage that this view is the only view necessary for accessing thestructured data record. Because of the SQL optimization documented in Chapter 11, Section11.3, this view alwayscontinue

Figure 14.3Using hierarchical SQL view to access structured data.

Page 165

eliminates unnecessary table accesses for each specific use of the view. This means there isnever a penalty for using this global view.

14.3—Internal Navigation and Mapping of Structured Data

To access a structured data record, it must be first mapped so that all segment types are easilyaccessible and their occurrences can be navigated. In order to map the structured data record,its data definition is necessary. This data definition describes the hierarchical data structure, itssegments, and their hierarchical level and relationships to other segment types in the structure.As stated previously, fixed segment occurrence counts are stored in the data definition, whilevariable segment occurrence counts are stored in the data record. The pseudo code in Figure14.4 uses the hierarchical order (top to bottom, left to right) and physical database hierarchical

level of the segment definitions in the data structure definition to drive the mapping andsegment decomposition process.

The pseudo code in Figure 14.4 has a couple of optimizations for bypassing the storing ofunnecessary segment occurrences. These are possible when the data is for read-only purposesand will not be updated. Another optimization that is possible is to hold off invoking thissegment decomposition routine until after the root segment for the active record is processed.This is possible because the root segment will be processed first, before the lower levelsegments of the record are required. The root segment is the leading segment and is accessiblewithout performing the segment decomposition routine. The reason that this is an optimizationis that very often the root segment contains record selection or join qualification criteria thatmay cause bypassing of further processing of the record, and this optimization will avoid theprocess of decomposing the record.

If the structured record is to be updated, including inserting of segment occurrences, thestructured record must also be moved into a hierarchically linked structure, or at leastexpanded while it is being mapped. This will allow for the insertion of segment occurrences.Writing an updated structure record back out is accomplished by first compressing it back intoa contiguous structured record. This process is much easier than expanding the data structure,since it has already been mapped.

It is worth noting that languages that can define hierarchical structures, like COBOL, C, orXML, have the procedural flexibility to define structures that do not conform to good structuredefinition principles. These can cause problems for mapping procedures like the one in Figure14.4. The most important rule to observe when defining a hierarchical structure is to keepeachcontinue

Page 166

Figure 14.4Pseudo code to decompose and map a structured record.

segment's data definition contiguous. This means that once a lower level child segment type isdefined, it should indicate the end of the parent segment. Any remaining segment data isambiguous to the structure definition process.break

Page 167

14.4—SQL-Based Universal Data Access of Structured Data

Almost all standard universal data access interfaces use SQL as the database access language.These include OLE DB, ODBC, and JDBC interfaces. Since these interfaces use SQL, accessto nonrelational databases is not a straight-forward procedure. Structured data records presentan additional access problem because of their contiguous format. The data access middlewaredesign in Figure 14.5 uses a two-step process to interface structured records seamlessly toSQL-based universal data access interfaces. The structured record processor box in thediagram moves the data between the structured data record and the intermediate tables usingthe data structure metadata extracted from outer join specifications to navigate the structuredrecord. The data provider component moves the data between the intermediate tables and the

universal data access interfaces (i.e., OLE DB). By using these intermediate virtual tables, anyorder of SQL requests from the universal data access interfaces can be handled in a directfashion, including updates. With the outer join modeling the structured data record, this methodproduces a truly seamless interface process with the SQL-based universal data accessinterfaces.

Because structured records on file are more easily addressed through their root segment, thiscan affect processing of SQL WHERE and ON clauses thatcontinue

Figure 14.5Interfacing to SQL-based universal data access middleware.

Page 168

reference data in lower level segments in structured records. For root references, the structuredrecord processor in Figure 14.5 can directly address the required structured records on file,while for lower level references it will have to sequentially search through the selectedstructured records' contents unless a secondary index was used.

14.5—Handling Multiple Structure Formats within a File

Files that contain structured records may also contain multiple record formats that areinterspersed in the file. These structured records will have a field in their root segment thatwill distinguish the different record types in the file. Applications can handle these differentrecord formats by testing this format indication in the root segment and then using the properstructure overlay to process it. A similar technique can be used for SQL queries to ensure thatonly records of a specific format are processed by selecting on the format indication. This isusually appropriate for queries since only one format for a query is usually required at onetime. This format selection process can be specified as in:SELECT EmpNo FROMStructuredView WHERE DeptNo=123 AND StructuredFormat=2. In this example, theDeptNo and StructuredFormat fields are located in the root segment. This technique works

because the structured record can be retrieved and its root segment tested without the need todecompose the structured record, as discussed in Section 3 of this chapter.

14.6—Interfacing to Prerelational and Postrelational Data

Interestingly, prerelational and postrelational systems are very similar. They both processcomplex hierarchical data models, while conventional relational databases use simpletwo-dimensional data tables and result structures. In this regard, prerelational andpostrelational systems have similar tasks to perform in order to process them using SQLrequests. This means that they too can be processed in a similar fashion to structured records,as is demonstrated in Figure 14.6, which replaces the structured data record processor inFigure 14.5 with a nested relational processor. This could also have been an IMS database orany other hierarchical database (see Chapter 11 for an IMS example).

14.7—The Importance of the View for Contiguous Data

With contiguous data, as described earlier, the entire contiguous data structure must be knownto handle all possible data access requests. This is because itcontinue

Page 169

Figure 14.6Interfacing universal data access to nested relational

structures.

may be necessary to navigate across unnecessary and unrelated data to get to required data. Forexample, in our structure example shown in Figure 14.3, in order to access Product data it isnecessary to navigate over Division, Department, and Employee data. It is usually necessary tonavigate over Division data in this data request because it is on a structure path to Productdata, but Department and Employee data are not on a path to Product data—yet they stillrequire navigating across. This is because they physically precede the Product data in thecontiguous structured record, making the starting point of the Product data in a variablelocation in the record that requires understanding the data structure of the preceding data tolocate. This explains why the entire structure of the contiguous data structure is necessary toaccess it.

One of the advantages of data modeling SQL is that the data structure meta informationnecessary to access data structures is contained in the same SQL used to access it. Contiguous

data structures may present a problem in this case since the entire data structure is necessary toaccess them and usually only the portion of the data structure necessary to access the requireddata is defined in the access SQL. The solution to this problem is to supply one global view ofthe contiguous structure and require that it be used for all access of the data contiguousstructure. This may seem to cause a problem where the overdefinition will cause unnecessaryprocessing and storing of data. This is not the case, because of the structured view optimizationdescribed in Chapter 11. This optimization eliminates unnecessary processing of pathwaysspecified in the SQL specification. This also means that having one SQL view definition forany type of structure will always work without imposing any additional processing.

Utilizing the SQL view as a global application definition for structured data, as describedabove, offers the opportunity for the SQL view definition to contain the required metainformation necessary for the access of the definedcontinue

Page 170

physical hierarchical nonrelational structure. This access will be performed by the accessmethod for this database type. In this way, the SQL that makes up the global view is the logical(application) data structure while the physical data structure information is stored in the viewdefinition. The amount of the physical structure that will require accessing is determined by thedata that is selected for accessing or processing. Physical network data structures can behandled by having a global view definition for each global hierarchical view derived from thephysical network view. How the physical nonrelational meta information is obtained, stored,and utilized is outside of the ANSI SQL specification, keeping this SQL-based nonrelationalstructured access ANSI standard.

14.8—Conclusion

Structured data is used frequently in third- and fourth-generation languages and in objectapplications. When these structured data blocks are written out to a file, they become structuredrecords. This chapter has shown how these structured records can be seamlessly processed bySQL. In order to demonstrate this, it was shown how structured records are composed anddecomposed for access. It was then shown how SQL processing can seamlessly map to andfrom a decomposed structured record. Finally, it was shown how SQL structured record accesscan be implemented seamlessly using current SQL-based universal data access protocols suchas OLE DB, ODBC, and JDBC. This structured data example was used because it can beeasily adapted to operate with all other physical forms of hierarchical data.break

Page 171

PART IV—MISCELLANEOUS DATA MODELING TOPICS

Part IV presents some miscellaneous data modeling topics that can be implemented to improvethe data modeling performed by the outer join operation. Chapter 15 introduces a powerful

extension to the outer join data modeling procedure that allows the linking of a substructure notbased on its root table. Chapter 16 presents the external design of a SQL outer join statementgenerator utility that automates the generation of powerful outer joins that model and processcomplex data structures.break

Page 173

15—Advanced Lower Structure Linking

Advanced lower structure linking applies to hierarchically linking to the lower structure in away that is not covered in the linking rules specified in Chapter 6. Normally when linking tothe lower structure, the root of the lower structure is the only link point that can be referenced.This creates a valid hierarchy, and one that can be built top to bottom as would normally beexpected for a hierarchy. But there may be times when it is desirable to link to an existinglower level structure not based on its root. This is actually possible, and it will form a validlogical hierarchical structure with hierarchical semantics that are seamlessly compatible withstandard SQL view processing.

15.1—Overview of Nonroot Lower Level Linking

As stated above, it is often convenient and necessary to link to an existing lower level datastructure by referencing nonroot segments in the lower structure. This is possible and will forma valid hierarchical structure with hierarchical semantics, but may require special processingprecautions because hierarchical structures built in this manner cannot always be processed ina strict top-to-bottom fashion. This advanced linking process is shown in Figure 15.1. It mayrequire some special processing requirements that will be covered in this chapter.

Figure 15.1 demonstrates, as first pointed out in Chapter 6, that when linking below the rootsegment of a lower level structure, the root-level segment remains the lower level structurelink point. This rule is supported by the fact that the Department segment used in the lowerlevel link criteria is itselfcontinue

Page 174

Figure 15.1Example of nonroot-level linking of bottom structure.

dependent on the Division segment's existence, as shown in the example in Figure 15.1. Thismeans that the Division segment has to be linked to the Department segment before the Managersegment is linked to the lower structure, which semantically follows the expanded SQL syntaxused in these situations. This logically makes the lower level structure root the link point sinceall segments under it are dependent on it. This also means that hierarchical top-downprocessing is not always possible with this linking method.

15.2—Previous Nonroot Lower Level Linking Method

Some prerelational systems supported linking to lower level substructures using anonroot-level reference point. The easiest way to handle this for prerelational systems was tomake the reference point of the lower level structure the link point that caused the substructureto be inverted around the link point. This also causes all other paths originating from the rootsegment of the lower structure to be discarded. An example of this is shown in Figure 15.2.

This approach to linking to a lower level structure causes the structure of the lower levelstructure to change and thus its semantics change, also. For example, in the resulting structure,Division no longer affects Department and Product is removed. So this is probably not the bestapproach to take if another, more seamless approach is available. This approach of linking to anonroot-level link point in SQL does not emulate SQL's natural join syntax and semantics.

15.3—Semantics of Nonroot Lower Level Linking

Nonroot lower level structure linking can also be performed using multiple link points as longas they originate from a single upper level structure link point ascontinue

Page 175

Figure 15.2Example of old method of performing nonroot-level linking.

defined in linking rule two in Chapter 6. An example of this operation with its data structurediagram and SQL is shown in Figure 15.3. Even with multiple paths to the lower structure, theroot of the lower data structure is semantically the link point and the standard SQL outer joinsemantically and operationally supports this derived data structure. The lower level structure,which is usually built before it is joined, is filtered when joined according to the link criteria.This is the same process that occurs when structures are built bottom-up and throwaways(retrieved row discards) occur, as was described in Chapter 11. In the example below, the

Division view is filtered according to the Manager link value as it is linked. This means aseach manager is linked to the Division view, only the Department and/or Product for which thatparticular employee is manager is preserved. This is a simplified description, expanded furtherbelow.break

Figure 15.3Multiple path nonroot reference to lower structure.

Page 176

To understand multiple path nonroot references, it is easier if single path references areunderstood first. If the SQL ON clause in Figure 15.3 did not specify an AND or OR clause sothere is only one link criteria—say, a Department comparison—the Manager link would onlybe made on a Department match, with all other nonmatching Departments filtered out. But, noProducts would be filtered since there would be no filtering criteria specified for it. Thesesemantics are intuitive, unambiguous, and useful.

The SQL statement in Figure 15.3 does use an OR to link managers to the lower structure basedon whether they are a department manager or a product manager, creating a multiple pathreference. If a manager is neither, than he or she will not be linked to the lower structure. If amanager is a department manager, he or she will be linked to the lower structure with all othernonmatching Departments filtered out, but with no Products filtered out. If a manager is aproduct manager, the reverse is true; he or she will be linked to the lower structure with allother nonmatching products filtered out, but with no Departments filtered out. If a manager is amanager of a department and a product, then he or she will be linked to the matching lowerstructure and no filtering of the Department and Product will occur. This is consistent with theone-sided matches just described and follows the natural hierarchical sibling leg queryfiltering semantics described in Chapter 5.

If the SQL ON clause in Figure 15.3 specified an AND operator instead of an OR operator,then a multiple path link would only match a situation where the employee was both a managerfor a product and a manager for a department in the same division, and all other managers andproducts would be filtered out. The manager would have to be a department and productmanager from the same Division because of the common parent rule, also described in Chapter5.

To see why the different semantics described above make sense and why SQL and structured

data follow these semantics, producing data results that support these semantics, some queryexamples will be examined. The data in Figure 15.4 will be used in these queries that appearin the next sections. The data results are presented both in a structured format and a relationalflat, two-dimensional format, which uses the Cartesian product to represent the data in thisform. There are sibling segment paths in the data results to demonstrate their semanticoperation.

15.4—Single Path Reference to Lower Structure

A single path reference below the root to a lower level structure can consist of a singlereference or multiple ANDed references along a single path in the lowercontinue

Page 177

Figure 15.4Data used in following nonroot linking examples.

structure. In the latter case, this can include the root of the lower structure. Figure 15.5 showsan example of linking to a lower level structure using a single reference below the root. Singleor multiple references ANDed along a path operate on the same semantic filtering principles,so this example should suffice in all single path cases. This example's results and the others inthis chapter use a structured format to emphasize the data structure being displayed.

The SQL query statement in Figure 15.5 hierarchically links the upper level structure consistingof only the Manager table to the lower level DivView structure. This link is based on the lowerlevel structure's DeptMgr data field located below the root of the lower level structure, whichcreates the hierarchical structure and data shown in Figure 15.5—the associated semanticswere described in Section 15.3. DeptB is filtered out since its department manager Don is notin the Manager table. Along with DeptB, its Employees are also filtered out, as you wouldexpect. The last result in Figure 15.5 lists manager Jim with no other data since Jim is a

product manager and not a department manager, and the linking was based on departmentmanagers. Notice that all the other data on the nonfiltered paths are not filtered out. Thisstructured result also reflects the same result (minus the replicated data) applied relationally,ascontinue

Page 178

Figure 15.5Single path nonroot reference to lower structure data

example.

can be seen by applying the link criteria to each row in the Cartesian product in Figure 15.4.

15.5—Multiple Path References to Lower Structure

A more complex lower level linking occurs when multiple paths to the lower level structureare used. While multiple path lower level linking does create a valid hierarchical structure, theresults may appear ambiguous, depending on the use of the data. The use of the data may not fitits intended use, which can usually be corrected by using a single path reference, but sometimesa multiple path reference may be what is needed.

The SQL query statement in Figure 15.6 hierarchically links the upper level structure consistingof only the Manager table to the lower level DivView structure. This link is based on the lowerlevel structure's DeptMgr or ProdMgr data fields located below the root and on different pathsof the lower level structure, creating the hierarchical structure and result shown. Since managerMike is both a department and product manager, no Departments or Products are filtered outsince a match in product manager includes all Departments and a match in department managerincludes all Products. Manager Ralph matches with DeptC only, thereby filtering out otherDepartments, but not Products. Manager Jim only matches with product X, thus filtering outother Products but not Departments. As stated previously, the multiple path semanticsdemonstrated here were covered in Chapter 5 under sibling leg semantics.

This structured result also reflects the same result applied relationally, as can be seen byapplying the link criteria to each row in the Cartesian product in Figure 15.4. This result mayseem ambiguous since in some cases Productscontinue

Page 179

Figure 15.6Multiple path nonroot reference to lower structure data example.

are filtered and in other cases Departments are filtered. But it does link the structure to theDivView structure hierarchically and may be useful if the filtered values are not used insummaries unless they match the resulting semantics.

A final word about multiple paths and sibling path semantics. The Division view (DivView) inFigure 15.3 was used to demonstrate multiple path semantics using the Department and Producttables. These semantics were first described in Chapter 5, which documented how sibling legsemantics relied on the ''common parent" domain to determine and control the semantics. Thecommon parent of the Department and Product segments is the Division segment, which alsohappens to be the root segment of the Division structure. Note that this is a coincidence—theroot of a structure does not automatically operate as a common parent. This means thatsemantics of multiple path lower level references could become complex, with many differentcommon parents occurring at different locations in the structure. While the internal semantics ofmultiple path lower level structure references may be complex and the results may seemambiguous, the result is logically and relationally sound, and can be intuitive once the user isfamiliar with OR logic semantics.break

Page 180

15.6—Optimization Concerns for Nonroot Lower Level Linking

The optimizations specified in Chapter 11 can still be performed, but when nonroot lowerlevel linking is used, additional requirements need to be imposed on a case-by-case basisbased on hierarchical semantics. Top-down optimization as described in Chapter 11 is limited.In the SQL query in Figure 15.3, for example, the Division segment must be joined to theDepartment and Product segments before it can be joined to the Manager segment. This can

also affect view optimization, described in Chapter 11. This optimization can still beperformed, but will have to be adapted to sometimes access link criteria points even if they arenot on a path requiring access. In the example in Figure 15.7, the Department table is the onlytable containing selected data. Normally, the Employee and Product tables would not requireaccess since they are not on a path to selected data. However, indirectly the Product table is ona path to the required Department data, since the Division table relies on it to be linked withthe Manager table. Thus, removing it from access could change the result.

15.7—Using Lower Structure Linking with a View WHERE Clause

In Chapter 6 it was shown how structured subviews could contain WHERE clauses to filter thedata in their view. Because of the way WHERE clausescontinue

Figure 15.7View optimization needs to adapt for nonroot-level linking.

Page 181

operate on the entire structure, as explained in Chapter 7, using them with subviews presentsproblems, in particular, the filtering of higher level data based on lower level data. This resultsin a nonhierarchical form of processing, logically requiring bottom-up processing. For thisreason, Chapter 6 suggested limiting view WHERE clause processing to the root segment of theview. This allowed the view to be filtered based on its root, while keeping the processingstandard.

View WHERE clause processing using lower level filtering criteria is another form ofadvanced lower level structure linking as described in this chapter. This chapter shows that itdoes form a valid hierarchical structure and can be processed taking into consideration itsspecial processing requirements. Figure 15.8 demonstrates how this view WHERE clauseprocessing with lower level references results in the same processing requirements andfiltering results as described earlier in this chapter. The WHERE clause transformation to ONclause shown in this example was first shown in Chapter 6. In that example, the WHEREclause transformation was limited to join criteria for the root table or segment. With lowerlevel linking, this limitation is relaxed to include tables and segments from the root down to the

lower level link segment.break

Figure 15.8Advanced lower structure view WHERE clause processing.

Page 182

15.8—Restructuring the Data

While lower level linking, as described in this chapter, avoids the restructuring of data, whichis normally desired, it may be desired to dynamically alter the data structure of fixed-formatnonrelational physical structures. This is possible by comparing the structure in the SQL usedfor access with the physical structure definition of the data being accessed. The physicalstructured data is then restructured to match the specified SQL access structure. This isdemonstrated in Figure 15.9. Since physical legacy structures are fixed, this is the only methodavailable for restructuring.

Access to legacy data usually involves the heterogeneous access of a virtual (or logical) viewcomposed of multiple fixed-format legacy data structures. These multiple legacy fixedstructures are free to be linked together in any fashion using the access SQL. With this in mind,and to keep restructuring manageable, restructuring should be applied to single physicalstructures in isolation from other structures. This means that restructuring cannot be appliedacross physical structures.break

Figure 15.9Example of dynamic restructuring.

Page 183

The restructuring model of operation breaks apart the physical structure in a virtual record intoits constituent segment parts to resemble tables. Foreign-key values are placed into eachsegment for contiguous structures, which do not need them. These tables have dataindependence and can be easily restructured by naturally performing the access SQLspecification. This restructuring can cause data to be replicated or removed as necessary toproduce the correct semantics.

The amount of restructuring possible is limited to the structure of the physical data structure.This means that the physical data structure acts as the conceptual (or global) data view fromwhich all application views can be derived. This is important to realize, because your result islimited to this global view. If your physical view is an Employee view and your access view isa Department view, your result may not be what you expect. Instead of having one Departmentstructure for each department, containing all of its employees, there will be one Departmentstructure per employee. It would be possible to improve on this result, as described, byrestructuring the entire database all at one time instead of a record at a time, but this may not befeasible.

15.9—Conclusion

Nonroot lower level structure linking is a powerful capability that extends the outer join's datamodeling capability. While it does not follow hierarchical processing rules precisely, it doesgenerate hierarchical structures with hierarchically correct semantics, and extends this

hierarchical data modeling ability automatically and naturally to relational and nonrelationaldatabase processing. There is a tradeoff for this extended hierarchical data modelingability—it may require more complex processing procedures in order to take advantage of theadvanced capabilities described in Part III of this book. It was also shown how a fixed datastructure can be dynamically restructured to satisfy an SQL request with a different structurethan the fixed structure being accessed.break

Page 185

16—Data Modeling Outer Join Generator

Constructing complex ANSI SQL outer join statements can be very difficult and error prone.This chapter presents an easily adaptable generic external design of a successfully prototypedutility tool that automates the construction of powerful data modeling outer join statements asdocumented in this book. This is performed by assisting the user in constructingsimple-to-complex hierarchical diagrams that are automatically transformed into outer joinstatements. This greatly simplifies the complexity of advanced outer join construction in a veryintuitive and simplified manner.

Less and less SQL is being written by hand because SQL automatic generation is becomingmore and more commonplace and expected. This product proves that complex data modelingouter joins can also be generated easily.

16.1—Product Overview

Using input from the user in the form of a structure diagram, this product constructs complexANSI SQL outer join statements that automatically model and process complex data structurescomprised of standard relational tables. This is possible because the ANSI SQL outer join'spowerful syntax and datapreserving capabilities allow it to naturally model complexhierarchical data structures. This results in the outer join's operation following the semanticsdictated by the data structure defined in the outer join's specification.

This product has integrated many of the advanced capabilities and features described anddocumented in this book. These include complex data modeling, embedded structured views,views within views, support forcontinue

Page 186

nonhierarchical join operations, semantic optimizations, advanced data filtering, and illogicalstructured view detection. These are covered later in this document.

The hierarchical relational database processor from Chapter 12 has also been designed intothis product to allow the testing and verification of the generated outer join data modelingstatements. Users can add their own tables and data to the database. The database engineprocesses relational data in a nested (structured) manner to demonstrate the advanced outer

join capabilities that are possible from the vendor utilizing this technology as described in thisbook.

16.2—Operational Overview

To aid the user in supplying the necessary data structure information, this software utilitycommunicates with the user through a GUI interface with a series of menus for adding andchanging diagram boxes, as shown in Figure 16.1. This interface allows the user tointeractively diagram the required data structure using the menus to build and change the datastructure diagram. As the diagram progresses, it is displayed along with the outer joinstatement that models the data structure.break

Figure 16.1Example of screen display.

Page 187

Each table in the data structure is represented as a rectangular box identified here as a structurebox that is hierarchically linked with other structure boxes using the user-supplied information.Each box contains a unique box name. This name is its associated table name unless this tableis used more than once in the data structure. If this is the case, then this box is given a uniquealias name that should represent its meaning for each use of the associated table in the datastructure. Clicking on a box will show all the database information associated with it, such ascolumn names and table join criteria, and allow it to be changed.

The displayed data structure is stored in memory. All data structure operations specified viathe GUI are performed on the data structure stored in memory. Only one data structure is storedin memory at a time. The structure in memory is redisplayed with its associated outer joinstatement after each data structure operation. This means the current data structure and its outerjoin statement are usually displayed unless a query is run or a help request has used the screenin the interim. In this case, any of the menus will cause the data structure to be redisplayed after

their operation completes. The optimization menu that optimizes the outer join can also be usedto directly redisplay the data structure if immediate display is necessary.

The capability to combine tables using nonhierarchical join operations such as FULL andINNER join logic into a logical table and introduce it into the hierarchical data structure issupported. The data structure can support any number of these logical tables. Normally,UNION, FULL, and INNER joins that do not perform hierarchical data modeling wouldinvalidate the hierarchical data structure being constructed. But isolated properly in the outerjoin statement being built, these nonhierarchical joins can be controlled so that they do notinvalidate the data structure being modeled. These logical tables were discussed in Chapter 7.

A database engine and default test database is included to test out the execution of thegenerated outer join using the Run menu to see if its semantics will match the desired results.The default database is loaded into memory when this data modeling utility is invoked. Theuser can add new tables and data to this default database or specify their own database usingthe Load menu. If tables in the data structure being modeled are in the active test database inmemory, then their column names do not need to be specified when defining the structure usingthe user interface. If this is the case, the column names will be automatically taken from thedatabase table names in memory.

Once a complex data structure has been defined, the user can specify in the Optimization menuwhich columns are referenced or needed for a particular use of the structure, and the displayedstructure and generated outer join statement will be optimized for this particular use of the datastructure. Onlycontinue

Page 188

hierarchical outer join data structures such as the one displayed has the ability to remove tablesfrom the view without affecting the underlying structure. See Chapter 11 for documentation onthis optimization.

Structures can be saved to a file, retrieved, modified, or deleted using the View menu.Retrieved structures can combine their structure with the structure in memory. These savedstructures can also be referenced in data structures using embedded view names so that they areaccessed automatically at run time when the outer join statement is generated. This aidsreusability and has the advantage that if the referenced view is modified, the modification takeseffect for all queries that reference it directly or indirectly when they are loaded into memory.These embedded views will be identified by view name in the data structure diagram, withtheir box shadowed for emphasis. If embedded views have been identified as SQL views whenstored, then the SQL statement being generated will only use the SQL view name, as you wouldexpect. These features are specified by using the View menu and were documented in Chapter7.

When the optimization feature has determined that tables in the active structure do not requireaccess, they are removed from the generated join statement but remain in the displayed datastructure with their boxes connected by dashed lines to indicate they do not require access.Embedded view references in memory can be expanded in the structure diagram using theExpand option in the Optimization menu. These expanded views are indicated in the displayedstructure by dotted boxes. They cannot be modified since the changes would not affect theirstored view and they would be out of sync.

A very fine layered data filtering is possible with the outer join and its data modelingcapability. As the user specifies the data structure, a data filter can be applied at each linkpoint based on column values in the table being linked and/or column values along the path upto the root. This data filtering only affects the lower link point table and the tables that follow itin the data structure. This level of data filtering is not possible with the WHERE clause, whichonly filters entire rows. This is documented in Chapter 7.

There is a Help menu available in the main menu for an overview of the utility tool and itsoverall operation, and there is a Help menu available in each pull-down menu that explains itsoperation and may supply contextual information on error conditions.

16.3—Menu Overview

As shown in Figure 16.1, there are six main menus for (1) adding a structure box; (2) changingor removing a structure box; (3) optimizing and displaying the outer join statement and datastructure; (4) saving, retrieving, or deleting acontinue

Page 189

stored structure view; (5) running the outer join query; and (6) loading database data. Thesemenu choices are available from the main selection menu. The Add menu can also be poppedup by clicking on an empty spot on the screen, and the Change menu can be popped up byclicking on a structure box. The Main Bar menu also has Help and Exit menu selections. ThisHelp menu gives an overview of the product and its menus. The Exit menu will end theprogram—the user will be given a chance to save the current structure.

16.4—Adding a Structure Box

Building and defining the data structure is done through the menu in Figure 16.2 by adding andlinking one structure box at a time to the bottom of the data structure in memory. This menu canalso be invoked by clicking on an empty spot on the screen.

On the first data structure box to be entered, the bottom portion of this menu box under thedotted line is not displayed because no parent table information and link information isnecessary since this box should be the root box.

If the table name is not unique for the structure, the user must also enter a unique box name;otherwise, the table name will be used for the box name. When no column names are supplied,the column names are looked up in the active database if loaded in memory. If the table can't belocated in the active database, the user must supply the column names.break

Figure 16.2Adding a structure box.

Page 190

The Parent Box Name field must be specified to explicitly specify the link point box in theupper structure that the new box is to be linked to. The required link criteria must support andreflect the link point specification. The link criteria can be explicitly supplied as describedbelow or the user can be led through it a step at a time so there is no chance for error. Thisprocess is known here as autohelp.

The link criteria are made up of relationships between columns in the lower structure box (theone currently being added) and columns along a single path in the upper structure currently inmemory. The link point box in the upper structure (identified as the parent box) is the lowestlevel box along the specified path. A path is specified using conditions connected by ANDsand OR logical connectors to reference columns along a single path. If a column name is notunique in the structure being built, its box name must include a prefix to the column name,separated by a dot. This will uniquely identify the column.

OR conditions normally have a lower precedence than ANDs. This means ANDs are evaluatedbefore ORs, allowing them to be separately grouped around ORs. Each of these ANDed groupsmust specify the same link point box. Parentheses and constants will be supported. Constantsare used for computed link points such as ''A>5 and B>4" for computing a link point of B to Awhen there is no direct physical link. For more information on link rules, see Chapter 6.

When specifying a data structure, it is also possible to specify a nonhierarchically joined seriesof tables that is treated as one logical table in the data structure. These nonhierarchical logicaltables are defined as separate views and are used the same as any view, which the system willtreat as a single table. All tables in a logical table view specify a join type of INNER, FULL,or UNION, and this join type should remain constant for the logical table view definition. Ifviews are expanded in the displayed data structure using the Expand setting in the Optimizationmenu, logical table views will appear as boxes attached together horizontally in join order. Formore information on logical tables, see Chapter 7.

Data-filtering criteria can be specified anywhere along the path identified by the link criteria.

These data-filtering criteria will filter the table data being identified by the box being linkedbased on any of its column values and/or others along the path to the root. This supplies adata-filtering capability not obtainable through the WHERE clause. The data-filtering clausecan be explicitly specified or supplied through an autohelp facility.

Since the structure is built top to bottom, if a new root box is to be appended to the top of thecurrent structure in memory, the structure in memory should be saved to file with the ClearMemory option specified using thecontinue

Page 191

View menu. The new root box is then added and the original structure is retrieved from filespecifying a link to the new root structure box. For more information on data filtering, seeChapter 7.

Possible errors detected include invalid data structure errors caused by not following the rulesabove (also see Chapter 6) and undefined column names, usually caused by misspelled columnnames.

16.5—Specifying the Link Criteria

The menu shown in Figure 16.3 is invoked for help in building the link criteria. It is invoked inthe Add and Change menus. It walks the user through building the link criteria one condition ata time. It will prevent an invalid data structure link condition from being specified.

Column lists 1 and 2 contain the columns or values that can be selected for comparison whenbuilding the link criteria. These column lists may vary during the building of the link conditionto enforce the parent box specification. The user selects one column from column list 1, onecolumn from column list 2, and one comparison operator. The first selected column iscompared to the second selected column item using the selected comparison operator. Bydefault, the equality operator is already selected. If additional conditions are necessary, theuser selects AND or OR; otherwise, the user selects End, which is selected by default.break

Figure 16.3Specifying the link criteria.

Page 192

The Parent Box Name field must be specified at the time this Help menu is invoked. This willensure that the link criteria correctly reflects the data structure diagram by enabling the Helpmenu to guide the user in specifying the correct link criteria; otherwise, deriving the parent boxfrom the link criteria would be very error prone.

16.6—Specifying a Data Filter

The menu shown in Figure 16.4 is invoked for help in building the data-filter criteria. It isinvoked in the Add and Change menus. It walks the user through building the filtering criteriaone condition at a time. This prevents an invalid filter criterion from being specified.

The Column List fields are the fields that can be selected. The user selects one column from thecolumn list, one comparison operator, and then specifies a literal. By default, the equalityoperator is already selected. If additional conditions are necessary, the user selects AND orOR; otherwise, the user selects End, which is already selected by default.

This menu will be invoked after the join criteria has been processed. This will ensure that thefilter criteria do not inadvertently influence the join criteria. This is done to verify that what theuser considers a filtering condition is truly a filtering condition and does not influence the linkcriteria in any way.break

Figure 16.4Specifying a data filter.

Page 193

16.7—Changing or Removing a Structure Box

The Change menu shown in Figure 16.5 is used to modify the active data structure by changingor removing a structure box. It can be invoked directly by (1) positioning on the displayedstructure box to be changed or removed and clicking the mouse, or (2) by selecting Change

from the Main Bar menu and then specifying the structure box name to be changed or removedon the following intervening pull-down menu.

Changing a stored view cannot be done directly. The stored view must first be loaded intomemory and then resaved to perform a permanent change.

As stated above, the menu in Figure 16.6 can be displayed with its current column names bydirectly clicking on the structure box to be changed or removed, or by first selecting this optionfrom the Main Bar menu andcontinue

Figure 16.5Indicating a box name for a change or remove

operation.

Figure 16.6Changing or removing a structure box.

Page 194

supplying the structure box name in the menu above. If Remove or Remove All is specified inthe above menu, the removal is performed without requiring the menu on the following page.Remove All will remove all structure boxes on a path under the specified structure.

To insert a structure box in a path, first add it under its insert point, then change the parentreference of the lower structure of the insert point to reference the inserted structure box. Forstructure box changes, if Link Criteria or Filter Criteria is clicked on, Help will be invoked.Help for Link Criteria and Filter Criteria were described in the directions for the Add menu.

If the table name is not unique to the structure, the user must enter a unique box name;otherwise, the table name will be used for the box name. If no column names are supplied, thecolumn names are looked up in the active database in memory if one has been loaded using theLoad menu. If there is no active database or the table cannot be located in the active database,

the user must supply the column names.

The link criteria is made up of relationships between columns in the lower structure box (theone currently being added) and columns along a single path in the upper structure currently inmemory. The specified link point box in the upper structure (identified as the parent box) is thelowest level box along the specified path. A path is specified′ using conditions connected byANDs and ORs to reference columns along a single path. If a column name is not unique in thestructure being built, include its box name as a prefix to the column name, separated by a dot.This will uniquely identify the column. Additional information on the Change menu items canbe found under the Add menu description.

When a stored structure view is retrieved into memory because of a view reference, the effectis different if the view is a SQL view, which can be indicated in the bottom of the menu bysupplying or changing the matching SQL view. In this case, the constructed outer join statementuses the SQL view reference name instead of the actual tables that make up the view. Thisallows the generated statement to take advantage of the actual SQL view stored in the SQLdatabase. This supplied table name allows an alias name to be assigned to the SQL view,enabling the same view to be used more than once in a data structure by avoidingcolumn-naming conflicts.

16.8—Optimizing the Outer Join Statement and Data Structure

The menu shown in Figure 16.7 is used to optimize or display the modeled data structure andthe ANSI SQL outer join statement that defines it. If specific column names are specified forselection using this menu, then the displayedcontinue

Page 195

Figure 16.7Optimizing the outer join statement and data structure.

structure and outer join statement can be optimized to only access the tables necessary toretrieve the selected columns.

The column list contains all columns in the defined structure. Click on the columns that are tobe selected. The column list can also be used to turn off optimization by selecting ∗(all).Columns can also be deselected by clicking on them again.

Selected column names are highlighted and are remembered between invocations of the

Optimization and Run menus unless the Add or Change menu is used. The Run menu can alsopermit the selection of columns to optimize the structure and outer join statement when thequery is run. If the Add or Change menu is used, the selected columns default back to ∗(all) toavoid selected column conflicts caused by these menus.

With outer join views, the removal of tables from the view is based on which data is selected.This is possible when the data does not affect the result.

The displayed optimized structure uses two types of connecting lines: solid and dotted. Dottedconnecting lines connect areas of the structure that do not require access because ofoptimization. Views that are loaded using the Load menu are displayed in a condensed form,one box with double lines— these condensed boxes can be expanded using the Expand optionin the Optimization menu. When expanded, the boxes that make them up are made up fromdotted lines. Clicking the Expand option again will recondense them.

16.9—Saving, Retrieving, or Deleting a Stored Structure

The menu shown in Figure 16.8 performs three different functions associated with the datastructures constructed by the user using this utility tool. Thecontinue

Page 196

active data structure in memory can be saved to file. The specified data structure can beretrieved from a file into memory and linked under the current structure in memory if oneexists, and a data structure stored on file can be deleted.

When linking the retrieved structure under the in-memory structure, the link condition is madeup of link relationships between columns in the root box of the lower structure (the onecurrently being added) and columns along a single path in the upper structure currently inmemory. The join point box in the upper structure (specified as the parent box) is the lowestlevel box along the specified path. A path is specified using conditions connected by ANDoperators to reference tables along a single path. If the retrieved structure is to besymmetrically linked to the in-memory root, then specify the join type.

OR operators are allowed and normally have a lower precedence than ANDs. This meansANDs are evaluated before ORs, allowing them to be separately grouped within ORs. All ORsingular conditions and multigroup AND conditions within OR groups must specify the samejoin point table. Parentheses and constants are supported. Autohelp for specifying a linking orfilter criteria can be invoked by clicking on Link Criteria or Filter Criteria.

If memory does not contain an active data structure when this menu is invoked, then the bottomportion of the menu is left off. If the path is not supplied, the current directory is used to locateor store a file by the same name as the structure name. If retrieving a structure that is not to belinked to thecontinue

Figure 16.8Saving, retrieving, or deleting a stored structure.

Page 197

structure in memory, the Clear Memory option must be selected to remove the current structurein memory, if one exists.

When an embedded stored structure is referenced rather than loaded in a data structure, thestored structure is represented in the data structure as a double-lined box with the name of thestored structure. In the SQL statement, all the tables that make up the stored structure arepresent. Referenced stored structures are retrieved each time the structure making the referenceis loaded into memory, and referenced stored structures can be nested. Referenced structures inthe displayed structure can be expanded to show all of their tables and relationships using theExpand option in the Optimization menu, but this expanded portion of the displayed structurecannot be modified since the changes would not be reflected in the stored view on file.

If SQL view is indicated when saving a structure, it indicates that the stored structure is alsodefined to your SQL database as a view by the same name. When this structure is retrieved intomemory by loading or embedded reference, it affects the outer join statement being built suchthat only the SQL view name will be referenced in the SQL statement instead of all the tablesin the structure. This is because the tables not referenced explicitly are defined in the SQLview being referenced and will be expanded by the SQL system.

The alias suffix needs to be specified when a stored structure is loaded into a structure morethan once. The suffix is used to change the table name and column names by appending thesuffix to them so that there are no duplicate names in the structure. For SQL views, the viewname is modified in the data structure and used as an alias in the SQL statement. If the retrievedstructure is not a SQL view, the table names are modified. In either case, the columns of theaffected tables are changed to their qualified names using the alias suffix value as the tablename qualifier.

Saving a data structure to a file does not automatically remove the data structure in memory. Ifa substructure root table name is specified when saving a structure, then only the area of thestructure starting at the identified root table is saved. This allows only a portion of the structure

to be saved.

Deleting a structure requires no specifications other than indicating a delete operation andstructure name, and optionally the path name. The delete operation will remove this viewdefinition from the indicated file.

16.10—Running the Outer Join Query

Running the query will cause the current outer join to display the identified columns. TheWYSIWYG display option shown in Figure 16.9, which is on bycontinue

Page 198

Figure 16.9Running the outer join query.

default, will cause the data to be displayed as structured data, based on the structure specifiedby the outer join statement. This will remove unnecessary replicated data values. If turned off,the data will be displayed in standard first normal form.

The database tables being accessed by this query must have been loaded before this query isrun. The default database is loaded when this application is first invoked; otherwise, the Loadmenu can be used to load additional data and databases.

If the currently displayed outer join statement and its associated structure are currentlyoptimized from this menu or the Optimization menu, the column names specified from eithermenu will be highlighted for default selection in this menu.

Unless overridden by the optimization indicator, optimization will be invoked so thatunnecessary tables are not accessed. This indicator is used to see what external difference, ifany, no optimization has.

16.11—Loading the Database Data

Loading the database data using the menu in Figure 16.10 will load tables and their data from afile that contains the data and database meta information into memory. It is necessary that thetables to be queried are loaded before the outer join query is run. Having the tables loaded

while the structure is being built will also avoid having to specify column names. Building thestructure and outer join does not require the database to be loaded.break

Page 199

Figure 16.10Loading the database data.

If no file name is supplied, then the default test database file name is used (DSGDB.DAT). Thedatabase is automatically loaded from the default file when the application is first invoked. Itcan be modified so that this new data will also be loaded automatically. If Append isindicated, then the loaded database data will be added to the database information currently inmemory, and Overlay will replace all the currently loaded database information.

All data is stored on the input file as characters. All values on the file are comma-separated.The first piece of information for each table is the table name. The second piece of informationis the number of columns in the table followed by each column name. Then the data follows foreach column in turn, repeating until the value "∗END∗" occurs. At this point, the input streamcan end or repeat with information about the next table, starting with its table name. Thisprocedure is repeated until "∗FIN∗" is encountered.

Database system parameters are recorded in the file DSGDB.SYS. These control the maximumnumber of tables, columns, and data items in the total database and the maximum number oftables in the data structure.

16.12—Data Modeling Diagramming Symbols

The join data modeling diagramming symbols in Figure 16.11 will be used to diagram all thevalid join relationships in hierarchical data modeling diagrams used in this data modelingutility. The NATURAL one-sided LEFT and RIGHT outer joins are not listed below becausethey can be transformed into standard hierarchical LEFT and RIGHT joins, as demonstrated inChapter 4. Symmetric joins such as FULL and INNER joins create logical flat tables that arerepresented horizontally as shown below. These horizontal diagram blocks are drawn in theorder of execution, left to right, as shown below. While not suggested, symmetric join types canbe intermixed as shown below. CROSS and UNION joins are not represented below becausethey can be specified as inner joins in the case of the CROSS join and FULL joins in the caseof UNION joins.break

Page 200

Figure 16.11Join data modeling diagramming symbols.

16.13—Conclusion

This chapter has presented a data modeling tool that can easily construct sophisticated outerjoins that model complex data structures. The use of a SQL tool like this one will allowdatabase professionals to utilize the many advanced capabilities of the outer join that are notbeing currently utilized. It is hoped that this in turn will prompt SQL vendors to take advantageof the outer join's inherent data modeling capabilities that are not being utilized.break

Page 201

17—Summary

The ANSI SQL join is an ANSI standard operation with a powerful syntax and outer joinoperation whose real capabilities have not been fully understood or realized. This book'spurpose is to remedy this situation. Many of these capabilities are based on the outer join'sinherent ability to model and process complex data structures. This book proved that thispowerful data modeling capability does exist and demonstrated that this ability can beharnessed in a completely self-contained ANSI standard SQL facility. Using the outer join toperform data modeling was explained carefully so that any hierarchical data structure could bemodeled, including some possibly useful network and hybrid structures.

The flexible syntax of the ANSI SQL join and its outer join operation is what enables asingular, unambiguous hierarchical application view to be defined and utilized. These datamodeling capabilities can be immediately used by the user or can be utilized by SQL vendorsto support advanced new capabilities. The data structure meta information is embeddedtransparently in the outer join syntax that defines the view. This information is automaticallyavailable in ANSI outer joins. By extracting and utilizing this meta information from the outerjoin syntax, SQL vendors can provide ANSI standard features that have not been previouslypossible with standard SQL. Many of these advanced features were discussed in the book, suchas nested relational processing and object relational database interfacing.

The real power of the ANSI outer join appears when three or more tables are joined. This isbecause changing the order that the tables are specified influences the result. This is a dramaticand significant departure for relational databases, and the effects and capabilities of this havenot previously been fullycontinue

Page 202

studied, understood, or documented. Understanding these effects opens a whole new realm ofpossibilities for relational processing. For example, when more than two tables are being outerjoined, natural joins are not just a shorthand specification but actually change the basicoperation of the outer join for each basic type of join. This was also examined closely.

In this book, the power of hierarchical data structures was described, along with the inherentsemantics of these structures. These structures are unambiguous, making them excellent forapplication views. This book also demonstrates how the flat Cartesian product model producesthe same semantics as its comparable hierarchical model when processing relational databasequeries. This has an important significance for SQL seamless access of heterogeneous andlegacy database access, which is also shown in the book.

These outer join capabilities are relatively new to standard ANSI SQL, and are available to beused if users know how. It is hoped that this book has helped show the way.break

Page 203

APPENDIX A—DATABASE VIEWS USED IN THIS BOOK

Page 204

Figure A.1Company database relationships.

Notes on the Company Database Views

The Manager, ProdMgr, and DeptMgr table names used in some of the views from the companydatabase are alias names for the Employee table. The Employee and Department views containthe same tables and use the same relationships. For this reason they are used often todemonstrate how outer join data modeling can model different views. The Division view hastwo legs that both contain multiple occurrences and for this reason it is used to demonstrate thesemantics of multiple legs. The Network structure is used to show that an ambiguous networktype structure can also be modeled by the outer join operation.break

Page 205

Notes on the Parts-Suppliers Views

The Parts and Suppliers views from the Parts-Suppliers database are used to demonstrate howmany-to-many relationships can be modeled with the outer join and how the semantics of theseviews behave. The Association table which is needed to correlate the many-to-manyrelationship is still physically present in both views but its use is transparent to its operationand is therefore not shown in the views. The Association table can also hold intersecting datawhich is data that is related to both tables (Parts and Suppliers) at their intersection point,which is why it must be stored in the association table at the intersection point of the pathbetween these two tables. The price of a part is intersecting data since the price is not onlydetermined by the part but also by the supplier of that part. Intersecting data will logicallyappear to be part of the bottom table in many-to-many application views such as the Parts andSupplier views shown here. This also maintains the Association table's transparency in the

views.break

Figure A.2Parts-Suppliers M-to-M relationships.

Page 207

GLOSSARY

The terminology listed here is used in this book and the definitions supplied pertain to their usein this book.

A

ADTAn ADT is an abstract data type. This is the capability added to SQL: 1999 that allowsuser-defined complex data types to be defined in SQL. They contain an internal structure thatcan consist of multiple attributes and the logic that can operate on it.

Access pathThe access path refers to a navigation path in a hierarchical structure from the root table of thestructure to the table requiring access. This path must be followed when accessing a table byaccessing each table along the path to the required table in order to maintain the semantics ofthe data structure.

Ad hoc queryAn ad hoc query is a database query specified interactively. This means that the database querydoes not require being predefined to the database system processing the query. In relationalsystems, this will require dynamic SQL query processing.

Alternate keyAn alternate key is a column or field in a relational table or record that can also be used as thekey besides its primary key. As such, this key probably is not unique among other rows orrecords in the table or file. A foreign key can be considered an alternate key. The alternate key

is usually the ''many" side of a one-to-many relationship.break

Page 208

Ambiguous semanticsSemantics are about meaning. Ambiguous semantics are semantics that have more than onepossible meaning. These meanings can be conflicting. Semantics should be singular in meaningto be most useful. Nonhierarchical structures can have ambiguous semantics, which willproduce ambiguous results.

Ambiguous structuresData structures such as network structures have ambiguous semantics when used to represent asingular view of the data. These structures do not have a singular meaning because data valuesin the structure can usually be reached from multiple paths, with each path representingdifferent semantics or meaning.

Application viewThe application view is how the application visualizes the structure of the database. Complexstructures should be hierarchical because hierarchical structures are unambiguous in meaning.This enhances the usefulness of hierarchical structure's semantics for application use. Withapplication views, applications can share views, and databases can support many differentapplication views.

Association tableAssociation tables are used in relational databases to maintain many-to-many datarelationships such as the relationship between Parts and Suppliers. This relationship canoperate in either direction as a one-to-many relationship: Part over Suppliers and Supplierover Parts. Both directions cannot be maintained with just the Parts and Suppliers tables, so anassociation table is used between the Parts and Suppliers tables to maintain the one-to-manyrelationships in both directions when performing the necessary joins.

Associative operationAn associative operation is one where the operation's execution order can be changed, withinthe limits of not altering the physical ordering of the operations, without affecting a change inthe result. This is usually tested with the aid of parentheses. Addition and multiplication areassociative in operation, while subtraction and division are not. For example, withmultiplication 5 ∗2 ∗ 4 equals 5 ∗ (2 ∗ 4) while with subtraction 5-3-1 does not equal 5- (3 -1).

B

Bottom-up processing/executionBottom-up processing of outer join hierarchical structures involves their construction bybuilding them from the bottom of the structure upwards. This can change the normal table joinorder and relies on the hierarchical capabilities of the one-sided outer join operation thatbuilds the structure.break

Page 209

Business rulesThe operational rules of a business can be embedded into the database using stored proceduresand triggers. Triggers can turn the database into an active database by having it automaticallyact on the rules by invoking the stored procedures. This process can be further enhanced by therefined data-filtering capability of the outer join, which allows the database to better representthe rules.

C

CardinalityCardinality is a relational term for the number of rows in a table or result.

Cartesian productA Cartesian product is the result when two tables are joined with no WHERE clause. Eachrow of one table is joined with every row of the other table, creating all combinations. For thisreason, the result is referred to as being exploded.

CoalescingCoalescing is the processing of non-null key values and null key join values under the samedomain to return a single valid key value representing the non-null value amongst them. Thishas special significance for outer joins where null key values can be produced because of theirdata-preserving ability.

ColumnA column or attribute is a relational term for a field that is defined in a table.

Common parentCommon parent refers to the lowest level table or segment in a data structure that is a commonlink point to two or more sibling legs of the data structure.

Commutative operationA commutative binary operation is one in which two input arguments can be switched aroundwithout affecting the results. Addition and multiplication are commutative, while subtractionand division are not. For example, with addition 5 + 6 equals 6 + 5, while with subtraction 4 -2 does not equal 2 - 4.

Complex data modelingComplex data modeling, used in the context of this book, applies to the ability to constructhierarchical data structures that contain multiple legs by using the outer join operation.Multiple legs add another level of capabilities and complexity to the principles involved indefining data structures with the outer join and to the semantics associated with these datastructures.break

Page 210

Composite keyA composite key is a key made up of multiple fields that, taken all together, produce a uniquekey. It is usually used when it is necessary to construct a unique key value when no singlecolumn in the table represents a unique key.

Concatenated keyA concatenated key, as used in this book, is a composite key where the fields usually consist offoreign keys that define a hierarchical path upwards to the root segment.

Conceptual viewA conceptual view is a view or schema that defines all possible data and their validrelationships in a database so that all required application views can be defined from it. Assuch, a conceptual view requires a network structure to define it because of the highprobability of intersecting paths. A conceptual view sits between the internal and externalviews and acts as a level of indirection between the two.

CROSS joinThe CROSS join is one of ANSI SQL's join types. It creates a basic inner join Cartesianproduct result, and as such it does not use or require a join condition, so no ON or USINGclause is used with it.

D

Dangling tupleDangling tuples are the rows that are not matched in join operations. In inner joins they arediscarded, and with outer joins they can be preserved in the result by padding the unmatchedrows with nulls.

Data abstractionData abstraction is the hiding of the complexity of the data. In this book a good example wouldbe a stored structured data view whose use helps hide the complexity of the data structure.

Data definitionA data definition is a definition of the characteristics of data items in the database. Thisincludes, but is not limited to, the data type, size, number of occurrences, and structurerelationship of a data item to other data items in the database.

Data filteringData filtering, as used in this book, is the process of selectively removing unwanted data fromthe query result. It is specified on the WHERE or ON clause, but should not be considered partof the data modeling link criteria. The data-filtering process operates differently whenspecified on the WHERE clause than the ON clause. Also see ON clause filtering and WHEREclause filtering.break

Page 211

Data independenceData independence is the characteristic that usually enables data to be easily combined into anunlimited number of different structures. Without this property, data cannot easily be combinedto form different combinations of data. This property requires the normalizing of relational databy breaking it up into multiple tables following the rules of normalization.

Data inheritanceData inheritance is the process of acquiring characteristics and functions from a higher levelclass. In the case of an outer join structured view, this involves inheriting higher level data

when it is included in a structured view so it can be used in multiple views.

Data modelingData modeling is the ability and process of specifying and constructing complex data structuresthat represent specific semantics. In SQL, this can be performed with the ANSI SQL outer joinoperation, which can inherently define and process complex data structures.

Data structure extraction (DSE) technologyThe DSE technology extracts the data structure meta information from outer joins that modeldata structures. This meta information contains a detailed description of the data structure fromwhich powerful and useful semantics can be derived.

Data structure meta informationMeta information, also known as metadata, is information about information. Data structuremeta information is information about the data structure, such as a detailed description of itsstructure and data relationships.

Data structure processingThe ability of the database engine to process a complex data structure and take advantage of itssemantics, which are dictated by its complex structure relationships.

Data warehouseA data warehouse is an out-of-production storehouse of a company's past and possibly itspresent data used for performing all forms of analysis. For this reason, this data needs to becombined (data modeled) in infinite ways, and processed in an ad hoc, what-if, interactivemanner.

Database navigationSee Navigation.

Database segmentSee Segment.

Database restructuringSee Restructuring.break

Page 212

Database rootSee Root.

Declarative languageSee Nonprocedural language.

DenormalizationDenormalization is the process of prejoining normalized data and saving the result as anunnormalized table. This is a deliberate data design decision. This avoids the overhead ofperforming the join operation each time the query that requires the data is used. Thedisadvantage of unnormalized data is that its data independence is lost.

Derived tableSee Temporary table.

Disparate database accessDisparate heterogeneous database access is the accessing and processing of different databasetypes in a logical database view. This can include the combined processing of relational andnonrelational databases. Also see Heterogeneous database access.

DomainDomain in relational terms usually applies to columns in one or more tables that have the sameuse and meaning, and therefor the same range of values. Thus, when joining two tables, theirjoin columns should be in the same domain.

DSE technologySee Data structure extraction technology.

Dynamic path shorteningDynamic path shortening is a database access optimization used in outer join processing wherethe active access path can be dynamically shortened at the current path position when missingdata is encountered. This is significant to the outer join operation since missing data is notusually a reason to stop processing with the outer join.

Dynamic rebuild/rewriteDynamic rebuild or rewrite is an SQL optimization where the SQL query can be dynamicallyrewritten at execution time to be more efficient or to take advantage of the latest features in thecurrent SQL system. With the outer join containing meta information about the data structurebeing processed, there are significant possibilities for semantic optimizations to be applieddynamically. These include applying powerful new SQL: 1999 features as they becomeavailable in the active SQL processor.

Dynamic SQL specificationDynamic SQL specification is the ability to construct SQL query statements at run time. Thisenables SQL queries to be specified in an ad hoc, interactive fashion—not requiringpredefinition. Thiscontinue

Page 213

capability is automatically extended to data modeling by the ANSI SQL outer join operation.

E

Embedded viewEmbedded view is the capability to nest views by placing views within views. This nestingcapability seamlessly supports structured data views containing data structures defined by theouter join operation.

Enterprise accessEnterprise access is the ability of an application or database system to seamlessly access alldatabases in the corporate enterprise regardless of the database types or database locationsinvolved.

Entity relationship diagramAn entity relationship (ER) diagram is a network structure diagram that depicts all of the data

entities, their relationships, and their relationship types (i.e., one-to-many, many-to-one,many-to-many) in a database.

Equal joinAn equal join is just that, a relational join that uses an equality operation to relate the tables.An equal join is also known in relational terms as an equijoin.

EquijoinAn equijoin is a fancy term for an equal join, which is a relational join that uses an equalityoperation to relate the tables.

Expanded viewExpanded views are embedded SQL views whose name reference is replaced with itsrepresentative SQL code so that the query can be processed (parsed).

Explicit natural joinAn explicit natural join is a term coined in this book for ANSI SQL natural joins that arespecified by using the NATURAL keyword, hence the use of the term explicit.

Extended Cartesian productA relational Cartesian product produces all combinations of rows from two relational tables.An extended Cartesian product, as used in this book, operates by augmenting each table with anall-null row that is also joined with every unmatched row when performing the Cartesianproduct. This result can be used to validate or define the operation and semantics of outer joinoperations.

External viewAn external view is one of the three types of views that comprise the three-tier model fordatabase architecture, these being the internal, external, and conceptual views. The externalview is the view that thecontinue

Page 214

application and user of an application have of the database. For this reason, it is also known asthe application view. With application views, applications can share views, and databases cansupport many application views.

F

FieldA field in relational terms is a column in a table, data element, or attribute.

First normal formFirst normal form doesn't permit relational tables to contain repeating data types or groups in asingle row. Repeating data should be placed in another table where each occurrence of therepeating data is placed in a different row. This allows a table to remain a flattwo-dimensional structure.

Fixed-occurring fieldsFixed-occurring fields are fields that can occur multiple times in a record. They are calledfixed because the amount of space required to contain the fixed-occurrence fields is reserved in

the record whether it is used or not. This means that a fixed-occurring field can contain avariable number of data fields, but is still considered fixed because it always uses the samefixed amount storage space.

Flat fileA flat file is a file that has a fixed, unvarying format. It has no variable-occurring fields, butcan have fixed-occurring fields. In this way, each record is of the same length. Also seeVariable-occurring fields and Fixed-occurring fields.

Flat structureA flat structure is a two-dimensional data structure, the same as a relational table or datastructure in first normal form.

FlatteningFlattening a data structure means taking a multilevel structure such as a hierarchical structureand converting it into a flat, two-dimensional first normal form table. A side effect of thisflattening is losing data structure information (semantic loss) and introducing replicated datavalues to fill out the flat structure.

Foreign keyA foreign key is an alternate key in one or more tables that relates to a primary key in anothertable.

Fourth-generation languageSee Nonprocedural language.

FULL joinA FULL join is an outer join type that preserves data on both sides of the join operation whenrows are not matched up. Unmatched rows are padded with null values.break

Page 215

H

Heterogeneous database accessHeterogeneous database access is the accessing of different physical databases, possibly fromdifferent vendors, as if they were one logical database.

Hierarchical data structureHierarchical data structures are multilevel data structures where the tables (nodes) at eachlevel only have one related parent. This means the tables have only one pathway leading tothem from the next higher level table directly above them. This results in hierarchical structuresonly having a single path from the root of the structure to any data item, making their semanticsunambiguous and powerful.

Hierarchical processingThe term hierarchical processing, as used in this book, is the processing of hierarchicalmodeled structures as hierarchical structures, so that the useful semantics of these structurescan be utilized. This means that SQL-based processing of hierarchical modeled relational andnonrelational tables can be performed in non-first normal form to avoid flattening the datastructures, which would cause semantic loss.

HierarchictivityA term coined in this book to describe transformational principles of hierarchical structuresthat are not covered by commutative and associative principles.

Hybrid structureA hybrid data structure, as used in this book, is a complex data structure comprised fromhierarchical outer joins (i.e., one-sided outer joins) and nonhierarchical joins (i.e., inner andFULL outer joins). This combination of join operations can create hybrid data structures withvery complex semantics that are outside the semantics of hierarchical and network structures.These may be useful in very specific cases.

I

Illogical structureAn illogical or invalid structure, as used in this book, is a nonhierarchical data structureconstructed by hierarchical outer joins (i.e., one-sided outer joins) that do not follow the joinlinking rules for creating hierarchical structures. The semantics of these structures are oftenambiguous, but may be useful in very specific cases.

Implicit natural joinAn implicit natural join is a term coined in this book for ANSI SQL natural joins that arespecified by using the USING clause instead of the ON clause, which implies that a natural joinis to be performed—hence, the use of the term implicit.

IMSIMS is IBM's popular legacy hierarchical database management system.break

Page 216

InheritanceSee Data inheritance.

Inline view expansionSee Expanded views.

Inner joinThe inner join is the standard default join. It does not preserve unmatched data rows under anycircumstances.

Internal viewAn internal view is one of the three types of views that comprise the three-tier model fordatabase architecture, these being the internal, external, and conceptual view. The internalview is the view that the database system has of how the data is physically stored in thedatabase.

Intersecting dataIntersecting data is additional data that is stored in an association table along with theassociation data. An association table holds the relationships between two tables that have amany-to-many relationship, such as Parts/Suppliers. The intersecting data is uniquely related tothe associated data in each row at the intersection point. An example of intersecting data is the

price of a part from a specific supplier.

Invalid structureSee Illogical structure.

J

JDBCJDBC is the Java Database Connectivity API. It uses SQL as the database interface language. Itis an open database connection standard that can be used in the Java programming environment.

Join table orderJoin table order can be controlled in the outer join specification. This table join control isimportant in an outer join operation since it can affect the result.

Join table reorderingJoin table reordering is the process of altering the table join order to optimize the execution ofthe outer join operation. This cannot be done indiscriminately, since changing the table joinorder can affect the results of the outer join operation. Analyzing the data structures defined bythe outer join operation and understanding its semantics is one way of determining when andhow table join order can be optimized without changing the result.

L

Late bindingThe ability to specify which methods are used at run time. Late binding with outer join datamodeling is the ability of the database application to accept different data structures that can bespecified and processed at run time.break

Page 217

LEFT joinThe LEFT join operation is an outer join that preserves unmatched rows from the tablespecified on the left side of the join operation.

Left-sided nestingLeft-sided nesting, as used in this book, is the natural, intuitive way of specifying more thantwo tables in an ANSI SQL join specification. Tables are introduced left to right whenspecifying the outer join specification. This is in contrast to right-sided nesting.

LegA leg is a pathway in the data structure, including the data that is stored along its path.

Legacy databaseLegacy database applies to any prerelational database that is still in existence, or anyprerelational database system that is still in operation.

Link pointsThe link points, as used in this book, are the two tables, one in the upper and one in the lowerdata structures, which are linked by a pathway when a data structure is being built using theouter join operation and its ON clause join specification. The ON clause join specification

specifies the link points.

LinkingLinking is the process defined in this book for specifying a pathway between two structures thatcombines them into a single structure.

Logical structureLogical structures, unlike physical structures, rely on data values and their logical relationshipsto define the data structure. They do not rely on physical data juxtaposition or physical pointersto form the data structure.

Logical tableA logical table, as used in this book, is a simple intuitive construct supported by the outer joindata modeling procedure. It seamlessly supports a series of powerful nonhierarchical,symmetric join operations as a single, flat logical table in the hierarchical model withoutinvalidating the hierarchical structure. These symmetric operations include the INNER join,FULL join, UNION join, and NATURAL join operation.

Lost dataSee Missing data.

M

Many-to-many relationshipsMany-to-many relationships are relationships used in data modeling where both sides of therelationship can have multiple occurrences. The classic example is the Parts/Suppliersrelationship where onecontinue

Page 218

part can be carried by multiple suppliers, and one supplier can carry multiple parts.

Many-to-one relationshipsMany-to-one relationships are relationships used in data modeling where the upper level of therelationship has many occurrences and the lower level has only a single occurrence. Theclassic example is the Employee-to-Department relationship where many employees can havethe same department.

Meta informationMeta information, also known as metadata, is information about information. When used withdata structures, as in this book, it pertains to information about the data structure, such as itsdescription.

MetadataSee Meta information.

Materialized viewMaterialization of a view is the process of generating the view's value as a temporary table toreplace the view in the processing of a query.

Missing data

Missing data, also known as lost data, is the data that is lost in an inner join when rows of thetables being joined do not match with any other rows. Missing data can also occur withone-sided joins on the side that is not being preserved.

Multidimensional databaseA multidimensional database is an OLAP database organized and controlled around multipledimensions to accommodate very efficient access and the manipulation and correlation of largeamounts of data.

Multileg structureA multileg structure is a complex hierarchical structure with multiple legs. If any table or nodein a hierarchical structure has more than one pathway exiting it, it defines more than one leg.Multiple legs in hierarchical data structures significantly increase their semantics andcomplicate their construction principles, which is why multileg structures are consideredcomplex structures in this book.

Multimedia databaseA multimedia database or MMDBMS is not just a database with multimedia features, but aspecialized database whose purpose is to support multimedia applications and theirfunctions.break

Page 219

N

Natural joinA natural operation is applied to an equal join operation, which causes the join's commonnamed join key matching values to be coalesced into a single value in the result. With outerjoins, this feature can affect their operation since these coalesced values are stored, updated,and accessed in the working set as the join operation progresses through other natural joinoperations in multitable joins that are under a common join domain.

NavigationDatabase navigation is the process of locating a data record anywhere in the database structure.Not all databases support procedural navigation—for example, relational databases areself-navigating, which operate transparently. The ANSI SQL outer join arguably does allowsome level of procedural navigation since it can specify the order the tables are joined in,which can affect the result.

Nested displayA nested display is one in which the data is displayed in ''What You See Is What You Get"(WYSIWYG) structured format. This format preserves the data structure and its semantics sothe data and its structure can be displayed intuitively, showing the data structure.

Nested relational processingNested relational processing is the structured processing derived from processing nestedrelations (tables within tables). Structured processing operates by following the physicalstructure to take advantage of its semantics.

Network structure

Unlike hierarchical data structures, network data structures can have multiple paths to the datastored in them. Like the hierarchical structure, this has specific uses. If the data can be reachedfrom more than one path in a network structure, it makes the semantics of the data ambiguousfrom the point of view of an application. This limits its usefulness as a view for applications.But a network structure is necessary to define a conceptual view with its capability to defineintersecting paths that can define all possible data relationships in a database.

Non-first normal formIn relational terms, non-first normal form means that tables can support structured or nesteddata with repeating data (multiple occurrences of data in a single column). This form ofrelational data can be processed by a nested relational processor. The first normal formrequirement is not a requirement for good database design or even a relational requirement —itis a requirement imposed by SQL and its requirement for flat, two-dimensional tables.break

Page 220

Nonhierarchical join supportSee Logical table.

Nonprocedural languageNonprocedural languages are also known as fourth-generation languages or declarativelanguages. The term declarative language got its name from the fact that with nonprocedurallanguages it is not necessary to specify how to perform a task; it is only necessary to specifywhat you want the task to accomplish.

Nonrelational databaseA nonrelational database is any database that is not a relational database. These include legacyand postrelational databases.

NormalizationNormalization is the process of designing a database following at least the first three databasenormalization rules for good database design. All of these normalization rules require or relyon breaking the data apart and storing the data in multiple tables or segments to increase theirdata independence. The join operation is used to combine the data back together when and as itis needed.

NullNulls are padding values that are used to represent missing data in outer join results. Nulls arealso used to represent unknown or nonapplicable values when data is entered into a relationaltable.

O

Object relational interfaceAn object relational interface is a relational interface to an object database that helps integratethe two technologies. To be successful, this interface needs to support standard SQL.Unfortunately, previous versions of SQL did not support data modeling, which is a verynecessary component of object databases. Now, with the support of the ANSI SQL outer join,SQL can inherently perform complex data structure processing, making it an excellentcandidate for an object relational database interface.

ODBCODBC is the Open Database Connectivity API standard put forth by Microsoft Corporation. Ituses SQL as the database interface language.

OIDOID is an object identifier used in object databases and object relational databases to refer to aspecific instance of an object.

OLE DBOLE Database is a COM (common object model)-based technology put forth by MicrosoftCorporation for accessing data stored in various ways. It can use SQL or other kinds ofdatabase commands to access the data.break

Page 221

ON clauseThe ON clause is used with the ANSI SQL outer join operation to specify the join criteria foreach table being joined in the join specification. The ON clause does supply greater controlover outer joining tables than is possible through a single WHERE clause. This proves that ithas usefulness beyond the WHERE clause and gives reason for its existence. The ON clause isalso important to performing data modeling.

ON clause filteringThe ON clause is used with the ANSI SQL outer join operation to specify the join criteria foreach table being joined, but it can also specify data filtering, which allows more control andprecise level of data filtering than if specified on the WHERE clause.

One-sided joinThe one-sided join is the LEFT or RIGHT join. These are known as one-sided joins becausethey preserve data only on one side, known as the dominant side.

One-to-many relationshipsOne-to-many relationships are relationships in data modeling where the upper level of therelationship has only one occurrence and the lower level has many related occurrences. Theclassic example of this is the Department-to-Employee relationship where each department canhave many employees.

Open database interfaceAn open database interface is a database interface that is not proprietary and freely availableto all potential users, and supplies access to most common database types.

OR subclauseOR subclauses, as used in this book, are the join conditions specified between OR operators ina join condition. These are used in determining the link points between data structures beinglinked.

OR subconditionAn OR subcondition, as used in this book, is one of the join conditions that makes up an ORsubclause and is used in determining the link points between the data structures being linked.

Outer joinThe outer join operation is used to preserve data that does not find a match in a join operation.

P

Parallel processingParallel processing is when different pathways or parts of the SQL statement or its specifiedtables can be processed concurrently. Since sibling legs of a hierarchical structure areindependent of one another, they can be accessed concurrently under their common parent withno side effects.break

Page 222

ParentA parent is the next higher level table or node in the data structure. In a hierarchical structure,parents are important because their children cannot exist without them.

Parentheses useSometimes, parentheses can be used to override the default table join order of outer joins;other times, parentheses cannot be used to change the table join order. This is because when theON and USING clauses are present, they control the join order and cannot be overridden byparentheses. Some join types do not use an ON or USING clause, and the explicit NATURALoption removes the necessity of ON and USING clauses for join types that do use themnormally. In these cases, parentheses must be used to change the join order. Parentheses canalso be used to emphasize the default table join order, but they still must be correctly placed.

PathA path is a series of connected nodes in a data structure. In a relational database, these nodesare tables, while in a nonrelational database they can be flat files or segments.

Path qualificationPath qualification is when the join conditions of ON conditions also reference higher leveltables or nodes up the path from the link point of the upper level structure being linked. Thisadds additional qualifications to the active join operation based on the path already establishedabove the table or structure being joined.

Path shorteningSee Dynamic path shortening.

PathwayA pathway, as used in this book, is a route from one table or segment to another in ahierarchical structure defined by the join ON clause. No two pathways can lead to the samelower level table or segment in a hierarchical structure.

Physical structurePhysical structures, unlike logical structures, rely on physical data juxtaposition or physicalpointers to form the data structure. They do not rely on data values and their logicalrelationships to define the data structure.

Pipelining

Pipelining with relational databases is an optimization where parts of multitable joins can beprocessed in parallel to increase computer efficiency and reduce execution time.break

Page 223

Plug and playPlug and play, as used in this book, is the ability of different software (and hardware)components to be attached, introduced, or plugged into an operational system without requiringmanual configuration or reconfiguration.

PolymorphismPolymorphism allows the same method name to be used for many methods. Polymorphism, asused in this book, is the capability that allows the same outer join modeling statement or viewto process different disparate heterogeneous databases.

Postrelational databaseA postrelational database is the next generation of relational database, one with extendedrelational features such as nested relational processing.

Primary keyA primary key is a database key that uniquely identifies a record or a row in a file or table andis identified to the database as a primary key. It is usually required for a row or record of adatabase.

Procedural languageA procedural language is another name for a third-generation language. With procedurallanguages, the programmer has to procedurally specify or code how to perform theprogramming task to be performed.

Pseudo codePseudo code is high-level programming code that is used in some of the examples in this bookthat may not be totally complete, but is complete enough to easily convey the principles beingdemonstrated.

Q

Query rewrite/rebuildSee Dynamic rewrite/rebuild.

R

Read-a-headRead-a-head is a database access optimization technique that reads data before it may beneeded in order to take advantage of current access optimization opportunities that may not beavailable when the data is required.

Replicated dataReplicated data is data that is replicated when a data structure is flattened into atwo-dimensional table structure in order to keep the structure flat and to preserve the datastructure. This replicated data can throw summaries off, and has the potential to obscure thedata structure. Replicated data is not the same as duplicate data, whose identical data is

semantically correct.break

Page 224

RestructuringRestructuring, as used in this book, is the changing of a database structure along with its activedata. While this will not change the values of the data, it can change the occurrence counts ofthe data items.

ReusabilityReusability with the outer join is its ability to define structured data views with substructureviews so they can be shared many times in other structures. This also has the advantage thatchanges to the substructure can be easily or automatically propagated to all of the structures itis used in.

RIGHT joinThe RIGHT join operation is an outer join that preserves unmatched data from the tablespecified on the right side of the join operation.

Right-sided nestingRight-sided nesting, as used in this book, is performed when tables are introduced right to leftin the join specification. It is not as natural or intuitive a way to introduce multiple tables asleft-sided nesting since right-sided nesting requires a stacking process to handle the nesting thatoccurs. Right-sided nesting automatically occurs with subviews specified on the right side.

RootThe root of a hierarchical structure is the topmost table or segment in the structure. Since ahierarchical structure is an upside-down tree, it makes sense that the starting table or segmentis called the root. All access to a hierarchical structure originates from the root.

RowRelational tables are made up of horizontal rows and vertical columns. The relational name fora row is a tuple. A row is analogous to a record in a flat file.

S

Scope of controlEach specific ANSI SQL join operation joins two working sets or tables. This means the tablesreferenced by ON clauses during each join operation must be a member of one of the twoworking sets being joined. This ON clause range of acceptable table references is known as thescope of control.

SegmentA segment, as used in this book, is a contiguous block of closely related data such as a singlerow of data in a relational table. A segment type is a particular named class of segment. Astructured record is made up of different segments types and their occurrences that are linkedinto a hierarchical structure. The term segment is a holdover term from legacy databases and isstill a useful generic term for describing database structure building blocks.break

Page 225

Semantic lossSemantic loss, as used in this book, occurs when semantic structure information is lost from astructure or platform conversion. This occurs when hierarchical or object structured data isconverted to flat relational data, losing the explicit relationships in the data. Once lost, thesemantic information cannot automatically be recovered by reversing the conversion.

Semantic optimizationsSemantic optimizations are powerful optimizations based on the semantics of the data structurebeing accessed. They can be very high level optimizations where a single optimization canlogically remove one or more tables from a query instead of optimizing accesses on anaccess-by-access basis.

Sibling legsSibling legs are parallel legs that are related indirectly through a common parent table orsegment. These legs are not directly related—they are separate and independent. One leg canexist without the other. They have no row-by-row correlation. This has specific consequencesfor the semantics of the data structure.

SQL-92SQL-92 is the ANSI SQL standard that introduced the outer join operation, which was ratifiedin 1992.

SQL: 1999Since SQL3 is expected to be ratified in 1999, it is already being referred to as SQL: 1999.Also see SQL3.

SQL3SQL3 is the object standard for relational databases. It has not been ratified yet, but its featuresand capabilities are showing up. These include support for abstract data types (ADTs),user-defined functions (UDFs), and user-defined types (UDTs). Also see SQL: 1999.

Star schemaA star schema is a schema or view in a star pattern that is used frequently in multidimensionaldatabases. It consists of a large fact table with a number of dimension tables that index it.

Structured data recordStructured data records are hierarchical data structures that are stored contiguously with noembedded structure pointers. It is used inherently by third- and fourth-generation languages thatsupport structured data.

Structured database processingSee Hierarchical processing.

Structured query outputSee Nested display.break

Page 226

Substructure viewSubstructure views are views that contain data structures that can be seamlessly embedded in

SQL statements and structured views to create larger views.

Surrogate keyA complaint often waged against relational keys is that they serve two purposes. Theyrepresent data, and also represent a relationship used to join tables, which can be problematicat times. A surrogate key is a key that only serves to represent physical relationships betweenrows or segments used in join or link operations; its value does not represent data.

Symmetric joinINNER and FULL joins are referred to as symmetric joins because they are commutative inoperation. They produce the same results when left and right table inputs are reversed. Havingbalanced inputs, they produce a flat, balanced structure.

T

Table join orderSee Join table order.

Tabular structureA tabular structure is a flat two-dimensional table structure with rows and columns, such as arelational table.

Temporary tableA temporary table is a derived table produced and referenced during the processing of a queryor the materialization of a view and then automatically deleted when the query completes.

ThrowawaysThrowaways, as used in this book, are data rows retrieved in performing a join specificationthat are later discarded in the same join operation because of unmatched rows.

Top-down processing/executionThe term top-down processing, as used in this book, is used when the defined hierarchicalstructure is accessed from its top towards the bottom. This is also the best way to perform joinoperations needed to process a hierarchical data structure since it avoids unnecessaryaccesses, known as throwaways.

TupleA tuple is the relational term for a row of a table and is also analogous to a record of a flat file.

U

Unambiguous semanticsUnambiguous semantics are semantics with only one meaning or interpretation. Hierarchicaldata structures have unambiguous semantics because they are singular in nature, having onlyone path to anycontinue

Page 227

value. This makes their semantics unambiguous, making them very useful and powerful.

UNION join

A UNION join is also called an outer union. It unions two tables that have different formats.This operation was probably included in the join syntax because it can easily be simulated bythe FULL join, as in T1 FULL JOIN T2 ON 1>2.

Universal data accessUniversal data access (UDA) frameworks are used by products to support access to all formsand types of data and databases. They utilize standard database platforms such as OLE DB,ODBC, and JDBC to enable easy interfacing to all forms of data and databases. While theseplatforms are SQL based, not all UDA platforms are.

UnnormalizedUnnormalized data is data that has not been normalized. If it is left unnormalized on purpose foraccess efficiency, a better term is denormalized data.

URLA URL is a Universal Resource Locator, which is another name for an Internet address. Thebasic format is Protocol://Resource.Server.Catagory/Directory/Object.

USING clauseThe USING clause is used instead of the ON clause to specify that an implicit natural joinoption is to be applied to the join operation. Its argument is different than the ON clause,requiring only column names.

V

Variable-occurring fieldsVariable-occurring fields are database fields that can occur a variable number of times andtake up only the space necessary to store the variable-occurring fields. Because they only takeup the space required to hold the data, variable-occurring fields require an occurrence count tobe embedded with them in the data, adding slightly to the storage overhead.

View materializationSee Materialized view.

View expansionSee Expanded view.

View optimizationView optimization is a powerful outer join semantic optimization that can dynamically excludetables in a view from access based on which columns are specified at view invocation. Thismeans there is never a penalty for using an outer join hierarchical view that contains moretables thancontinue

Page 228

are needed. This also means that the number of required views can be reduced, since one largeview can do the job as well as many small ones.

View update capabilityView update is the capability to update multitable join views. This has always presented aproblem because of the lack of semantics caused by the flattened Cartesian product when

multiple tables are joined in a view.

Views-within-viewsSee Embedded views.

Virtual keyA virtual key is a logical key that does not physically exist in a row or record, but is used toretrieve it and is inserted when the row or record is retrieved into storage to act as its key.This can be the case when the key exists in an index and does not exist in the row or record thatis indexed.

Virtual view/databaseA virtual view or database is a logical view or database that includes many databases that canbe accessed under a logical database view to enable the databases to operate as a singledatabase for this view.

W

WHERE clause filteringWHERE clauses can also specify data filtering criteria besides join criteria. When datafiltering is specified on the WHERE clause, it is applied to the entire row so that if the datafiltering criteria is met, the entire row is filtered out. This is not the case with ON clausefiltering, which allows for a finer level of filtering.

Working setsA working set is an area of memory or storage used to hold active results while a query isbeing processed. Since the ANSI SQL-92 join syntax allows the join order to be controlled,multiple concurrent working sets may be required during the processing of a query.

WYSIWYGSee Nested display.

X

XMLXML stands for Extensible Markup Language, which is used with the development of Websites. XML, with its many new capabilities, will replace HTML. This book is concerned withXML's ability to describe data in any shape or form. It has the potential to restore the Internet'sfocus to internal content rather than external format, allowing it to be used more like a standarddatabase. XML also retains and expands HTML capabilities that define how data is to bedisplayed.break

Page 229

BIBLIOGRAPHY

Bhargava, G. et al, ''Efficient Processing of Outer Joins and Aggregate Functions," 12th

International Conference on Data Engineering, IEEE, 1996, pp. 441–449.

Bobak, A. R., Data Modeling and Design for Today's Architectures, Norwood, MA: ArtechHouse, Inc., 1997.

Cattel, R.G.G., Object Database Standard: ODMG 2.0, San Francisco, CA: MorganKaufmann Publishers, Inc., 1997.

Celko, J., Joe Celko's SQL for Smarties: Advanced SQL Programming, San Francisco, CA:Morgan Kaufmann Publishers, Inc., 1995, Chapter 16, Section 2: Outer Joins, pp. 185–198.

Chen, A.L.P., "Outerjoin Optimization in Multidatabase Systems," Proceedings of the SecondInternational Symposium on Databases in Parallel and Distributed Systems, 1990, pp.211–218.

Date, C. J., "The Outer Join," Proceedings of the Second International Conference onDatabases, Cambridge, England, Sept. 1983, pp. 76–106.

Date, C. J., Relational Database, Selected Writings, Menlo Park, CA: Addison-WesleyPublishing Company, Inc., 1986, Chapter 12: Why Is It So Difficult to Provide aRelational Interface to IMS?, pp. 241–257; Chapter 16: The Outer Join, pp. 335–366;Chapter 17: Updating Views, pp. 367–394.break

Page 230

Date, C. J., Relational Database, Selected Writings, 1989–1991, Menlo Park, CA:Addison-Wesley Publishing Company. Inc., 1992, Chapter 19: Watch Out for Outer Join,pp. 311–333.

David, M. M., "4GLs, 5GLs, and the Database," DATABASE Programming DesignMagazine, Oct. 1988, pp. 42–49.

David, M. M., "Advanced Capabilities of the Outer Join," ACM SIGMOD Record, Vol. 21,No. 1, March 1992, pp. 65–70.

David, M. M., "The ANSI-92 Outer Join, A Powerful New Open Database Access Interface,"Relational Database Journal, Nov. 1995, pp. 17–19.

David, M. M., "Heterogeneous Database Processing, A 4GL Case Study," DATABASEProgramming Design Magazine, March 1991, pp. 27–34.

David, M. M., "Heterogeneous Database Processing of SQL, IMS and VSAM Databases,"DATABASE Programming Design Magazine, April 1991, pp. 49–54.

David, M. M., "The Ins and Outs of the Outer Join," DATABASE Programming DesignMagazine, Feb. 1990, 35–43.

David, M. M., "Multimedia Databases Through the Looking Glass," DATABASE Programming Design Magazine, May 1997, pp. 26–35.

David, M. M., "The Outer Limits of the Relational Join," DATA BASE Management Magazine,Oct. 1991, pp. 12–14.

David, M. M., "The Relational Outer Join as an ODBMS Interface," DBMS Magazine, Jan.1993, pp. 73–78.

David, M. M., "SQL-Based XML Structured Data Access," WEB Techniques Magazine, June1999.

David, M. M., "Universal Data Access: Fulfilling the Promise," DATABASE Programming andDesign Online, Dec. 1998.

Decorte, G. et al, "An Object Oriented Model for Capturing Data Semantics," IEEE ComputerSociety Press, Feb. 1992, pp. 126–135.

Galindo-Legaria, C., Algebraic Optimization of Outerjoin Queries, Ph.D. dissertation, Centerfor Research in Computing Technology, Harvard University, Cambridge, MA, 1992.break

Page 231

Galindo-Legaria, C., and A. Rosenthal, "How to Extend a Conventional Optimizer to HandleOne- and Two-Sided Outerjoin," IEEE Proceedings of Data Engineering, 1992 pp.402–409.

Groff, J. R., and P. N. Weinberg, Using SQL, Berkeley, CA: Osborne McGraw-Hill, 1990,Chapter 7, Multi-Table Queries (Joins), pp. 141–178.

Ju, P., Databases On The Web: Designing and Programming for Network Access, New York:M&T Books, 1997.

Korth, H. F., and A. Silberschatz, Database System Concepts, McGraw-Hill, Inc., 1991,Chapter 9, Section 6, Join Strategies for Parallel Processors, pp. 301–304, Chapter 14,Section 14.2, The Nested Relational Model, pp. 458–459.

Lee, B. S., and G. Wiederhold, "Outer Joins and Filters for Instantiating Objects fromRelational Databases Through Views," IEEE Transactions on Knowledge and DataEngineering, Vol. 6, No. 1, Feb 1994, pp. 108–119.

Melton, J., and A. R. Simon, Understanding the New SQL: A Complete Guide, San Mateo,CA: Morgan Kaufmann Publishers, Inc., 1993, Chapter 8, Working with Multiple Tables:The Relational Operators, pp. 149–173.

Orfali, R., and D. Harkey, Client/Server Programming with JAVA and CORBA, New York:John Wiley & Sons, Inc., 1997.

Reiner, D., and A. Rosenthal, "Extending the Algebraic Framework of Query Processing toHandle Outerjoins," Proceedings 10th International Conference on Very Large DataBases, Singapore, Aug., 1984.

Reingruber, M. C., and W. W. Gregory, The Data Modeling Handbook, A Best-PracticeApproach to Building Quality Data Models, New York: John Wiley & Sons, Inc., 1994.

Rosenthal, A., and C. Galindo-Legaria, Query Graphs, Implementing Trees, andFreely-Reorderable Outerjoins, ACM SIGMOD, 1990, pp. 291–299.

Saracco, C. M., Universal Database Management: A Guide to Object/RelationalTechnology, San Francisco, CA: Morgan Kaufmann Publishers, Inc., 1998.

Simon, A. R., Strategic Database Technology: Management for the Year 2000, SanFrancisco, CA: Morgan Kaufmann Publishing, Inc., 1995.

St. Laurent, S., XML:A Primer, MIS Press, 1998.break

Page 232

Stonebraker, M., Object-Relational DBMS, The Next Great Wave, San Francisco, CA:Morgan Kaufmann Publishers, Inc., 1996

Ullman, J. D., Principles of Database and Knowledge-Base Systems, Volume II: The NewTechnologies, Computer Science Press, Inc., 1989, Chapter 17, The Universal Relation asa User Interface, pp. 1026–1069.

Unidata, Nested Relational Database Technology, Unidata Inc., 1993.break

Page 233

ABOUT THE AUTHOR

Michael M. David is a principal at CompuAid in Santa Monica, California, where heresearches new technologies and designs commercial database tools. Previously, he was aconsulting staff scientist at Teradata Corporation where he designed database utilities, andbefore that he was a senior software designer at Sterling Software's Answer Division where hedesigned 4GL universal data access products. He has over 15 years experience designing anddeveloping commercial database tools and utilities that operate seamlessly across disparateand heterogeneous database environments. He has written articles about his research anddevelopment activities for Database Programming and Design Magazine, DBMS Magazine,DATA BASE Management Magazine, Relational Database Journal, ACM SIGMOD Record,InfoDB Journal, and WEB Techniques Magazine.

Noticing the powerful inherent data modeling capability of the ANSI SQL join operation, hehas thoroughly studied and researched this powerful operation, publishing his findings. Thisresearch included developing a technology for extracting and utilizing the valuable datastructure meta information naturally embedded in ANSI SQL join specifications that arenaturally modeling complex data structures. He can be reached at [email protected]

Page 235

INDEX

A

Abstract data types (ADTs), 102, 126, 207

Access optimizations, 118-19

Access paths

defined, 207

dynamic shortening of, 132

Adding a Structure Box, 189-91

Filter Criteria option, 190

illustration, 189

Link Criteria option, 190

Parent Box Name field, 190

See also Data modeling outer join generator

Ad hoc queries, 207

Alternate key, 207

Ambiguous semantics, 208

Ambiguous structures, 208

AND operation, 62, 176

incorrect use of, 70

logical tables and, 86

multiple path referencing with, 71

OR operation vs., 72

precedence, 190, 196

ANSI SQL joins, 11-22

associativity, 18

benefits, 21

Cartesian product model and, 17-18

commutativity, 18

defined, 11, 201

inner joins, 31-32

join intermixing, 33-34

nesting, 11-12

power of, 201-2

syntax, 11-14, 21, 201

types, 23-35

type specification, 14

See also Outer joins

Application view, 52, 125, 208

ARRAY composite type, 103

Association table, 57, 147, 148, 208

Associativity

ANSI SQL FULL outer join, 24, 25

CROSS join, 32

defined, 19-20, 208

determining, 18

join order and, 20

natural FULL join, 44

natural inner join, 45

See also Commutativity

B

Bottom-up processing/execution, 208

Business rules, 209

C

Cardinality, 209

Cartesian product, 66

data structure relationship to, 59

defined, 209

extended, 213

Cartesian product effect, 57, 58, 61break

Page 236

Cartisian product effect (continued)

applying, 59

illustrated, 58

Cartesian product model, 17-18, 60

ON clause and, 18

outer join syntax and, 17

use of, 17

Change or Remove a Structure Box, 193-94

Changing or Removing a Structure Box

illustrated, 193


Table Name field, 194


Coalescing

defined, 209

effect simulation, 40

Columns, 3, 209

Common parent, 209

Commutativity

CROSS join, 32

defined, 19, 209

determining, 18

FULL outer join, 23-24

lack of, 19

multiple one-sided joins and, 19

See also Associativity

Composite key, 210

CompuAid, 109, 115

concatenated key, 210

Conceptual view, 54-55

defined, 54, 210

illustrated, 55

See also Views

Contiguous data

structured, 164

variable-length, 162

view for, 168-70

CROSS join

defined, 12, 32, 210

example, 32

join order and, 16, 32

simulating, 32


D

Dangling tuple, 210

Data abstraction, 155-56, 210

Database access

disparate, 212

enterprise, 119-20, 213

homogeneous, 215

legacy, 119-20, 217

open, interface, 120, 121

optimizations, 118-19

with outer join, 119-20

SQL structured, 164-65

universal, 127

Database navigation, 117-18, 157-58

defined, 117, 219

instructions, 117-18

outer join and, 118

Data inheritance, 156-57, 211

Data modeling, xx, 64-65, 67-106

with ANSI outer join, 103-4

coding, outer join statements, 94-95

complex, 79, 209

defined, 64, 211

diagramming symbols, 199-200

external data definition integration with, 127

features added to SQL/SQL3 standard, 93

flexible, 69

generation of, outer join statements, 95

hierarchical relational processor prototype, 144-46

many-to-many, 90-91

meta information, 169

object relational interface, 154-55

with old-style outer joins, 104-5

SQL: 1999 and, 102-3

valid/invalid results, 73-74

Data modeling outer join generator, 185-200

Add menu, 189

Change menu, 189, 193

database data, loading, 198-99

database engine, 187

data filtering, 188, 190

data filters, specifying, 192-93

data structure operations, 187

data structure optimization, 194

default test database, 187

defined, 185break

Page 237

diagramming symbols, 199-200

display screen example, 186

embedded views, 188

Exit menu, 189

features, 185-86

Help menu, 189, 192

interface, 186

link criteria, specifying, 191-92

menu overview, 188-89

operational overview, 186-88

Optimization menu, 187, 198

outer join optimization, 194-95

outer join query, running, 197-98

product overview, 185-86

stored structures, saving/retrieving/deleting, 195-97

structure boxes, adding, 189-91

structure boxes, changing/removing, 193-94

View menu, 188, 191

Data structure extraction (DSE) technology, 107, 109-15, 143

as building block technology, 114

characteristics, 109-10

data structure determination, 111

defined, 109, 211

development of, 109, 115

example, 110-11

imposing data structures on SQL and, 114-15

internal logic, 113

logical table example, 111

prototype, 149

software, 150, 151

symmetric linking example, 111-12

using, 109-10

vendor need for, 113-14

Data structures, 51-66

Cartesian product relationship, 59

client/server processing, 94

composition of, 63-64

conceptual view, 54-55

data modeling, 64-65

external/internal views, 54

flattening, 214

flexible, processing, 69

imposing, on SQL specification, 114-15

many-to-many relationship, 55-57

many-to-one relationship, 55

meta information, 211

multiple formats within file, 168

network to hierarchical conversion, 57

nonhierarchical, 64, 87-90

one-to-many relationship, 55

optimizing, 194-95

ordering of, 62-63

outer join, 68

physical vs. logical, 59-60

processing, 211

relational, composition, 63-64

restructuring of, 62-63

semantics, 157

specifying, 190

symmetric joining of, 89

three-tier architecture, 53-54

WHERE clause filtering with, 77

See also Data modeling; Hierarchical data structures

Data warehouse

defined, 211

interface, 121

outer join view access, 122

Denormalization, 212

Diagramming symbols, 199-200

Domains, 212

Dynamic path shortening, 132

defined, 132, 212

illustrated, 133

on network structures, 138

Dynamic rebuild/rewrite, 135-36, 212

E

Embedded views, 16

data modeling outer join generator, 188

defined, 213

empirical proof, 99-101

hierarchical relational prototype, 148

logical tables, 88

See also Views

Empirical proofs

embedded structured view support, 99-101

hierarchical structure processing, 95-98

indirect link, 101-2break

Page 238

Empirical proofs (continued)

nonhierarchical structure processing, 98-99

Enterprise database access, 119-20, 213

Entity relationship diagram, 213

Equal join, 213

Expanded view, 149, 213

Explicit natural joins, 37-39

defined, 213

nonassociative behavior, 40

See also Natural joins

External view, 54, 213-14

F

Fields

defined, 214

fixed-occurring, 214

foreign-key, 3, 214

primary-key, 3

in segments, 63

variable-occurring, 227

See also Records

Filtering

data modeling outer join generator, 188, 190

data structure, 81-83

defined, 210

ON clause vs. WHERE clause, 92

WHERE clause, with data structures, 77

WHERE clause, with subviews, 77-79

Flat structures, 214

Foreign keys, 3, 214

Fourth generation languages (4GLs), 51, 53, 117, 214

FROM clause

of outer join definition, 12

table name reference, 7

FULL outer join, 6, 7

ANSI SQL, 23-26

associativity property, 24, 25

commutative behavior, 23-24

data preservation, 42

defined, 23, 214

diagramming symbols, 200

logical tables with, 85

natural, 39, 42-44

nonstandard implementation example, 8

simulated, 7


H

Hierarchical data structures

building, 70

defined, 215

hierarchical control, 96-97

hybrid, 85, 86, 215

invalid example, 84

legs synchronization of, 90

network structure conversion to, 57, 58

power of, 51-53, 202

processing empirical proof, 95-98

relating, to relational processing, 57-59

structure control, 97

See also Data structures

Hierarchical relational processing, 121-22

defined, 122, 215

display, 122

Hierarchical relational processor prototype, 143-51

conclusion, 150-51


Department view processed by, 145

embedded views, 148

Employee view processed by, 146

many-to-many relationships, 146-48

operation, 144

Part/Supplier view processed by, 147

Supplier/Part view processed by, 147

view optimization, 148-50

Hierarchictivity, 20-21, 22, 35

defined, 20, 215

example, 21

multileg, structure, 30

one-sided outer join, 28

principles, 30

RIGHT outer join, 29

I

Illogical structures, 215

Implicit natural joins, 37-39

defined, 215

example, 38

NATURAL keyword and, 38

See also Natural joins

IMS

access code, 137

defined, 215break

Page 239

legacy database, 143

Indirect linking, 83

empirical proof, 101-2

illustrated, 83

See also Linking

Inner joins

ANSI SQL, 31-32

defined, 216


formats example, 32

logical tables with, 85

lost data and, 5-6

natural, 44-45

new role of, 105

performing, 4

problematic characteristic of, 4

review, 4-5

sample, 5

table order, 5

Intermixing joins, 33-34

difficulty of, 33

examples, 34

natural types, 45-46

nonhierarchical, 87

Internal view, 54, 216

Intersecting data, 216

J

JOIN keyword, 14

Join operations. See Inner joins; Outer joins

Join order

associativity and, 20

defined, 216

ON/USING clauses and, 16

parentheses and, 16-17

precedence, 20

specification flexibility, 103

L

Late binding, 158-59

defined, 159, 216

example, 159

LEFT outer joins, 14-15

defined, 26, 217

diagramming symbol, 200

illustrated, 27

with left-sided nesting, 14

natural, 41-42

with right-sided nesting, 15

specification, 14


Left-sided nesting, 12, 14, 99

defined, 217

view expansion example, 100

See also Nesting

Legacy database access, 119-20, 217

Linking

defined, 217

indirect, 83, 101-2

lower structure, 173-83

nonhierarchical, 88

symmetrical, 88, 111-12

Loading the database data, 198-99

Append option, 199

illustrated, 199


Logical data structures, 59-60

defined, 217

physical structure conversion to, 62

view results, 60


Logical tables, 84

defined, 217

DSE example, 111

embedded, in view expansion, 88

hierarchical hybrid structure with, 85

with INNER/FULL join operation, 85

multiple, 86

NATURAL example, 86

as root structure, 85, 86

See also Tables

Lower structure linking, 173-83

advanced, 173-74

example data, 177

multiple path references, 178-79

nonroot, 173-76

restructuring and, 182-83

single path reference, 176-78

with view WHERE clause, 180-81

See also Linking

M

Many-to-many (M to M) relationships, 55-57, 66

association table, 57, 147, 148

characteristics, 56break

Page 240

Many-to-many (M to M) relationships (continued)

data example, 57


defined, 146, 217-18

example data views, 56

hierarchical relational processor prototype, 146-48

Parts-Suppliers, 205

Many-to-one (M to 1) relationships, 55

behavior, 55

defined, 218

WYSIWYG display, 56

Meta information, 218

Multidimensional databases, 218

Multilegs

AND selection qualification semantics example, 61

data selection semantics example, 61

defined, 218

OR selection qualification semantics example, 62

Multimedia authoring systems, 126

Multimedia databases

ADTs and, 126

defined, 218

directory support, 124-26

hierarchical directory example, 126

purpose of, 124

Multiple path references, 178-79

Multitable natural outer joins, 39-41

N

Natural joins, 37-47

coalescing effect simulation, 40

condensed result, 44

defined, 12, 219


explicit, 37-39, 213

FULL, 39, 42-44

implicit, 37-39, 215

inner, 44-45

intermixing, 45-46

join order and, 16

LEFT, 41-42

multitable, 39-41

one-sided, 41-42

one-sided, transformation, 46-47

with outer joins, 37

types of, 37


Nesting, 11

defined, 219

left-sided, 12, 14, 99, 217

natural join, 12

right-sided nesting, 15, 100, 224

Network structures, 57, 58, 65, 69

ambiguous result, 73-74

applying hierarchical optimizations to, 138-39

conversion, ambiguous result, 75

conversion, to hierarchical structures, 57, 58, 65

defined, 219

dynamic path shortening on, 138

junction points, 138

optimized, 139

SQL definition, 69, 70

Nonhierarchical joins

data structures and, 87-90

intermixing, 87

one-sided, 30-31

processing empirical proof, 98-99

type support, 83-87

Non-natural joins, 46

Nonprocedural language, 220

Nonrelational databases, 220

Nonroot lower level linking, 173-76

multiple path structure, 175

optimization concerns, 180

overview, 173-74

previous method, 174

semantics, 174-76

structure, 174

view optimization needs for, 180

See also Lower structure linking

Normalization rules, 64-65

defined, 220

first normal form, 64-65, 214

second normal form, 65

third normal form, 65

Nulls, 220

O

Object relational interface, 122-23, 153-60

capabilities, 154break

Page 241

data abstraction and reusability, 155-56

data inheritance, 156-57

data modeling and structure processing, 154-55

data navigation, 157-58

defined, 220

illustrated, 123

late binding and polymorphism, 158-59

outer join, 123

outer join derivation, 154

plug and play, 159-60

structured memory storage, 155

ODBC (Open Database Connectivity), 220

ODMG model, 103

ON clause, 13

in Cartesian product model, 18

defined, 221

filtering, 82-84, 221

first rule, 70-71

for hierarchical substructure views, 78

join condition rules, 70-71

join order and, 15

join relationship via, 69

second rule, 71

shifting, to WHERE clauses, 139-41

specification, 13

third rule, 71

valid/invalid use of, 72-73

One-sided outer joins

ANSI SQL, 26-31

defined, 26, 221

hierarchical behavior, 28

natural, 41-42

natural, transformation, 46-47

noncommutative behavior, 27

nonhierarchical, 30-31

non-natural, 46

nonstandard implementation example, 8

one-leg hierarchical data structure, 29

simulated, 7

See also LEFT outer join; RIGHT outer join

One-to-many (1 to M) relationships, 55

behavior, 55

defined, 221

handling considerations, 146

WYSIWYG display, 56

Optimize and Display Outer Join and Structure, 194-95

OR operation, 62, 176

AND operation vs., 72

defined, 221

incorrect use of, 70

logical tables and, 86

multiple path referencing with, 71

precedence, 190, 196

Outer join data modeling, 67-106

coding, statements, 94-95

generation of, statements, 95

minimum requirements, 69

new capabilities based on, 107-70

related capabilities, 81-92

See also Data modeling

Outer join optimization, 131-41, 194-95

dynamic rebuild and, 135-36, 212

dynamic shortening of access path and, 132

hierarchical, to network structures, 138-39

join table reordering and, 131-32

of nonrelational SQL interfaces, 136-38

parallel database processing, 135

shifting ON clauses to WHERE clauses and, 139-41

unnecessary table removal, 132-35

Outer joins

advanced capabilities, 117-30

ambiguous nonstandard implementation example, 9

ANSI, uniqueness for data modeling, 103-4

associativity, 18, 19-20

commutativity, 18-19

CROSS, 12, 32, 210

database access with, 119-20

data structures, 68

defined, 6, 221


FULL, 6, 7, 8, 23-26

introduction, xix

LEFT, 14-15, 26-27

multimedia application directory support, 124-26

natural, 12, 37-47break

Page 242

Outer joins (continued)

object relational interface, 122-23

old style, data modeling and, 104-5

one-sided, 6, 7, 8, 26-31

previous syntax problems, 7-9

query, 197-98

review, 6-7

RIGHT, 26-27, 29

simulating, 6-7

SQL XML data structure connection, 128-30

syntactical notations, 13

syntax, 11-14, 21

syntax definition, 12

universal data access, 127

value-added feature support, 120-21

view update capability, 123-24

See also Outer join optimization

OUTER keyword, 14

P

Parallel database processing, 135, 136, 138, 221

Parentheses

join order and, 16-17

use, 222

Paths

access, 132, 207

defined, 222

dynamic shortening, 132-33, 138, 212

multiple, references, 176-78

qualification, 222

Physical data structures, 59-60

conversion, 62

defined, 222

view results, 60


Pipelining, 222

Plug and play, 159-60, 223

Polymorphism, 158-59

defined, 159, 223

example, 159

Postrelational data, 168, 223

Prerelational data, 168

Primary-key fields, 3

Procedural language, 223

Q

Queries

ad hoc, 207

outer join, running, 197-98

sibling legs, 60-62

R

Read-a-head technique, 223

Records, 63

defined, 63

structured, 162-63, 225

Relational databases, 63

Relationships

many-to-many, 55-57, 66, 146-48

many-to-one, 55, 56, 218

ON clause and, 69

one-to-many, 55, 56, 146, 221

Restructuring, 182-83

defined, 224

dynamic, 182

model, 183

physical structures and, 182

possible amount, 183

Reusability, 155-56, 158-59, 224

RIGHT outer joins

defined, 26-27, 224


hierarchical structures, 29

illustrated, 27


Right-sided nesting, 15, 100

defined, 224

triggered by delaying ON clauses, 113

view expansion example, 101

See also Nesting

ROW composite type, 103

Rows, 3

defined, 224

unmatched, 7-8

See also Tables

Running the Outer Join Query, 197-98

illustrated, 198

WYSIWYG Display option, 197-98

S

Saving, Retrieving, Deleting a Stored Structure, 195-97

Alias Suffix field, 197

Filter Criteria option, 196

illustrated, 196break

Page 243



Scope of control, 224

Segments

defined, 224

fields in, 63

variable-occurrence repeating, 163

SELECT statements, 6, 9

Semantics, 157

ambiguous, 208

loss, 225

nonroot lower level linking, 174-76

optimizations, 225

unambiguous, 226-27

Sibling legs

correspondence, 60

defined, 225

query semantics, 60-62

Specifying a Data Filter, 192

Specify the Link Criteria, 191-92

Column lists, 192

illustrated, 192


SQL: 1999

ADT support, 102

data modeling and, 102-3

defined, 102, 225

ODMG and, 103

ROW/ARRAY composite types, 103

UDTs, 102

SQL

access of procedural databases, 137

dynamic, specification, 212-13

interfaces, nonrelational, 136-38

interfaces, standardized, 153-54

SQL3, 225

SQL-92, 225

Star schema, 225

Stored structures, 195-97

deleting, 197

embedded, 197

referenced, 197

retrieving, 197

saving, 197

Structure boxes

adding, 189-91

changing, 193-94

inserting, in paths, 194

removing, 193-94


Structured data

access, 164-65

contiguous, 164

hierarchical SQL view to access, 164

internal navigation of, 165-66

mapping of, 165-66

SQL-based universal data access of, 167-68

Structured records, 162-63

accessing, 165

composition, 162

decompose/map pseudo code, 166

defined, 162, 225

updating, 165

variable-length contiguous, 162

variable-occurrence repeating segments, 163

See also Structured data

Substructure views, 74-76

data abstraction with, 155-56

defined, 226

reusability with, 155-56

WHERE clause filtering with, 77-79

Surrogate key, 226

Symmetrical linking, 88, 111-12

Symmetric joins, 88-90

defined, 226

illustrated, 89

synchronizes legs of hierarchical structure, 90

T

Tables

association, 57, 147, 148, 208

columns, 3, 7

defined, 3

joined order, 13

logical, 84, 217

nonhierarchically joined, 89

order, 216

reordering, 131-32, 216

rows, 3, 7break

Page 244

Tables (continued)

temporary, 226

unnecessary, removing, 132-35

virtual, 103

Tabular structure, 226

Three-tier database architecture, 53-54

Tuples, 3, 226

U

Unambiguous semantics, 226-27

UNION joins, 32-33

defined, 32, 227

example, 33

performing, 32-33

Universal data access, 127, 161-70

contiguous data view and, 168-70

defined, 227

interfacing, to middleware, 167

interfacing, to nested relational structures, 169

interfacing, to prerelational/postrelational data, 168

multiple structure formats and, 168

nonrelational SQL-based, 161-70

of structured data, 167-68

User-defined functions (UDFs), 102

User-defined types (UDTs), 102

USING clause, 13

defined, 227

join order and, 15

specification, 13

V

Value-added features, 120-21

Views

application, 52, 125, 208

conceptual, 54-55, 210

contiguous data and, 168-70

embedded, 16, 76, 88, 99-101, 148

expanded, 149, 213

external, 54, 213-14

as global application definition, 169

internal, 54, 216

materialized, 218

optimizations, 134-35, 148-50, 180-81, 227-28

outer join, 132-35

Parts-Suppliers, 205

removing unnecessary tables from, 132-35

substructure, 74-76, 226

this book, 204

updating, 123-24, 228

virtual, 228

WHERE clause processing, 181

Virtual keys, 228

W

WHERE clause, 5, 7

AND operator, 61

filtering, 82-83, 228

filtering with data structures, 77

filtering with substructures, 77-79

lower level linking with, 180-81

shifting ON clause to, 139-41

specification, 13

X

XML (Extensible Markup Language), 128-30

capabilities, 128

defined, 228

definition elements, 128

SQL and, 128-30

Web sites, 128, 129break

Recent Titles in the Artech HouseComputing Library

Advanced ANSI SQL Data Modeling and structure Processing, Michael M. David

Authentic Systems for Secure Networks, Rolf Oppliger

Business Process Implementation for IT Profesionals and Managers, Robert B. Walford

Client/Server Computing: Architecture, Applications, and Distributed Sytems Management,Bruce Elbert and Bobby Martyna

Computer-Mediated Communications: Multimedia Applications, Rob Walters

Computer Telephoney Intergration, Second Edition, Rob Walters

Data Modeling and Design for today's Architectures, Angelo Bobak

Data Quality for the Information age, Thomas C. Redman

Data Wharehousing and Data Mining for Teleocommunications, Rob Mattison

Desighning Web Software, Stan Magee and Leonardo L. Tripp

Distributed and Multi-Database Systems, Angelo R. Bobak

Electronic Payment Systems, Doal O'Mahony, Michael Peirce, and Hitesh Tewari

Future Codes: Essays in Advanced Computer Technology and the Law, Curtis E.A. Karnow

A Guide to Pogramming Languages: Overview and Comparison, Ruknet Cezzar

Guide to Software Engineering Standards and specifications, Stan Magee and Leonard L.Tripp

Internet and Intranet Security, Rolf Opplinger

Internet and Digital Libraries: The International Dimension, Jack Kesslerbreak

Managing Computer Networks: A case-Based Reasoning Approach, Lundy Lewis

Metadata Management for Information Control and Business Success, Guy Tozer

Practical Guide to Software Engineering Models, John W. Horch

Practical Process Simulation Using Object-Orientated Techniques and C++, José Garrido

Risk Management Process for Software Engineering Models, Marian Myerson

Secure electronic Transactions: Introduction and Technical Reference, Larry Loeb

Software Process Improvement With CMM, Joesph Raynus

Software Veification and validation: A Practitioner's Guide, Steven R. Rakitin

Solving the 2000 Crisis, Patrick McDermott

User-Centered Information Design for Improved Software Usability, Pradeep Henry

For futher information on these and other Artech House titles, including preciously out-of-printbooks now available through In-Print-Foever (IPF) program, contact:

Artech House685 Canton StreetNorwood, MA 02062Phone: 781-769-9750Fax: 781-769-6334e-mail: [email protected]

Artech House46 Gillingham StreetLondon SW1V 1AH UKPhone: +44 (0)20 7596-8750Fax: +44 (0)20 7630-0166e-mail: [email protected]

Find us on the World Wide web at:www.artechhouse.com

Documents

advanced ansi sql data modeling