PostgreSQL: Introduction and Conceptsdb.ucsd.edu/static/cse132b-sp01/Postgress.pdf · PostgreSQL: Introduction and Concepts Bruce Momjian April 3, 2000. ... Appendix D contains a

000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066

PostgreSQL:Introduction

andConcepts

Bruce Momjian

April 3, 2000

006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132

ii

013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198

Note to Reviewers

The material on these pages is a work in progress, tentatively titled, PostgreSQL: Introduction and Concepts,to be published in 2000, ©Addison–Wesley. Posted with permission of the publisher. All rights reserved.

I have completed my first draft through chapter 10. Later chapters may not be spell checked, proofread,or critiqued. You are seeing it as it is being written.

I am interested in any comments you may have, including typographic errors, places with not enoughdetail or too much detail, missing topics, extraneous topics, confusing sentences, poor word choice, etc. ThePDF version has numbers appearing in the margins to allow you to easily refer to specific lines in the book.People reading the web version may refer to specific URL’S. Please mention the date of April 3, 2000 whenreferring to this document. You may contact me at mailto:[email protected].

A current copy may be retrieved from http://www.postgresql.org/docs/awbook.html. Also, it is availablefrom the POSTGRESQL FAQ’s and Documentation page, http://www.postgresql.org/docs. It is updatedautomatically every night. This book is set in Bitstream Century Old Style, 11 point.

Keep in mind that this is to be printed as a book. In the PDF version, diagrams may not appear on thesame pages that refer to them. They will appear on the facing page when printed in book format.

iii

mailto:[email protected]

http://www.postgresql.org/docs/awbook.html

http://www.postgresql.org/docs

019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264

iv NOTE TO REVIEWERS

026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330

Foreword

Most research projects never leave the academic environment. Occasionally, exceptional ones survive thetransition from the university to the real world and go on to become a phenomenon. POSTGRESQL is one ofthose projects. Its popularity and success is a testament to the dedication and hard work of the POSTGRESQLglobal development team. Developing an advanced database system is no small feat. Maintaining andenhancing an inherited code base is even more challenging. The POSTGRESQL team has not only managedto improve the quality and usability of the system, but to spread its use among the Internet user community.This book is a major milestone in the history of the project.

POSTGRES95, later renamed POSTGRESQL, started out as a pet project to overhaul POSTGRES. POSTGRES isa novel and feature-rich database system created by many students and staff at the UNIVERSITY OF CALIFORNIA

AT BERKELEY. Our goal was to keep the powerful and useful features while trimming down the bloat causedby much experimentation and research. We had a lot of fun reworking the internals. At the time, we hadno idea where we were going with the project. The POSTGRES95 exercise was not research, but simplya bit of engineering housecleaning. By the spring of 1995, it occurred to us that there was a need for anopen-source SQL-based multi-user database in the Internet user community. Our first release was met withgreat enthusiasm. We are very pleased to see the project continuing.

Obtaining information about a complex system like POSTGRESQL is a great barrier to its adoption. Thisbook fills a critical gap in the documentation of the project. This book provides an excellent overview ofthe system. It covers a wide range of topics from the basics to the more advanced and unique features ofPOSTGRESQL.

In writing this book, Bruce Momjian has drawn on his experience in helping beginners with POSTGRESQL.The text is easy to understand and full of practical tips. Momjian captures database concepts using simpleand easy to understand language. He also presents numerous real life examples throughout the book. Hedoes an outstanding job and covers many advanced POSTGRESQL topics. Enjoy reading the book and havefun exploring POSTGRESQL! It is our hope this book will not only teach you about using PostgreSQL but alsoinspire you to delve into its innards and contribute to the ongoing POSTGRESQL development effort.

JOLLY CHEN and ANDREW YU, co-authors of POSTGRES95

v

033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396

vi FOREWORD

039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462

Preface

This book is about POSTGRESQL, the most advanced open source database. From its origins in academia,POSTGRESQL has moved to the Internet with explosive growth. It is hard to believe the advances during thepast three years under the guidance of a team of world-wide Internet developers. This book is a testamentto their vision, and to the success POSTGRESQL has become.

The book is designed to lead the reader from their first database query through the complex queriesneeded to solve real-world problems. No knowledge of database theory or practice is required. Basicknowledge of operating system capabilities is expected, like the ability to type at an operating systemprompt.

The book starts with a short history of POSTGRESQL. It leads the reader through their first query, andteaches the most important database commands. Common problems are covered early, like placing quotesinside quoted strings. This should prevent users from getting stuck with queries that fail. I have seen manybug reports in the past few years, and try to cover the common pitfalls.

With a firm foundation established, additional commands are introduced. Finally, specialty chaptersoutline complex topics like multi-user control and performance. While coverage of these complex topics isnot exhaustive, I try to show common real-world problems and their solutions.

I do not shy away from complex topics, like index types and table join methods. If these topics areconfusing, it is sufficient to skim these chapters and continue. Start using the database, and return to thesetopics later.

At each step, the purpose of the commands is clearly illustrated. I want readers to understand more thanquery syntax. I want them to know why each command is valuable, so they will use the proper commands intheir real-world database applications.

A novice should read the entire book, while skimming over the later chapters. The complex natureof database systems should not prevent readers from getting started. Test databases are safe way to tryqueries. As readers gain more experience, the later chapters will start to make sense. Experienced databaseusers can skip the chapters on basic SQL functionality. The cross-referencing of sections should allow you toquickly move from general to more specific information.

Appendix D contains a copy of the POSTGRESQL reference manual which should be consulted anytimeyou are having trouble with query syntax. Also, I should mention the excellent documentation that is partof POSTGRESQL. The documentation covers many complex topics. It includes much POSTGRESQL-specificfunctionality that cannot be covered in a book of this length. I refer to sections of the documentation in thistext where appropriate.

The website for this book is located at http://www.postgresql.org/docs/awbook.html.

vii

http://www.postgresql.org/docs/awbook.html

046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000501050205030504050505060507050805090510051105120513051405150516051705180519052005210522052305240525052605270528

viii PREFACE

052905300531053205330534053505360537053805390540054105420543054405450546054705480549055005510552055305540555055605570558055905600561056205630564056505660567056805690570057105720573057405750576057705780579058005810582058305840585058605870588058905900591059205930594

Acknowledgements

Update this page with current information before publication.

POSTGRESQL and this book would not be possible without the talented and hard-working members of thePOSTGRESQL Global Development Team. They took source code that could have become just another aban-doned project, and turned it into the open source alternative to commercial database systems. POSTGRESQLis a shining example of Internet community development.

Steering

• Fournier, Marc G. in Wolfville, Nova Scotia, Canada ([email protected]) coordinates the whole effortand provides the server and administers our primary web site, mailing lists, ftp site, and source coderepository.

• Lane, Tom in Pittsburgh, PA, USA ([email protected]) has performed many PostgreSQL improve-ments. He has worked on the optimizer and a variety of complex issues.

• Lockhart, Thomas G. in Pasadena, California, USA ([email protected]) works on documen-tation, data types, particularly date/time and geometric objects, and on SQL standards compatibility.

• Mikheev, Vadim B. in Krasnoyarsk, Russia ([email protected]) does large projects, like vacuum, subselects,triggers, and multi-version concurrency control (MVCC).

• Momjian, Bruce in Philadelphia, Pennsylvania, USA ([email protected]) maintains FAQ andTODO lists, code cleanup, some patch application, makes training materials, and some coding.

• Wieck, Jan in Hamburg, Germany ([email protected]) overhauled the query rewrite rule system, wroteour procedural languages PL/pgSQL and PL/Tcl and added the NUMERIC/DECIMAL type.

Major Developers

• Cain, D’Arcy J.M. in Toronto, Ontario, Canada ([email protected]) worked on the Tcl interface, Py-GreSQL, and the INET type.

• Dal Zotto, Massimo near Trento, Italy ([email protected]) has done locking code and other improvements.

• Elphick, Oliver in Newport, Isle of Wight, UK ([email protected]) maintains the PostgreSQL package forDebian GNU/Linux.

• Horak, Daniel near Pilzen, Czech Republic ([email protected]) did the WinNT port of PostgreSQL(using the Cygwin environment).

• Inoue, Hiroshi in Fukui, Japan ([email protected]) improved btree index access.

ix

059505960597059805990600060106020603060406050606060706080609061006110612061306140615061606170618061906200621062206230624062506260627062806290630063106320633063406350636063706380639064006410642064306440645064606470648064906500651065206530654065506560657065806590660

x ACKNOWLEDGEMENTS

• Ishii, Tatsuo in Zushi, Kanagawa, Japan ([email protected]) handles multi-byte foreign language supportand porting issues.

• Martin, Dr. Andrew C.R. in London, England ([email protected]) helped in the Linux and IrixFAQ’s including some patches to the PostgreSQL code.

• Mergl, Edmund in Stuttgart, Germany ([email protected]) created and maintains pgsql_perl5. Healso created DBD-Pg which is available via CPAN.

• Meskes, Michael in Dusseldorf, Germany ([email protected]) handles multi-byte foreign lan-guage support, and maintains ecpg.

• Mount, Peter in Maidstone, Kent, United Kingdom ([email protected]) has done the Java JDBCInterface.

• Nikolaidis, Byron in Baltimore, Maryland ([email protected]) rewrote and maintains theODBC interface for Windows.

• Owen, Lamar in Pisgah Forest, North Carolina, USA ([email protected]) RPM package maintainer.

• Teodorescu, Constantin in Braila, Romania ([email protected]) has done the PgAccess DB Interface.

• Thyni, Göran in Kiruna, Sweden ([email protected]) has worked on the UNIX socket code.

Non-code contributors

• Bartunov, Oleg in Moscow, Russia ([email protected]) introduced the locale support.

• Vielhaber, Vince near Detroit, Michigan, USA ([email protected]) maintains our website.

All developers listed in alphabetical order.

066106620663066406650666066706680669067006710672067306740675067606770678067906800681068206830684068506860687068806890690069106920693069406950696069706980699070007010702070307040705070607070708070907100711071207130714071507160717071807190720072107220723072407250726

Contents

Note to Reviewers iii

Foreword v

Preface vii

Acknowledgements ix

1 History of POSTGRESQL 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 UNIVERSITY OF CALIFORNIA AT BERKELEY . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Development Leaves BERKELEY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 POSTGRESQL Global Development Team . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.5 Open Source Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Issuing Database Commands 52.1 Starting a Database Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Controlling a Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Getting Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.4 Exiting a Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Basic SQL Commands 93.1 Relational Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.2 Creating Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.3 Adding Data with INSERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.4 Viewing Data with SELECT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.5 Selecting Specific Rows with WHERE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.6 Removing Data with DELETE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.7 Modifying Data with UPDATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.8 Sorting Data with ORDER BY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.9 Destroying Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 Customizing Queries 194.1 Data types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2 Quotes Inside Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.3 Using NULL Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.4 Controlling DEFAULT Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.5 Column Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

xi

072707280729073007310732073307340735073607370738073907400741074207430744074507460747074807490750075107520753075407550756075707580759076007610762076307640765076607670768076907700771077207730774077507760777077807790780078107820783078407850786078707880789079007910792

xii CONTENTS

4.6 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.7 AND/OR Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.8 Range of Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.9 LIKE Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.10 Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.11 CASE Clause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.12 Distinct Rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.13 Functions and Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.14 SET, SHOW, and RESET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5 SQL Aggregates 395.1 Aggregates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.2 Using GROUP BY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.3 Using HAVING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.4 Query Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6 Joining Tables 456.1 Table and Column References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456.2 Joined Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456.3 Creating Joined Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476.4 Performing Joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506.5 Three and Four Table Joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516.6 Additional Join Possibilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526.7 Choosing a Join Key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556.8 One-to-Many Joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566.9 Unjoined tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566.10 Table Aliases and Self-Joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566.11 Non-Equijoins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586.12 Ordering Multiple Parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586.13 Primary and Foreign Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606.14 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

7 Numbering Rows 637.1 Object Identification Numbers (OIDs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637.2 Object Identification Number Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647.3 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657.4 Creating Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657.5 Using Sequences to Number Rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677.6 Serial Column Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677.7 Manually Numbering Rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

8 Combining SELECTs 718.1 UNION, EXCEPT, INTERSECT Clauses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718.2 Subqueries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 748.3 Outer Joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 818.4 Subqueries in Non-SELECT Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 818.5 UPDATE with FROM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 838.6 Inserting Data Using SELECT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

079307940795079607970798079908000801080208030804080508060807080808090810081108120813081408150816081708180819082008210822082308240825082608270828082908300831083208330834083508360837083808390840084108420843084408450846084708480849085008510852085308540855085608570858

CONTENTS xiii

8.7 Creating Tables Using SELECT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 848.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

9 Data Types 859.1 Purpose of Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 859.2 Installed Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 859.3 Type Conversion and Casting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 899.4 Support Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 899.5 Support Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 899.6 Support Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 929.7 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 929.8 Large Objects(BLOBS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 949.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

10 Transactions and Locks 9510.1 Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9510.2 Multi-Statement Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9510.3 Visibility of Committed Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9710.4 Read Committed and Serializable Isolation Levels . . . . . . . . . . . . . . . . . . . . . . . . 9810.5 Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9910.6 Deadlocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10110.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

11 Performance 10311.1 Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10311.2 Unique Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10411.3 Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10411.4 Vacuum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10511.5 Vacuum Analyze . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10511.6 EXPLAIN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10511.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

12 Controlling Results 10912.1 LIMIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10912.2 Cursors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

13 Table Management 11113.1 Temporary Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11113.2 ALTER TABLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11113.3 Inheritance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11113.4 Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11113.5 Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11113.6 Triggers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11113.7 Primary/Foreign Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11113.8 NOTIFY and LISTEN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

14 Importing and Exporting Data 11314.1 COPY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

085908600861086208630864086508660867086808690870087108720873087408750876087708780879088008810882088308840885088608870888088908900891089208930894089508960897089808990900090109020903090409050906090709080909091009110912091309140915091609170918091909200921092209230924

xiv CONTENTS

15 Extending POSTGRESQL 11515.1 User-Defined Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11515.2 User-Defined Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11515.3 User-Defined Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

16 Programming Interfaces 11716.1 Need for a Programming Lanuage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11816.2 C Language API (LIBPQ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11816.3 Embedded C (ECPG) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11816.4 C++ (LIBPQ++) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11816.5 JAVA (JDBC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11816.6 ODBC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11816.7 PERL (PGSQL_PERL5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11816.8 TCL/TK (LIBPGTCL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11816.9 PYTHON (PYGRESQL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11816.10Web Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11816.11Server-side Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

17 User Applications 11917.1 PSQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11917.2 PGACCESS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

18 POSTGRESQL Administration 12318.1 Creating Users and Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12318.2 Backup and Restore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12318.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12318.4 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12318.5 Customization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12318.6 System Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12318.7 Access Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12318.8 Internationalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

A Additional Resources 125A.1 Frequently Asked Questions (FAQ’S) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125A.2 Mailing List Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125A.3 Supplied Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125A.4 Commercial Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125A.5 Modifying the Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

B Installation 127B.1 Getting POSTGRESQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127B.2 Compiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127B.3 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127B.4 Starting the Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127B.5 Creating a Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

C PostgreSQL Non-Standard Features by Chapter 129

D Reference Manual 131

092509260927092809290930093109320933093409350936093709380939094009410942094309440945094609470948094909500951095209530954095509560957095809590960096109620963096409650966096709680969097009710972097309740975097609770978097909800981098209830984098509860987098809890990

CONTENTS xv

Bibliography 133

Index 133

099109920993099409950996099709980999100010011002100310041005100610071008100910101011101210131014101510161017101810191020102110221023102410251026102710281029103010311032103310341035103610371038103910401041104210431044104510461047104810491050105110521053105410551056

xvi CONTENTS

105710581059106010611062106310641065106610671068106910701071107210731074107510761077107810791080108110821083108410851086108710881089109010911092109310941095109610971098109911001101110211031104110511061107110811091110111111121113111411151116111711181119112011211122

List of Figures

2.1 psql session startup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 My first SQL query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Multi-line query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.4 Backslash-p demo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.1 Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.2 Create table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.3 Example of backslash-d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.4 Insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.5 My first SELECT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.6 My first WHERE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.7 More complex WHERE clause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.8 A single cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.9 A block of cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.10 Comparing string fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.11 DELETE example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.12 My first UPDATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.13 Use of ORDER BY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.14 Reverse ORDER BY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.15 Use of ORDER BY and WHERE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.1 Example of common data types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.2 Insertion of specific columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.3 NULL handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.4 Comparison of NULL fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.5 NULLs and blank strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.6 Using DEFAULTs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.7 Controlling column labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.8 Computation using a column label . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.9 Comment styles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.10 New friends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.11 WHERE test for Sandy Gleason . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.12 Friends in New Jersey and Pennsylvania . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.13 Mixing ANDs and ORs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.14 Properly mixing ANDs and ORs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.15 Selecting a range of values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.16 Firstname begins with D. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.17 Regular expression sample queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.18 Complex regular expression queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

xvii

112311241125112611271128112911301131113211331134113511361137113811391140114111421143114411451146114711481149115011511152115311541155115611571158115911601161116211631164116511661167116811691170117111721173117411751176117711781179118011811182118311841185118611871188

xviii LIST OF FIGURES

4.19 CASE example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.20 Complex CASE example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.21 DISTINCT prevents duplicates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.22 Function examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.23 Operator examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.24 SHOW and RESET examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.1 Aggregate examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.2 Aggregates and NULLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.3 Aggregate with GROUP BY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.4 GROUP BY on two columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.5 HAVING usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6.1 Qualified column names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466.2 Joining tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466.3 Creation of company tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486.4 Insertion into company tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496.5 Finding customer name using two queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506.6 Finding customer name using one query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506.7 Finding order number for customer name . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516.8 Three-table join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516.9 Four-table join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526.10 Employees who have taken orders for customers. . . . . . . . . . . . . . . . . . . . . . . . . 536.11 Joining customer and employee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.12 Joining part and employee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546.13 Statename table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546.14 Using a customer code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556.15 One-to-many join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576.16 Unjoined tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576.17 Using table aliases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576.18 Examples of self-joins using table aliases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586.19 Non-equijoins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596.20 New salesorder table for multiple parts per order . . . . . . . . . . . . . . . . . . . . . . . . 596.21 Orderpart table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606.22 Queries involving orderpart table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

7.1 OID test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647.2 Columns with OIDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647.3 Examples of sequence function use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 667.4 Numbering customer rows using a sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . 677.5 Customer table using SERIAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

8.1 Combining two columns with UNION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718.2 Combining two tables with UNION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728.3 UNION with duplicates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 738.4 UNION ALL with duplicates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 738.5 EXCEPT restricts output from the first SELECT . . . . . . . . . . . . . . . . . . . . . . . . . . 738.6 INTERSECT returns only duplicated rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 748.7 Friends not in Dick Gleason’s state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

118911901191119211931194119511961197119811991200120112021203120412051206120712081209121012111212121312141215121612171218121912201221122212231224122512261227122812291230123112321233123412351236123712381239124012411242124312441245124612471248124912501251125212531254

LIST OF FIGURES xix

8.8 Subqueries can replace some joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 768.9 Correlated subquery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 778.10 Employees who took orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 788.11 Customers who have no orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798.12 IN query rewritten using ANY and EXISTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 808.13 NOT IN query rewritten using ALL and EXISTS . . . . . . . . . . . . . . . . . . . . . . . . . . 818.14 Simulating outer joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 828.15 Subqueries with UPDATE and DELETE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 828.16 UPDATE the order_date . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 838.17 Using SELECT with INSERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 838.18 Table creation with SELECT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

9.1 Error generated by undefined function/type combination. . . . . . . . . . . . . . . . . . . . . 919.2 Error generated by undefined operator/type combination . . . . . . . . . . . . . . . . . . . . 929.3 Creation of array columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 929.4 Using arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 939.5 Using large images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

10.1 INSERT with no explicit transaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9510.2 INSERT with explicit transaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9610.3 Two INSERTs in a single transaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9610.4 Multi-statement transaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9610.5 Transaction rollback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9710.6 Read-committed isolation level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9810.7 Serializable isolation level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9910.8 SELECT with no locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10010.9 SELECT…FOR UPDATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

11.1 Example of CREATE INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10311.2 Example of a unique index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10411.3 Using EXPLAIN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10511.4 More complex EXPLAIN examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10611.5 EXPLAIN example using joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

125512561257125812591260126112621263126412651266126712681269127012711272127312741275127612771278127912801281128212831284128512861287128812891290129112921293129412951296129712981299130013011302130313041305130613071308130913101311131213131314131513161317131813191320

xx LIST OF FIGURES

132113221323132413251326132713281329133013311332133313341335133613371338133913401341134213431344134513461347134813491350135113521353135413551356135713581359136013611362136313641365136613671368136913701371137213731374137513761377137813791380138113821383138413851386

List of Tables

3.1 Table friend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4.1 Common data types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2 Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.3 LIKE comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.4 Regular expression operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.5 Regular expression special characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.6 Regular expression examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.7 SET options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.8 DATESTYLE output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.1 Aggregates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

7.1 Sequence number access functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

9.1 POSTGRESQL data types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 869.2 Geometric types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 889.3 Common functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 909.4 Common operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 919.5 Common variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

10.1 Visibility of single-query transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9710.2 Visibility using multi-query transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9810.3 Waiting for a lock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10010.4 Deadlock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

16.1 Interface summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

17.1 psql query buffer commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12017.2 psql listing commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12017.3 psql output commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12017.4 psql special capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12117.5 psql command-line arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

xxi

138713881389139013911392139313941395139613971398139914001401140214031404140514061407140814091410141114121413141414151416141714181419142014211422142314241425142614271428142914301431143214331434143514361437143814391440144114421443144414451446144714481449145014511452

xxii LIST OF TABLES

145314541455145614571458145914601461146214631464146514661467146814691470147114721473147414751476147714781479148014811482148314841485148614871488148914901491149214931494149514961497149814991500150115021503150415051506150715081509151015111512151315141515151615171518

Chapter 1

History of POSTGRESQL

1.1 Introduction

POSTGRESQL is the most advanced open source database server. In this chapter, you will learn aboutdatabases, open source software, and the history of POSTGRESQL.

There are three basic office productivity applications: word processors, spreadsheets, and databases. Wordprocessors produce text documents critical to any business. Spreadsheets are used for financial calculationsand analysis. Databases are used primarily for data storage and retrieval. You can use a word processor ora spreadsheet to store small amounts of data. However, with large volumes of data or data that must beretrieved and updated frequently, databases are the best choice. Databases allow orderly data storage, rapiddata retrieval, and complex data analysis, as you will see in the coming chapters.

1.2 UNIVERSITY OF CALIFORNIA AT BERKELEY

POSTGRESQL’S ancestor was INGRES, developed at the UNIVERSITY OF CALIFORNIA AT BERKELEY (1977–1985).The INGRES code was taken and enhanced by RELATIONAL TECHNOLOGIES/INGRES CORPORATION1, whichproduced one of the first commercially successful relational database servers. Also at Berkeley, MICHAEL

STONEBRAKER lead a team to develop an object-relational database server called POSTGRES (1986–1994). ThePOSTGRES code was taken by ILLUSTRA2 and developed into a commercial product. Two Berkeley graduatestudents, JOLLY CHEN and ANDREW YU, added SQL capabilities to POSTGRES, and called it POSTGRES95(1994–1995). They left Berkeley, but Chen continued maintaining POSTGRES95, which had an active mailinglist.

1.3 Development Leaves BERKELEY

In the summer of 1996, it became clear that the demand for an open source SQL database server was great,and a team was formed to continue development. MARC G. FOURNIER, Toronto, Canada, offered to host themailing list, and provide a server to host the source tree. One thousand mailing list subscribers were movedto the new list. A server was configured, giving a few people login accounts to apply patches to the sourcecode using cvs.3.

1Ingres Corp. was later purchased by Computer Associates.2Illustra was later purchased by Informix and integrated into Informix’s Universal Server.3cvs sychronizes access by developers to shared program files.

1

151915201521152215231524152515261527152815291530153115321533153415351536153715381539154015411542154315441545154615471548154915501551155215531554155515561557155815591560156115621563156415651566156715681569157015711572157315741575157615771578157915801581158215831584

2 CHAPTER 1. HISTORY OF POSTGRESQL

JOLLY CHEN had stated, "This project needs a few people with lots of time, not many people with a littletime." With 250,000 lines of C4 code, we understood what he meant. In the early days, there were fourmajor people involved, MARC FOURNIER, THOMAS LOCKHART in Pasadena, California, VADIM MIKHEEV inKrasnoyarsk, Russia, and myself in Philadelphia, Pennsylvania. We all had full-time jobs, so we were doingthis in our spare time. It certainly was a challenge.

Our first goal was to scour the old mailing list, evaluating patches that had been posted to fix variousproblems. The system was quite fragile then, and not easily understood. During the first six months ofdevelopment, there was fear that a patch would break the system, and we would be unable to correct theproblem. Many bug reports had us scratching our heads, trying to figure out not only what was wrong, buthow the system even performed many functions.

We inherited a huge installed base. A typical bug report was, "When I do this, it crashes the database."We had a whole list of them. It became clear that some organization was needed. Most bug reports requiredsignificant research to fix, and many were duplicates, so our TODO list reported every buggy SQL query. Ithelped us identify our bugs, and made users aware of them too, cutting down on duplicate bug reports.

We had many eager developers, but the learning curve in understanding how the back-end worked wassignificant. Many developers got involved in the edges of the source code, like language interfaces or databasetools, where things were easier to understand. Other developers focused on specific problem queries, tryingto locate the source of the bug. It was amazing to see that many bugs were fixed with just one line of Ccode. POSTGRES had evolved in an academic environment, and had not been exposed to the full spectrum ofreal-world queries. During that period, there was talk of adding features, but the instability of the systemmade bug fixing our major focus.

1.4 POSTGRESQL Global Development Team

In late 1996, we changed the name from POSTGRES95 to POSTGRESQL. It is a mouthful, but honors theBerkeley name and SQL capabilities. We started distributing the source code using remote cvs, whichallowed people to keep up-to-date copies of the development tree without downloading an entire set of filesevery day.

Releases were every 3–5 months. This consisted of 2–3 months of development, one month of betatesting, a major release, and a few weeks to issue sub-releases to correct serious bugs. We were nevertempted to follow a more aggressive schedule with more releases. A database server is not like a wordprocessor or a game, where you can easily restart it if there is a problem. Databases are multi-user, and lockuser data inside the database, so we must make our software as reliable as possible.

Development of source code of this scale and complexity is not for the novice. We had trouble gettingdevelopers interested in a project with such a steep learning curve. However, our civilized atmosphere, andour improved reliability and performance, finally helped attract the experienced talent we needed.

Getting our developers the knowledge they needed to assist with POSTGRESQL was clearly a priority.We had a TODO list that outlined what needed to be done, but with 250,000 lines of code, taking on any TODO

item was a major project. We realized developer education would pay major benefits in helping people getstarted. We wrote a detailed flowchart of the back-end modules.5 We wrote a developers’ FAQ6, to describesome of the common questions of POSTGRESQL developers. With this, developers became more productiveat fixing bugs and adding features.

The source code we inherited from Berkeley was very modular. However, most Berkeley coders usedPOSTGRESQL as a test bed for research projects. Improving existing code was not a priority. Their coding

4C is a popular computer language first developed in the 1970’s.5All the files mentioned in this chapter are available as part of the POSTGRESQL distribution, or at http://www.postgresql.org/docs.6Frequently Asked Questions

http://www.postgresql.org/docs

158515861587158815891590159115921593159415951596159715981599160016011602160316041605160616071608160916101611161216131614161516161617161816191620162116221623162416251626162716281629163016311632163316341635163616371638163916401641164216431644164516461647164816491650

1.5. OPEN SOURCE SOFTWARE 3

styles were also quite varied.We wrote a tool to reformat the entire source tree in a consistent manner. We wrote a script to find

functions that could be marked as static7, or never-called functions that could be removed completely. Theseare run just before each release. A release checklist reminds us of the items to be changed for each release.

As we gained knowledge of the code, we were able to perform more complicated fixes and featureadditions. We redesigned poorly structured code. We moved into a mode where each release had major newfeatures, instead of just bug fixes. We improved SQL conformance, added sub-selects, improved locking, andadded missing SQL functionality. We added commercial-style telephone support.

The Usenet discussion group archives started touting us. In the previous year, we searched for POST-GRESQL, and found many people were recommending other databases, even though we were addressinguser concerns as rapidly as possible. One year later, many people were recommending us to users whoneeded transaction support, complex queries, commercial-grade SQL support, complex data types, and reli-ability. This more clearly portrayed our strengths. Other databases were recommended when speed wasthe overriding concern. REDHAT’S shipment of POSTGRESQL as part of their LINUX8 distribution quicklymultiplied our user base.

Every release is now a major improvement over the last. Our global development team now has mastery ofthe source code we inherited from Berkeley. Finally, every module is understood by at least one developmentteam member. We are now easily adding major features, thanks to the increasing size and experience of ourworld-wide development team.

1.5 Open Source Software

POSTGRESQL is open source software. The term open source software often confuses people. With commercialsoftware, a company hires programmers, develops a product, and sells it to users. With Internet communi-cation, there are new possibilities. In open source software, there is no company. Capable programmers withinterest and some free time get together via the Internet and exchange ideas. Someone writes a programand puts it in a place everyone can access. Other programmers join and make changes. When the programis sufficiently functional, they advertise the program’s availability to other Internet users. Users find bugsor missing features and report them back to the developers, who enhance the program.

It sounds like an unworkable cycle, but in fact it has several advantages:

• A company structure is not required, so there is no overhead and no economic restrictions.

• Program development is not limited to a hired programming staff, but taps the capabilities and experi-ence of a large pool of Internet programmers.

• User feedback is facilitated, allowing program testing by a large number of users in a short period oftime.

• Program enhancements can be rapidly distributed to users.

7A static function is a function that is used by only one program file.8Linux is a popular UNIX-like, open source operating system.

165116521653165416551656165716581659166016611662166316641665166616671668166916701671167216731674167516761677167816791680168116821683168416851686168716881689169016911692169316941695169616971698169917001701170217031704170517061707170817091710171117121713171417151716

4 CHAPTER 1. HISTORY OF POSTGRESQL

171717181719172017211722172317241725172617271728172917301731173217331734173517361737173817391740174117421743174417451746174717481749175017511752175317541755175617571758175917601761176217631764176517661767176817691770177117721773177417751776177717781779178017811782

Chapter 2

Issuing Database Commands

At this point, the book assumes you have:

• POSTGRESQL installed

• POSTGRESQL server running

• You are a configured POSTGRESQL user

• You have created a database called test.

If not, please see appendix B.In this chapter, you will learn how to connect to the database server, and issue simple commands to the

POSTGRESQL server.

2.1 Starting a Database Session

POSTGRESQL uses a client/server model of communication. That means that a POSTGRESQL server continuallyruns, waiting for client requests. The server processes the request and returns the result to the client.

Choosing an Interface

Because the POSTGRESQL server runs as an independent process on the computer, there is no way for a userto interact with it directly. Instead, there are client applications designed specifically for user interaction.This chapter shows you how to interact with POSTGRESQL using the psql interface. Additional interfacesare covered in Chapter 16.

Choosing a Database

Each POSTGRESQL server controls access to a number of databases. Databases are storage areas used bythe server to partition information. For example, a typical installation may have a production database, usedto keep all information about a company. They may also have a training database, used for training andtesting purposes. They may have private databases, used by individuals to store personal information. Forthis exercise, we will assume you have created an empty database called test. If this is not the case, seesection B.5.

5

178317841785178617871788178917901791179217931794179517961797179817991800180118021803180418051806180718081809181018111812181318141815181618171818181918201821182218231824182518261827182818291830183118321833183418351836183718381839184018411842184318441845184618471848

6 CHAPTER 2. ISSUING DATABASE COMMANDS

Starting a Session

To start a psql session and connect to the test database, type psql test at the command prompt. Your outputshould look similar to figure 2.1. Remember, the operating system command prompt is case-sensitive, soyou must type this in all lowercase.1

$ psql testWelcome to psql, the PostgreSQL interactive terminal.

Type: \copyright for distribution terms\h for help with SQL commands\? for help on internal slash commands\g or terminate with semicolon to execute query\q to quit

test=>

Figure 2.1: psql session startup

2.2 Controlling a Session

Congratulations. You have successfully connected to the POSTGRESQL server. You can now issue commands,and receive replies from the server. Let’s try one. Type SELECT CURRENT_USER; and press Enter (see figure 2.2).If you make a mistake, just press backspace and retype. This should show your login name underneath the

test=> SELECT CURRENT_USER;getpgusername-------------postgres(1 row)

test=>

Figure 2.2: My first SQL query

dashed line. In the example, the login name postgres is shown. The word getpgusername is a column label.The server is also reporting that it has returned one row of data. The line test=> tells you that the server isdone and is waiting for your next database query.

Let’s try another one. At the test=> prompt, type SELECT CURRENT_TIMESTAMP; and press Enter. It shouldshow the current date and time. Each time you execute the query, the server will report the current time toyou.

Typing in the Query Buffer

Typing in the query buffer is similar to typing at an operating system command prompt. However, at anoperating system command prompt, Enter completes each command. In psql, commands are completed only

1A few operating systems are case-insensitive.

184918501851185218531854185518561857185818591860186118621863186418651866186718681869187018711872187318741875187618771878187918801881188218831884188518861887188818891890189118921893189418951896189718981899190019011902190319041905190619071908190919101911191219131914

2.3. GETTING HELP 7

when you enter a semicolon (;) or backslash-g (\g). Here’s a good example. Let’s do SELECT 1 + 3; but ina different way. See figure 2.3.2 Notice the query is spread over three lines. Notice the prompt changed

test=> SELECTtest-> 1 + 3test-> ;?column?--------

4(1 row)

test=>

Figure 2.3: Multi-line query

from => on the first line to -> on the second line to indicate the query was being continued. The semicolontold psql to send the query to the server. We could easily have replaced the semicolon with backslash-g.I don’t recommend you type queries as ugly as this one, but longer queries will benefit from the ability tospread them over multiple lines. You might notice the query is in uppercase. Unless you are typing a stringin quotes, the POSTGRESQL server doesn’t care whether words are uppercase or lowercase. For stylisticreasons, I recommend you enter words special to POSTGRESQL in uppercase.

Try some queries on your own involving arithmetic. Each computation must start with the word SELECT,then your computation, and finally a semicolon or backslash-g to finish. For example, SELECT 4 * 10; wouldreturn 40. Addition is performed using plus (+), subtraction using minus (-), multiplication using asterisk(*), and division using forward slash (/).

If you have readline3 installed, psql will even allow you to use your arrow keys. Your left and right arrowkeys allow you to move around, and the up and down arrows retrieve previously typed queries.

Displaying the Query Buffer

You can continue typing indefinitely, until you use a semicolon or backslash-g. Everything you type willbe buffered by psql until you are ready to send the query. If you use backslash-p (\p), you see everythingaccumulated in the query buffer. In figure 2.4, three lines of text are accumulated and displayed by the userusing backslash-p. After display, we use backslash-g to execute the query which returns the value 21. Thiscomes in handy with long queries.

Erasing the Query Buffer

If you don’t like what you have typed, use backslash-r (\r) to reset or erase the buffer.

2.3 Getting Help

You might ask, “Are these backslash commands documented anywhere?” If you look at figure 2.1, you willsee the answer is printed every time psql starts. Backslash-? (\?) prints all valid backslash commands.Backslash-h displays help for SQL commands. SQL commands are covered in the next chapter.

2Don’t be concerned about ?column?. We will cover that in section 4.7.3Readline is an open-source library that allows powerful command-line editing.

191519161917191819191920192119221923192419251926192719281929193019311932193319341935193619371938193919401941194219431944194519461947194819491950195119521953195419551956195719581959196019611962196319641965196619671968196919701971197219731974197519761977197819791980

8 CHAPTER 2. ISSUING DATABASE COMMANDS

test=> SELECTtest-> 2 * 10 + 1test-> \pSELECT2 * 10 + 1test-> \g?column?---------

21(1 row)

test=>

Figure 2.4: Backslash-p demo

2.4 Exiting a Session

This chapter would not be complete without showing you how to exit psql. Use backslash-q (\q) to quit thesession. Backslash-q exits psql. Backslash g (go), p (print), r (reset), and q (quit) should be all you need for awhile. Section 17.1 has more information about psql.

198119821983198419851986198719881989199019911992199319941995199619971998199920002001200220032004200520062007200820092010201120122013201420152016201720182019202020212022202320242025202620272028202920302031203220332034203520362037203820392040204120422043204420452046

Chapter 3

Basic SQL Commands

SQL stands for Structured Query Language. It is the most common way of communicating with databaseservers, and is supported by almost all database systems. In this chapter, you will learn about relationaldatabase systems and how to issue the most important SQL commands.

3.1 Relational Databases

As I mentioned in section 1.1, the purpose of a database is rapid data storage and retrieval. Today, mostdatabase systems are relational databases. While the term relational database has a mathematical foundation,in practice it means that all data stored in the database is arranged in a uniform structure.

In figure 3.1, you see the database server with access to three databases, test, demo, and finance. You

Database Server

��

��

��

��

��

��

Database Test��

��

Database Demo

��

��

��

��

Database Finance

Figure 3.1: Databases

could issue the command psql finance and be connected to the finance database. You have already dealt withthis in chapter 2. Using psql, you chose to connect to database test with the command psql test. To see alist of databases available at your site, type psql -l. The first column lists the database names. However,you may not have permission to connect to them.

You might ask, “What are those black rectangles in the databases?” Those are tables. Tables are thefoundation of a relational database management system (RDBMS). As I mentioned earlier, databases store

9

204720482049205020512052205320542055205620572058205920602061206220632064206520662067206820692070207120722073207420752076207720782079208020812082208320842085208620872088208920902091209220932094209520962097209820992100210121022103210421052106210721082109211021112112

10 CHAPTER 3. BASIC SQL COMMANDS

data. Those tables are where data is stored in a database. Each table has a name defined by the person whocreated it.

Let’s look at a single table called friend in table 3.1. You can easily see how tables are used to store data.

FirstName LastName City State AgeMike Nichols Tampa FL 19Cindy Anderson Denver CO 23Sam Jackson Allentown PA 22

Table 3.1: Table friend

Each friend is listed as a separate row in the table. The table records five pieces of information about eachfriend, firstname, lastname, city, state, and age.1

Each friend is on a separate row. Each column contains the same type of information. This is the type ofstructure that makes relational databases successful. Relational databases allow you to select certain rowsof data, certain columns of data, or certain cells. You could select the entire row for Mike, the entire columnfor City, or a specific cell like Denver. There are synonyms for the terms table, row, and column. Table is moreformally referred to as a relation or class, row as record or tuple, and column as field or attribute.

3.2 Creating Tables

Let’s create our own table and call it friend. The psql statement to create the table is shown in figure 3.2.You don’t have to type it exactly like that. You could have used all lowercase, or you could have written it in

test=> CREATE TABLE friend (test(> firstname CHAR(15),test(> lastname CHAR(20),test(> city CHAR(15),test(> state CHAR(2),test(> age INTEGERtest(> )\gCREATE

Figure 3.2: Create table

one long line, and it would have worked just the same.Let’s look at it from the top down. The words CREATE TABLE have special meaning to the database

server. They indicate that the next request from the user is to create a table. You will find most SQL

requests can be quickly identified by the first few words. The rest of the request has a specific format that isunderstood by the database server. While capitalization and spacing are optional, the format for a query mustbe followed. Otherwise, the database server will issue an error such as parser: parse error at or near"pencil", meaning the database server got confused near the word pencil. In such a case, the manual page forthe command should be consulted and the query reissued in the proper format. A copy of the POSTGRESQLmanual pages appear in appendix D.

The CREATE TABLE command follows a specific format. First, the two words CREATE TABLE, then the tablename, then an open parenthesis, then a list of column names and their types, followed by a close parenthesis.

1In a real-world database, the person’s birth date would be stored and not the person’s age. Age has to be updated every timethe person has a birthday. A person’s age can be computed when needed from a birth date field.

211321142115211621172118211921202121212221232124212521262127212821292130213121322133213421352136213721382139214021412142214321442145214621472148214921502151215221532154215521562157215821592160216121622163216421652166216721682169217021712172217321742175217621772178

3.3. ADDING DATA WITH INSERT 11

The important part of this query is between the parentheses. You will notice there are five lines there. Thefirst line, firstname CHAR(15), represents the first column of the table to create. The word firstname is thename of the first column, and the text CHAR(15) indicates the column type and length. The CHAR(15) meansthe first column of every row holds up to 15 characters. The second column is called lastname and holdsup to 20 characters. Columns of type char hold characters of a specified length. User-supplied characterstrings2 that do not fill the entire length of the field are right-padded with blanks. Columns city and state aresimilar. The final column, age, is different. It is not a CHAR() column. It is an INTEGER column. It holds wholenumbers, not characters. Even if there were 5,000 friends in the table, you can be certain that there are nonames appearing in the age column, only whole numbers. It is this structure that helps databases to be fastand reliable.

POSTGRESQL supports more column types than just char() and integer. However, in this chapter we willuse only these two. Sections 4.1 and 9.2 cover column types in more detail.

Create some tables yourself now. Only use letters for your table and column names. Don’t use anynumbers, punctuation, or spaces at this time.

The \d command allows you to see information about a specific table, or a list of all table names in thecurrent database. To see information about a specific table, type \d followed by the name of the table. Forexample, to see the column names and types of your new friend table in psql, type \d friend. Figure 3.3shows this. If you use \d with no table name after it, you will see a list of all table names in the database.

test=> \d friendTable "friend"

Attribute | Type | Extra-----------+----------+-------firstname | char(15) |lastname | char(20) |city | char(15) |state | char(2) |age | int4 |

Figure 3.3: Example of backslash-d

3.3 Adding Data with INSERT

Let’s continue toward the goal of making a table exactly like the friend table in table 3.1. We have the tablecreated, but there is no data/friends in it. You add data into a table with the INSERT command. Just as CREATE

TABLE has a specific format that must be followed, INSERT has a specific format too. You can see the formatin figure 3.4. First, you must use single quotes around the character strings. Double quotes will not work.Spacing and capitalization are optional, except inside the single quotes. Inside them, the text is taken asliteral, so any capitalization will be stored in the database exactly as you specify. If you type too many quotes,you might get to a point where your backslash commands don’t work anymore, and your prompt will appearas test’>. Notice the single-quote before the greater-than sign. Just type another single quote to get out ofthis mode, use \r to clear the query buffer and start again. Notice that the 19 doesn’t have quotes. It doesn’tneed them because the column is a numeric column, not a character column. When you do your inserts, besure to match each piece of data to the receiving column. Use the INSERT query in figure 3.4 as a sample andcomplete the insertion of the three friends shown in table 3.1.

2A character string is a group of characters strung together.

217921802181218221832184218521862187218821892190219121922193219421952196219721982199220022012202220322042205220622072208220922102211221222132214221522162217221822192220222122222223222422252226222722282229223022312232223322342235223622372238223922402241224222432244


test=> INSERT INTO friend VALUES (test(> ’Mike’,test(> ’Nichols’,test(> ’Tampa’,test(> ’FL’,test(> 19test(> );INSERT 18720 1

Figure 3.4: Insert

3.4 Viewing Data with SELECT

You have seen how to store data in the database. Let’s show you how to retrieve data. Surprisingly, thereis only one command to get data out of the database, and that command is SELECT. You have already usedSELECT in your first database query in figure 2.2 on page 6. SELECT has many variations. We are going to useit to show the rows in the table friend. The query is shown in figure 3.5. In this case, I put the entire query

test=> SELECT * FROM friend;firstname |lastname |city |state|age---------------+--------------------+---------------+-----+---Mike |Nichols |Tampa |FL | 19Cindy |Anderson |Denver |CO | 23Sam |Jackson |Allentown |PA | 22(3 rows)

Figure 3.5: My first SELECT

on one line. That’s fine. As queries get longer, breaking them into multiple lines helps make things clearer.

Let’s look at this in detail. First, we have the word SELECT, followed by an asterisk (*), then the wordFROM, and our table name friend, and a semicolon to execute the query. The SELECT starts our command,and tells the database server what is coming next. The * tells the server we want all the columns from thetable. The FROM friend tells which table we want to see. So, we have said we want all (*) columns from ourtable friend, and indeed, that is what is displayed. It should have the same data as table 3.1 on page 10.

As I mentioned, SELECT has a large number of variations, and we will look at a few of them now. First,suppose you want to retrieve only one of the columns from the friend table. You might already suspect thatthe asterisk (*) has to be changed in the query. If you replace the asterisk (*) with one of the column names,you will see only that column. Try SELECT city FROM friend. You can choose any of the columns. You caneven choose multiple columns, by separating the names with a comma. For example, to see first and lastnames only, use SELECT firstname, lastname FROM friend. Try a few more SELECT commands until you getcomfortable. If you specify a name that is not a valid column name, you will get an error message, ERROR:attribute ’mycolname’ not found. If you try selecting from a table that does not exist, you will get an errormessage like ERROR: Relation ’mytablename’ does not exist. POSTGRESQL is using the formal relationaldatabase terms relation and attribute in these error messages.

224522462247224822492250225122522253225422552256225722582259226022612262226322642265226622672268226922702271227222732274227522762277227822792280228122822283228422852286228722882289229022912292229322942295229622972298229923002301230223032304230523062307230823092310

3.5. SELECTING SPECIFIC ROWS WITH WHERE 13

3.5 Selecting Specific Rows with WHERE

Let’s take the next step in controlling the output of SELECT. In the previous section, we showed how to selectonly certain columns from the table. Now, we will show how to select only certain rows. The additional thingneeded to do this is the WHERE clause. Without a WHERE clause, every row is returned.

The WHERE clause goes right after the FROM clause. In the WHERE clause, you specify the rows you wantreturned, as shown in figure 3.6. The query returns the rows that have an age column equal to 23. Figure 3.7

test=> SELECT * FROM friend WHERE age = 23;firstname |lastname |city |state|age---------------+--------------------+---------------+-----+---Cindy |Anderson |Denver |CO | 23(1 row)

Figure 3.6: My first WHERE

shows a more complex example that returns two rows. You can combine the column restrictions and the row

test=> SELECT * FROM friend WHERE age <= 22;firstname |lastname |city |state|age---------------+--------------------+---------------+-----+---Mike |Nichols |Tampa |FL | 19Sam |Jackson |Allentown |PA | 22(2 rows)

Figure 3.7: More complex WHERE clause

restrictions in a single query, allowing you to select any single cell, or a block of cells. See figures 3.8 and 3.9.

test=> SELECT lastname FROM friend WHERE age = 22;lastname--------------------Jackson(1 row)

Figure 3.8: A single cell

Try using one of the other columns in the WHERE clause. Up to this point, we have made only comparisonson the age column. The age column is integer. The only tricky part about the other columns is that theyare char() columns, so you have to put the comparison value in single quotes. You also have to match thecapitalization exactly. See figure 3.10. If you had compared the firstname column to ’SAM’ or ’sam’, it wouldhave returned no rows.

Try a few more until you are comfortable.

3.6 Removing Data with DELETE

We know how to add data to the database. Now we learn how to remove it. Removal is quite simple. TheDELETE command can quickly remove any or all rows from a table. The command DELETE FROM friend will

231123122313231423152316231723182319232023212322232323242325232623272328232923302331233223332334233523362337233823392340234123422343234423452346234723482349235023512352235323542355235623572358235923602361236223632364236523662367236823692370237123722373237423752376


test=> SELECT city, state FROM friend WHERE age >= 21;city |state---------------+-----Denver |COAllentown |PA(2 rows)

Figure 3.9: A block of cells

test=> SELECT * FROM friend WHERE firstname = ’Sam’;firstname |lastname |city |state|age---------------+--------------------+---------------+-----+---Sam |Jackson |Allentown |PA | 22(1 row)

Figure 3.10: Comparing string fields

delete all rows from the table friend. The query DELETE FROM friend WHERE age = 19 will remove only thoserows that have an age column equal to 19.

Here is a good exercise. INSERT a row into the friend table, use SELECT to verify the row has beenproperly added, then use DELETE to remove the row. This combines the things you learned in the previoussections. Figure 3.11 shows an example.

3.7 Modifying Data with UPDATE

How do you modify data already in the database? You could use DELETE to remove the row, then INSERT toinsert a new row, but that is quite inefficient. The UPDATE command allows you to update data already in thedatabase. It follows a format similar to the previous commands.

Mike had a birthday, so we want to update his age in this table. Figure 3.12 shows an example. Theexample shows the word UPDATE, the table name friend, followed by SET, then the column name, the equalssign (=), and the new value. The WHERE clause restricts the number of rows affected by the update, as inDELETE. Without a WHERE clause, all rows are updated.

Notice that the Mike row has moved to the end of the list. The next section will show you how to controlthe order of the row display.

3.8 Sorting Data with ORDER BY

In a SELECT query, rows are displayed in an undetermined order. If you want to guarantee the rows arereturned from SELECT in a specific order, you need to add the ORDER BY clause to the end of the SELECT.Figure 3.13 shows the use of ORDER BY. You can reverse the order by adding DESC, as seen in figure 3.14.If the query were to use a WHERE clause too, the ORDER BY would appear after the WHERE clause, as infigure 3.15.

You can ORDER BY more than one column by specifying multiple column names or labels, separated bycommas. It would sort by the first column specified. For rows with equal values in the first column, itwould sort based on the second column specified. Of course, this does not make sense in the friend examplebecause all column values are unique.

237723782379238023812382238323842385238623872388238923902391239223932394239523962397239823992400240124022403240424052406240724082409241024112412241324142415241624172418241924202421242224232424242524262427242824292430243124322433243424352436243724382439244024412442

3.8. SORTING DATA WITH ORDER BY 15

test=> SELECT * FROM friend;firstname |lastname |city |state|age---------------+--------------------+---------------+-----+---Mike |Nichols |Tampa |FL | 19Cindy |Anderson |Denver |CO | 23Sam |Jackson |Allentown |PA | 22(3 rows)

test=> INSERT INTO friend VALUES (’Jim’, ’Barnes’, ’Ocean City’,’NJ’, 25);INSERT 18880 1test=> SELECT * FROM friend;firstname |lastname |city |state|age---------------+--------------------+---------------+-----+---Mike |Nichols |Tampa |FL | 19Cindy |Anderson |Denver |CO | 23Sam |Jackson |Allentown |PA | 22Jim |Barnes |Ocean City |NJ | 25(4 rows)

test=> DELETE FROM friend WHERE lastname = ’Barnes’;DELETE 1test=> SELECT * FROM friend;firstname |lastname |city |state|age---------------+--------------------+---------------+-----+---Mike |Nichols |Tampa |FL | 19Cindy |Anderson |Denver |CO | 23Sam |Jackson |Allentown |PA | 22(3 rows)

Figure 3.11: DELETE example

test=> UPDATE friend SET age = 20 WHERE firstname = ’Mike’;UPDATE 1test=> SELECT * FROM friend;firstname |lastname |city |state|age---------------+--------------------+---------------+-----+---Cindy |Anderson |Denver |CO | 23Sam |Jackson |Allentown |PA | 22Mike |Nichols |Tampa |FL | 20(3 rows)

Figure 3.12: My first UPDATE

244324442445244624472448244924502451245224532454245524562457245824592460246124622463246424652466246724682469247024712472247324742475247624772478247924802481248224832484248524862487248824892490249124922493249424952496249724982499250025012502250325042505250625072508


test=> SELECT * FROM friend ORDER BY state;firstname |lastname |city |state|age---------------+--------------------+---------------+-----+---Cindy |Anderson |Denver |CO | 23Mike |Nichols |Tampa |FL | 20Sam |Jackson |Allentown |PA | 22(3 rows)

Figure 3.13: Use of ORDER BY

test=> SELECT * FROM friend ORDER BY age DESC;firstname |lastname |city |state|age---------------+--------------------+---------------+-----+---Cindy |Anderson |Denver |CO | 23Sam |Jackson |Allentown |PA | 22Mike |Nichols |Tampa |FL | 20(3 rows)

Figure 3.14: Reverse ORDER BY

test=> SELECT * FROM friend WHERE age >= 21 ORDER BY firstname;firstname |lastname |city |state|age---------------+--------------------+---------------+-----+---Cindy |Anderson |Denver |CO | 23Sam |Jackson |Allentown |PA | 22(2 rows)

Figure 3.15: Use of ORDER BY and WHERE

250925102511251225132514251525162517251825192520252125222523252425252526252725282529253025312532253325342535253625372538253925402541254225432544254525462547254825492550255125522553255425552556255725582559256025612562256325642565256625672568256925702571257225732574

3.9. DESTROYING TABLES 17

3.9 Destroying Tables

This chapter would not be complete without showing how to destroy tables. It is accomplished using the DROP

TABLE command. The command DROP TABLE friend will remove the friend table. Both the table structureand the data contained in the table will be erased. We will be using the friend table in the next chapter, so Idon’t recommend you remove the table at this time. Remember, to remove only the data in the table, withoutremoving the table structure itself, use DELETE.

3.10 Summary

In this chapter, we have shown you the basic operations of any database:

• Table creation (CREATE TABLE)

• Table destruction (DROP TABLE)

• Adding (INSERT)

• Displaying(SELECT)

• Removing (DELETE)

• Replacing (UPDATE)

257525762577257825792580258125822583258425852586258725882589259025912592259325942595259625972598259926002601260226032604260526062607260826092610261126122613261426152616261726182619262026212622262326242625262626272628262926302631263226332634263526362637263826392640


264126422643264426452646264726482649265026512652265326542655265626572658265926602661266226632664266526662667266826692670267126722673267426752676267726782679268026812682268326842685268626872688268926902691269226932694269526962697269826992700270127022703270427052706

Chapter 4

Customizing Queries

This chapter will illustrate additional capabilities of the basic SQL commands.

4.1 Data types

Table 4.1 shows the most common column data types. Figure 4.1 shows queries using these types. There

Category Type Descriptioncharacter string char(length) blank-padded string, fixed storage length

varchar(length) variable storage lengthnumber integer integer, +/–2 billion range

float floating point number, 15-digit precisionnumeric(precision, decimal) number with user-defined precision and decimal location

date/time date datetime timetimestamp date and time

Table 4.1: Common data types

is table creation, INSERT, and SELECT. There are a few things of interest in this example. First, notice howthe numbers do not require quotes, while character strings, dates, and times require them. Also note thetimestamp column displays its value in the standard UNIX date1 format. It also displays the time zone.

The final SELECT uses psql’s \x display mode.2 Without the \x, the SELECT would have displayed toomuch information to fit on one line. The fields would have wrapped around the edge of the display, making ithard to read. The columns would still line up, but there would be other data in the way. Of course, anothersolution to field wrapping is to select fewer columns. Remember, you can select any columns from the tablein any order.

Section 9.2 covers column types in more detail.

4.2 Quotes Inside Text

Suppose you want to insert the name O’Donnell. You might be tempted to enter this in psql as ’O’Donnell’,but this will not work. The presence of a single quote inside a single-quoted string generates a parse error.

1This is the format generated by typing the command date at the UNIX command prompt.2See section 17.1 for a full list of the psql backslash commands.

19

270727082709271027112712271327142715271627172718271927202721272227232724272527262727272827292730273127322733273427352736273727382739274027412742274327442745274627472748274927502751275227532754275527562757275827592760276127622763276427652766276727682769277027712772

20 CHAPTER 4. CUSTOMIZING QUERIES

test=> CREATE TABLE alltypes (state CHAR(2),name CHAR(30),children INTEGER,distance FLOAT,budget NUMERIC(16,2),born DATE,checkin TIME,started TIMESTAMP

);CREATEtest=> INSERT INTO alltypestest-> VALUES (test(> ’PA’,test(> ’Hilda Blairwood’,test(> 3,test(> 10.7,test(> 4308.20,test(> ’9/21/74’,test(> ’9:00’,test(> ’07/03/1996 10:30:00’);INSERT 18544 1test=> SELECT state, name, children, distance, budget FROM alltypes;state|name |children|distance| budget-----+------------------------------+--------+--------+-------PA |Hilda Blairwood | 3| 10.7|4308.20(1 row)

test=> SELECT born, checkin, started FROM alltypes;born|checkin |started

----------+--------+----------------------------09-21-1974|09:00:00|Wed Jul 03 10:30:00 1996 EDT(1 row)

test=> \xExpanded display is on.test=> SELECT * FROM alltypes;-[ RECORD 1 ]----------------------------state | PAname | Hilda Blairwoodchildren | 3distance | 10.7budget | 4308.20born | 09-21-1974checkin | 09:00:00started | Wed Jul 03 10:30:00 1996 EDT

Figure 4.1: Example of common data types

277327742775277627772778277927802781278227832784278527862787278827892790279127922793279427952796279727982799280028012802280328042805280628072808280928102811281228132814281528162817281828192820282128222823282428252826282728282829283028312832283328342835283628372838

4.3. USING NULL VALUES 21

One way to place a single quote inside a single-quoted string is to use two quotes together like this, ’O’ ’Don-nell’.3 Two single quotes inside a single-quoted string cause one single quote to be generated. Another wayis to use a backslash like this, ’O\’Donnell’. The backslash escapes the single quote character.

4.3 Using NULL Values

Let’s return to the INSERT statement described in section 3.3 on page 11. We will continue to use the friendtable from the previous chapter. In figure 3.4, we specified a value for friend column. Suppose we wanted toinsert a new row, but did not want to supply data for all the columns, i.e. we want to insert information aboutMark, but we don’t know Mark’s age.

Figure 4.2 shows this. After the table name, we have column names in parentheses. These columns will

test=> INSERT INTO friend (firstname, lastname, city, state)test-> VALUES (’Mark’, ’Middleton’, ’Indianapolis’, ’IN’);INSERT 18881 1

Figure 4.2: Insertion of specific columns

be assigned, in order, to the supplied data values. If we were supplying data for all columns, we wouldn’tneed to name them. In this example, we must name the columns. The table has five columns, but we areonly supplying four data values.

The column we did not assign was age. The interesting question is, “What is in the age cell for Mark?”.The answer is that the age cell contains a NULL value.

NULL is a special value that is valid in any column. It is used when a valid entry for a field is not known ornot applicable. In the previous example, we wanted to add Mark to the database but we didn’t know his age.It is hard to imagine what numeric value could be used for Mark’s age column. Zero or minus-one would bestrange age values. NULL is the perfect value for his age.

Suppose we had a spouse column. What value should be used if someone is not married? A NULL valuewould be the proper value for that field. If there were a wedding_anniversary column, unmarried peoplewould have a NULL value in that field. NULL values are very useful. Before databases supported NULL values,users would put special values in columns, like -1 for unknown numbers and 1/1/1900 for unknown dates.NULLs are much clearer.

NULLs have a special behavior in comparisons. Look at figure 4.3. First, notice the age column for Markis empty. It is really a NULL. In the next query, because NULL values are unknown, the NULL row does notappear in the output. The third query really confuses people.4 Why doesn’t the Mark row appear? The ageis NULL or unknown, meaning the database doesn’t know if it equals 99 or not, so it doesn’t guess. It refusesto print it. In fact, there is no comparison that will produce the NULL row, except the last query shown. Thetests IS NULL and IS NOT NULL are designed specifically to test for the existence of NULL values. NULLs oftenconfuse new users. Remember, if you are making comparisons on columns that could contain NULL values,you must test for them specifically.

Figure 4.4 shows an example. We have inserted Jack, but the city and state were not known, so they areset to NULL. The next query’s WHERE comparison is contrived, but illustrative. Because city and state areboth NULL, you might suspect that the Jack row would be returned. However, because NULL means unknown,there is no way to know if the two NULL values are equal. Again, POSTGRESQL doesn’t guess, and refuses toprint it.

3That is not a double qoute between the O and D. Those are two single quotes.4The <> means not equal.

283928402841284228432844284528462847284828492850285128522853285428552856285728582859286028612862286328642865286628672868286928702871287228732874287528762877287828792880288128822883288428852886288728882889289028912892289328942895289628972898289929002901290229032904


test=> SELECT * FROM friend;firstname |lastname |city |state|age---------------+--------------------+---------------+-----+---Cindy |Anderson |Denver |CO | 23Sam |Jackson |Allentown |PA | 22Mike |Nichols |Tampa |FL | 20Mark |Middleton |Indianapolis |IN |(4 rows)

test=> SELECT * FROM friend WHERE age > 0;firstname |lastname |city |state|age---------------+--------------------+---------------+-----+---Cindy |Anderson |Denver |CO | 23Sam |Jackson |Allentown |PA | 22Mike |Nichols |Tampa |FL | 20(3 rows)

test=> SELECT * FROM friend WHERE age <> 99;firstname |lastname |city |state|age---------------+--------------------+---------------+-----+---Cindy |Anderson |Denver |CO | 23Sam |Jackson |Allentown |PA | 22Mike |Nichols |Tampa |FL | 20(3 rows)

test=> SELECT * FROM friend WHERE age IS NULL;firstname |lastname |city |state|age---------------+--------------------+---------------+-----+---Mark |Middleton |Indianapolis |IN |(1 row)

Figure 4.3: NULL handling

test=> INSERT INTO friendtest-> VALUES (’Jack’, ’Burger’, NULL, NULL, 27);INSERT 19053 1test=> SELECT * FROM friend WHERE city = state;firstname|lastname|city|state|age---------+--------+----+-----+---(0 rows)

Figure 4.4: Comparison of NULL fields

290529062907290829092910291129122913291429152916291729182919292029212922292329242925292629272928292929302931293229332934293529362937293829392940294129422943294429452946294729482949295029512952295329542955295629572958295929602961296229632964296529662967296829692970

4.4. CONTROLLING DEFAULT VALUES 23

There is one more issue with NULLs that needs clarification. In character columns, a NULL is not the sameas a zero length field. That means that the string ’’ and NULL are different. Figure 4.5 shows an example ofthis. There are no valid numeric and date blank values, but a character string can be blank. When viewed

test=> CREATE TABLE nulltest (name CHAR(20), spouse CHAR(20));CREATEtest=> INSERT INTO nulltest VALUES (’Andy’, ’’);INSERT 18986 1test=> INSERT INTO nulltest VALUES (’Tom’, NULL);INSERT 18987 1test=> SELECT * FROM nulltest;

name | spouse----------------------+----------------------Andy |Tom |(2 rows)

test=> SELECT * FROM nulltest WHERE spouse = ’’;name | spouse

----------------------+----------------------Andy |(1 row)

test=> SELECT * FROM nulltest WHERE spouse IS NULL;name | spouse

----------------------+--------Tom |(1 row)

Figure 4.5: NULLs and blank strings

in psql, any numeric field that is blank has to contain a NULL because there is no blank number. However,there are blank strings, so blank strings and NULLs are displayed the same in psql. However, they are notthe same, so be careful not to confuse the meaning of NULLs in character fields.

4.4 Controlling DEFAULT Values

As we learned in the previous section, columns not specified in an INSERT statement are given NULL values.This can be changed using the DEFAULT keyword. When creating a table, next to each column type, you canuse the keyword DEFAULT and then a value. The value will be used anytime the column value is not suppliedin an INSERT. If no DEFAULT is defined, a NULL is used for the column. Figure 4.6 shows a typical use ofdefault values. The default for the timestamp column is actually a call to an internal POSTGRESQL variablethat returns the current date and time. If any value is supplied for a field with a default, that value is usedinstead.

297129722973297429752976297729782979298029812982298329842985298629872988298929902991299229932994299529962997299829993000300130023003300430053006300730083009301030113012301330143015301630173018301930203021302230233024302530263027302830293030303130323033303430353036


test=> CREATE TABLE account (test(> name CHAR(20),test(> balance NUMERIC(16,2) DEFAULT 0,test(> active CHAR(1) DEFAULT ’Y’,test(> created TIMESTAMP DEFAULT CURRENT_TIMESTAMPtest(> );CREATEtest=> INSERT INTO account (name)test-> VALUES (’Federated Builders’);INSERT 19023 1test=> SELECT * FROM account;

name | balance | active | created----------------------+---------+--------+------------------------------Federated Builders | 0.00 | Y | Sat Nov 13 13:50:15 1994 EST(1 row)

Figure 4.6: Using DEFAULTs

4.5 Column Labels

You might have noticed the text that appears at the top of each column in the SELECT output. That is calledthe column label. Usually, the label is the name of the column being selected. However, you can control whattext appears at the top of each column by using the AS keyword. For example, figure 4.7 replaces the defaultcolumn label firstname with the column label buddy. You might have noticed that the query in figure 2.3 on

test=> SELECT firstname AS buddy FROM friend;buddy---------------CindySamMikeMark(4 rows)

Figure 4.7: Controlling column labels

page 7 has the column label ?column?. The database server returns this label when there is no suitable label.In that case, the result of an addition doesn’t have an appropriate label. Figure 4.8 shows the same querywith an appropriate label added using AS.

4.6 Comments

POSTGRESQL allows you to place any text into psql for use as comments. There are two comment styles.The presence of two dashes (- -) marks all text to the end of the line as a comment. POSTGRESQL alsounderstand C-style comments, where the comment begins with slash-asterisk (/*) and ends with asterisk-slash (*/). Figure 4.9 shows these comment styles. Notice how the multi-line comment is marked by a psql

303730383039304030413042304330443045304630473048304930503051305230533054305530563057305830593060306130623063306430653066306730683069307030713072307330743075307630773078307930803081308230833084308530863087308830893090309130923093309430953096309730983099310031013102

4.7. AND/OR USAGE 25

test=> SELECT 1 + 3 AS total;total-----

4(1 row)

Figure 4.8: Computation using a column label

command prompt of *>. It is a reminder you are in a multi-line comment, just as -> is a reminder you are ina multi-line statement, and ’> is a reminder you are in a multi-line quoted string.

test=> -- a single line commenttest=> /* a multi-linetest*> comment */

Figure 4.9: Comment styles

4.7 AND/OR Usage

Up to this point, there have been only simple WHERE clause tests. In the next few sections, we will showhow to do more complex WHERE clause testing.

Complex WHERE clause tests are done by connecting simple tests using the words AND and OR. Forillustration, I have loaded the friend table with new people in figure 4.10. Selecting certain rows from the

test=> SELECT * FROM friend ORDER BY firstname\gfirstname |lastname |city |state|age---------------+--------------------+---------------+-----+---Dean |Yeager |Plymouth |MA | 24Dick |Gleason |Ocean City |NJ | 19Ned |Millstone |Cedar Creek |MD | 27Sandy |Gleason |Ocean City |NJ | 25Sandy |Weber |Boston |MA | 33Victor |Tabor |Williamsport |PA | 22(6 rows)

Figure 4.10: New friends

table will require more complex WHERE conditions. For example, if we wanted to select Sandy Gleason byname, it would be difficult with only one comparison in the WHERE clause. If we tested for firstname =’Sandy’, we would select both Sandy Gleason and Sandy Weber. If we tested for lastname = ’Gleason’, wewould get both Sandy Gleason and her brother Dick Gleason. The proper way is to use AND to join tests ofboth firstname and lastname. The proper query is shown in figure 4.11. The AND joins the two comparisonswe need.

A similar comparison could be done to select friends living in Cedar Creek, Maryland. There could beother friends living in Cedar Creek, Ohio, so the comparison city = ’Cedar Creek’ is not enough. The propertest is city = ’Cedar Creek’ AND state = ’MD’.

310331043105310631073108310931103111311231133114311531163117311831193120312131223123312431253126312731283129313031313132313331343135313631373138313931403141314231433144314531463147314831493150315131523153315431553156315731583159316031613162316331643165316631673168


test=> SELECT * FROM friendtest-> WHERE firstname = ’Sandy’ AND lastname = ’Gleason’;firstname |lastname |city |state|age---------------+--------------------+---------------+-----+---Sandy |Gleason |Ocean City |NJ | 25(1 row)

Figure 4.11: WHERE test for Sandy Gleason

Another complex test would be to select people who are in the state of New Jersey (NJ) or Pennsylvania(PA). Such a comparison requires the use of OR. The test state = ’NJ’ OR state = ’PA’ would return thedesired rows, as shown in figure 4.12.

test=> SELECT * FROM friendtest-> WHERE state = ’NJ’ OR state = ’PA’ ORDER BY firstname;firstname |lastname |city |state|age---------------+--------------------+---------------+-----+---Dick |Gleason |Ocean City |NJ | 19Sandy |Gleason |Ocean City |NJ | 25Victor |Tabor |Williamsport |PA | 22(3 rows)

Figure 4.12: Friends in New Jersey and Pennsylvania

An unlimited number of ANDs and ORs can be linked together to perform complex comparison tests. WhenANDs are linked with other ANDs, there is no possibility for confusion. The same is true of ORs. However,when ANDs and ORs are both used in the same query, the results can be confusing. Figure 4.13 shows sucha case. You might suspect that it would return rows with firstname equal to Victor and state equals PA or

test=> SELECT * FROM friendtest-> WHERE firstname = ’Victor’ AND state = ’PA’ OR state = ’NJ’test-> ORDER BY firstname;firstname |lastname |city |state|age---------------+--------------------+---------------+-----+---Dick |Gleason |Ocean City |NJ | 19Sandy |Gleason |Ocean City |NJ | 25Victor |Tabor |Williamsport |PA | 22(3 rows)

Figure 4.13: Mixing ANDs and ORs

NJ. In fact, the query returns rows with firstname equal to Victor and state equals PA, or state equals NJ. Inthis case, AND is evaluated first, then OR. When mixing ANDs and ORs, it is best to collect the ANDs and ORsinto common groups using parentheses. Figure 4.14 shows the proper way to enter this query. Withoutparentheses, it is very difficult to understand a query with mixed ANDs and ORs.

316931703171317231733174317531763177317831793180318131823183318431853186318731883189319031913192319331943195319631973198319932003201320232033204320532063207320832093210321132123213321432153216321732183219322032213222322332243225322632273228322932303231323232333234

4.8. RANGE OF VALUES 27

test=> SELECT * FROM friendtest-> WHERE firstname = ’Victor’ AND (state = ’PA’ OR state = ’NJ’)test-> ORDER BY firstname;firstname |lastname |city |state|age---------------+--------------------+---------------+-----+---Victor |Tabor |Williamsport |PA | 22(1 rows)

Figure 4.14: Properly mixing ANDs and ORs

Comparison Operatorless than <less than or equal <=equal =greater than or equal >=greater than >not equal <> or !=

Table 4.2: Comparisons

4.8 Range of Values

Suppose we wanted to see all friends who had ages between 22 and 25. Figure 4.15 shows two queries thatproduce this result. The first query uses AND to perform two comparisons that both must be true. We used<= and >= so the age comparisons included the limiting ages of 22 and 25. If we used < and > the ages 22and 25 would not have been included in the output. The second query uses BETWEEN to generate the samecomparison. BETWEEN comparisons include the limiting values in the result.

4.9 LIKE Comparison

Greater-than and less-than comparisons are possible, as shown in table 4.2. Even more complex comparisonsare available. Users often need to compare character strings to see if they match a certain pattern. Forexample, sometimes they only want fields that begin with a certain letter, or contain a certain word. TheLIKE keyword allows such comparisons. The query in figure 4.16 returns rows where the firstname beginswith D. The percent (%) is interpreted to mean any characters can follow the D. The query performs the testfirstname LIKE ’D%’.

The test firstname LIKE ’%D%’ returns rows where firstname contains a D anywhere in the field, not justat the beginning. The effect of the having a % before and after a character allows the character to appearanywhere in the string.

More complex tests can be performed with LIKE, as shown in table 4.3. While percent (%) matches anunlimited number of characters, the underscore (_) matches only a single character. The underscore allowsany single character to appear in its position. To test if a field does not match a pattern, use NOT LIKE. Totest for an actual percent sign (%), use %%. An actual underscore (_) is tested with two underscores.

Attempting to find all character fields that end with a certain character can be difficult. For char()columns, like firstname, there are trailing spaces that make such trailing comparisons difficult with LIKE.Other character column types don’t use trailing spaces. Those can use the test colname LIKE ’%g’to find allrows that end with g. See section 9.2 for complete coverage on character data types.

323532363237323832393240324132423243324432453246324732483249325032513252325332543255325632573258325932603261326232633264326532663267326832693270327132723273327432753276327732783279328032813282328332843285328632873288328932903291329232933294329532963297329832993300


test=> SELECT *test-> FROM friendtest-> WHERE age >= 22 AND age <= 25;

firstname | lastname | city | state | age-----------------+----------------------+-----------------+-------+-----Dean | Yeager | Plymouth | MA | 24Sandy | Gleason | Ocean City | NJ | 25Victor | Tabor | Williamsport | PA | 22(3 rows)

test=> SELECT *test-> FROM friendtest-> WHERE age BETWEEN 22 AND 25;

firstname | lastname | city | state | age-----------------+----------------------+-----------------+-------+-----Dean | Yeager | Plymouth | MA | 24Sandy | Gleason | Ocean City | NJ | 25Victor | Tabor | Williamsport | PA | 22(3 rows)

Figure 4.15: Selecting a range of values

test=> SELECT * FROM friend WHERE firstname LIKE ’D%’;firstname |lastname |city |state|age---------------+--------------------+---------------+-----+---Dean |Yeager |Plymouth |MA | 24Dick |Gleason |Ocean City |NJ | 19(2 rows)

Figure 4.16: Firstname begins with D.

Comparison Operationbegins with D LIKE ’D%’contains a D LIKE ’%D%’has D in second position LIKE ’_D%’begins with D and contains e LIKE ’D%e%’begins with D, contains e, then f LIKE ’D%e%f%’begins with non-D NOT LIKE ’D%’

Table 4.3: LIKE comparison

330133023303330433053306330733083309331033113312331333143315331633173318331933203321332233233324332533263327332833293330333133323333333433353336333733383339334033413342334333443345334633473348334933503351335233533354335533563357335833593360336133623363336433653366

4.10. REGULAR EXPRESSIONS 29

4.10 Regular Expressions

Regular expressions allow more powerful comparisons than the more standard LIKE and NOT LIKE. Regularexpression comparisons are a unique feature of POSTGRESQL. They are very common in UNIX, such as in theUNIX grep command.5

Table 4.4 shows the regular expression operators and table 4.5 shows the regular expression special

Comparison Operatorregular expression ˜regular expression, case insensitive ˜*not equal to regular expression !ñot equal to regular expression, case insensitive !˜*

Table 4.4: Regular expression operators

characters. Note that the caret (ˆ) has a different meaning outside and inside square brackets ([ ]). While

Test Special Charactersstart ênd $any single character .set of characters [ccc]set of characters not equal [ˆccc]range of characters [c-c]range of characters not equal [ˆc-c]zero or one of previous character ?zero or multiple of previous characters *one or multiple of previous characters +OR operator |

Table 4.5: Regular expression special characters

regular expressions are powerful, they are complex to create. Table 4.6 shows some examples. Figure 4.17shows examples of queries using regular expressions. For a description, see the comment above each query.

Figure 4.18 shows two more complex regular expressions. The first query shows the way to properlytest for a trailing n. Because char() columns have trailing space to fill the column, you need to test for possibletrailing spaces. See section 9.2 for complete coverage on character data types. The second query might besurprising. Some think it returns rows that do not contain an S. Instead, the query returns all rows that haveany character that is not an S. Sandy contains characters that are not S, such as a, n, d, and y, so that row isreturned. The test would only prevent rows containing only S’s from being printed.

You can test for the literal characters listed in table 4.5. For example, to test for a dollar sign, use \$.To test for an asterisk, use \*. The backslash removes any special meaning from the character that followsit. To test for a literal backslash, use two backslashes, like \\. This is different from LIKE special characterliteral handling, where %% was used to test for a literal percent sign.

Because regular expressions have a powerful special character command set, creating them can bedifficult. Try some queries on the friend table until you are comfortable with regular expression comparisons.

5Actually, POSTGRESQL regular expressions are like egrep extended regular expressions.

336733683369337033713372337333743375337633773378337933803381338233833384338533863387338833893390339133923393339433953396339733983399340034013402340334043405340634073408340934103411341234133414341534163417341834193420342134223423342434253426342734283429343034313432


test=> SELECT * FROM friend ORDER BY firstname;firstname |lastname |city |state|age---------------+--------------------+---------------+-----+---Dean |Yeager |Plymouth |MA | 24Dick |Gleason |Ocean City |NJ | 19Ned |Millstone |Cedar Creek |MD | 27Sandy |Gleason |Ocean City |NJ | 25Sandy |Weber |Boston |MA | 33Victor |Tabor |Williamsport |PA | 22(6 rows)

test=> -- firstname begins with ’S’test=> SELECT * FROM friend WHERE firstname ˜ ’ˆS’ ORDER BY firstname;firstname |lastname |city |state|age---------------+--------------------+---------------+-----+---Sandy |Gleason |Ocean City |NJ | 25Sandy |Weber |Boston |MA | 33(2 rows)

test=> -- firstname has an e in the second positiontest=> SELECT * FROM friend WHERE firstname ˜ ’ˆ.e’ ORDER BY firstname;firstname |lastname |city |state|age---------------+--------------------+---------------+-----+---Dean |Yeager |Plymouth |MA | 24Ned |Millstone |Cedar Creek |MD | 27(2 rows)test=> -- firstname contains b, B, c or Ctest=> SELECT * FROM friend WHERE firstname ˜* ’[bc]’ ORDER BY firstname;firstname |lastname |city |state|age---------------+--------------------+---------------+-----+---Dick |Gleason |Ocean City |NJ | 19Victor |Tabor |Williamsport |PA | 22(2 rows)

test=> -- firstname does not contain s or Stest=> SELECT * FROM friend WHERE firstname !˜* ’s’ ORDER BY firstname;firstname |lastname |city |state|age---------------+--------------------+---------------+-----+---Dean |Yeager |Plymouth |MA | 24Dick |Gleason |Ocean City |NJ | 19Ned |Millstone |Cedar Creek |MD | 27Victor |Tabor |Williamsport |PA | 22(4 rows)

Figure 4.17: Regular expression sample queries

343334343435343634373438343934403441344234433444344534463447344834493450345134523453345434553456345734583459346034613462346334643465346634673468346934703471347234733474347534763477347834793480348134823483348434853486348734883489349034913492349334943495349634973498

4.10. REGULAR EXPRESSIONS 31

Test Operationbegins with D ˜ ’ˆD’contains D ˜ ’D’D in second position ˜ ’ˆ.D’begins with D and contains e ˜ ’ˆD.*e’begins with D, contains e, and then f ˜ ’D.*e.*f’contains A, B, C, or D ˜ ’[A-D]’ or ˜ ’[ABCD]’contains A or a ˜* ’a’ or ˜ ’[Aa]’does not contain D !˜ ’D’does not begin with D !˜ ’ˆD’ or ˜ ’ˆ[ˆD]’begins with D, with one optional leading space ˜ ’ˆ ?D’begins with D , with optional leading spaces ˜ ’ˆ *D’begins with D, with at least one leading space ˜ ’ˆ +D’ends with G, with optional trailing spaces ˜ ’G *$’

Table 4.6: Regular expression examples

test=> -- firstname ends with ntest=> SELECT * FROM friend WHERE firstname ˜ ’n *$’ ORDER BY firstname;firstname |lastname |city |state|age---------------+--------------------+---------------+-----+---Dean |Yeager |Plymouth |MA | 24(1 row)

test=> -- firstname contains a non-S charactertest=> SELECT * FROM friend WHERE firstname ˜ ’[ˆS]’ ORDER BY firstname;firstname |lastname |city |state|age---------------+--------------------+---------------+-----+---Dean |Yeager |Plymouth |MA | 24Dick |Gleason |Ocean City |NJ | 19Ned |Millstone |Cedar Creek |MD | 27Sandy |Gleason |Ocean City |NJ | 25Sandy |Weber |Boston |MA | 33Victor |Tabor |Williamsport |PA | 22(6 rows)

Figure 4.18: Complex regular expression queries

349935003501350235033504350535063507350835093510351135123513351435153516351735183519352035213522352335243525352635273528352935303531353235333534353535363537353835393540354135423543354435453546354735483549355035513552355335543555355635573558355935603561356235633564


4.11 CASE Clause

Many programming languages have conditional statements, stating if condition is true then do-something,else do-something-else. This allows execution of statements based on some condition. While SQL is not aprocedural programming language, it does allow conditional control over what data is returned from a query.The WHERE clause uses comparisons to control row selection. The CASE statement allows comparisons incolumn output. Figure 4.19 shows a query using CASE to create a new output column showing adult or minoras appropriate, based on the age field. Of course, the values adult and minor do not appear in the table friend.

test=> SELECT firstname,test-> age,test-> CASEtest-> WHEN age >= 21test-> THEN ’adult’test-> ELSE ’minor’test-> ENDtest-> FROM friend;firstname |age|case---------------+---+-----Dean | 24|adultDick | 19|minorNed | 27|adultSandy | 25|adultSandy | 33|adultVictor | 22|adult(6 rows)

Figure 4.19: CASE example

The CASE clause allows the creation of those conditional strings.

A more complex example is shown in figure 4.20. In this example, there are multiple WHEN clauses. TheAS clause is used to label the column with the word distance. Though I have shown only SELECT examples,CASE can be used in UPDATE and other complex situations. CASE allows the creation of conditional values,which can be used for output or for further processing in the same query. CASE values only exist inside asingle query, so they can’t be used outside the query that defines them.

4.12 Distinct Rows

It is often desirable to return the results of a query with no duplicates. The keyword DISTINCT preventsduplicates from being returned. Figure 4.21 shows the use of the DISTINCT keyword to prevent duplicatestates and duplicate city and state combinations. Notice DISTINCT operates only on the columns selected inthe query. It does not compare non-selected columns when determining uniqueness. Section 5.2 shows howcounts can be generated for each of the distinct values.

356535663567356835693570357135723573357435753576357735783579358035813582358335843585358635873588358935903591359235933594359535963597359835993600360136023603360436053606360736083609361036113612361336143615361636173618361936203621362236233624362536263627362836293630

4.13. FUNCTIONS AND OPERATORS 33

SELECT firstname,state,CASE

WHEN state = ’PA’ THEN ’close’WHEN state = ’NJ’ OR state = ’MD’ THEN ’far’ELSE ’very far’

END AS distanceFROM friend;firstname |state|distance---------------+-----+--------Dean |MA |very farDick |NJ |farNed |MD |farSandy |NJ |farSandy |MA |very farVictor |PA |close(6 rows)

Figure 4.20: Complex CASE example

4.13 Functions and Operators

There are a large number of functions and operators available in POSTGRESQL. Function calls take zero, one,or more arguments and return a single value. You can list all functions and their arguments using psql’s\df command. You can use psql’s \dd command to display comments about any specific function or group offunctions, as shown in figure 4.22.

Operators differ from functions in the following ways:

• Operators are symbols, not names

• Always take two arguments

• Arguments appear to the left and right of the operator symbol

For example, + is an operator that takes one argument on the left and one on the right, and returns their sum.Psql’s \do command lists all POSTGRESQL operators and their arguments. Figure 4.23 shows operator listingsand their use. The standard arithmetic operators: addition (+), subtraction (-), multiplication (*), division(/), modulo/remainder (%), and exponentiation (ˆ) honor the standard precedence rules. Exponentiationis done first, multiplication, division, and modulo are second, and addition and subtraction are performedlast. Parentheses can be used to alter this precedence. Other operators are evaluated left-to-right, unlessparentheses are present.

4.14 SET, SHOW, and RESET

The SET command allows the changing of various POSTGRESQL parameters. The changes remain in effectfor the duration of the database connection. Table 4.7 shows various parameters that can be controlled withSET.

DATESTYLE controls the appearance of dates when printed in psql as seen in table 4.8. It controls the

363136323633363436353636363736383639364036413642364336443645364636473648364936503651365236533654365536563657365836593660366136623663366436653666366736683669367036713672367336743675367636773678367936803681368236833684368536863687368836893690369136923693369436953696


test=> SELECT state FROM friend ORDER BY state;state-------MAMAMDNJNJPA(6 rows)

test=> SELECT DISTINCT state FROM friend ORDER BY state;state-------MAMDNJPA(4 rows)

test=> SELECT DISTINCT city, state FROM friend ORDER BY state, city;city | state

-----------------+-------Boston | MAPlymouth | MACedar Creek | MDOcean City | NJWilliamsport | PA(5 rows)

Figure 4.21: DISTINCT prevents duplicates

Function SET optionDatestyle DATESTYLE TO ’POSTGRES’|’SQL’|’ISO’|’GERMAN’|’US’|’NONEUROPEAN’|’EUROPEAN’Timezone TIMEZONE TO ’value’Client encodings CLIENT_ENCODING|NAMES TO encodingServer encodings SERVER_ENCODING TO encodingTransaction isolation level TRANSACTION ISOLATION LEVEL ’SERIALIZABLE’|’READ COMMITTED’Optimizer heap cost COST_HEAP TO #Optimizer index cost COST_INDEX TO #Genetic Query Optimizer (GEQO) GEQO TO ’ON[=#]’|’OFF’Key Set Query Optimizer (KSQO) KSQO TO ’ON’|’OFF’Enable trace flags PG_OPTIONS TO ’value’

Table 4.7: SET options

369736983699370037013702370337043705370637073708370937103711371237133714371537163717371837193720372137223723372437253726372737283729373037313732373337343735373637373738373937403741374237433744374537463747374837493750375137523753375437553756375737583759376037613762

4.14. SET, SHOW, AND RESET 35

test=> \dfList of functions

Result | Function | Arguments-----------+---------------------+-----------------------------_bpchar | _bpchar | _bpchar int4_varchar | _varchar | _varchar int4numeric | abs | numeric…

test=> \df intList of functions

Result | Function | Arguments---------+---------------+------------------------int4 | int | int4int2 | int2 | float4…

test=> \df upperList of functions

Result | Function | Arguments--------+----------+-----------text | upper | text(1 row)test=> \dd upper

Object descriptionsName | What | Description-------+----------+-------------upper | function | uppercase(1 row)

test=> SELECT upper(’jacket’);upper--------JACKET

test=> SELECT sqrt(2.0); -- square rootsqrt

--------------1.4142135624(1 row)

Figure 4.22: Function examples

376337643765376637673768376937703771377237733774377537763777377837793780378137823783378437853786378737883789379037913792379337943795379637973798379938003801380238033804380538063807380838093810381138123813381438153816381738183819382038213822382338243825382638273828


test=> \doList of operators

Op | Left arg | Right arg | Result-----+------------+------------+-----------! | int4 | | int4!! | | int4 | int4!!= | int4 | name | bool…

test=> \do /List of operators

Op | Left arg | Right arg | Result----+----------+-----------+----------/ | box | point | box/ | char | char | char…

test=> \do ˆList of operators

Op | Left arg | Right arg | Result----+----------+-----------+--------ˆ | float8 | float8 | float8(1 row)

test=> \dd Ôbject descriptions

Name | What | Description------+----------+----------------ˆ | operator | exponentiation(1 row)

test=> SELECT 2 + 3 ˆ 4;?column?----------

83(1 row)

Figure 4.23: Operator examples

382938303831383238333834383538363837383838393840384138423843384438453846384738483849385038513852385338543855385638573858385938603861386238633864386538663867386838693870387138723873387438753876387738783879388038813882388338843885388638873888388938903891389238933894

4.14. SET, SHOW, AND RESET 37

Output forStyle Optional Ordering February 1, 1983

POSTGRES us or NONEUROPEAN 02/01/1983POSTGRES EUROPEAN 01/02/1983SQL US or NONEUROPEAN 02-01-1983SQL EUROPEAN 01-02-1983ISO 1983-02-01German 01.02.1983

Table 4.8: DATESTYLE output

format (slashes, dashes, or year first), and the display of the month first (US) or day first (European). Thecommand SET DATESTYLE TO ’SQL,US’ would most likely be used by users in the USA, while Europeans mightprefer SET DATESTYLE TO ’POSTGRES,EUROPEAN’. The ISO DATESTYLE and GERMAN DATESTYLE are not affectedby any of the other options.

TIMEZONE defaults to the timezone of the server or the PGTZ environment variable. The psql client mightbe in a different timezone, and SET TIMEZONE allows this to be changed inside psql. TRANSACTION ISOLATION

will be covered in section 10.3.The other options are for developer use during debugging sessions. The SET manual page covers them

in more detail.The SHOW command is used to display current database session parameters. RESET allows session

parameters to be reset to their default values. Figure 4.24 shows an example of this.6

test=> SHOW DATESTYLE;NOTICE: DateStyle is Postgres with US (NonEuropean) conventionsSHOW VARIABLEtest=> SET DATESTYLE TO ’SQL, EUROPEAN’;SET VARIABLEtest=> SHOW DATESTYLE;NOTICE: DateStyle is SQL with European conventionsSHOW VARIABLEtest=> RESET DATESTYLE;RESET VARIABLEtest=> SHOW DATESTYLE;NOTICE: DateStyle is Postgres with US (NonEuropean) conventionsSHOW VARIABLE

Figure 4.24: SHOW and RESET examples

6Your site defaults may be different.

389538963897389838993900390139023903390439053906390739083909391039113912391339143915391639173918391939203921392239233924392539263927392839293930393139323933393439353936393739383939394039413942394339443945394639473948394939503951395239533954395539563957395839593960


396139623963396439653966396739683969397039713972397339743975397639773978397939803981398239833984398539863987398839893990399139923993399439953996399739983999400040014002400340044005400640074008400940104011401240134014401540164017401840194020402140224023402440254026

Chapter 5

SQL Aggregates

Users often require the ability to summarize database information. Instead of seeing all rows, they want justa count or total. This is called aggregation or gathering together. This chapter deals with POSTGRESQL’sability to generate summarized database information using aggregates.

5.1 Aggregates

There are five aggregates outlined in table 5.1. COUNT operates on entire rows. The others operate on

Aggregate FunctionCOUNT(*) count of rowsSUM(colname) column totalMAX(colname) column maximumMIN(colname) column minimumAVG(colname) column average

Table 5.1: Aggregates

specific columns. Figure 5.1 shows examples of aggregate queries.

Aggregates can be combined with the WHERE clause to produce more complex results. The query SELECTAVG(age) FROM friend WHERE age >= 21 computes the average age of people age 21 or older. This preventsDick Gleason from being included in the average computation because he is younger than 21. The columnlabel defaults to the name of the aggregate. You can use AS to change it, as shown in section 4.5.

NULLs are not processed by most aggregates, like MAX(), SUM(), and AVG(). If a column is NULL, it isskipped and the result is not affected by any NULL values. However, if a column contains only NULL values,the result is NULL, not zero. COUNT(*) is different. It does count NULLs because it is looking at entire rowsby using the asterisk(*). It is not looking at individual columns like the other aggregates. To find the COUNT

of all non-NULL values in a certain column, use COUNT(columnname).

Figure 5.2 illustrates aggregate handling of NULLs. First, a single row containing a NULL column is usedto show aggregates returning NULL results. Two versions of COUNT on a NULL column are shown. NoticeCOUNT never returns a NULL value. Then, a single non-NULL row is inserted, and the results shown. Noticethe AVG() of 3 and NULL is 3, not 1.5, illustrating the NULL is not involved in the average computation.

39

402740284029403040314032403340344035403640374038403940404041404240434044404540464047404840494050405140524053405440554056405740584059406040614062406340644065406640674068406940704071407240734074407540764077407840794080408140824083408440854086408740884089409040914092

40 CHAPTER 5. SQL AGGREGATES

test=> SELECT * FROM friend ORDER BY firstname;firstname |lastname |city |state|age---------------+--------------------+---------------+-----+---Dean |Yeager |Plymouth |MA | 24Dick |Gleason |Ocean City |NJ | 19Ned |Millstone |Cedar Creek |MD | 27Sandy |Gleason |Ocean City |NJ | 25Sandy |Weber |Boston |MA | 33Victor |Tabor |Williamsport |PA | 22(6 rows)

test=> SELECT COUNT(*) FROM friend;count-----

6(1 row)

test=> SELECT SUM(age) FROM friend;sum---150(1 row)

test=> SELECT MAX(age) FROM friend;max---33(1 row)

test=> SELECT MIN(age) FROM friend;min---19(1 row)

test=> SELECT AVG(age) FROM friend;avg---25(1 row)

Figure 5.1: Aggregate examples

409340944095409640974098409941004101410241034104410541064107410841094110411141124113411441154116411741184119412041214122412341244125412641274128412941304131413241334134413541364137413841394140414141424143414441454146414741484149415041514152415341544155415641574158

5.1. AGGREGATES 41

test=> CREATE TABLE aggtest (col INTEGER);CREATEtest=> INSERT INTO aggtest VALUES (NULL);INSERT 18826 1test=> SELECT SUM(col) FROM aggtest;sum-----

(1 row)

test=> SELECT MAX(col) FROM aggtest;max-----

(1 row)

test=> SELECT COUNT(*) FROM aggtest;count-------

1(1 row)

test=> SELECT COUNT(col) FROM aggtest;count-------

0(1 row)

test=> INSERT INTO aggtest VALUES (3);INSERT 18827 1test=> SELECT AVG(col) FROM aggtest;avg-----

3(1 row)

test=> SELECT COUNT(*) FROM aggtest;count-------

2(1 row)

test=> SELECT COUNT(col) FROM aggtest;count-------

1(1 row)

Figure 5.2: Aggregates and NULLs

415941604161416241634164416541664167416841694170417141724173417441754176417741784179418041814182418341844185418641874188418941904191419241934194419541964197419841994200420142024203420442054206420742084209421042114212421342144215421642174218421942204221422242234224


5.2 Using GROUP BY

Simple aggregates return one row as a result. It is often desirable to apply an aggregate to groups of rows.Queries using aggregates with GROUP BY have the aggregate applied to rows grouped by another column inthe table. For example, SELECT COUNT(*) FROM friend returns the total number of rows in the table. Thequery in figure 5.3 shows the use of GROUP BY to generate a count of the number of people in each state.COUNT(*) is not applied to the entire table at once. With GROUP BY, the table is split up into groups by state,and COUNT(*) is applied to each group.

test=> SELECT state, COUNT(*)test-> FROM friendtest-> GROUP BY state;state|count-----+-----MA | 2MD | 1NJ | 2PA | 1(4 rows)

test=> SELECT state, MIN(age), MAX(age), AVG(age)test-> FROM friendtest-> GROUP BY statetest-> ORDER BY 4 DESC;state|min|max|avg-----+---+---+---MA | 24| 33| 28MD | 27| 27| 27NJ | 19| 25| 22PA | 22| 22| 22(4 rows)

Figure 5.3: Aggregate with GROUP BY

The second query shows the minimum, maximum, and average age of the people in each state. It alsoshows an ORDER BY on the aggregate column. Because the column is the fourth column in the result, you canidentify the column by the number 4. Doing ORDER BY avg would have worked too. You can GROUP BY morethan one column, as shown in figure 5.4.

GROUP BY collects all NULL values into a single group. Psql’s \da command lists all the aggregatessupported by POSTGRESQL

5.3 Using HAVING

There is one more aggregate capability that is often overlooked. It is the HAVING clause. HAVING allowsyou to perform conditional tests on aggregate values. It is often used with GROUP BY. With HAVING, you caninclude or exclude groups based on the aggregate value for that group. For example, suppose you want toknow all the states where there is more than one friend. Looking at the first query in figure 5.3, you can seeexactly which states have more than one friend. HAVING allows you to programmatically test on the count

422542264227422842294230423142324233423442354236423742384239424042414242424342444245424642474248424942504251425242534254425542564257425842594260426142624263426442654266426742684269427042714272427342744275427642774278427942804281428242834284428542864287428842894290

5.4. QUERY TIPS 43

test=> SELECT city, state, COUNT(*)test-> FROM friendtest-> GROUP BY state, city;test-> ORDER BY 1, 2

city | state | count-----------------+-------+-------Boston | MA | 1Cedar Creek | MD | 1Ocean City | NJ | 2Plymouth | MA | 1Williamsport | PA | 1(5 rows)

Figure 5.4: GROUP BY on two columns

column, as shown in figure 5.5. Aggregates can’t be used in a WHERE clause. They are valid only inside

test=> SELECT state, COUNT(*)test-> FROM friendtest-> GROUP BY statetest-> HAVING COUNT(*) > 1;state|count-----+-----MA | 2NJ | 2(2 rows)

Figure 5.5: HAVING usage

HAVING.

5.4 Query Tips

In figures 5.3 and 5.5, the queries are spread over several lines. When a query has several clauses, likeFROM, WHERE, and GROUP BY, it is best to place each clause on a separate line. It makes queries easier tounderstand. Clear queries also use appropriate capitalization.

In a test database, it isn’t a problem if you make a mistake. In a live, production database, one incorrectquery can cause great difficulties. It takes five seconds to issue an erroneous query, and sometimes fivedays to recover from it. Double-check your queries before executing them. This is especially important forUPDATE, DELETE, and INSERT queries because they modify the database. Also, before performing UPDATE

or DELETE, do a SELECT or SELECT COUNT(*) with the same WHERE clause. Make sure the SELECT result isreasonable before doing the UPDATE or DELETE.

429142924293429442954296429742984299430043014302430343044305430643074308430943104311431243134314431543164317431843194320432143224323432443254326432743284329433043314332433343344335433643374338433943404341434243434344434543464347434843494350435143524353435443554356


435743584359436043614362436343644365436643674368436943704371437243734374437543764377437843794380438143824383438443854386438743884389439043914392439343944395439643974398439944004401440244034404440544064407440844094410441144124413441444154416441744184419442044214422

Chapter 6

Joining Tables

This chapter shows how to store data using multiple tables. Multi-table storage and multi-table queries arefundamental to relational databases.

We start this chapter with table and column references. These are important in multi-table queries.Then, we cover the advantages of splitting data across multiple tables. Next, we introduce an example basedon a mail order company, showing table creation, insertion, and queries using joins. Finally, we explorevarious join types.

6.1 Table and Column References

Before dealing with joins, there is one important feature that must be mentioned. Up to this point, all querieshave involved a single table. With multiple tables in a query, column names get confusing. Unless you arefamiliar with each table, it is difficult to know which column names belong to which tables. Sometimes twotables have the same column name. For these reasons, SQL allows you to fully qualify column names bypreceding the column name with the table name. An example of table name prefixing is shown in figure 6.1.The first query has unqualified column names. The second is the same query, but with fully qualified columnnames. A period separates the table name from the column name.

The final query shows another feature. Instead of specifying the table name, you can create a table aliasto take the place of the table name in the query. The alias name follows the table name in the FROM clause.In this example, f is used as an alias for the friend table. While these features are not important in singletable queries, they are useful in multi-table queries.

6.2 Joined Tables

In our friend example, splitting data into multiple tables makes little sense. However, in cases where we mustrecord information about a variety of things, multiple tables have benefits. Consider a company that sellsparts to customers through the mail. The database has to record information about many things: customers,employees, sales orders, and parts. It is obvious a single table cannot hold the different types of informationin an organized manner. Therefore, we create four tables: customer, employee, salesorder, and part. However,putting information in different tables causes problems. How do we record which sales orders belong to whichcustomers? How do we record the parts for the sales orders? How do we record which employee receivedthe sales order? The answer is to assign unique numbers to every customer, employee, and part. When wewant to record the customer in the salesorder table, we put the customer’s number in the salesorder table.When we want to record which employee took the order, we put the employee’s number in the salesordertable. When we want to record which part has been ordered, we put the part number in the salesorder table.

45

442344244425442644274428442944304431443244334434443544364437443844394440444144424443444444454446444744484449445044514452445344544455445644574458445944604461446244634464446544664467446844694470447144724473447444754476447744784479448044814482448344844485448644874488

46 CHAPTER 6. JOINING TABLES

test=> SELECT firstname FROM friend WHERE state = ’PA’;firstname---------------Victor(1 row)

test=> SELECT friend.firstname FROM friend WHERE friend.state = ’PA’;firstname---------------Victor(1 row)

test=> SELECT f.firstname FROM friend f WHERE f.state = ’PA’;firstname---------------Victor(1 row)

Figure 6.1: Qualified column names

Breaking up the information into separate tables allows us to keep detailed information about customers,employees, and parts. It also allows us to refer to those specific entries as many times as needed by using aunique number. This is illustrated in figure 6.2.

Customer Employee Part

Salesorder

Figure 6.2: Joining tables

People might question whether it is necessary to use separate tables. While not necessary, it is often agood idea. Without having a separate customer table, every piece of information about a customer would haveto be stored in the salesorder table every time a salesorder row was added. The customer’s name, telephonenumber, address, and other information would have to be repeated. Any change in customer information,like a change in telephone number, would have to be performed in all places that information is stored. Witha customer table, the information is stored in one place, and each salesorder points to the customer table. Thisis more efficient, and allows easier administration and data maintenance. The advantages of using multipletables are:

448944904491449244934494449544964497449844994500450145024503450445054506450745084509451045114512451345144515451645174518451945204521452245234524452545264527452845294530453145324533453445354536453745384539454045414542454345444545454645474548454945504551455245534554

6.3. CREATING JOINED TABLES 47

• Easier data modification

• Easier data lookup

• Data stored in only one place

• Less storage space required

The only time duplicate data need not be moved to a separate table is when all of these are true:

• Time required to perform a join is prohibitive.

• Data lookup is unnecessary.

• Duplicate data requires little storage space.

• Data is very unlikely to change.

The customer, employee, part, and salesorder example clearly benefits from multiple tables.1

6.3 Creating Joined Tables

Figure 6.3 shows the SQL statements needed to create those tables.2 The customer, employee, and part tableseach have a column to hold their unique identification numbers. The salesorder3 table has columns to holdthe customer, employee, and part numbers associated with the sales order. For the sake of simplicity, we willassume that each salesorder contains only one part number.

We have used underscores(_) to allow multiple words in column names, i.e. customer_id. This is common.You could enter the column as CustomerId, but POSTGRESQL converts all identifiers, like column and tablenames, to lowercase, so the actual column name becomes customerid, which is not very clear. You can’t putspaces in table or column names either unless you put double quotes(") around the name like "customer id".Double quotes also preserve any capitalization you supply. If you decide to use this feature, you have to putdouble quotes around the table or column name every time you reference it. This can be cumbersome.

Keep in mind that all table and column names not protected by double quotes should be made up of onlyletters, numbers, and the underscore character. Each name must start with a letter, not a number. Don’tuse punctuation, except underscore, in your names either. For example, address, office, and zipcode9 are validnames, while 2pair and my# are not.

The example also shows the existence of a column named customer_id in two tables. This is done becausethe two columns contain the same type of number, a customer identification number. Naming them the sameclearly shows which columns join the tables together. If you wanted to use unique names, you could name thecolumn salesorder_customer_id or sales_cust_id. This makes the column names unique, but still documentsthe columns to be joined.

Figure 6.4 shows the insertion of a row into the customer, employee, and part tables. It also shows theinsertion of a row into the salesorder table, using the same customer, employee, and part numbers to link thesalesorder row to the other rows we inserted.4 For simplicity, we will use only a single row per table.

1The process of distributing data across multiple tables to prevent redundancy is called data normalization.2In the real-world, the name columns would be much longer, perhaps char(60) or char(180). You should base the length on the

longest name you may ever wish to store. I am using short names so they display properly in the examples.3A table can not be called order. Order is a reserved keyword, for use in the ORDER BY clause. Reserved keywords are not

available as table or column names.4Technically, the column customer.customer_id is a primary key because it is the unique key for each customer row. The column

salesorder.customer_id is a foreign key because it points to another table’s primary key. This is covered in more detail in section 6.13.

455545564557455845594560456145624563456445654566456745684569457045714572457345744575457645774578457945804581458245834584458545864587458845894590459145924593459445954596459745984599460046014602460346044605460646074608460946104611461246134614461546164617461846194620


test=> CREATE TABLE customer (test(> customer_id INTEGER,test(> name CHAR(30),test(> telephone CHAR(20),test(> street CHAR(40),test(> city CHAR(25),test(> state CHAR(2),test(> zipcode CHAR(10),test(> country CHAR(20)test(> )\gCREATEtest=> CREATE TABLE employee (test(> employee_id INTEGER,test(> name CHAR(30),test(> hire_date DATEtest(> )\gCREATEtest=> CREATE TABLE part (test(> part_id INTEGER,test(> name CHAR(30),test(> cost NUMERIC(8,2)test(> )\gCREATEtest=> CREATE TABLE salesorder (test(> order_id INTEGER,test(> customer_id INTEGER, -- joins to customer.customer_idtest(> employee_id INTEGER, -- joins to employee.employee_idtest(> part_id INTEGER, -- joins to part.part_idtest(> order_date DATE,test(> ship_date DATE,test(> payment NUMERIC(8,2)test(> )\gCREATE

Figure 6.3: Creation of company tables

462146224623462446254626462746284629463046314632463346344635463646374638463946404641464246434644464546464647464846494650465146524653465446554656465746584659466046614662466346644665466646674668466946704671467246734674467546764677467846794680468146824683468446854686

6.3. CREATING JOINED TABLES 49

test=> INSERT INTO customer VALUES (test(> 648,test(> ’Fleer Gearworks, Inc.’,test(> ’1-610-555-782’,test(> ’830 Winding Way’,test(> ’Millersville’,test(> ’AL’,test(> ’35041’,test(> ’USA’test(> )\gINSERT 18838 1test=> INSERT INTO employee VALUES (test(> 24,test(> ’Lee Meyers’,test(> ’10/16/1989’test(> )\gINSERT 18839 1test=> INSERT INTO part VALUES (test(> 153,test(> ’Garage Door Spring’,test(> 18.39test(> )\gINSERT 18840 1test=> INSERT INTO salesorder VALUES(test(> 14673,test(> 648,test(> 24,test(> 153,test(> ’7/19/1994’,test(> ’7/28/1994’,test(> 18.39test(> )\gINSERT 18841 1

Figure 6.4: Insertion into company tables

468746884689469046914692469346944695469646974698469947004701470247034704470547064707470847094710471147124713471447154716471747184719472047214722472347244725472647274728472947304731473247334734473547364737473847394740474147424743474447454746474747484749475047514752


6.4 Performing Joins

With data spread across multiple tables, an important issue is how to retrieve the data. Figure 6.5 shows howto find the customer name for a given order number. It uses two queries. The first gets the customer_id for

test=> SELECT customer_id FROM salesorder WHERE order_id = 14673\gcustomer_id-----------

648(1 row)

test=> SELECT name FROM customer WHERE customer_id = 648\gname

--------------------------------Fleer Gearworks, Inc.(1 row)

Figure 6.5: Finding customer name using two queries

order number 14673. The user then uses the returned customer identification number of 648 in the WHERE

clause of the next query. That query finds the customer name record where the customer_id equals 648.We can call this two query approach a manual join, because the user manually took the result from the firstquery and placed that number into the WHERE clause of the second query.

Fortunately, relational databases can perform this join automatically. Figure 6.6 shows the same join asfigure 6.5 but in a single query. This query shows all the elements necessary to perform the join of two

test=> SELECT customer.name -- returnstest-> FROM customer, salesorder -- tables involvedtest-> WHERE customer.customer_id = salesorder.customer_id AND -- jointest-> salesorder.order_id = 14673\g -- restriction

name--------------------------------Fleer Gearworks, Inc.(1 row)

Figure 6.6: Finding customer name using one query

tables:

• The two tables involved in the join are specified in the FROM clause.

• The two columns needed to perform the join are specified as equal in the WHERE clause.

• The salesorder table’s order number is tested in the WHERE clause.

• The customer table’s customer name is returned from the SELECT.

Internally, the database performs the join by:

• salesorder.order_id = 14673: Find that row in the salesorder table

475347544755475647574758475947604761476247634764476547664767476847694770477147724773477447754776477747784779478047814782478347844785478647874788478947904791479247934794479547964797479847994800480148024803480448054806480748084809481048114812481348144815481648174818

6.5. THREE AND FOUR TABLE JOINS 51

• salesorder.customer_id = customer.customer_id: From the row just found, get the customer_id. Findthe equal customer_id in the customer table.

• customer.name: Return name from the customer table.

You can see the database is performing the same steps as our manual join, but much faster.Notice that figure 6.6 qualifies each column name by prefixing it with the table name, as discussed in

section 6.1. While such prefixing is optional in many cases, in this example it is required because the columncustomer_id exists in both tables mentioned in the FROM clause, customer and salesorder. If this were notdone, the query would generate an error: ERROR: Column ’customer_id’ is ambiguous.

You can perform the join in the opposite direction too. In the previous query, the order number is supplied,and the customer name is returned. In figure 6.7, the customer name is supplied, and the order numberreturned. I have switched the order of items in the FROM clause and in the WHERE clause. The ordering of

test=> SELECT salesorder.order_idtest-> FROM salesorder, customertest-> WHERE customer.name = ’Fleer Gearworks, Inc.’ ANDtest-> salesorder.customer_id = customer.customer_idtest-> \gorder_id----------

14673(1 row)

Figure 6.7: Finding order number for customer name

items is not important in those clauses.

6.5 Three and Four Table Joins

You can perform a three-table join as shown in figure 6.8. The first printed column is the customer name.

test=> SELECT customer.name, employee.nametest-> FROM salesorder, customer, employeetest-> WHERE salesorder.customer_id = customer.customer_id ANDtest-> salesorder.employee_id = employee.employee_id ANDtest-> salesorder.order_id = 14673

name | name--------------------------------+--------------------------------Fleer Gearworks, Inc. | Lee Meyers(1 row)

Figure 6.8: Three-table join

The second column is the employee name. Both columns are labeled name. You could use AS to give thecolumns unique labels. Figure 6.9 shows a four-table join, using AS to make each column label unique. Thefour-table join matches the arrows in figure 6.2, with the arrows of the salesorder table pointing to the otherthree tables.

481948204821482248234824482548264827482848294830483148324833483448354836483748384839484048414842484348444845484648474848484948504851485248534854485548564857485848594860486148624863486448654866486748684869487048714872487348744875487648774878487948804881488248834884


test=> SELECT customer.name AS customer_name,test-> employee.name AS employee_name,test-> part.name AS part_nametest-> FROM salesorder, customer, employee, parttest-> WHERE salesorder.customer_id = customer.customer_id ANDtest-> salesorder.employee_id = employee.employee_id ANDtest-> salesorder.part_id = part.part_id ANDtest-> salesorder.order_id = 14673

customer_name | employee_name | part_name--------------------------------+--------------------------------+---------------------Fleer Gearworks, Inc. | Lee Meyers | Garage Door Spring(1 row)

Figure 6.9: Four-table join

Joins can be performed among tables that are only indirectly related. Suppose you wish to find employeeswho have taken orders for each customer. Figure 6.10 shows such a query. Notice that the query displays justthe customer and employee tables. The salesorder table is used to join the two tables but is not displayed. TheDISTINCT keyword is used because multiple orders taken by the same employee for the same customer wouldmake that employee appear more than once, which was not desired. The second query uses an aggregate toreturn a count for each unique customer, employee pair.

Up to this point, we have had only a single row in each table. As an exercise, add additional customer,employee, and part rows, and add salesorder rows that join to these new entries. You can use figure 6.4 as anexample. You can use any identification numbers you wish. Try the queries already shown in this chapterwith your new data.

6.6 Additional Join Possibilities

At this point, all joins have involved the salesorder table in some form. Suppose we wanted to assign anemployee to manage each customer account. If we add an employee_id column to the customer table, thecolumn could store the identification number of the employee assigned to manage the customer’s account.Figure 6.11 shows how to perform the join between customer and employee tables. The first query finds theemployee name assigned to manage customer number 648. The second query shows the customer namesmanaged by employee 24. Notice the salesorder table is not involved in this query.

Suppose you wanted to assign an employee to be responsible for answering detailed questions aboutparts. Add an employee_id column to the part table, place valid employee identifiers in the column, andperform similar queries as shown in figure 6.12. Adding columns to existing tables is covered in section 13.2.

There are cases where a join could be performed with the state column. For example, to check statecodes for validity5, a statecode table could be created with all valid state codes. An application could checkthe state code entered by the user, and report an error if the state code is not in the statecode table. Anotherexample would be the need to print the full state name in queries. State names could be stored in a separatetable and joined when the full state name is desired. Figure 6.13 shows an example of a statename table. Thisshows two more uses for additional tables:

• Check codes against a list of valid values, i.e. only allow valid state codes5The United States Postal Service has assigned a unique two-letter code to each U.S. state.

488548864887488848894890489148924893489448954896489748984899490049014902490349044905490649074908490949104911491249134914491549164917491849194920492149224923492449254926492749284929493049314932493349344935493649374938493949404941494249434944494549464947494849494950

6.6. ADDITIONAL JOIN POSSIBILITIES 53

test=> SELECT DISTINCT customer.name, employee.nametest-> FROM customer, employee, salesordertest-> WHERE customer.customer_id = salesorder.customer_id andtest-> salesorder.employee_id = employee.employee_idtest-> ORDER BY customer.name, employee.nametest-> \g

name | name--------------------------------+--------------------------------Fleer Gearworks, Inc. | Lee Meyers(1 row)

test=> SELECT DISTINCT customer.name, employee.name, COUNT(*)test-> FROM customer, employee, salesordertest-> WHERE customer.customer_id = salesorder.customer_id andtest-> salesorder.employee_id = employee.employee_idtest-> GROUP BY customer.name, employee.nametest-> ORDER BY customer.name, employee.nametest-> \g

name | name | count--------------------------------+--------------------------------+-------Fleer Gearworks, Inc. | Lee Meyers | 1(1 row)

Figure 6.10: Employees who have taken orders for customers.

test=> SELECT employee.nametest-> FROM customer, employeetest-> WHERE customer.employee_id = employee.employee_id ANDtest-> customer.customer_id = 648

test=> SELECT customer.nametest-> FROM customer, employeetest-> WHERE customer.employee_id = employee.employee_id ANDtest-> employee.employee_id = 24test-> ORDER BY customer.name

Figure 6.11: Joining customer and employee

495149524953495449554956495749584959496049614962496349644965496649674968496949704971497249734974497549764977497849794980498149824983498449854986498749884989499049914992499349944995499649974998499950005001500250035004500550065007500850095010501150125013501450155016


test=> -- find the employee assigned to part number 14673test=> SELECT employee.nametest-> FROM part, employeetest-> WHERE part.employee_id = employee.employee_id ANDtest-> part.part_id = 153

test=> -- find the parts assigned to employee 24test=> SELECT part.nametest-> FROM part, employeetest-> WHERE part.employee_id = employee.employee_id ANDtest-> employee.employee_id = 24test-> ORDER BY name

Figure 6.12: Joining part and employee

test=> CREATE TABLE statename (state CHAR(2),test(> name CHAR(30)test(> )\gCREATEtest=> INSERT INTO statename VALUES (’AL’, ’Alabama’);INSERT 18934 1…

test=> SELECT statename.name AS customer_statenametest-> FROM customer, statenametest-> WHERE customer.customer_id = 648 ANDtest-> customer.state = statename.state

Figure 6.13: Statename table

501750185019502050215022502350245025502650275028502950305031503250335034503550365037503850395040504150425043504450455046504750485049505050515052505350545055505650575058505950605061506250635064506550665067506850695070507150725073507450755076507750785079508050815082

6.7. CHOOSING A JOIN KEY 55

• Store code descriptions, i.e. state code and state name

6.7 Choosing a Join Key

The join key is the value used to link entries between tables. For example, in figure 6.4, 648 is the customerkey, appearing in the customer table to uniquely identify the row, and in the salesorder table to refer to thatspecific customer row.

Some people might question whether an identification number is needed. Should the customer name beused as a join key? Using the customer name as the join key is not good because:

• Numbers are less likely to be entered incorrectly.

• Two customers with the same name would be impossible to distinguish in a join.

• If the customer name changes, all references to that name would have to change.

• Numeric joins are more efficient than long character string joins.

• Numbers require less storage than characters strings.

In the statename table, the two-letter state code is probably a good join key because:

• Two letter codes are easy for users to remember and enter.

• State codes are always unique.

• State codes do not change.

• Short two-letter codes are not significantly slower than integers in joins.

• Two-letter codes do not require significantly more storage than integers.

There are basically two choices for join keys, identification numbers and short character codes. If an itemis referred to repeatedly, it is best to use a short character code as a join key. You can display this key tousers and allow them to refer to customers and employees using codes. Users prefer to identify items byshort, fixed-length character codes containing numbers and letters. For example, customers can be identifiedby six-character codes, FLE001, employees by their initials, BAW, and parts by five-character codes, E7245.Codes are easy to use and remember. In many cases, users can choose the codes, as long as they are unique.

It is possible to allow users to enter short character codes and still use identification numbers as joinkeys. This is done by adding a code column to the table. For the customer table, a new column called code canbe added to hold the customer code. When the user enters a customer code, the query can find the customerid assigned to the customer code, and use that customer id in joins with other tables. Figure 6.14 shows aquery using a customer code to find all order numbers for that customer.

test=> SELECT order_idtest-> FROM customer, salesordertest-> WHERE customer.code = ’FLE001’ ANDtest-> customer.customer_id = salesorder.customer_id

Figure 6.14: Using a customer code

In some cases, identification numbers are fine and codes unnecessary:

508350845085508650875088508950905091509250935094509550965097509850995100510151025103510451055106510751085109511051115112511351145115511651175118511951205121512251235124512551265127512851295130513151325133513451355136513751385139514051415142514351445145514651475148


• Items with short lifespans, i.e. order numbers

• Items without appropriate codes, i.e. payroll batch numbers

• Items used internally and not referenced by users

Defining codes for such values would be useless. It is better to allow the database to assign a unique numberto each item. The next chapter covers database support for assigning unique identifiers.

There is no universal rule about when to choose codes or identification numbers. U.S. states are clearlybetter keyed on codes, because there are only 50 U.S. states, the codes are short, unique, and well known bymost users. At the other extreme, order numbers are best used without codes because there are too manyof them and codes would be of little use.

6.8 One-to-Many Joins

Up to this point, when two tables were joined, one row in the first table matched exactly one row in thesecond table. making the joins one-to-one joins. Imagine if there were more than one salesorder row fora customer id. Multiple order numbers would be printed. That would be a one-to-many join, where onecustomer row joins to more than one salesorder row. Suppose there were no orders made by a customer.Even though there was a valid customer row, if there were no salesorder row for that customer identificationnumber, no rows would be returned. We could call that a one-to-none join.6

Figure 6.15 shows an example. Because the animal table’s 507 rabbit row join to three rows in thevegetable table, the rabbit row is duplicated three times in the output. This is a one-to-many join. There isno join for the 508 cat row in vegetable table, so the 508 cat row does not appear in the output. This is anexample of a one-to-none join.

6.9 Unjoined tables

When joining tables, it is necessary to join each table mentioned in the FROM clause by specifying joins in theWHERE clause. If you list a table name in the FROM clause, but fail to join it in the WHERE clause, the effectis to mark that table as unjoined. This causes it to be paired with every row in the query result. Figure 6.16illustrates this effect using tables from firgure 6.15. The SELECT does not join any column from animal to anycolumn in vegetable, causing every value in animal to be paired with every value in vegetable. This effect iscalled a Cartesian product and is usually not intended. When a query returns many more rows than expected,look for an unjoined table in the query.

6.10 Table Aliases and Self-Joins

In section 6.1, you saw how to refer to specific tables in the FROM clause using a shorter name. Figure 6.17shows a rewrite of the query in figure 6.14 using aliases. A c is used as an alias for the customer table, and sis used as an alias for the salesorder table. Table aliases are handy in these cases.

However, with table aliases, you can even join a table to itself. Such joins are called self-joins. Thesame table is given two different alias names. Each alias then represents a different instance of the table.This might seem like a concept of questionable utility, but it can prove useful. Figure 6.18 shows practicalexamples.

6Many database servers support a special type of join called an outer join that allows non-joined data to appear in the query.Unfortunately, POSTGRESQL does not support outer joins at this time.

514951505151515251535154515551565157515851595160516151625163516451655166516751685169517051715172517351745175517651775178517951805181518251835184518551865187518851895190519151925193519451955196519751985199520052015202520352045205520652075208520952105211521252135214

6.10. TABLE ALIASES AND SELF-JOINS 57

test=> SELECT * FROM animal;animal_id | name-----------+-----------------

507 | rabbit508 | cat

(2 rows)

test=> SELECT * FROM vegetable;animal_id | name-----------+-----------------

507 | lettuce507 | carrot507 | nuts

(3 rows)

test=> SELECT *test-> FROM animal, vegetabletest-> WHERE animal.animal_id = vegetable.animal_id;animal_id | name | animal_id | name-----------+-----------------+-----------+-----------------

507 | rabbit | 507 | lettuce507 | rabbit | 507 | carrot507 | rabbit | 507 | nuts

(3 rows)

Figure 6.15: One-to-many join

test=> SELECT *test-> FROM animal, vegetable;animal_id | name | animal_id | name-----------+-----------------+-----------+-----------------

507 | rabbit | 507 | lettuce508 | cat | 507 | lettuce507 | rabbit | 507 | carrot508 | cat | 507 | carrot507 | rabbit | 507 | nuts508 | cat | 507 | nuts

(6 rows)

Figure 6.16: Unjoined tables

test=> SELECT order_idtest-> FROM customer c, salesorder stest-> WHERE c.code = ’FLE001’ ANDtest-> c.customer_id = s.customer_id

Figure 6.17: Using table aliases

521552165217521852195220522152225223522452255226522752285229523052315232523352345235523652375238523952405241524252435244524552465247524852495250525152525253525452555256525752585259526052615262526352645265526652675268526952705271527252735274527552765277527852795280


test=> SELECT c2.nametest-> FROM customer c, customer c2test-> WHERE c.customer_id = 648 ANDtest-> c.zipcode = c2.zipcode

test=> SELECT c2.name, s.order_idtest-> FROM customer c, customer c2, salesorder stest-> WHERE c.customer_id = 648 ANDtest-> c.zipcode = c2.zipcode ANDtest-> c2.customer_id = s.customer_id ANDtest-> c2.customer_id <> 648test-> \g

test=> SELECT c2.name, s.order_id, p.nametest-> FROM customer c, customer c2, salesorder s, part ptest-> WHERE c.customer_id = 648 ANDtest-> c.zipcode = c2.zipcode ANDtest-> c2.customer_id = s.customer_id ANDtest-> s.part_id = p.part_id ANDtest-> c2.customer_id <> 648

Figure 6.18: Examples of self-joins using table aliases

The first query finds all customers in the same zipcode as customer number 648. The second finds allcustomers in the same zipcode as customer number 648. It then finds the order numbers placed by thosecustomers. We have restricted the c2 table’s customer identification number to not equal 648 because wedon’t want customer 648 to appear in the result. The third query goes farther by retrieving the part numbersassociated with those orders.

6.11 Non-Equijoins

Equijoins are the most common type of join. They use equality comparisons (=) to join tables. Figure 6.19shows our first non-equijoin. The first query is a non-equijoin because it uses a not-equal (<>) comparisonto perform the join. It returns all customers not in the same country as customer number 648. The secondquery uses less-than (<) to perform the join. Instead of finding equal values to join, all rows greater-than thecolumn’s value are joined. The query returns the all employees hired after employee number 24. The thirdquery uses greater-than (>) in a similar way. The query returns all parts that cost less than part number153. Non-equijoins are not used very often, but there are certain queries that can only be performed usingnon-equijoins.

6.12 Ordering Multiple Parts

Our customer, employee, part, and salesorder example has a serious limitation. It allows only one part perorder. In the real world, this would never be acceptable. Having covered many complex join topics in thischapter, a more complete database layout can be created to allow multiple parts per order.

528152825283528452855286528752885289529052915292529352945295529652975298529953005301530253035304530553065307530853095310531153125313531453155316531753185319532053215322532353245325532653275328532953305331533253335334533553365337533853395340534153425343534453455346

6.12. ORDERING MULTIPLE PARTS 59

test=> SELECT c2.nametest-> FROM customer c, customer c2test-> WHERE c.customer_id = 648 ANDtest-> c.country <> c2.countrytest-> ORDER BY c2.nametest-> \g

test=> SELECT e2.name, e2.hire_datetest-> FROM employee e, employee e2test-> WHERE e.employee_id = 24 ANDtest-> e.hire_date < e2.hire_datetest-> ORDER BY e2.hire_date, e2.nametest-> \g

test=> SELECT p2.name, p2.costtest-> FROM part p, part p2test-> WHERE p.part_id = 153 ANDtest-> p.cost > p2.costtest-> ORDER BY p2.cost

Figure 6.19: Non-equijoins

Figure 6.20 shows a new version of the salesorder table. Notice that the part_id column has been removed.The customer, employee, and part tables remain unchanged.

test=> CREATE TABLE salesorder (test(> order_id INTEGER,test(> customer_id INTEGER, -- joins to customer.customer_idtest(> employee_id INTEGER, -- joins to employee.employee_idtest(> order_date DATE,test(> ship_date DATE,test(> payment NUMERIC(8,2)test(> )\gCREATE

Figure 6.20: New salesorder table for multiple parts per order

Figure 6.21 shows a new table, orderpart. This table is needed because the original salesorder table couldhold only one part number per order. Instead of putting the part_id in the salesorder table, the orderpart tablewill hold one row for each part number ordered. If five part numbers are in order number 15398, there willbe five rows in the orderpart table with order_id equal to 15398.

We have also added a quantity column. If someone orders seven of the same part number, we put onlyone row in the orderpart table, but set the quantity field equal to 7. We have used DEFAULT to set the quantityto one if no quantity is supplied.

Notice there is no price field in the orderpart table. This is because the price is stored in the part table.Anytime the price is needed, a join is performed to get the price. This allows a part’s price to be changed in

534753485349535053515352535353545355535653575358535953605361536253635364536553665367536853695370537153725373537453755376537753785379538053815382538353845385538653875388538953905391539253935394539553965397539853995400540154025403540454055406540754085409541054115412


one place, and all references to it automatically updated.7

test=> CREATE TABLE orderpart(test(> order_id INTEGER,test(> part_id INTEGER,test(> quantity INTEGER DEFAULT 1test(> )\gCREATE

Figure 6.21: Orderpart table

This new table layout illustrates the master / detail use of tables. The salesorder table is the mastertable because it holds information common to each order, such as customer and employee identifiers, andorder date. The orderpart table is the detail table because it contains the specific parts making up the order.Master/detail tables are a common use of multiple tables.

Figure 6.22 shows a variety of queries using the new orderpart table. The queries are of increasingcomplexity. The first query already contains the order number of interest, so there is no reason to use thesalesorder table. It goes directly to the orderpart table to find the parts making up the order, and joins to thepart table for part descriptions. The second query does not have the order number. It only has the customerid and order date. It must use the salesorder table to find the order number, and then join to the orderpartand part tables to get order quantities and part information. The third query does not have the customer id,but instead must join to the customer table to get the customer_id for use with the other tables. Notice eachquery displays more columns to the user. The final query computes the total cost of the order. It uses anaggregate to SUM cost times (*) quantity for each part in the order.

6.13 Primary and Foreign Keys

A join is performed by comparing two columns, like customer.customer_id and salesorder.customer_id. Cus-tomer.customer_id is called a primary key because it is the unique (primary) identifier for the customer table.Salesorder.customer_id is called a foreign key because it holds a key to another (foreign) table.

6.14 Summary

Previous chapters covered query tasks. This chapter dealt with technique — the technique of creating anorderly data layout using multiple tables. Acquiring this skill takes practice. Expect to redesign your firsttable layout many times as you improve it.

Good data layout can make your job easier. Bad data layout can make queries a nightmare. As you createyour first real-world tables, you will soon learn to identify good and bad data designs. Continually reviewyour table structures. Refer to this chapter again for ideas. Don’t be afraid to redesign everything. Redesignis hard, but when it is done properly, queries are much easier to craft.

Relational databases excel in their ability to relate and compare data. Tables can be joined and analyzedin ways never anticipated. With good data layout and the power of SQL, you can retrieve an unlimited amountof information from your database.

7In our example, changing part.price would change the price on previous orders of the part. This would be inaccurate. In thereal-world, there would have to be a partprice table to store the part number, price, and effective date for the price.

541354145415541654175418541954205421542254235424542554265427542854295430543154325433543454355436543754385439544054415442544354445445544654475448544954505451545254535454545554565457545854595460546154625463546454655466546754685469547054715472547354745475547654775478

6.14. SUMMARY 61

test=> -- first querytest=> SELECT part.nametest-> FROM orderpart, parttest-> WHERE orderpart.part_id = part.part_id ANDtest-> orderpart.order_id = 15398

test=> -- second querytest=> SELECT part.name, orderpart.quantitytest-> FROM salesorder, orderpart, parttest-> WHERE salesorder.customer_id = 648 ANDtest-> salesorder.order_date = ’7/19/1994’ ANDtest-> salesorder.order_id = orderpart.order_id ANDtest-> orderpart.part_id = part.part_id

test=> -- third querytest=> SELECT part.name, part.cost, orderpart.quantitytest-> FROM customer, salesorder, orderpart, parttest-> WHERE customer.name = ’Fleer Gearworks, Inc.’ ANDtest-> salesorder.order_date = ’7/19/1994’ ANDtest-> salesorder.customer_id = customer.customer_id ANDtest-> salesorder.order_id = orderpart.order_id ANDtest-> orderpart.part_id = part.part_id

test=> -- fourth querytest=> SELECT SUM(part.cost * orderpart.quantity)test-> FROM customer, salesorder, orderpart, parttest-> WHERE customer.name = ’Fleer Gearworks, Inc.’ ANDtest-> salesorder.order_date = ’7/19/1994’ ANDtest-> salesorder.customer_id = customer.customer_id ANDtest-> salesorder.order_id = orderpart.order_id ANDtest-> orderpart.part_id = part.part_id

Figure 6.22: Queries involving orderpart table

547954805481548254835484548554865487548854895490549154925493549454955496549754985499550055015502550355045505550655075508550955105511551255135514551555165517551855195520552155225523552455255526552755285529553055315532553355345535553655375538553955405541554255435544


554555465547554855495550555155525553555455555556555755585559556055615562556355645565556655675568556955705571557255735574557555765577557855795580558155825583558455855586558755885589559055915592559355945595559655975598559956005601560256035604560556065607560856095610

Chapter 7

Numbering Rows

Unique identification numbers and short character codes allow reference to specific rows in a table. Theywere used extensively in the previous chapter. The customer table had a customer_id column that held aunique identification number for each customer. The employee and part tables had similar uniquely numberedcolumns. Those columns were important for joins to those tables

While unique character codes must be supplied by users, unique row numbers can be generated auto-matically using two methods. This chapter shows how to uniquely number rows in POSTGRESQL.

7.1 Object Identification Numbers (OIDs)

Every row in POSTGRESQL is assigned a unique, normally invisible number called an object identificationnumber or OID. When the software is initialized with initdb,1 a counter is created and set to approximatelyseventeen-thousand.2 The counter is used to uniquely number every row. Databases can be created anddestroyed, but the counter continues to increase. The counter is used by all databases, so object identificationnumbers are always unique. No two rows in any table or in any database have the same object id.3

You have seen object identification numbers already. Object identification numbers are displayed afterevery INSERT statement. If you look back at figure 3.4 on page 12, you will see the line INSERT 18720 1.INSERT is the command that was executed, 18720 is the object identification number assigned to the insertedrow, and 1 is the number of rows inserted. A similar line appears after every INSERT statement. Figure 6.4on page 49 shows sequential object identification numbers assigned by consecutive INSERT statements.

Normally, a row’s object identification number is displayed only by INSERT queries. However, if the OID

is specified by a non-INSERT query, it will be displayed, as shown in figure 7.1. The SELECT has accessed thenormally invisible OID column. The OID displayed by the INSERT and the OID displayed by the SELECT are thesame.

Even though no OID column is mentioned in CREATE TABLE statements, every POSTGRESQL table hasan invisible column called OID. The column only appears if you specifically access it.4 The query SELECT *FROM table_name does not display the OID column. SELECT OID, * FROM table_name will display it.

Object identification numbers can be used as primary and foreign key values in joins. Since every rowhas a unique object id, there is no need for a separate column to hold the row’s unique number.

For example, in the previous chapter there was a column called customer.customer_id. This column heldthe customer number. It uniquely identified each row. However, we could have used the row’s object

1See section B.3 for a description of initdb.2Values less than this are reserved for internal use.3Technically, OID’S are unique among all databases sharing a common data directory tree.4There are several other invisible columns. The POSTGRESQL manuals cover their meaning and use.

63

561156125613561456155616561756185619562056215622562356245625562656275628562956305631563256335634563556365637563856395640564156425643564456455646564756485649565056515652565356545655565656575658565956605661566256635664566556665667566856695670567156725673567456755676

64 CHAPTER 7. NUMBERING ROWS

test=> CREATE TABLE oidtest(age INTEGER);CREATEtest=> INSERT INTO oidtest VALUES (7);INSERT 18697 1test=> SELECT oid, age from oidtest;oid | age

-------+-----18697 | 7(1 row)

Figure 7.1: OID test

identification number as the unique number for each row. Then, there would be no need to create the columncustomer.customer_id. Customer.oid would be the unique customer number.

With this change, a similar change would be needed in the salesorder table. We would rename salesor-der.customer_id to salesorder.customer_oid because the column now refers to an OID. The column type shouldbe changed also. Salesorder.customer_id was defined as type INTEGER. The new salesorder.customer_oidcolumn would hold the OID of the customer who made the order. For this reason, we would change thecolumn type from INTEGER to OID. Figure 7.2 shows a new version of the salesorder table using each row’sOID as a join key.

test=> CREATE TABLE salesorder (test(> order_id INTEGER,test(> customer_oid OID, -- joins to customer.oidtest(> employee_oid OID, -- joins to employee.oidtest(> part_oid OID, -- joins to part.oid…

Figure 7.2: Columns with OIDs

A column of type OID is similar to an INTEGER column, but defining it as type OID documents that thecolumn holds OID values. Don’t confuse a column of type OID with a the column named OID. Every row has anormally invisible column named OID. A row can have zero, one, or more user-defined columns of type OID.

A column of type OID is not automatically assigned any special value from the database. Only the columnnamed OID is specially assigned during INSERT.

Also, the order_id column in the salesorder table could be eliminated. The salesorder.oid column couldrepresent the unique order number.

7.2 Object Identification Number Limitations

Object identification numbers have some limitations:

• Object identification numbers are normally not sequential in a table.

• Once assigned during INSERT, object identification numbers cannot be modified.

• By default, backups do not record the normally invisible column named OID.

567756785679568056815682568356845685568656875688568956905691569256935694569556965697569856995700570157025703570457055706570757085709571057115712571357145715571657175718571957205721572257235724572557265727572857295730573157325733573457355736573757385739574057415742

7.3. SEQUENCES 65

The global nature of object identification assignment means most OIDs in a table are not sequential. Forexample, if you insert a customer today, and another one tomorrow, the two customers will not get sequentialOIDs. The two customer OIDs could differ by thousands. This is because INSERTs into other tables betweenthe two customer inserts increment the object counter. If the OID is not visible to users, this is not a problem.Non-sequential numbering does not affect query processing. However, if users see and enter these numbers,it might seem strange customer identification numbers are not sequential and have large gaps in numbering.

An OID is assigned to every row during INSERT. UPDATE cannot modify the system-generated OID of arow.

When performing database backups, the system-generated OID of each row is normally not backed up. Aflag must be added to enable the backup of OIDs. See section 18.2 for details.

7.3 Sequences

POSTGRESQL has another way of uniquely numbering rows. They are called sequences. Sequences are namedcounters created by users. After creation, the sequence can be assigned to a table as a column default. Usingsequences, unique numbers can be automatically assigned during INSERT.

The advantage of sequences is that there are no gaps in numeric assignment, as happens with OIDs.5

Sequences are ideal as user-visible identification numbers. If a customer is created today, and anothertomorrow, the two customers will have sequential numbers. This is because no other table shares thesequence counter.

Sequence numbers are unique only within a single table. For example, if a table has a unique rownumbered 937, another table might have a row numbered 937 also, assigned by a different sequence counter.

7.4 Creating Sequences

Sequences are not created automatically like OIDs. You must create sequences using the CREATE SEQUENCE

command. Three functions control the sequence counter. They are listed in table 7.1.

Function Actionnextval(’name’) Returns the next available sequence number, and updates the countercurrval(’name’) Returns the sequence number from the previous nextval() callsetval(’name’,newval) Sets the sequence number counter to the specified value

Table 7.1: Sequence number access functions

Figure 7.3 shows an example of sequence creation and sequence function usage. The first commandcreates the sequence. Then, various sequence functions are called. Note the SELECTs do not have a FROM

clause. Sequence function calls are not directly tied to any table. This figure shows that:

• nextval() returns ever increasing values

• currval() returns the previous sequence value without incrementing

• setval() sets the sequence counter to a new value

Currval() returns the sequence number assigned by a prior nextval() call in the current session. It is notaffected by nextval() calls of other users. This allows reliable retrieval of nextval()-assigned values in latterqueries.

5This is not completely true. Gaps can occur if a query is assigned a sequence number as part of an aborted transaction. Seesection 10.2 for a description of aborted transactions.

574357445745574657475748574957505751575257535754575557565757575857595760576157625763576457655766576757685769577057715772577357745775577657775778577957805781578257835784578557865787578857895790579157925793579457955796579757985799580058015802580358045805580658075808


test=> CREATE SEQUENCE functest_seq;CREATEtest=> SELECT nextval(’functest_seq’);nextval---------

1(1 row)

test=> SELECT nextval(’functest_seq’);nextval---------

2(1 row)

test=> SELECT currval(’functest_seq’);currval---------

2(1 row)

test=> SELECT setval(’functest_seq’, 100);setval--------

100(1 row)

test=> SELECT nextval(’functest_seq’);nextval---------

101(1 row)

Figure 7.3: Examples of sequence function use

580958105811581258135814581558165817581858195820582158225823582458255826582758285829583058315832583358345835583658375838583958405841584258435844584558465847584858495850585158525853585458555856585758585859586058615862586358645865586658675868586958705871587258735874

7.5. USING SEQUENCES TO NUMBER ROWS 67

7.5 Using Sequences to Number Rows

Configuring a sequence to uniquely number rows involves several steps:

• Create the sequence.

• Create the table, defining nextval() as the column default.

• During INSERT, do not supply a value for the sequenced column, or use nextval().

Figure 7.4 shows the use of a sequence for unique row numbering in the customer table. The first state-

test=> CREATE SEQUENCE customer_seq;CREATEtest=> CREATE TABLE customer (test(> customer_id INTEGER DEFAULT nextval(’customer_seq’),test(> name CHAR(30)test(> );CREATEtest=> INSERT INTO customer VALUES (nextval(’customer_seq’), ’Bread Makers’);INSERT 19004 1test=> INSERT INTO customer (name) VALUES (’Wax Carvers’);INSERT 19005 1test=> INSERT INTO customer (name) VALUES (’Pipe Fitters’);INSERT 19008 1test=> SELECT * FROM customer;customer_id | name-------------+--------------------------------

1 | Bread Makers2 | Pipe Fitters3 | Wax Carvers

(3 rows)

Figure 7.4: Numbering customer rows using a sequence

ment creates a sequence counter named customer_seq. The second command creates the customer table,and defines nextval(’customer_seq’) as the default for the customer_id column. The first INSERT manuallysupplies the sequence value for the column. The nextval(’customer_seq’) function call will return the nextavailable sequence number, and increment the sequence counter. The second and third INSERTs allow thenextval(’customer_seq’) DEFAULT be used for the customer_id column. Remember, a column’s DEFAULT valueis used only when a value is not supplied by an INSERT statement. This is covered in section 4.4. The SELECT

shows the sequence has sequentially numbered the customer rows.

7.6 Serial Column Type

There is an easier way to use sequences. If you define a column of type SERIAL, a sequence will beautomatically created, and a proper DEFAULT assigned to the column. Figure 7.5 shows an example of this.Do not be concerned about the index lines in the figure. Indexing is covered in section 11.1.

587558765877587858795880588158825883588458855886588758885889589058915892589358945895589658975898589959005901590259035904590559065907590859095910591159125913591459155916591759185919592059215922592359245925592659275928592959305931593259335934593559365937593859395940


test=> CREATE TABLE customer (test(> customer_id SERIAL,test(> name CHAR(30)test(> );NOTICE: CREATE TABLE will create implicit sequence ’customer_customer_id_-seq’ for SERIAL column ’customer.customer_id’NOTICE: CREATE TABLE/UNIQUE will create implicit index ’customer_customer_id_-key’ for table ’customer’CREATEtest=> \d customer

Table "customer"Attribute | Type | Extra

-------------+----------+------------------------------------------------------------customer_id | int4 | not null default nextval(’customer_customer_id_seq’::text)name | char(30) |Index: customer_customer_id_keytest=> INSERT INTO customer (name) VALUES (’Car Wash’);INSERT 19152 1test=> SELECT * FROM customer;customer_id | name-------------+--------------------------------

1 | Car Wash(1 row)

Figure 7.5: Customer table using SERIAL

594159425943594459455946594759485949595059515952595359545955595659575958595959605961596259635964596559665967596859695970597159725973597459755976597759785979598059815982598359845985598659875988598959905991599259935994599559965997599859996000600160026003600460056006

7.7. MANUALLY NUMBERING ROWS 69

7.7 Manually Numbering Rows

Some people wonder why OIDs and sequences are needed. Why can’t a database user just find the highestnumber in use, add one, and use that as the new unique row number? There are several reasons why OIDsand sequences are preferred:

• Performance

• Concurrency

• Standardization

First, it is usually slow to scan all numbers currently in use to find the next available number. Using acounter in a separate location is faster. Second, there is the problem of concurrency. If one user gets thehighest number, and another user is looking for the highest number at the same time, the two users mightchoose the same next available highest number. Of course, if this happens, the number would not be unique.Such concurrency problems do not occur when using OIDs or sequences. Third, it is more reliable to usedatabase-supplied unique number generation than to generate unique numbers manually.

7.8 Summary

Both OIDs and sequences allow the automatic unique numbering of rows. OIDs are always created andnumbered, while sequences require more work to configure. Both are valuable tools for uniquely numberingrows.

600760086009601060116012601360146015601660176018601960206021602260236024602560266027602860296030603160326033603460356036603760386039604060416042604360446045604660476048604960506051605260536054605560566057605860596060606160626063606460656066606760686069607060716072


607360746075607660776078607960806081608260836084608560866087608860896090609160926093609460956096609760986099610061016102610361046105610661076108610961106111611261136114611561166117611861196120612161226123612461256126612761286129613061316132613361346135613661376138

Chapter 8

Combining SELECTs

This book has covered various topics like regular expressions, aggregates, and joins. These are powerfulSQL features that allow the construction of complex queries. However, in some cases, even these tools arenot enough. This chapter shows how SELECTs can be combined to create even more powerful queries.

8.1 UNION, EXCEPT, INTERSECT Clauses

Sometimes a single SELECT statement can not produce the desired result. UNION, EXCEPT, and INTERSECT

allow SELECT statements to be chained together, allowing more complex queries to be constructed.For example, suppose we want to output the friend table’s firstname and lastname in the same column.

Normally two queries would be required, one for each column. However, with UNION, the output of twoSELECTs can be combined in a single query, as shown in figure 8.1. The query combines two columns into a

test=> SELECT firstnametest-> FROM friendtest-> UNIONtest-> SELECT lastnametest-> FROM friendtest-> ORDER BY 1;

firstname----------------------DeanDickGleasonMillstoneNedSandyTaborVictorWeberYeager(10 rows)

Figure 8.1: Combining two columns with UNION

single output column.

71

613961406141614261436144614561466147614861496150615161526153615461556156615761586159616061616162616361646165616661676168616961706171617261736174617561766177617861796180618161826183618461856186618761886189619061916192619361946195619661976198619962006201620262036204

72 CHAPTER 8. COMBINING SELECTS

UNION allows an unlimited number of SELECT statements to be combined to produce a single result.Each SELECT must return the same number of columns. If the first SELECT returns two columns, the otherSELECTs must return two columns. The column types must be similar also. If the first SELECT returns anINTEGER value in the first column, the other SELECTs must return an INTEGER in their first columns.

With UNION, an ORDER BY clause can be used only at the end of the last SELECT. The ordering applies tothe output of the entire query. In the previous figure 8.1, the ORDER BY clause specifies the ordering columnby number. Instead of a number, we could use ORDER BY firstname because UNION’s output labels are thesame as the column labels of the first SELECT.

As another example, suppose we have two tables that hold information about various animals. One tableholds information about aquatic animals, and another contains information about terrestrial animals. Twoseparate tables are used because each table records information specific to a class of animal. The aquatic_-animal table holds information meaningful only for aquatic animals, like preferred water temperature. Theterrestrial_animal table holds information meaningful only for terrestrial animals, like running speed. Wecould have put the animals in the same table, but it was clearer to keep them separate. In most cases, wedeal with the animal types separately.

However, suppose we need to list all the animals, both aquatic and terrestrial. There is no single SELECT

that will show animals from both tables. We can not join the tables because there is no join key. Joining isnot desired. We want rows from the terrestrial_animal table and the aquatic_animal table output together ina single column. Figure 8.2 shows how these two tables can be combined with UNION.

test=> INSERT INTO terrestrial_animal (name) VALUES (’tiger’);INSERT 19122 1test=> INSERT INTO aquatic_animal (name) VALUES (’swordfish’);INSERT 19123 1test=> SELECT nametest-> FROM aquatic_animaltest-> UNIONtest-> SELECT nametest-> FROM terrestrial_animal;

name--------------------------------swordfishtiger(2 rows)

Figure 8.2: Combining two tables with UNION

By default, UNION prevents duplicate rows from being displayed. For example, figure 8.3 inserts penguininto both tables. However, penguin is not duplicated in the output. To preserve duplicates, you must useUNION ALL, as shown in figure 8.4.

You can do more complex things when chaining SELECTs. EXCEPT allows all rows to be returned from thefirst SELECT except rows that also appear in the second SELECT. Figure 8.5 shows an EXCEPT query. While theaquatic_animal table contains swordfish and penguin, the query returns only swordfish. Penguin is excludedfrom the output because it is returned by the second query. While UNION adds rows to the first SELECT,EXCEPT subtracts rows from the first SELECT.

INTERSECT returns only rows generated by all SELECTs. Figure 8.6 uses INTERSECT and displays onlypenguin. While several animals are returned by the two SELECTs, only penguin is returned by both SELECTs.

Any number of SELECTs can be linked using these methods. The previous examples allowed multiple

620562066207620862096210621162126213621462156216621762186219622062216222622362246225622662276228622962306231623262336234623562366237623862396240624162426243624462456246624762486249625062516252625362546255625662576258625962606261626262636264626562666267626862696270

8.1. UNION, EXCEPT, INTERSECT CLAUSES 73

test=> INSERT INTO aquatic_animal (name) VALUES (’penguin’);INSERT 19124 1test=> INSERT INTO terrestrial_animal (name) VALUES (’penguin’);INSERT 19125 1test=> SELECT nametest-> FROM aquatic_animaltest-> UNIONtest-> SELECT nametest-> FROM terrestrial_animal;

name--------------------------------penguinswordfishtiger(3 rows)

Figure 8.3: UNION with duplicates

test=> SELECT nametest-> FROM aquatic_animaltest-> UNION ALLtest-> SELECT nametest-> FROM terrestrial_animal;

name--------------------------------swordfishpenguintigerpenguin(4 rows)

Figure 8.4: UNION ALL with duplicates

test=> SELECT nametest-> FROM aquatic_animaltest-> EXCEPTtest-> SELECT nametest-> FROM terrestrial_animal;

name--------------------------------swordfish(1 row)

Figure 8.5: EXCEPT restricts output from the first SELECT

627162726273627462756276627762786279628062816282628362846285628662876288628962906291629262936294629562966297629862996300630163026303630463056306630763086309631063116312631363146315631663176318631963206321632263236324632563266327632863296330633163326333633463356336


test=> SELECT nametest-> FROM aquatic_animaltest-> INTERSECTtest-> SELECT nametest-> FROM terrestrial_animal;

name--------------------------------penguin(1 row)

Figure 8.6: INTERSECT returns only duplicated rows

columns to populate a single result column. Without the ability to chain SELECTs using UNION, EXCEPT, andINTERSECT, it would be impossible to generate the desired results. SELECT chaining can do other sophisticatedthings, like joining a column to one table in the first SELECT, and joining the same column to another table inthe second SELECT.

8.2 Subqueries

Subqueries are similar to SELECT chaining. While SELECT chaining combines SELECTs on the same level in aquery, subqueries allow SELECTs to be embedded inside other queries. Subqueries can:

• Take the place of a constant in a comparison

• Take the place of a constant yet vary based on the row being processed

• Return a list of values for use in a comparison

Subqueries as Constants

A subquery, also called a subselect, can take the place of a constant in a query. Though a constant neverchanges, a subquery’s value is recomputed every time the query is executed.

As an example, we will use the friend table shown in figure 8.7. Suppose we want to find friends whoare not in the same state as Dick Gleason. We could place his state in the query using the constant string’NJ’, but if he moves to another state, the query would have to be changed. Using his state column is morereliable.

The figure shows two ways to generate the correct result. One query uses a self-join to do the comparisonto Dick Gleason’s state. The last query uses a subquery which returns his state as ’NJ’. This value is used bythe upper query. The subquery has taken the place of a constant. Unlike a constant, the value is recomputedevery time the query is executed.

Though we have used table aliases in the subquery for clarity, they are not required. A column namewith no table specification is automatically paired with a table in the current subquery. If no matching tableis found in the current subquery, higher parts of the query are searched for a match. State, firstname, andlastname in the subquery refer to the instance of the friend table in the subquery. The same column namesin the upper query automatically refer to the friend instance in the upper query. If a column name matchestwo tables in the same subquery, an error is returned indicating the column is ambiguous.

Subqueries can eliminate table joins also. For example, consider the parts order company in figures 6.3and 6.4 on page 48. To find the customer name for order number 14673, we join the salesorder and customer

633763386339634063416342634363446345634663476348634963506351635263536354635563566357635863596360636163626363636463656366636763686369637063716372637363746375637663776378637963806381638263836384638563866387638863896390639163926393639463956396639763986399640064016402

8.2. SUBQUERIES 75

test=> SELECT * FROM friend;firstname | lastname | city | state | age

-----------------+----------------------+-----------------+-------+-----Dean | Yeager | Plymouth | MA | 24Dick | Gleason | Ocean City | NJ | 19Ned | Millstone | Cedar Creek | MD | 27Sandy | Gleason | Ocean City | NJ | 25Sandy | Weber | Boston | MA | 33Victor | Tabor | Williamsport | PA | 22(6 rows)

test=> SELECT f1.firstname, f1.lastname, f1.statetest-> FROM friend f1, friend f2test-> WHERE f1.state <> f2.state ANDtest-> f2.firstname = ’Dick’ ANDtest-> f2.lastname = ’Gleason’test-> ORDER BY firstname, lastname;

firstname | lastname | state-----------------+----------------------+-------Dean | Yeager | MANed | Millstone | MDSandy | Weber | MAVictor | Tabor | PA(4 rows)

test=> SELECT f1.firstname, f1.lastname, f1.statetest-> FROM friend f1test-> WHERE f1.state <> (test(> SELECT f2.statetest(> FROM friend f2test(> WHERE f2.firstname = ’Dick’ ANDtest(> f2.lastname = ’Gleason’test(> )test-> ORDER BY firstname, lastname;

firstname | lastname | state-----------------+----------------------+-------Dean | Yeager | MANed | Millstone | MDSandy | Weber | MAVictor | Tabor | PA(4 rows)

Figure 8.7: Friends not in Dick Gleason’s state

640364046405640664076408640964106411641264136414641564166417641864196420642164226423642464256426642764286429643064316432643364346435643664376438643964406441644264436444644564466447644864496450645164526453645464556456645764586459646064616462646364646465646664676468


tables. This is shown as the first query in figure 8.8. The second query does not have a join, but instead gets

test=> SELECT nametest-> FROM customer, salesordertest-> WHERE customer.customer_id = salesorder.customer_id ANDtest-> salesorder.order_id = 14673;


test=> SELECT nametest-> FROM customertest-> WHERE customer.customer_id = (test(> SELECT salesorder.customer_idtest(> FROM salesordertest(> WHERE order_id = 14673test(> );


Figure 8.8: Subqueries can replace some joins

the customer_id from a subquery. In general, if a table is involved in only one join, and no columns from thetable appear in the query result, the join can be eliminated and the table moved to a subquery.

In this example, we have specified salesorder.customer_id and customer.customer_id to clearly indicate thetables being referenced. However, this is not required. We could have used only customer_id in both places.POSTGRESQL finds the first table in the same subquery or higher that contains a matching column name.

Subqueries can be used anywhere a computed value is needed. A subquery has its own FROM and WHERE

clauses. It can have its own aggregates, GROUP BY, and HAVING. Its only interaction with the upper query isthe value it returns. This allows sophisticated comparisons that would be difficult if the subquery’s clauseshad to be combined with those of the upper query.

Subqueries as Correlated Values

While subqueries can act as constants in queries, subqueries can also act as correlated values. Correlatedvalues vary based on the row being processed. A normal subquery is evaluated once and its value used bythe upper query. In a correlated subquery, the subquery is evaluated repeatedly for every row processed.

For example, suppose you want to know the name of your oldest friend in each state. You can do thiswith HAVING and table aliases, as shown in the first query of figure 8.9. Another way is to execute a subqueryfor each row which finds the maximum age for that state. If the maximum age equals the age of the currentrow, the row is output, as shown in the second query. The query references the friend table two times, usingaliases f1 and f2. The upper query uses f1. The subquery uses f2. The correlating specification is WHEREf1.state = f2.state. This makes it a correlated subquery because the subquery references a column fromthe upper query. Such a subquery cannot be evaluated once and the same result used for all rows. It mustbe evaluated for every row because the upper column value can change.

646964706471647264736474647564766477647864796480648164826483648464856486648764886489649064916492649364946495649664976498649965006501650265036504650565066507650865096510651165126513651465156516651765186519652065216522652365246525652665276528652965306531653265336534

8.2. SUBQUERIES 77

test=> SELECT f1.firstname, f1.lastname, f1.agetest-> FROM friends f1, friends f2test-> WHERE f1.state = f2.statetest-> GROUP BY f2.state, f1.firstname, f1.lastname, f1.agetest-> HAVING f1.age = max(f2.age)test-> ORDER BY firstname, lastname;

firstname | lastname | age-----------------+----------------------+-----Ned | Millstone | 27Sandy | Gleason | 25Sandy | Weber | 33Victor | Tabor | 22(4 rows)

test=> SELECT f1.firstname, f1.lastname, f1.agetest-> FROM friends f1test-> WHERE age = (test(> SELECT MAX(f2.age)test(> FROM friends f2test(> WHERE f1.state = f2.statetest(> )test-> ORDER BY firstname, lastname;

firstname | lastname | age-----------------+----------------------+-----Ned | Millstone | 27Sandy | Gleason | 25Sandy | Weber | 33Victor | Tabor | 22(4 rows)

Figure 8.9: Correlated subquery

653565366537653865396540654165426543654465456546654765486549655065516552655365546555655665576558655965606561656265636564656565666567656865696570657165726573657465756576657765786579658065816582658365846585658665876588658965906591659265936594659565966597659865996600


Subqueries as List of Values

The previous subqueries returned one row of data to the upper query. If any of the previous subqueriesreturned more than one row, an error would be generated: ERROR: More than one tuple returned by asubselect used as an expression. However, it is possible to use subqueries returning multiple rows.

Normal comparison operators like equal and less-than expect a single value on the left and on theright. Two special comparisons, IN and NOT IN, allow multiple values to appear on the right-hand side. Forexample, the test col IN (1,2,3,4) compares col against four values. If col equals any of the four values,the comparison will return true. The test col NOT IN (1,2,3,4) will will return true if col does not equal anyof the four values.

An unlimited number of values can be specified on the right-hand side of an IN or NOT IN comparison. Inaddition, instead of constants, a subquery can be placed on the right-hand side. The subquery can returnmultiple rows. The subquery is evaluated, and its output used like a list of constant values.

Suppose we want all employees who took sales orders on a certain date. We could perform the query twoways. We could join the employee and salesorder tables, as shown in the first query of figure 8.10. The second

test=> SELECT DISTINCT employee.nametest-> FROM employee, salesordertest-> WHERE employee.employee_id = salesorder.employee_id ANDtest-> salesorder.order_date = ’7/19/1994’;

name--------------------------------Lee Meyers(1 row)

test=> SELECT nametest-> FROM employeetest-> WHERE employee_id IN (test(> SELECT employee_idtest(> FROM salesordertest(> WHERE order_date = ’7/19/1994’test(> );

name--------------------------------Lee Meyers(1 row)

Figure 8.10: Employees who took orders

query uses a subquery. The subquery is evaluated, and generates a list of values used by IN to performthe comparison. The subquery is possible because the salesorder table is involved in a single join, and nocolumns from the salesorder table are returned by the query.

A NOT IN comparison returns true if a column’s value is not found. For example, suppose we want to seeall customers who have never ordered a product. We need to find the customers who have no sales orders.This cannot be done with a join. We need an anti-join, because we want to find all customer rows that donot join to any salesorder row. Figure 8.11 shows the query. The subquery returns a list of customer_idsrepresenting all customers who have placed orders. The upper query returns all customer names where thecustomer_id does not appear in the subquery output.

660166026603660466056606660766086609661066116612661366146615661666176618661966206621662266236624662566266627662866296630663166326633663466356636663766386639664066416642664366446645664666476648664966506651665266536654665566566657665866596660666166626663666466656666

8.2. SUBQUERIES 79

test=> SELECT nametest-> FROM customertest-> WHERE customer_id NOT IN (test(> SELECT customer_idtest(> FROM salesordertest(> );name------(0 rows)

Figure 8.11: Customers who have no orders

NOT IN and Subqueries with NULLs

If a NOT IN subquery returns a NULL row, the NOT IN comparison always returns false. This is becauseNOT IN requires the upper column to be not equal to every value returned by the subquery. Every inequalitycomparison must return true. However, all comparisons with NULL return false, even inequality comparisons,so NOT IN returns false. NULL comparisons are covered in section 4.3.

We can prevent NULLs from reaching the upper query by adding IS NOT NULL to the subquery. As anexample, in figure 8.11, if there were any NULL customer_id values, the query would return no rows. We canprevent this by adding WHERE customer_id IS NOT NULL to the subquery.

An IN subquery does not have this problem with NULLs because IN will return true if it finds any trueequality comparison. NOT IN must find all inequality comparison to be true.

There is another way to analyze subqueries returning NULLs. Suppose a subquery returns three rows,1, 2, and NULL. The test uppercol NOT IN (subquery) expands to uppercol NOT IN (1,2, NULL). This furtherexpands to uppercol <> 1 AND uppercol <> 2 AND uppercol <> NULL. The last comparison with NULL is falsebecause all comparisons with NULL are false, even not equal comparisons. AND returns false if any of itscomparisons return false. Therefore, the NOT IN comparison returns false.

If the test used IN, the comparison would be uppercol = 1 OR uppercol = 2 OR uppercol = NULL. Whilethe last comparison is false, OR will return true if any of the comparisons is true. It does not require them allto be true like AND.

Subqueries Returning Multiple Columns

Most subqueries return a single column to the upper query. However, it is possible to handle subqueriesreturning more than one column. For example, the test WHERE (7, 3) IN (SELECT col1, col2 FROM subtable)returns true if the subquery returns a row with 7 in the first column, and 3 in the second column. The test WHERE(uppercol1, uppercol2) IN (SELECT col1, col2 FROM subtable) performs equality comparisons between theupper two columns and the subquery’s two columns. This allows multiple columns in the upper query to becompared with multiple columns in the subquery. Of course, the number of values specified on the left of IN

or NOT IN must be the same as the number of columns returned by the subquery.

ANY, ALL, and EXISTS Clauses

IN and NOT IN are special cases of the more generic subquery clauses ANY, ALL, and EXISTS. ANY will returntrue if the comparison operator is true for any value in the subquery. The test col < ANY(5,7,9) returns trueif col is less than any of the three values. ALL requires all subquery values to compare as true, so col <

666766686669667066716672667366746675667666776678667966806681668266836684668566866687668866896690669166926693669466956696669766986699670067016702670367046705670667076708670967106711671267136714671567166717671867196720672167226723672467256726672767286729673067316732


ALL(5,7,9) returns true if col is less than all three values. IN is the same as = ANY, and NOT IN is the sameas <> ALL.

Normally, you can use operators like equal and greater-than only with subqueries returning one row.With ANY and ALL, comparisons can be made with subqueries returning multiple rows. They allow you tospecify whether any or all of the subquery values must compare as true.

EXISTS returns true if the subquery returns any rows, and NOT EXISTS returns true if the subquery returnsno rows. By using a correlated subquery, EXISTS allows complex comparisons of upper query values insidethe subquery. For example, two upper query variables can be compared in the subquery’s WHERE clause.EXISTS and NOT EXISTS do not allow values on their left-hand sides, so it does not matter which columns arereturned by the subquery.

For example, figure 8.12 shows the IN subquery from figure 8.10 and the query rewritten using ANY andEXISTS. Notice the EXISTS subquery uses a correlated subquery to join the employee_id columns of the two

SELECT nameFROM employeeWHERE employee_id IN (

SELECT employee_idFROM salesorderWHERE order_date = ’7/19/1994’);

SELECT nameFROM employeeWHERE employee_id = ANY (

SELECT employee_idFROM salesorderWHERE order_date = ’7/19/1994’);

SELECT nameFROM employeeWHERE EXISTS (

SELECT employee_idFROM salesorderWHERE salesorder.employee_id = employee.employee_id AND

order_date = ’7/19/1994’);

Figure 8.12: IN query rewritten using ANY and EXISTS

tables. Figure 8.13 shows the NOT IN query from figure 8.11 and the query rewritten using ALL and NOT

EXISTS.

Summary

A subquery can represent a fixed value, a correlated value, or a list of values. An unlimited number ofsubqueries can be used. Subqueries can be nested inside other subqueries.

673367346735673667376738673967406741674267436744674567466747674867496750675167526753675467556756675767586759676067616762676367646765676667676768676967706771677267736774677567766777677867796780678167826783678467856786678767886789679067916792679367946795679667976798

8.3. OUTER JOINS 81

SELECT nameFROM customerWHERE customer_id NOT IN (

SELECT customer_idFROM salesorder);

SELECT nameFROM customerWHERE customer_id <> ALL (

SELECT customer_idFROM salesorder);

SELECT nameFROM customerWHERE NOT EXISTS (

SELECT customer_idFROM salesorderWHERE salesorder.customer_id = customer.customer_id

);

Figure 8.13: NOT IN query rewritten using ALL and EXISTS

In some cases, subqueries simply allow an additional way to phrase a query. In others, a subquery is theonly way to produce the desired result.

8.3 Outer Joins

An outer join is like a normal join, except special handling is performed to prevent unjoined rows frombeing suppressed in the result. For example, in the join customer.customer_id = salesorder.customer_id, onlycustomers that have sales orders appear in the result. If a customer has no sales orders, he is suppressedfrom the output. However, if the salesorder table is used in an outer join, the result will include all customers.The customer and salesorder tables are joined and output, plus one row for every unjoined customer is output.In the query, any reference to salesorders columns for these unjoined customers returns NULL.

As of POSTGRESQL 7.0, outer joins are not supported. They can be simulated using subqueries and UNION

ALL, as shown in figure 8.14. The first SELECT performs a normal join of the customer and salesorder tables.The second SELECT displays all customer who have no orders, and displays NULL as their order number.

8.4 Subqueries in Non-SELECT Queries

Subqueries can be used in UPDATE and DELETE statements also. Figure 8.15 shows two examples. Thefirst query deletes all customers who have no sales orders. The second query sets the ship_date equal to’11/16/96’ for all orders made by customer Fleer Gearworks, Inc. The numbers after DELETE and UPDATE

indicate the number of rows affected by the queries.

679968006801680268036804680568066807680868096810681168126813681468156816681768186819682068216822682368246825682668276828682968306831683268336834683568366837683868396840684168426843684468456846684768486849685068516852685368546855685668576858685968606861686268636864


SELECT name, order_idFROM customer, salesorderWHERE customer.customer_id = salesorder.customer_idUNION ALLSELECT name, NULLFROM customerWHERE customer.customer_id NOT IN (SELECT customer_id FROM salesorder)ORDER BY name

Figure 8.14: Simulating outer joins

test=> DELETE FROM customertest-> WHERE customer_id NOT IN (test(> SELECT customer_idtest(> FROM salesordertest(> );DELETE 0test=> UPDATE salesordertest-> SET ship_date = ’11/16/96’test-> WHERE customer_id = (test(> SELECT customer_idtest(> FROM customertest(> WHERE name = ’Fleer Gearworks, Inc.’test(> );UPDATE 1

Figure 8.15: Subqueries with UPDATE and DELETE

686568666867686868696870687168726873687468756876687768786879688068816882688368846885688668876888688968906891689268936894689568966897689868996900690169026903690469056906690769086909691069116912691369146915691669176918691969206921692269236924692569266927692869296930

8.5. UPDATE WITH FROM 83

8.5 UPDATE with FROM

UPDATE can have an optional FROM clause, which allows joins to other tables. The FROM clause also allowsthe use of columns from other tables in the SET clause. With this capability, columns can be updated withdata from other tables.

Suppose we want to update the salesorder table’s order_date column. For some reason, some orders existin the system that have order_dates earlier than the hire_date of the employee who recorded the sale. Forthese rows, we wish to set the order_date equal to the employee’s hire_date. Figure 8.16 shows this query.

test=> UPDATE salesordertest-> SET order_date = employee.hire_datetest-> FROM employeetest-> WHERE salesorder.employee_id = employee.employee_id AND

salesorder.order_date < employee.hire_date;UPDATE 0

Figure 8.16: UPDATE the order_date

The FROM clause allows the use of the employee table in the WHERE and SET clauses. While UPDATE canuse subqueries to control which data rows are updated, only the FROM clause allows columns from othertables to be used in the SET clause.

8.6 Inserting Data Using SELECT

Up to this point, every INSERT statement has inserted a single row. Each INSERT had a VALUES clause listingthe constants to be inserted. However, there is a second form of the INSERT statement. It allows the outputof a SELECT to be used to insert values into a table.

Suppose we wish to add all our friends from the friend table to the customer table. Figure 8.17 shows thatinstead of a VALUES clause, INSERT can use the output of SELECT to insert data into the table. Each column

test=> INSERT INTO customer (name, city, state, country)test-> SELECT firstname || ’ ’ || lastname, city, state, ’USA’test-> FROM friend;INSERT 0 6

Figure 8.17: Using SELECT with INSERT

of the SELECT matches a receiving column in the INSERT. Column names and character string constants canbe used in the SELECT output. The line INSERT 0 6 shows six rows were inserted into the customer table. Azero object identifier is returned because more than one row was inserted.

Inserting into the customer name column presents an interesting challenge. The friend table stores firstand last names in separate columns. The customer table has a single name column. The only solution isto combine the firstname and lastname columns, with a space between them. For example, a firstname of’Dean’ and lastname of ’Yeager’ must be inserted into customer.name as ’Dean Yeager’. This is possible usingthe || operator. It allows character strings to be joined together to form a single string, a process calledconcatenation. In this example, firstname, space (’ ’), and lastname are joined using ||.

693169326933693469356936693769386939694069416942694369446945694669476948694969506951695269536954695569566957695869596960696169626963696469656966696769686969697069716972697369746975697669776978697969806981698269836984698569866987698869896990699169926993699469956996


8.7 Creating Tables Using SELECT

In addition to inserting into existing tables, SELECT has an INTO clause that can create a table and place all itsoutput into the new table. For example, suppose we want to create a new table called newfriend just like ourfriend table, but without an age column. This is easily done with the query in figure 8.18. The SELECT…INTO

test=> SELECT firstname, lastname, city, statetest-> INTO newfriendtest-> FROM friend;SELECTtest=> \d newfriend

Table "newfriend"Attribute | Type | Extra-----------+----------+-------firstname | char(15) |lastname | char(20) |city | char(15) |state | char(2) |

test=> SELECT * FROM newfriend;firstname | lastname | city | state

-----------------+----------------------+-----------------+-------Dean | Yeager | Plymouth | MADick | Gleason | Ocean City | NJNed | Millstone | Cedar Creek | MDSandy | Gleason | Ocean City | NJSandy | Weber | Boston | MAVictor | Tabor | Williamsport | PA(6 rows)

Figure 8.18: Table creation with SELECT

query:

• Creates a table called newfriend

• Uses SELECT’s column labels to name the columns of the new table

• Uses SELECT’s column types as the column types of the new table

SELECT…INTO is CREATE TABLE and SELECT combined in a single statement. The AS clause can be used tochange the column labels and thus control the column names in the new table. The other queries in thefigure show the new table’s structure and contents.

8.8 Summary

This section introduced a variety of powerful SQL features. It takes experience to know when and how to usethem.

With so many capabilities, it can be difficult to construct queries. Your goal should be to make queries asclear as possible.

699769986999700070017002700370047005700670077008700970107011701270137014701570167017701870197020702170227023702470257026702770287029703070317032703370347035703670377038703970407041704270437044704570467047704870497050705170527053705470557056705770587059706070617062

Chapter 9

Data Types

Data types have been used in previous chapters.This chapter covers them in detail.

9.1 Purpose of Data Types

It is tempting to think databases would be easier to use if there was only one data type – a type that couldhold any type of information: numbers, character strings, or dates. While a single data type would certainlymake table creation simpler, there are definite advantages to having different data types:

Consistent Results Columns of a uniform type produce consistent results. Displaying, ordering, aggre-gates, and joins deliver consistent results. There is no conflict about how different types are comparedor displayed. Selecting from an INTEGER column always yields INTEGER values.

Data Validation Columns of a uniform type accept only properly formated data. Invalid data is rejected. Acolumn of type INTEGER will reject a DATE value.

Compact Storage Columns of a uniform type are stored more compactly.

Performance Columns of a uniform type are processed more quickly.

For these reasons, each column in a relational database can hold only one type of data. Data types can not bemixed within a column.

This limitation can cause some difficulties. For example, in our friend table, there is an age column oftype INTEGER. Only whole numbers can be placed in that column. The values “I will ask for his age soon” or“She will not tell me her age” can not be placed in that column. NULL can represent “I don’t know her age.”The solution is to create an age_comments column of type CHAR() to hold comments which can not be placedin the age field.

9.2 Installed Types

POSTGRESQL supports a large number of data types, as shown in table 9.1. Except for the numeric types, allentered values must be surrounded by single quotes.

85

706370647065706670677068706970707071707270737074707570767077707870797080708170827083708470857086708770887089709070917092709370947095709670977098709971007101710271037104710571067107710871097110711171127113711471157116711771187119712071217122712371247125712671277128

86 CHAPTER 9. DATA TYPES

Category Type DescriptionCharacter string TEXT variable storage length

VARCHAR(length) variable storage length with maximum lengthCHAR(length) fixed storage length, blank-padded to length, internally BPCHAR

Numeric INTEGER integer,�

2 billion range, internally INT4INT2 integer,

�32 thousand range

INT8 integer,��

rangeOID object identifierNUMERIC(precision, decimal) number, user-defined precision and decimal locationFLOAT floating-point number, 15-digit precision, internally FLOAT8FLOAT4 floating-point number, 6-digit precision

Temporal DATE dateTIME timeTIMESTAMP date and timeINTERVAL interval of time

Logical BOOL boolean, true or falseGeometric POINT point

LSEG line segmentPATH list of pointsBOX rectangleCIRCLE circlePOLYGON polygon

Network INET IP address with optional netmaskCIDR IP network addressMACADDR Ethernet MAC address

Table 9.1: POSTGRESQL data types

712971307131713271337134713571367137713871397140714171427143714471457146714771487149715071517152715371547155715671577158715971607161716271637164716571667167716871697170717171727173717471757176717771787179718071817182718371847185718671877188718971907191719271937194

9.2. INSTALLED TYPES 87

Character String

Character string types are the most commonly used data types. They can hold any sequence of letters, digits,punctuation, and other valid ASCII characters.1 Typical character strings are names, descriptions, and mailingaddresses. Any value can be stored in a character string. However, character strings should be used onlywhen other data types are inappropriate. The other types provide data validation, more compact storage, andbetter performance.

There are three character string data types: TEXT, VARCHAR(length), and CHAR(length). TEXT allowsan unlimited number of characters to be stored. VARCHAR(length) limits the length of the field to lengthcharacters. Both TEXT and VARCHAR() store only the number of characters in the string. CHAR(length) issimilar to VARCHAR(), except it always stores exactly length characters. It pads the value with trailing spacesto the specified length. It provides slightly faster access than TEXT or VARCHAR().

Understanding why character string types are different from other data types can be difficult. For example,you can store 763 as a character string. In this case, you are storing the symbols 7, 6, and 3, not the numericvalue 763. You can’t add a number to the character string 763 because it doesn’t make sense to add a numberto three symbols. Similarly, the character string 3/8/1992 is eight symbols starting with 3 and ending with2. If you store it in a character string data type, it is not a date. You can not sort it with other values andexpect them to be in chronological order. The string 1/4/1998 is less than 3/6/1992 when these are sorted ascharacter strings because 1 is less than 3.

This illustrates why the other data types are valuable. The other types have a predefined format for theirdata, and can do more appropriate operations on the stored information.

Still, there is nothing wrong with storing numbers or dates in character strings when appropriate. Thestreet address 100 Maple Avenue is best stored in a character string type, even though a number is part ofthe street address. It makes no sense to store the street number in a separate INTEGER field. Also, partnumbers like G8223-9 must be stored in character strings because of the G and dash. In fact, part numbersthat are always five digits, like 32911 or 00413 should be stored in character strings too. They are not realnumbers, but symbols. Leading zeros can not be displayed by INTEGER fields, but are easily displayed incharacter strings.

Numeric

Numeric types allow the storage of numbers. The numeric types are: INTEGER, INT2, INT8, OID, NUMERIC(),FLOAT, and FLOAT4.

INTEGER, INT2, and INT8 store whole numbers of various ranges. Larger ranges require more storage,i.e. INT8 requires twice the storage of INTEGER, and is slower.

OID is used to store POSTGRESQL object identifiers. While INTEGER could be used for this purpose, OID

helps document the meaning of the value stored in the column.NUMERIC(precision, decimal) allows user-defined digits of precision, rounded to decimal places. This type

is slower than the other numeric types.FLOAT and FLOAT4 allow storage of floating-point values. Numbers are stored using fifteen (FLOAT) or

six (FLOAT4) digits of precision. The location of the decimal point is stored separately, so large values like4.78145e+32 can be represented. FLOAT and FLOAT4 are fast and have compact storage, but can produceimprecise rounding during computations. When complete accuracy of floating point values is required,NUMERIC() should be used.

1ASCII is the standard encoding used to map symbols to values. For example, uppercase A maps to the internal value 65.Lowercase a maps to the value 97. Period (.) maps to 46. Space maps to 32.

719571967197719871997200720172027203720472057206720772087209721072117212721372147215721672177218721972207221722272237224722572267227722872297230723172327233723472357236723772387239724072417242724372447245724672477248724972507251725272537254725572567257725872597260


Temporal

Temporal types allow storage of date, time, and time interval information. While these can be stored incharacter strings, it is better to use temporal types, for reasons outlined earlier in this chapter.

The four temporal types are: DATE, TIME, TIMESTAMP, and INTERVAL. DATE allows storage of a single dateconsisting of year, month, and day. The format used to input and display dates is controlled by the DATESTYLE

setting covered in section 4.14 on page 33. TIME allows storage of hour, minute, and second, separated bycolons. TIMESTAMP represents storage of both date and time, i.e. 2000-7-12 17:34:29. INTERVAL representsan interval of time, like 5 hours or 7 days. INTERVAL values are often generated by subtracting two TIMESTAMP

values to find the elapsed time. For example, 1996–12–15 19:00:40 minus 1996–12–8 14:00:10 results in anINTERVAL value of 7 05:00:30, which is seven days, five hours, and thirty seconds. Temporal types can alsohandle timezone designations.

Logical

The only logical type is BOOLEAN. A BOOLEAN field can store only true or false, and of course NULL too. Youcan input true as true, t, yes, y, or 1. False can be input as false, f, no, n, or 0. While true and false can be inputin a variety of ways, true is always output as t and false as f.

Geometric

The geometric types allow storage of geometric primitives. The geometric types are: POINT, LSEG, PATH,BOX, CIRCLE, and POLYGON. Table 9.2 shows the geometric types and typical values.

Types Example NotesPOINT (2,7) (x,y) coordinatesLSEG [(0,0),(1,3)] start and stop points of line segmentPATH ((0,0),(3,0),(4,5),(1,6)) ( ) is a closed path, [ ] is an open pathBox (1,1),(3,3) opposite corner points of a rectangleCIRCLE <(1,2),60> center point and radiusPOLYGON ((3,1),(3,3),(1,0)) points form closed polygon

Table 9.2: Geometric types

Network

The network types are: INET, CIDR, and MACADDR. INET allows storage of an IP address, with or withouta netmask. A typical INET value with netmask is 172.20.90.150 255.255.255.0. CIDR stores IP networkaddresses. It allows a subnet mask to specify the size of the network segment. A typical CIDR value is172.20.90.150/24. MACADDR stores MAC (Media Access Control) addresses. These are assigned to Ethernetnetwork cards at the time of manufacture. A typical MACADDR value is 0:50:4:1d:f6:db.

Internal

There are a variety of types used internally. Psql’s \dT command shows all data types.

726172627263726472657266726772687269727072717272727372747275727672777278727972807281728272837284728572867287728872897290729172927293729472957296729772987299730073017302730373047305730673077308730973107311731273137314731573167317731873197320732173227323732473257326

9.3. TYPE CONVERSION AND CASTING 89

9.3 Type Conversion and Casting

In most cases, values of one type are converted to another type automatically. In rare circumstances whereyou need to explicitly convert one type to another, you can use a function to perform the conversion.Conversion functions have the same names as data types. To convert a value to type INTEGER, use thefunction called integer()., To convert a column date_col of type DATE to type TEXT, use text(): text(date_col).You can also perform type casting using double-colons or the CAST operator, i.e. date_col::text or CAST (datecolAS TEXT).

9.4 Support Functions

Functions allows access to specialized routines from SQL. Functions take one or more arguments, and returna result.

Suppose you want to uppercase a value or column. There is no command for uppercase, but there isa function that will do it. POSTGRESQL has a function called upper. Upper takes a single string argument,and returns the argument in uppercase. The function call upper(col) calls the function upper with col as itsargument, and returns col in uppercase. Figure 9.4 shows an example of the use of the upper function.

test=> SELECT * FROM functest;name------Judy(1 row)

test=> SELECT upper(name) FROM functest;upper-------JUDY(1 row)

Example of a function call

There are many functions available. Table 9.3 shows the most common ones, organized by the data typesthey support. Psql’s \df shows all defined functions and their arguments. Section 17.1 has information aboutall psql commands.

If you call a function with a type for which it is not defined, you will get an error, as shown in the firstquery of figure 9.1. In the first query, the 732 is a number, not a character string. The second query converts732 to a character string so the length can be computed.

9.5 Support Operators

Operators are similar to functions, and are covered in section 4.13 on page 33. Table 9.4 shows the mostcommon operators. Psql’s \do shows all defined operators and their arguments.

All data types have the standard comparison operators <, <=, =, >=, >, and <>. Not all operator/typecombinations are defined. For example, if you try to add two DATE values, you will get an error, as shown inthe first query of figure 9.2.

732773287329733073317332733373347335733673377338733973407341734273437344734573467347734873497350735173527353735473557356735773587359736073617362736373647365736673677368736973707371737273737374737573767377737873797380738173827383738473857386738773887389739073917392


Type Function Example ReturnsCharacter length() length(col) length of colString character_length() character_length(col) length of col, same as length()

octet_length() octet_length(col) length of col, including multi-byte overheadtrim() trim(col) col with leading and trailing spaces removedtrim(BOTH…) trim(BOTH, col) same as trim()trim(LEADING…) trim(LEADING col) col with leading spaces removedtrim(TRAILING…) trim(TRAILING col) col with trailing spaces removedtrim(…FROM…) trim(str FROM col) col with leading and trailing str removedrpad() rpad(col, len) col padded on the right to len charactersrpad() rpad(col, len, str) col padded on the right using strlpad() lpad(col, len) col padded on the left to len characterslpad() lpad(col, len, str) col padded on the left using strupper() upper(col) col uppercasedlower() lower(col) col lowercasedinitcap() initcap(col) col with the first letter capitalizedstrpos() strpos(col, str) position of str in colposition() position(str IN col) same as strpos()substr() substr(col, pos) col starting at position possubstring(…FROM…) substring(col FROM pos) same as substr() abovesubstr() substr(col, pos, len) col starting at position pos for length lensubstring(…FROM…FOR…) substring(col FROM pos FOR len) same as substr() abovetranslate() translate(col, from, to) col with from changed to toto_number() to_number(col, mask) convert col to NUMERIC() based on maskto_date to_date(col, mask) convert col to DATE based on maskto_timestamp to_timestamp(col, mask) convert col to TIMESTAMP based on mask

Numeric round() round(col) round to an integerround() round(col, len) NUMERIC() col rounded to len decimal placestrunc() trunc(col) truncate to an integertrunc() trunc(col, len) NUMERIC() col truncated to len decimal placesabs() abs(col) absolute valuefactorial() factorial(col) factorialsqrt() sqrt(col) square rootcbrt() cbrt(col) cube rootexp() exp(col) exponentialln() ln(col) natural logarithmlog() log(log) base-10 logarithmto_char() to_char(col, mask) convert col to a string based on mask

Temporal date_part() date_part(units, col) units part of colextract(…FROM…) extract(units FROM col) same as date_part()date_trunc() date_trunc(units, col ) col rounded to unitsisfinite() isfinite(col) BOOLEAN indicating if col is a valid datenow() now() TIMESTAMP representing current date and timetimeofday() timeofday() string showing date/time in UNIX formatoverlaps() overlaps(c1, c2, c3, c4) BOOLEAN indicating if col’s overlap in timeto_char() to_char(col, mask) convert col to string based on mask

Geometric see psql’s \df for a list of geometric functionsNetwork broadcast() broadcast(col) broadcast address of col

host() host(col) host address of colnetmask() netmask(col) netmask of colmasklen() masklen(col) mask length of colnetwork() network(col) network address of col

NULL nullif() nullif(col1, col2) return NULL if col1 equals col2, else return col1coalesce() coalesce(col1, col2,…) return first non-NULL argument

Table 9.3: Common functions

739373947395739673977398739974007401740274037404740574067407740874097410741174127413741474157416741774187419742074217422742374247425742674277428742974307431743274337434743574367437743874397440744174427443744474457446744774487449745074517452745374547455745674577458

9.5. SUPPORT OPERATORS 91

test=> SELECT length(755);ERROR: Function ’length(int4)’ does not exist

Unable to identify a function which satisfies the given argument typesYou will have to retype your query using explicit typecasts

test=> SELECT length(text(755));length--------

3(1 row)

Figure 9.1: Error generated by undefined function/type combination.

Type Function Example ReturnsCharacter || col1 || col2 append col2 on to the end of col1String ˜ col ˜ pattern BOOLEAN, col matches regular expression pattern

!˜ col !˜ pattern BOOLEAN, col does not match regular expression pattern˜* col ˜* pattern same as ˜, but case-insensitive!˜* col !˜* pattern same as !˜, but case-insensitive˜˜ col ˜˜ pattern BOOLEAN, col matches LIKE patternLIKE col LIKE pattern same as ˜˜!˜˜ col !˜˜ pattern BOOLEAN, col does not match LIKE patternNOT LIKE col NOT LIKE pattern same as !˜˜

Numeric ! !col factorial+ col1 + col2 addition– col1 – col2 subtraction* col1 * col2 multiplication/ col1 / col2 division% col1 % col2 remainder/moduloˆ col1 ˆ col2 col1 raised to the power of col2

Temporal + col1 + col2 addition of temporal values– col1 – col2 subtraction of temporal values(…) OVERLAPS (…) (c1, c2) OVERLAPS (c3,c4) BOOLEAN indicating col’s overlap in time

Geometric see psql’s \do for a list of geometric operatorsNetwork << col1 << col2 BOOLEAN indicating if col1 is a subnet of col2

<<= col1 <<= col2 BOOLEAN indicating if col1 is equal or a subnet of col2>> col1 >> col2 BOOLEAN indicating if col1 is a supernet of col2>>= col1 >>= col2 BOOLEAN indicating if col1 is equal or a supernet of col2

Table 9.4: Common operators

745974607461746274637464746574667467746874697470747174727473747474757476747774787479748074817482748374847485748674877488748974907491749274937494749574967497749874997500750175027503750475057506750775087509751075117512751375147515751675177518751975207521752275237524


test=> SELECT date(’1/1/1992’) + date(’1/1/1993’);ERROR: Unable to identify an operator ’+’ for types ’date’ and ’date’

You will have to retype this query using an explicit casttest=> SELECT date(’1/1/1992’) + interval(’1 year’);

?column?------------------------1993-01-01 00:00:00-05(1 row)

test=> SELECT timestamp(’1/1/1992’) + ’1 year’;?column?

------------------------1993-01-01 00:00:00-05(1 row)

Figure 9.2: Error generated by undefined operator/type combination

9.6 Support Variables

There are several defined variables. These are shown in table 9.5.

Meaning MeaningCURRENT_DATE current dateCURRENT_TIME current timeCURRENT_TIMESTAMP current date and timeCURRENT_USER user connected to the database

Table 9.5: Common variables

9.7 Arrays

Arrays allow a column to store several simple data values. You can store one-dimensional arrays, two-dimensional arrays, or arrays with any number of dimensions.

An array column is created like an ordinary column, except brackets are used to specify the dimensions ofthe array. The number of dimensions and size of each dimension are for documentation purposes only. Valuesthat do not match the dimensions specified at column creation are not rejected. Figure 9.3 creates a tablewith one-, two-, and three-dimensional INTEGER columns. The first and last columns have sizes specified.

test=> CREATE TABLE array_test (test(> col1 INTEGER[5],test(> col2 INTEGER[][],test(> col3 INTEGER[2][2][]test(> );CREATE

Figure 9.3: Creation of array columns

752575267527752875297530753175327533753475357536753775387539754075417542754375447545754675477548754975507551755275537554755575567557755875597560756175627563756475657566756775687569757075717572757375747575757675777578757975807581758275837584758575867587758875897590

9.7. ARRAYS 93

The first column is a one-dimensional array, also called a list or vector. Values inserted into that columnlook like {3,10,9,32,24} or {20,8,9,1,4}. Each value is a list of integers, surrounded by curly braces. Thesecond column, col2, is a two-dimensional array. Typical values for this column are {{2,9,3},{4,3,5}} or{{18,6},{32,5}}. Notice double braces are used. The outer brace surrounds two one-dimensional arrays.You can think of it as a matrix, with the first one-dimensional array representing the first row of the array,and the second representing the second row of the array. Commas separate the individual elements, andeach pair of braces. The third column of the array_test table is a three-dimensional array, holding values like{{{3,1},{1,9}},{{4,5},{8,2}}}. This is a three-dimensional matrix made up of two 2

�2 matrices. Arrays of

any size can be constructed.

Figure 9.4 shows a query inserting values into array_test, and several queries selecting data from thetable. Brackets are used to access individual array elements.

test=> INSERT INTO array_test VALUES (test(> ’{1,2,3,4,5}’,test(> ’{{1,2},{3,4}}’,test(> ’{{{1,2},{3,4}},{{5,6}, {7,8}}}’test(> );INSERT 52694 1test=> SELECT * FROM array_test;

col1 | col2 | col3-------------+---------------+-------------------------------{1,2,3,4,5} | {{1,2},{3,4}} | {{{1,2},{3,4}},{{5,6},{7,8}}}(1 row)

test=> SELECT col1[4] FROM array_test;col1------

4(1 row)

test=> SELECT col2[2][1] FROM array_test;col2------

3(1 row)

test=> SELECT col3[1][2][2] FROM array_test;col3------

4(1 row)

Figure 9.4: Using arrays

Any data type can be used as an array. If individual elements of the array are accessed frequently orupdated, it is better to use separate columns or tables rather than arrays.

759175927593759475957596759775987599760076017602760376047605760676077608760976107611761276137614761576167617761876197620762176227623762476257626762776287629763076317632763376347635763676377638763976407641764276437644764576467647764876497650765176527653765476557656


9.8 Large Objects(BLOBS)

POSTGRESQL can not store values of more than several thousand bytes using the above data types, nor canbinary data be easily entered within single quotes. Large objects, also called Binary Large Objects or BLOBS,are used to store very large values and binary data.

Large objects allow storage of any operating system file, like images or large text files, directly intothe database. You load the file into the database using lo_import(), and retrieve the file from the databaseusing lo_export(). Figure 9.5 shows an example that stores a fruit name and image. Lo_import() stores

test=> CREATE TABLE fruit (name CHAR(30), image OID);CREATEtest=> INSERT INTO fruittest-> VALUES (’peach’, lo_import(’/usr/images/peach.jpg’));INSERT 27111 1test=> SELECT lo_export(fruit.image, ’/tmp/outimage.jpg’)test-> FROM fruittest-> WHERE name = ’peach’;lo_export-----------

1(1 row)

Figure 9.5: Using large images

/usr/images/peach.jpg into the database and returns an OID value that is inserted into fruit.image. Lo_export()uses the OID value stored in fruit.image to find the large object stored in the database, and places the imageinto the new file /tmp/outimage.jpg. The 1 returned by lo_export() indicates a successful export.

Full pathnames must be used with large objects because the database server is running in a differentdirectory than the psql client. Files are imported and exported by the postgres user, so postgres must havepermission to read the file for lo_import(), and directory write permission for lo_export().

9.9 Summary

Care should be used when choosing data types. The many data types give users great flexibility. Wisedecisions about column names and types give the database structure and consistency. It also improvesperformance and allows efficient data storage. Don’t choose types hastily — you will regret it later.

765776587659766076617662766376647665766676677668766976707671767276737674767576767677767876797680768176827683768476857686768776887689769076917692769376947695769676977698769977007701770277037704770577067707770877097710771177127713771477157716771777187719772077217722

Chapter 10

Transactions and Locks

Up to this point, we have used POSTGRESQL as a sophisticated filing cabinet. However, a database is muchmore. It allows users to view and modify information simultaneously. It helps ensure data integrity. Thischapter explores these database capabilities.

10.1 Transactions

Though you may not have heard the term transaction before, you have already used them. Every SQL

query is executed in a transaction. Transactions give databases an all-or-nothing capability when makingmodifications.

For example, suppose the query UPDATE trans_test SET col = 3 is in the process of modifying 700 rows.And suppose, after it has modified 200 rows, the user types control-C, or the computer reset button is pressed.When the user looks at trans_test, he will see that none of the rows have been updated.

This might surprise you. Because 200 of the 700 rows had already updated, you might suspect 200 rowshad been modified. However, POSTGRESQL uses transactions to guarantee queries are either completed, orhave no effect.

This feature is valuable. Suppose you were executing a query to add $500 to everyone’s salary. Andsuppose you kicked the power cord out of the wall while the update was happening. Without transactions, thequery may have updated half the salaries, but not the rest. It would be difficult to know where the UPDATE

stopped. You would wonder, “Which rows were updated, and which ones were not?” You can’t just re-executethe query, because some people have already received their $500 increase. With transactions, you can checkto see if any of the rows were updated. If one was updated, they all were updated. If not, simply re-executethe query.

10.2 Multi-Statement Transactions

By default, each SQL query runs in its own transaction. Figures 10.1 and 10.2 show two identical queries.

test=> INSERT INTO trans_test VALUES (1);INSERT 130057 1

Figure 10.1: INSERT with no explicit transaction

Figure 10.1 shows a typical INSERT query. Before POSTGRESQL starts the INSERT, it begins a transaction. Itperforms the INSERT, then commits the transaction. This is done automatically for any query with no explicit

95

772377247725772677277728772977307731773277337734773577367737773877397740774177427743774477457746774777487749775077517752775377547755775677577758775977607761776277637764776577667767776877697770777177727773777477757776777777787779778077817782778377847785778677877788

96 CHAPTER 10. TRANSACTIONS AND LOCKS

test=> BEGIN WORK;BEGINtest=> INSERT INTO trans_test VALUES (1);INSERT 130058 1test=> COMMIT WORK;COMMIT

Figure 10.2: INSERT with explicit transaction

transaction. Figure 10.2 shows an INSERT using an explicit transaction. BEGIN WORK starts the transaction,and COMMIT WORK commits the transaction. The only difference between the two queries is that there is animplied BEGIN WORK…COMMIT WORK surrounding first INSERT.

Even more valuable is the ability to bind multiple queries into a single transaction. When this is done,either all the queries execute to completion, or none of them have any effect. For example, figure 10.3 showstwo INSERTs in a transaction. PostgreSQL guarantees either both INSERTs succeed, or none of them.

test=> BEGIN WORK;BEGINtest=> INSERT INTO trans_test VALUES (1);INSERT 130059 1test=> INSERT INTO trans_test VALUES (2);INSERT 130060 1test=> COMMIT WORK;COMMIT

Figure 10.3: Two INSERTs in a single transaction

For a more complicated example, suppose you have a table of bank account balances, and suppose youwish to transfer $100 from one account to another account. This is performed using two queries — anUPDATE to subtract $100 from one account, and an UPDATE to add $100 to another account. The UPDATEsshould either both complete, or none of them. If the first UPDATE completes but not the second, the $100would disappear from the bank records. It would have been subtracted from one account, but never added.Such errors are very hard to find. Multi-statement transactions prevent them from happening. Figure 10.4shows the two queries bound into a single transaction. The transaction forces POSTGRESQL to perform the

test=> BEGIN WORK;BEGINtest=> UPDATE bankacct SET balance = balance - 100 WHERE acctno = ’82021’;UPDATE 1test=> UPDATE bankacct SET balance = balance + 100 WHERE acctno = ’96814’;UPDATE 1test=> COMMIT WORK;COMMIT

Figure 10.4: Multi-statement transaction

queries as a single operation.

778977907791779277937794779577967797779877997800780178027803780478057806780778087809781078117812781378147815781678177818781978207821782278237824782578267827782878297830783178327833783478357836783778387839784078417842784378447845784678477848784978507851785278537854

10.3. VISIBILITY OF COMMITTED TRANSACTIONS 97

When you begin a transaction with BEGIN WORK, you don’t have to commit it using COMMIT WORK. Youcan close the transaction with ROLLBACK WORK and the transaction will be discarded. The database is left asthough the transaction had never been executed. In figure 10.5, the current transaction is rolled back, causingthe DELETE have no effect. Also, if any query inside a multi-statement transaction can not be executed due

test=> INSERT INTO rollback_test VALUES (1);INSERT 19369 1test=> BEGIN WORK;BEGINtest=> DELETE FROM rollback_test;DELETE 1test=> ROLLBACK WORK;ROLLBACKtest=> SELECT * FROM rollback_test;x---1(1 row)

Figure 10.5: Transaction rollback

to an error, the entire transaction is automatically rolled back.

10.3 Visibility of Committed Transactions

Though we have focused on the all-or-nothing nature of transactions, they have other important benefits.Only committed transactions are visible to users. Though the current users sees his changes, other usersdo not see them until the transaction is committed.

For example, figure 10.1 shows two users issuing queries using the default mode in which every statementis in its own transaction. Figure 10.2 shows the same query with user 1 using a multi-query transaction. User

User 1 User 2 NotesSELECT (*) FROM trans_test returns 0

INSERT INTO trans_test VALUES (1) add row to trans_testSELECT (*) FROM trans_test returns 1

SELECT (*) FROM trans_test returns 1

Table 10.1: Visibility of single-query transactions

1 sees the changes made by his transaction. However, user 2 does not see the changes until user 1 commitsthe transaction.

This is another advantage of transactions. They insulate users from seeing uncommitted transactions.Users never see a partially committed view of the database.

As another example, consider the bank account query where we transfered $100 from one bank accountto another. Suppose we were calculating the total amount of money in all bank accounts at the same timethe $100 was being transfered. If we did not see a consistent view of the database, we could have seen the$100 removed from the account, but not see the $100 added. Our bank account total would be wrong. Aconsistent database view means we either see the $100 in its original account, or we see it in its new account.

785578567857785878597860786178627863786478657866786778687869787078717872787378747875787678777878787978807881788278837884788578867887788878897890789178927893789478957896789778987899790079017902790379047905790679077908790979107911791279137914791579167917791879197920


User 1 User 2 NotesBEGIN WORK User 1 starts a transaction

SELECT (*) FROM trans_test returns 0INSERT INTO trans_test VALUES (1) add row to trans_testSELECT (*) FROM trans_test returns 1

SELECT (*) FROM trans_test returns 0COMMIT WORK

SELECT (*) FROM trans_test returns 1

Table 10.2: Visibility using multi-query transactions

Without this feature, we would have to make sure no one was making bank account transfers while we werecalculating the amount of money in all accounts.

While this is a contrived example, real-world database users INSERT, UPDATE, and DELETE data all atthe same time, while others SELECT data. All this activity is orchestrated by the database so each user canoperate in a secure manner, knowing other users will not affect their results in an unpredictable way.

10.4 Read Committed and Serializable Isolation Levels

The previous section illustrated that users only see committed transactions. This does not address whathappens if someone commits a transaction while you are in your own transaction. There are cases whereyou need to control if other transaction commits are seen by your transaction.

POSTGRESQL’s default isolation level, READ COMMITTED, allows you to see other transaction commitswhile your transaction is open. Figure 10.6 illustrates this effect. First, the transaction does a SELECT

test=> BEGIN WORK;BEGINtest=> SELECT COUNT(*) FROM trans_test;count-------

5(1 row)

test=> --test=> -- someone commits INSERT INTO trans_testtest=> --test=> SELECT COUNT(*) FROM trans_test;count-------

6(1 row)

test=> COMMIT WORK;COMMIT

Figure 10.6: Read-committed isolation level

COUNT(*). Then, while sitting at a psql prompt, someone INSERTs into the table. The next SELECT COUNT(*)

792179227923792479257926792779287929793079317932793379347935793679377938793979407941794279437944794579467947794879497950795179527953795479557956795779587959796079617962796379647965796679677968796979707971797279737974797579767977797879797980798179827983798479857986

10.5. LOCKING 99

shows the newly INSERTED row. When another user commits a transaction, it is seen by the currenttransaction, even if it is committed after the current transaction started.

You can prevent your transaction from seeing changes made to the database. SET TRANSACTION ISOLATION

LEVEL SERIALIZABLE changes the isolation level of the current transaction. SERIALIZABLE isolation preventsthe current transaction from seeing commits made by other transactions. Any commit made after the start ofthe first query of the transaction is not visible. Figure 10.7 shows an example of a SERIALIZABLE transaction.

test=> BEGIN WORK;BEGINtest=> SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;SET VARIABLEtest=> SELECT COUNT(*) FROM trans_test;count-------

5(1 row)

test=> --test=> -- someone commits INSERT INTO trans_testtest=> --test=> SELECT COUNT(*) FROM trans_test;count-------

5(1 row)

test=> COMMIT WORK;COMMIT

Figure 10.7: Serializable isolation level

SERIALIZABLE isolation provides a stable view of the database for SELECT transactions. For transactionscontaining UPDATE and DELETE queries, SERIALIZABLE mode is more complicated. SERIALIZABLE isolationforces the database to execute all transactions as though they were run serially, one after another, even ifthey are run concurrently. If two concurrent transactions attempt to update the same row, serializability isimpossible. When this happens, POSTGRESQL forces one transaction to roll back.

For SELECT-only transactions, SERIALIZABLE isolation level should be used when you don’t want to seeother transaction commits during your transaction. For UPDATE and DELETE transactions, SERIALIZABLE

isolation prevents concurrent modification of the same data row, and should be used with caution.

10.5 Locking

Exclusive locks, also called write locks, prevent other users from modifying a row or an entire table. Rowsmodified by UPDATE and DELETE are exclusively locked automatically for the duration of the transaction. Thisprevents other users from making changes to the row until the transaction is either committed or rolledback.

For example, table 10.3 shows two simultaneous UPDATE transactions affecting the same row. One trans-

798779887989799079917992799379947995799679977998799980008001800280038004800580068007800880098010801180128013801480158016801780188019802080218022802380248025802680278028802980308031803280338034803580368037803880398040804180428043804480458046804780488049805080518052


Transaction 1 Transaction 2 NotesBEGIN WORK BEGIN WORK Start both transactionsUPDATE row 64 Transaction 1 exclusively locks row 64

UPDATE row 64 Transaction 2 must wait to see if first transaction commitsCOMMIT WORK Transaction 1 commits. Transaction 2 returns from UPDATE.

COMMIT WORK Transaction 2 commits

Table 10.3: Waiting for a lock

action must wait to see if the other transaction commits or rolls back. If these had been using SERIALIZABLE

isolation level, transaction 2 would have been rolled back automatically if transaction 1 committed.

The only time users must wait for other users is when they are trying to modify the same row. If theymodify different rows, there is no waiting. SELECT queries never have to wait.

Locking is done automatically by the database. However, there are cases when locking must be controlledmanually. For example, figure 10.8 shows a query that first SELECTs a row, then performs an UPDATE. The

test=> BEGIN WORK;BEGINtest=> SELECT *test-> FROM lock_testtest-> WHERE name = ’James’;id | name-----+--------------------------------521 | James(1 row)

test=> --test=> -- the SELECTed row is not lockedtest=> --test=> UPDATE lock_testtest-> SET name = ’Jim’test-> WHERE name = ’James’;UPDATE 1test=> COMMIT WORK;COMMIT

Figure 10.8: SELECT with no locking

problem is another user can modify the James row between the SELECT and UPDATE. To prevent this, youcan use SERIALIZABLE isolation. In this mode, one of the UPDATEs would fail. A better solution is to useSELECT…FOR UPDATE to lock the selected rows. Figure 10.9 shows the same query using SELECT…FOR

UPDATE. Another user can not modify the James row between the SELECT…FOR UPDATE and UPDATE.

You can also manually control locking using the LOCK command. It allows specification of a transaction’slock type and scope. See the LOCK manual page for more information.

805380548055805680578058805980608061806280638064806580668067806880698070807180728073807480758076807780788079808080818082808380848085808680878088808980908091809280938094809580968097809880998100810181028103810481058106810781088109811081118112811381148115811681178118

10.6. DEADLOCKS 101

test=> BEGIN WORK;BEGINtest=> SELECT *test-> FROM lock_testtest-> WHERE name = ’James’test-> FOR UPDATE;id | name-----+--------------------------------521 | James(1 row)

test=> --test=> -- the SELECTed row is lockedtest=> --test=> UPDATE lock_testtest-> SET name = ’Jim’test-> WHERE name = ’James’;UPDATE 1test=> COMMIT WORK;COMMIT

Figure 10.9: SELECT…FOR UPDATE

10.6 Deadlocks

It is possible to create an unrecoverable lock condition, called a deadlock. Figure 10.4 illustrates how twotransactions become deadlocked. In this example, each transaction holds a lock and is waiting for the other

Transaction 1 Transaction2 NotesBEGIN WORK BEGIN WORK Start both transactionsUPDATE row 64 UPDATE row 83 Independent rows write lockedUPDATE row 83 Holds waiting for transaction 2 to release write lock

UPDATE row 64 Attempt to get write lock held by transaction 1auto-ROLLBACK WORK Deadlock detected — transaction 2 automatically rolled back

COMMIT WORK Transaction 1 returns from UPDATE and commits

Table 10.4: Deadlock

transaction’s lock to be released. One transaction must be rolled back by POSTGRESQL because the twotransactions will wait forever. Obviously, if they had acquired locks in the same order no deadlock wouldoccur.

10.7 Conclusion

Single-user database queries are concerned with getting the job done. Multi-user queries must be designedto gracefully handle multiple users accessing the data.

Multi-user interaction can be very confusing. The database is constantly changing. In a multi-user envi-ronment, improperly constructed queries can randomly fail when users perform simultaneously operations.

811981208121812281238124812581268127812881298130813181328133813481358136813781388139814081418142814381448145814681478148814981508151815281538154815581568157815881598160816181628163816481658166816781688169817081718172817381748175817681778178817981808181818281838184


Queries can not assume that rows from previous transactions still exist.By understanding POSTGRESQL’S multi-user behavior, you are now prepared to create robust queries.

Overlapping transactions and locking must always be considered. POSTGRESQL has a powerful set of featuresto allow the construction of reliable multi-user queries.

818581868187818881898190819181928193819481958196819781988199820082018202820382048205820682078208820982108211821282138214821582168217821882198220822182228223822482258226822782288229823082318232823382348235823682378238823982408241824282438244824582468247824882498250

Chapter 11

Performance

In an ideal world, users would never have to be concerned about performance. The system would tune itself.However, databases don’t live in an ideal world. An untuned database can be thousands of times slower thana tuned one, so it pays to take steps to get optimal performance. This chapter shows how to get optimalperformance from your database.

11.1 Indexes

Indexing allows fast retrieval of specific rows from a table. For example, consider the query SELECT * FROM

bigtable WHERE col = 43. Without an index, POSTGRESQL must scan through the entire table looking forrows where col equals 43. With an index on col, POSTGRESQL can go directly to the rows where col equals43, bypassing all other rows.

Tables can be accessed two ways. Without an index, POSTGRESQL starts at the beginning of the tablefile and reads to the end, looking for relevant rows. Wih an index, POSTGRESQL looks up specific values inthe index, and goes directly to the matching rows. For large tables, it can take minutes to check every row.Using an index, finding a specific row takes fractions of a second.

Internally, POSTGRESQL stores user data in operating system files. Each table is in its own file, and rowsare stored one after another in the file. An index is a separate file that is sorted by one or more columns. Itcontains pointers into the table file, allowing rapid access to specific values in the table.

Unfortunately, POSTGRESQL does not create indexes automatically. Users must create them for fre-quently accessed columns. Indexes are created using the CREATE INDEX command, as shown in figure 11.1.In this example, bigtable_custid_idx is the name of the index, bigtable is the table being indexed, and cus-

test=> CREATE INDEX bigtable_custid_idx ON bigtable(customer_id);CREATE

Figure 11.1: Example of CREATE INDEX

tomer_id is the column being indexed. You can use any name for the index, but it is good to use the tableand column names as part of the index name, i.e. bigtable_customer_id_idx or i_bigtable_custid. This indexis only useful for finding rows in bigtable for specific customer_ids. It can not help when accessing othercolumns because indexes are sorted by a specific column.

You can create as many indexes as you wish. Of course, an index on a seldom used column is a wasteof disk space, and every table modification requires an update to each index, so performance can suffer withtoo many indexes.

103

825182528253825482558256825782588259826082618262826382648265826682678268826982708271827282738274827582768277827882798280828182828283828482858286828782888289829082918292829382948295829682978298829983008301830283038304830583068307830883098310831183128313831483158316

104 CHAPTER 11. PERFORMANCE

It is possible to create an index spanning multiple columns. Multi-column indexes are sorted by thefirst indexed column, then in cases where the first column has several equal values, sorting continues usingthe second indexed column. The command CREATE INDEX bigtable_age_gender_idx ON bigtable (age, gender)creates an index on the column age, and where several age rows have the same value, then sorts on gender.Multi-column indexes are only useful with columns that have many duplicate values.

The query SELECT * FROM bigtable WHERE age = 36 AND gender = ’F’ can use the index, as can SELECT *FROM bigtable WHERE age = 36.

However, index bigtable_age_gender_idx is useless if you wish to look up rows only based on gender. Thegender component of the index is used only after the age value has been specified. The query SELECT * FROM

bigtable WHERE gender = ’F’ can not use the index because there is no restriction on the age column, whichis the first part of the index.

CREATE INDEX has several other options. See the manual page for more information.

11.2 Unique Indexes

Unique indexes are like ordinary indexes, except they prevent duplicate values from occurring in the table.For example, figure 11.2 shows the creation of a table and unique index, and then the attempted insertion ofduplicate values. In CREATE INDEX, the word UNIQUE prevents the index from accepting duplicate values.

test=> CREATE TABLE duptest (channel INTEGER);CREATEtest=> CREATE UNIQUE INDEX duptest_channel_idx ON duptest (channel);CREATEtest=> INSERT INTO duptest VALUES (1);INSERT 130220 1test=> INSERT INTO duptest VALUES (1);ERROR: Cannot insert a duplicate key into unique index duptest_channel_idx

Figure 11.2: Example of a unique index

Sometimes unique indexes are created only to prevent duplicate values from appearing in the table.Multi-column unique indexes ensure the combination of all indexed columns remain unique. Unique indexesspeed data access and prevent duplicates.

11.3 Cluster

The CLUSTER command reorders the table file to match the ordering of an index. This is a specializedcommand that is valuable only in cases where performance is critical, and the indexed column has manyequal values.

For example, suppose bigtable.age has many duplicate values, and the query SELECT * FROM bigtableWHERE age = 25 is executed. An index on age allows rapid retrieval of the row locations from the index, but ifthere are thousands of matching rows, they may be scattered in the table file, requiring many disk accesses toretrieve them. CLUSTER reorders the table, placing duplicate values next to each other. This speeds accessfor large queries accessing many duplicate values. See the CLUSTER manual page for more information.

831783188319832083218322832383248325832683278328832983308331833283338334833583368337833883398340834183428343834483458346834783488349835083518352835383548355835683578358835983608361836283638364836583668367836883698370837183728373837483758376837783788379838083818382

11.4. VACUUM 105

11.4 Vacuum

When POSTGRESQL updates a row, it keeps the old copy of the row in the table file and writes a new one.The old row is marked as expired, and is used by other transactions still viewing the database in its priorstate. Deletions are similarly marked as expired, but not removed from the file.

The VACUUM command removes expired rows from the table file. While it removes them, it moves rowsfrom the end of the table into the expired spots, thereby compacting the table. While VACUUM is doing this,no one is allowed to access the table.

There are two ways to run VACUUM. VACUUM alone vacuums all tables in the database. VACUUM tablenamevacuums a single table.

The VACUUM command should be run periodically to clean out expired rows. For tables that are heavilymodified, it may be useful to run VACUUM every night in an automated manner. For tables with fewmodifications, VACUUM should be run as needed.

11.5 Vacuum Analyze

VACUUM ANALYZE is an enhanced version of VACUUM. While VACUUM ANALYZE is vacuuming a table, it collectsstatistics about the each column’s proportion of duplicate values and the maximum and minium values. Thisinformation is used by POSTGRESQL when deciding how to efficiently execute complex queries. VACUUM

ANALYZE should be run when the data stored in a table dramatically changes.

11.6 EXPLAIN

EXPLAIN reports how POSTGRESQL executes queries. For example, figure 11.3 shows a SELECT querypreceeded with the word EXPLAIN. EXPLAIN causes POSTGRESQL to display how the query will be executed,

test=> EXPLAIN SELECT customer_id FROM bigtable;NOTICE: QUERY PLAN:

Seq Scan on bigtable (cost=0.00..15.00 rows=1000 width=4)

EXPLAIN

Figure 11.3: Using EXPLAIN

rather than executing it. In the figure, POSTGRESQL reports a sequential scan will be used on bigtable,meaning it will scan the entire table. Cost is an estimate of the effort required to perform the query. Thenumbers are only meaningful for comparison. Rows indicates the number of rows it expects to return. Widthis the number of bytes per row.

Figure 11.4 shows more interesting examples of EXPLAIN. The first EXPLAIN shows a SELECT with therestriction customer_id = 55. This is again a sequential scan, but the restriction makes POSTGRESQL thinkit is returning ten rows. A VACUUM ANALYZE is run, causing the next query to properly show a rows valueof one instead of ten. An index is created, and the query rerun. This time, an index scan is used, allowingPOSTGRESQL to go right to the rows where customer_id equals 55. The next query illustrates that if norestriction is supplied, POSTGRESQL properly decides that the index is of no use, and a sequential scan isperformed. The last query uses an ORDER BY that matches the index, so POSTGRESQL uses an index scan.

Much more complex queries can be viewed using EXPLAIN, as shown in figure 11.5. In this example, tab1

838383848385838683878388838983908391839283938394839583968397839883998400840184028403840484058406840784088409841084118412841384148415841684178418841984208421842284238424842584268427842884298430843184328433843484358436843784388439844084418442844384448445844684478448


test=> EXPLAIN SELECT customer_id FROM bigtable WHERE customer_id = 55;NOTICE: QUERY PLAN:


EXPLAINtest=> VACUUM ANALYZE bigtable;VACUUMtest=> EXPLAIN SELECT customer_id FROM bigtable WHERE customer_id = 55;NOTICE: QUERY PLAN:


EXPLAINtest=> CREATE UNIQUE INDEX bigtable_custid_idx ON bigtable (customer_id);CREATEtest=> EXPLAIN SELECT customer_id FROM bigtable WHERE customer_id = 55;NOTICE: QUERY PLAN:

Index Scan using bigtable_custid_idx on bigtable (cost=0.00..2.01 rows=1 width=4)

EXPLAINtest=> EXPLAIN SELECT customer_id FROM bigtable;NOTICE: QUERY PLAN:


EXPLAINtest=> EXPLAIN SELECT * FROM bigtable ORDER BY customer_id;NOTICE: QUERY PLAN:

Index Scan using bigtable_custid_idx on bigtable (cost=0.00..42.00 rows=1000 width=4)

EXPLAIN

Figure 11.4: More complex EXPLAIN examples

844984508451845284538454845584568457845884598460846184628463846484658466846784688469847084718472847384748475847684778478847984808481848284838484848584868487848884898490849184928493849484958496849784988499850085018502850385048505850685078508850985108511851285138514

11.7. SUMMARY 107

test=> EXPLAIN SELECT * FROM tab1, tab2 WHERE col1 = col2;NOTICE: QUERY PLAN:

Merge Join (cost=139.66..164.66 rows=10000 width=8)-> Sort (cost=69.83..69.83 rows=1000 width=4)

-> Seq Scan on tab2 (cost=0.00..20.00 rows=1000 width=4)-> Sort (cost=69.83..69.83 rows=1000 width=4)

-> Seq Scan on tab1 (cost=0.00..20.00 rows=1000 width=4)

EXPLAIN

Figure 11.5: EXPLAIN example using joins

and tab2 are joined on col1 and col2. In this case, each table is sequential scanned, and the result sorted. Thetwo results are then merge joined to produce output. POSTGRESQL also supports hash join and nested loopjoin methods. POSTGRESQL chooses the join method it believes to be the fastest.

11.7 Summary

There are a variety of tools available to speed up POSTGRESQL queries. While they are not required to getresults, performance tuning can produce huge improvements in query speed.

851585168517851885198520852185228523852485258526852785288529853085318532853385348535853685378538853985408541854285438544854585468547854885498550855185528553855485558556855785588559856085618562856385648565856685678568856985708571857285738574857585768577857885798580


858185828583858485858586858785888589859085918592859385948595859685978598859986008601860286038604860586068607860886098610861186128613861486158616861786188619862086218622862386248625862686278628862986308631863286338634863586368637863886398640864186428643864486458646

Chapter 12

Controlling Results

12.1 LIMIT

12.2 Cursors

DECLARE

FETCH

CLOSE

109

864786488649865086518652865386548655865686578658865986608661866286638664866586668667866886698670867186728673867486758676867786788679868086818682868386848685868686878688868986908691869286938694869586968697869886998700870187028703870487058706870787088709871087118712

110 CHAPTER 12. CONTROLLING RESULTS

871387148715871687178718871987208721872287238724872587268727872887298730873187328733873487358736873787388739874087418742874387448745874687478748874987508751875287538754875587568757875887598760876187628763876487658766876787688769877087718772877387748775877687778778

Chapter 13

Table Management

13.1 Temporary Tables

Purpose

Creation

13.2 ALTER TABLE

13.3 Inheritance

Purpose

Creation

Examples

13.4 Views

Creation

Limitations

13.5 Rules

Creation

Limitations

13.6 Triggers

13.7 Primary/Foreign Keys

13.8 NOTIFY and LISTEN

POSTGRESQL allows users to send signals to each other using LISTEN and NOTIFY. For example, suppose auser wants to receive notification when a table is updated. He can register the table name using the LISTEN

111

877987808781878287838784878587868787878887898790879187928793879487958796879787988799880088018802880388048805880688078808880988108811881288138814881588168817881888198820882188228823882488258826882788288829883088318832883388348835883688378838883988408841884288438844

112 CHAPTER 13. TABLE MANAGEMENT

command. If someone updates the table and issues a NOTIFY command, all registered listeners will receivenotification. For more information, see the NOTIFY and LISTEN manual pages.

884588468847884888498850885188528853885488558856885788588859886088618862886388648865886688678868886988708871887288738874887588768877887888798880888188828883888488858886888788888889889088918892889388948895889688978898889989008901890289038904890589068907890889098910

Chapter 14

Importing and Exporting Data

14.1 COPY

Import

Export

DELIMITERS

BINARY

Frontend COPY

113

891189128913891489158916891789188919892089218922892389248925892689278928892989308931893289338934893589368937893889398940894189428943894489458946894789488949895089518952895389548955895689578958895989608961896289638964896589668967896889698970897189728973897489758976

114 CHAPTER 14. IMPORTING AND EXPORTING DATA

897789788979898089818982898389848985898689878988898989908991899289938994899589968997899889999000900190029003900490059006900790089009901090119012901390149015901690179018901990209021902290239024902590269027902890299030903190329033903490359036903790389039904090419042

Chapter 15

Extending POSTGRESQL

15.1 User-Defined Functions

Purpose

Creation

Examples

15.2 User-Defined Operators

Arithmetic Processing

Creation

15.3 User-Defined Types

Purpose

Creation

Indexing

115

904390449045904690479048904990509051905290539054905590569057905890599060906190629063906490659066906790689069907090719072907390749075907690779078907990809081908290839084908590869087908890899090909190929093909490959096909790989099910091019102910391049105910691079108

116 CHAPTER 15. EXTENDING POSTGRESQL

910991109111911291139114911591169117911891199120912191229123912491259126912791289129913091319132913391349135913691379138913991409141914291439144914591469147914891499150915191529153915491559156915791589159916091619162916391649165916691679168916991709171917291739174

Chapter 16

Programming Interfaces

There has been a lot of interest in this section. I plan to cover the strengths of each interface, and show anexample of each. I do not plan to teach how to program in each interface nor provide a list of function callsfor each, though this is possible depending on the publisher.

Add diagram here.

Interface Language Processing Advantageslibpq C compiled speed, portabilitylibpq++ C++ compiled speed, object-orientedecpg C compiled SQL-embedded CODBC C compiled non-UNIX portabilityJDBC Java interpreted portabilitylibpgtcl Tcl/Tk interpreted simplicity, windowingperl perl interpreted text processingpython python interpreted object oriented

Table 16.1: Interface summary

117

917591769177917891799180918191829183918491859186918791889189919091919192919391949195919691979198919992009201920292039204920592069207920892099210921192129213921492159216921792189219922092219222922392249225922692279228922992309231923292339234923592369237923892399240

118 CHAPTER 16. PROGRAMMING INTERFACES

16.1 Need for a Programming Lanuage

16.2 C Language API (LIBPQ)

16.3 Embedded C (ECPG)

16.4 C++ (LIBPQ++)

16.5 JAVA (JDBC)

16.6 ODBC

16.7 PERL (PGSQL_PERL5)

16.8 TCL/TK (LIBPGTCL)

16.9 PYTHON (PYGRESQL)

16.10 Web Access

PHP

CGI

16.11 Server-side Programming

PLPGSQL

PL/TCL

SPI

924192429243924492459246924792489249925092519252925392549255925692579258925992609261926292639264926592669267926892699270927192729273927492759276927792789279928092819282928392849285928692879288928992909291929292939294929592969297929892999300930193029303930493059306

Chapter 17

User Applications

17.1 PSQL

The following sections describe advanced psql features. See chapter 2 for an introduction to psql.

Controlling Query Buffer

You can easily control the psql query buffer. Table 17.1 summarizes the commands. See the psql manualpage for a full description of each item. There is one item of particular interest, edit (\e). This allows youto use your default editor to edit the query buffer. Type \e, and the current contents of the query buffer areloaded into your editor. Exit the editor, and the editor contents are reloaded into your query buffer, ready forexecution.1

Additional PSQL Commands

You can display a variety of information about the current database from psql’s listing commands. Table 17.2summarizes them. See the psql manual page for a full description of each item. You can control the waypsql displays output. Table 17.3 summarizes the output format commands. See the psql manual page for afull description of each item. You can access special psql capabilities. Table 17.4 summarizes them. See thepsql manual page for a full description of each item.

PSQL command-line arguments

You can change the behavior of psql in other ways. You normally start psql from the command line as psqlfollowed by the database name. However, you can add extra arguments between psql and the database nameto modify psql behavior. For example, psql -f file test will read commands from file, rather than fromyour keyboard. There is a summary of the options in table 17.5. Consult the psql manual page for moredetailed information.

1Here is how to set your default editor. At the operating system prompt, try typing:

EDITOR=viexport EDITOR

This sets the default editor to vi. If that doesn’t work, try:

setenv EDITOR vi

Do this at the operating system command prompt, not inside psql.

119

930793089309931093119312931393149315931693179318931993209321932293239324932593269327932893299330933193329333933493359336933793389339934093419342934393449345934693479348934993509351935293539354935593569357935893599360936193629363936493659366936793689369937093719372

120 CHAPTER 17. USER APPLICATIONS

Function psql CommandPrint \pExecute \g or ;Quit \qEdit \eEdit and execute \EBackslash help \?SQL help topics \hSQL help \h topicInclude file \iOutput to file \o fileOutput to command \o |commandShow query history \sSave query history \s fileWrite buffer to file \w fileRun subshell \!

Table 17.1: psql query buffer commands

Listing Type psql CommandAll tables \dtA single table \d tablenameA single index \d indexnameIndexes \diTables and indexes \dFunctions \dfOperators \doAggregates \daSequences \dsSystem tables \dSComments \dd objectDatabases \lGrant permissions \z

Table 17.2: psql listing commands

Modifies psql CommandField alignment \aField separator \f separatorExpanded output \xNo table headings or row count \tCompatibility display \mEnable HTML \HHTML3 options \T htmlHTML caption \C caption

Table 17.3: psql output commands

937393749375937693779378937993809381938293839384938593869387938893899390939193929393939493959396939793989399940094019402940394049405940694079408940994109411941294139414941594169417941894199420942194229423942494259426942794289429943094319432943394349435943694379438

17.1. PSQL 121

Capability psql CommandConnect to another database \connect dbnameCopy datafile into database \copy tablename to|from filename

Table 17.4: psql special capabilities

Option Capability psql ArgumentField justification -AField separator -F separatorNo column titles and row counts -tExtended output format -xEcho query -e

Controlling Output Echo \d query -EQuiet mode -qHTML output -HHTML table options -T optionsList databases -lDisable readline -nExecute query -c queryGet queries from file -f file

Automation Output to file -o fileSingle-step mode -sSingle-step with newline termination -SHostname -h hostname

Connection Port -p portPassword -u

Table 17.5: psql command-line arguments

943994409441944294439444944594469447944894499450945194529453945494559456945794589459946094619462946394649465946694679468946994709471947294739474947594769477947894799480948194829483948494859486948794889489949094919492949394949495949694979498949995009501950295039504

122 CHAPTER 17. USER APPLICATIONS

17.2 PGACCESS

950595069507950895099510951195129513951495159516951795189519952095219522952395249525952695279528952995309531953295339534953595369537953895399540954195429543954495459546954795489549955095519552955395549555955695579558955995609561956295639564956595669567956895699570

Chapter 18

POSTGRESQL Administration

18.1 Creating Users and Databases

18.2 Backup and Restore

18.3 Performance

18.4 Troubleshooting

18.5 Customization

18.6 System Tables

18.7 Access Configuration

Server Access – flags, pg_hba.conf

Database Access create user

Table Access – GRANT

18.8 Internationalization

National Character Encodings

Date Formats

123

957195729573957495759576957795789579958095819582958395849585958695879588958995909591959295939594959595969597959895999600960196029603960496059606960796089609961096119612961396149615961696179618961996209621962296239624962596269627962896299630963196329633963496359636

124 CHAPTER 18. POSTGRESQL ADMINISTRATION

963796389639964096419642964396449645964696479648964996509651965296539654965596569657965896599660966196629663966496659666966796689669967096719672967396749675967696779678967996809681968296839684968596869687968896899690969196929693969496959696969796989699970097019702

Appendix A

Additional Resources

A.1 Frequently Asked Questions (FAQ’S)

A.2 Mailing List Support

A.3 Supplied Documentation

A.4 Commercial Support

A.5 Modifying the Source Code

125

970397049705970697079708970997109711971297139714971597169717971897199720972197229723972497259726972797289729973097319732973397349735973697379738973997409741974297439744974597469747974897499750975197529753975497559756975797589759976097619762976397649765976697679768

126 APPENDIX A. ADDITIONAL RESOURCES

976997709771977297739774977597769777977897799780978197829783978497859786978797889789979097919792979397949795979697979798979998009801980298039804980598069807980898099810981198129813981498159816981798189819982098219822982398249825982698279828982998309831983298339834

Appendix B

Installation

B.1 Getting POSTGRESQL

B.2 Compiling

B.3 Initialization

B.4 Starting the Server

B.5 Creating a Database

127

983598369837983898399840984198429843984498459846984798489849985098519852985398549855985698579858985998609861986298639864986598669867986898699870987198729873987498759876987798789879988098819882988398849885988698879888988998909891989298939894989598969897989898999900

128 APPENDIX B. INSTALLATION

990199029903990499059906990799089909991099119912991399149915991699179918991999209921992299239924992599269927992899299930993199329933993499359936993799389939994099419942994399449945994699479948994999509951995299539954995599569957995899599960996199629963996499659966

Appendix C

PostgreSQL Non-Standard Features byChapter

129

996799689969997099719972997399749975997699779978997999809981998299839984998599869987998899899990999199929993999499959996999799989999100001000110002100031000410005100061000710008100091001010011100121001310014100151001610017100181001910020100211002210023100241002510026100271002810029100301003110032

130 APPENDIX C. POSTGRESQL NON-STANDARD FEATURES BY CHAPTER

100331003410035100361003710038100391004010041100421004310044100451004610047100481004910050100511005210053100541005510056100571005810059100601006110062100631006410065100661006710068100691007010071100721007310074100751007610077100781007910080100811008210083100841008510086100871008810089100901009110092100931009410095100961009710098

Appendix D

Reference Manual

The following is a copy of the reference manual pages (man pages) as they appeared in a pre-release versionof POSTGRESQL 7.0.

The manual pages go here. They are in SGML/Docbook format. Approxi-mately 100 pages.

131

100991010010101101021010310104101051010610107101081010910110101111011210113101141011510116101171011810119101201012110122101231012410125101261012710128101291013010131101321013310134101351013610137101381013910140101411014210143101441014510146101471014810149101501015110152101531015410155101561015710158101591016010161101621016310164

132 APPENDIX D. REFERENCE MANUAL

101651016610167101681016910170101711017210173101741017510176101771017810179101801018110182101831018410185101861018710188101891019010191101921019310194101951019610197101981019910200102011020210203102041020510206102071020810209102101021110212102131021410215102161021710218102191022010221102221022310224102251022610227102281022910230

Bibliography

[Bowman] Bowman et al., The Practical SQL Handbook, Addison–Wesley

[Date, Standard] Date, C.J. A Guide to The SQL Standard, Addison–Wesley

[Date, Introduction] Date. C.J. An Introduction to Database Systems, Addison–Wesley

[Celko] Celko, Joe SQL For Smarties, Morgan, Kaufmann

[Hilton] Hilton, Craig and Jeff Willis, Building Database Applications on the Web Using PHP3,Addison–Wesley

[User’s Guide] POSTGRESQL User’s Guide, http://www.postgresql.org/docs/user

[Tutorial] POSTGRESQL Tutorial, http://www.postgresql.org/docs/tutorial

[Administrator’s Guide] POSTGRESQL Administrators Guide, http://www.postgresql.org/docs/admin

[Programmer’s Guide] POSTGRESQL Programmer’s Guide, http://www.postgresql.org/docs/programmer

[Appendices] POSTGRESQL Appendices, http://www.postgresql.org/docs/postgres/part-appendix.htm

133

http://www.postgresql.org/docs/user

http://www.postgresql.org/docs/tutorial

http://www.postgresql.org/docs/admin

http://www.postgresql.org/docs/programmer

http://www.postgresql.org/docs/postgres/part-appendix.htm

Documents

PostgreSQL: Introduction and Conceptsdb.ucsd.edu/static/cse132b-sp01/Postgress.pdf · PostgreSQL: Introduction and Concepts Bruce Momjian April 3, 2000. ... Appendix D contains a