Upload
andrey-karpov
View
173
Download
0
Embed Size (px)
Citation preview
3/3/2016 Vertica Joins: A Refresher HPE Developer Community
https://community.dev.hpe.com/t5/VerticaBlog/VerticaJoinsARefresher/bap/234901 1/7
Sign up Log In
HPE Dev Community > Big Data & Analytics > Big Data Blogs > Vertica Blog > Vertica Joins: A Refresher
Search our blogs
Options
0 LikesPost a Comment
Vertica Joins: A RefresherBy VickiL a week ago
As a Vertica user, you know that using joins can improve query performance by combining records from one or more
tables. But sometimes, you need to develop complex joins. Vertica supports many different kinds of joins that perform
different functions based on your needs.
In this blog, we’ll give you a refresher on join algorithms and predicates, as well as the following Vertica join types:
In Vertica, we refer to the tables participating in the join as left or right. The left table is specified first in the join
statement, and the right table is the table mentioned second.
Examples used in this blogFor this blog, we’ll use the following scenario:
Your organization wants employees to increase or refresh their computer programming skills. To accomplish this, you
put together a list of courses from a local university that your employees can take. To help keep track of employees and
completed classes, you reference two tables: employees and courses, described in more detail below:
employees: A table with detailed employee information for all of your organization’s 87 employees. For room, the
following outputs only show a portion of the tables:
=> SELECT * FROM employees ORDER BY personal_id ASC;
personal_id | employee_name | employee_age | employee_gender
+++
1 | John | 43 | M
2 | Dan | 52 | M
3 | Lori | 38 | F
4 | Tom | 55 | M
5 | Mary | 39 | F
6 | Virginia | 66 | F
7 | Gary | 63 | M
8 | Rebecca | 19 | F
9 | Steven | 32 | M
10 | Jessica | 25 | F
…
courses: A table the university uses to record each time someone takes a course. This includes people from outside
your organization.
=> SELECT * FROM courses ORDER BY record_id ASC;
record_id | personal_id | course_id | course_name | date_taken
++++
1 | 10 | 1 | Intro to Comp Sci | 20150105
2 | 7 | 4 | Java 303 | 20150405
3 | 53 | 3 | Database Architecture | 20150803
4 | 4 | 2 | SQL 101 | 20160113
5 | 6 | 4 | Java 303 | 20140910
6 | 6 | 1 | Intro to Comp Sci | 20150329
7 | 2 | 2 | SQL 101 | 20141030
My Community
About
Other HPE Communities
Latest Articles
Social Media
Labels
Top tags
HP Vertica Hadoop 7.2
Vertica Place ORC
What's new
Amazon Web Services
Looking Under the Hood atVertica Queries
On the Trail of a Red-TailedHawk!
Vertica Joins: A Refresher
Improving COUNT DISTINCTPerformance with LiveAggregate Projec...
Vertica Big Data Meetup
Jump Start your BI DashboardDevelopment
Cloudera 5.5 Certified for usewith HPE Vertica 7.2.x
Want to Know What Twitterthinks? Use Vertica Pulse forSentime...
Unlimited Access to Vertica forSQL on Hadoop for Free!
Cambridge Big Data AnalyticsMeet Up
7.2 (6)
access policies (1)
Amazon (1)
Amazon Web Services (3)
analytics (4)
Apache (2)
APi (1)
AWS (3)
Backup (1)
beta (1)
BI (1)
big data (22)
blog (1)
Boston (1)
business intelligence (1)
Solutions Services Products About Us SupportHPE Developer Community
Inner joinsLeft, right, and full outer joinsNatural joinsCross joins
3/3/2016 Vertica Joins: A Refresher HPE Developer Community
https://community.dev.hpe.com/t5/VerticaBlog/VerticaJoinsARefresher/bap/234901 2/7
8 | 27 | 2 | SQL 101 | 20130704
9 | 8 | 4 | Java 303 | 20160105
10 | 3 | 3 | Database Architecture | 20150713
…
You can gain valuable insight into the relationship between the employees and courses by performing a variety of joins.
All the examples in this blog will use the employees and courses tables.
How does Vertica perform a join?Vertica uses two basic algorithms to perform a join operation:
The good news is, you don’t necessarily have to pay a whole lot of attention to the join algorithms because the Vertica
optimizer automatically chooses the most appropriate algorithm given the query and projections in a system. You can
facilitate a merge join by adding a projection that is sorted on the join keys.
Join predicatesIn the JOIN SQL statement, the join predicate specifies how Vertica should join the tables. You specify the join predicates
(for example, relational operators like <, <=, >, >=, <>, =, <=>) in the ON clause along with the columns from the left and
right tables that should be combined.
Vertica supports any arbitrary join expression with both matching and non-matching column values. If the clause uses an
equality predicate, indicated with an equal sign (=), the join is considered an equi-join. Consequently, non equi-joins use
predicates other than the equal sign, for example, the greater than sign (>).
For the examples in this blog, we’ll use the equality predicate (=).
Now that you have a basic understanding of what a join is for, let’s look at the different types of joins you can perform
with Vertica.
Inner joinsAn inner join combines records from two tables based on a join predicate, and returns rows from columns specified in the
SELECT statement that satisfy that predicate. Since we’re using the equality predicate, performing an inner join will
return values from columns in the SELECT statement where the columns specified in the JOIN statement match. If a row
from the left table’s joined column matches three rows from the right table’s specified column, the join will return three
rows (provided the columns are specified in the SELECT statement). When a value in the joined column appears in one
table but not the other, that row is not returned.
In the diagram below, the green shaded area represents values that are returned by an inner join. Keep in mind, duplicate
values may be returned.
Amazon Web Services
Analytics AWS
About the Author
Archives
View All
BD_Partner_Eng
Ben_Vandiver
Beth_Favini
BethF
Casey_S HP Big Data
Information Developer
Gary_G
HPBigData_BP
J_Kelley
KathyLynn
Nathan_W
nunziato
Sarah_L
Shubhangi_V
soniyas
VickiL
March 2016
February 2016
January 2016
December 2015
View Complete Archives
A hash join is used to join large data sets. In a hash join, Vertica uses the smaller (or inner) table to build an in-memory hash table on the join column. The Vertica execution engine then scans the larger (or outer) table andexamines the hash table to look for matches. The table size is determined by the number of rows times the size ofeach row. The optimizer chooses a hash join when projections are not sorted on the join columns. Although there areno sort requirements, the cost for performing a hash join can rise if the entire hash table can’t fit in memory.
If both inputs are pre-sorted on the join column, the optimizer chooses a merge join.
3/3/2016 Vertica Joins: A Refresher HPE Developer Community
https://community.dev.hpe.com/t5/VerticaBlog/VerticaJoinsARefresher/bap/234901 3/7
The query used to create an inner join looks like this:
=> SELECT <column list>
FROM <left joined table>
[INNER] JOIN <right joined table>
ON <join condition>
An inner join is the most commonly used type of join so the INNER keyword is optional.
Example
If you want to list your employees who have taken classes, along with information about these classes, you can join the
two tables on their common column, personal_id. This inner join will include rows from both tables where the
personal_id from the employees table matches a value in the courses table’s personal_id column.
=> SELECT e.personal_id AS e_personal_id, employee_name, c.personal_id AS
c_personal_id, c.course_name, c.date_taken FROM employees e INNER JOIN courses c ON
e.personal_id = c.personal_id ORDER BY e.personal_id ASC;
Some employees, like Virginia, show up multiple times because they have taken multiple classes (i.e., her personal id
shows up twice in the courses table). Other employees, like Mary, don’t show up at all because they haven’t taken any
classes yet (i.e., isn’t in the courses table).
But what if you want all the rows from one or both tables to be included in your result set? For that, you can use an outer
join.
Outer joinsOuter joins extend the functionality of inner joins by letting you preserve rows in one or both tables that do not have
matching rows in the other table. Create outer joins using the following syntax:
=> SELECT <column list>FROM <leftjoined table>[ LEFT | RIGHT | FULL ] OUTER JOIN
<rightjoined table>ON <join condition> Vertica gives you the option of performing left
outer joins, right outer joins, or full outer joins.
Left outer joinsA left outer join preserves a complete set of records from the left (preserved) table, along with any matched records in
the right (non-preserved) table. Where Vertica finds no match, it inserts a null value for the right table.
In the diagram below, the green area represents values that are returned by a left outer join:
Example
Using our previous example, let’s say you want to see a list of all your employees along with classes they’ve taken, if
applicable. The result of a left outer join will be a record of every employee and a blank entry for rows from the courses
table if that employee hasn’t taken a course.
=> SELECT e.personal_id AS e_personal_id, employee_name, c.personal_id AS
c_personal_id, c.course_name, c.date_taken FROM employees e LEFT OUTER JOIN courses c
ON e.personal_id = c.personal_id ORDER BY e.personal_id ASC;
3/3/2016 Vertica Joins: A Refresher HPE Developer Community
https://community.dev.hpe.com/t5/VerticaBlog/VerticaJoinsARefresher/bap/234901 4/7
Right outer joinsWhen performing a right outer join, Vertica returns a complete set of records from the right-joined (preserved) table, as
well as matched values from the left-joined (non-preserved) table. Where Vertica finds no match, it returns null values
for the left table.
In the diagram below, the green area represents values that are returned by a right outer join:
Example
You can use this example to list all courses taken, whether or not one of your employees participated.
=> SELECT e.personal_id AS e_personal_id, employee_name, c.personal_id AS
c_personal_id, c.course_name, c.date_taken FROM employees e RIGHT OUTER JOIN courses c
ON e.personal_id = c.personal_id ORDER BY e.personal_id ASC;
Full outer joinsYou can combine left outer joins and right outer joins with a full outer join. A full outer join returns all records specified in
the SELECT list, including nulls (missing matches), from either table in the join.
In the diagram below, the green area represents values that are returned by a full outer join:
Example
A full outer join is useful if you want to see, for example, each employee who has taken a particular class and each class
that has been taken by one of your employees, but you also want to see all the employees who have not taken any
courses, as well as any course that has not been attended by one of your employees:
=> SELECT e.personal_id AS e_personal_id, employee_name, c.personal_id AS
c_personal_id, c.course_name, c.date_taken FROM employees e FULL OUTER JOIN courses c
ON e.personal_id = c.personal_id ORDER BY e.personal_id ASC;
3/3/2016 Vertica Joins: A Refresher HPE Developer Community
https://community.dev.hpe.com/t5/VerticaBlog/VerticaJoinsARefresher/bap/234901 5/7
Natural JoinsA natural join is just a join with an implicit join predicate. An implicit join predicate doesn’t include any explicit join
conditions, such as e.personal_id = c.personal_id. Instead, implicit connections are formed by matching all
pairs of columns in the tables that have the same name and compatible data types. (If the data types are incompatible,
Vertica returns an error.) The result set contains only one column representing the pair of equally-named columns.
Natural joins take the following form:
=> SELECT <column list> FROM <leftjoined table>NATURAL [ INNER | LEFT OUTER | RIGHT
OUTER | FULL OUTER ] JOIN <rightjoined table>
Vertica performs a natural join by creating an inner join (default) on the column common to both tables. This is useful
both syntactically, and if you don’t know the names or commonality of columns.
Example
For example, this query naturally joins the two personal_id columns.
=> SELECT * FROM employees e NATURAL JOIN courses c;
Cross JoinsUse a cross join when you want to all possible combinations of matching one table with another.
Cross joins take the following form:
=> SELECT <column list>FROM <leftjoined table>CROSS JOIN <rightjoined table>
Example
For example, you may want to compare every employee with every course record to see who they’ve taken classes with
(identified by the personal_id number). In the result set, Vertica retrieves a record from the employees table and
creates a row for every record in the courses table. It then does the same for the rest of the records in the employee
table until each row in the employee table is displayed with each row of the courses table. The number of rows equals
the number of rows in the first table multiplied by the number of rows in the second table. In our example, the output is
100 rows (10 * 10). The example output below only shows a sample:
=> SELECT * FROM employees e CROSS JOIN courses ORDER BY e.personal_id ASC;
More InfoFor more information about joins, including restrictions and optimization, see the Vertica documentation.
Also read our blog on range joins!
3/3/2016 Vertica Joins: A Refresher HPE Developer Community
https://community.dev.hpe.com/t5/VerticaBlog/VerticaJoinsARefresher/bap/234901 6/7
Everyone's Tags: Joins refresher Vertica View All (3)
Tags: Joins| refresher| Vertica
Labels: Joins
Post a Comment Permalink
0 Likes
http://
Get a new challenge Get a sound challenge Help with word verification
*Name
Website (optional)
*Word verification by reCAPTCHA
Post Your CommentCancel
Leave a Comment
We encourage you to share your comments on this article. Comments are moderated and will bereviewed and posted as promptly as possible during regular business hours.
To ensure your comment is published, be sure to follow the community guidelines.
Font Family Font Sizes
† The opinions expressed above are the personal opinions of the authors, not of HPE. By using this site, you accept the Terms of Use and Rules of Participation
United States
QuoteRich Text HTML Preview
3/3/2016 Vertica Joins: A Refresher HPE Developer Community
https://community.dev.hpe.com/t5/VerticaBlog/VerticaJoinsARefresher/bap/234901 7/7
Corporate
Accessibility
Careers
Contact Us
Corporate Responsibility
Events
Hewlett Packard Labs
Investor Relations
Leadership
Newsroom
Sitemap
Partners
Find a Partner
Partner Programs
Social
YouTube
Communities
Developer Forums
Enterprise Business
Customer Resources
Enterprise Store
Public Sector Store
Education and Training
Email Signup
Legal
Privacy
Terms of Use
Cookies
© Copyright 2016 Hewlett Packard Enterprise Development LP
HPE employees: Report website issues