How to Use the SQL Union Join

SQL Essentials For Dummies

SQL joins are extremely useful. Unlike the other kinds of SQL join, the union join makes no attempt to match a row from the left source table with any rows in the right source table. It creates a new virtual table that contains the SQL union of all the columns in both source tables. In the virtual result table, the columns that came from the left source table contain all the rows that were in the left source table. For those rows, the columns that came from the right source table all have the null value.

Similarly, the columns that came from the right source table contain all the rows that were in the right source table. For those rows, the columns that came from the left source table all have the null value. Thus, the table resulting from a union join contains all the columns of both source tables — and the number of rows it contains is the sum of the number of rows in the two source tables.

The result of a SQL union join by itself is not immediately useful in most cases; it produces a result table with many nulls in it. But you can get useful information from a union join when you use it in conjunction with the COALESCE expression. Look at an example.

Suppose that you work for a company that designs and builds experimental rockets. You have several projects in the works. You also have several design engineers who have skills in multiple areas. As a manager, you want to know which employees, having which skills, have worked on which projects. Currently, this data is scattered among the EMPLOYEE table, the PROJECTS table, and the SKILLS table.

The EMPLOYEE table carries data about employees, and EMPLOYEE.EmpID is its primary key. The PROJECTS table has a row for each project that an employee has worked on. PROJECTS.EmpID is a foreign key that references the EMPLOYEE table. The SKILLS table shows the expertise of each employee. SKILLS.EmpID is a foreign key that references the EMPLOYEE table.

The EMPLOYEE table has one row for each employee; the PROJECTS table and the SKILLS table have zero or more rows.

The following tables show example data.

EMPLOYEE Table
EmpID	Name
1	Ferguson
2	Frost
3	Toyon

PROJECTS Table
ProjectName	EmpID
X-63 Structure	1
X-64 Structure	1
X-63 Guidance	2
X-64 Guidance	2
X-63 Telemetry	3
X-64 Telemetry	3

SKILLS Table
Skill	EmpID
Mechanical Design	1
Aerodynamic Loading	1
Analog Design	2
Gyroscope Design	2
Digital Design	3
R/F Design	3

From the tables, you can see that Ferguson has worked on X-63 and X-64 structure design and has expertise in mechanical design and aerodynamic loading.

Now suppose that, as a manager, you want to see all the information about all the employees. You decide to apply an equi-join to the EMPLOYEE, PROJECTS, and SKILLS tables:

SELECT *
  FROM EMPLOYEE E, PROJECTS P, SKILLS S
  WHERE E.EmpID = P.EmpID
   AND E.EmpID = S.EmpID ;

You can express this same operation as an inner join by using the following syntax:

Both formulations give the same result.

Result of SQL Inner Join
E.EmpID	Name	P.EmpID	ProjectName	S.EmpID	Skill
1	Ferguson	1	X-63 Structure	1	Mechanical Design
1	Ferguson	1	X-63 Structure	1	Aerodynamic Loading
1	Ferguson	1	X-64 Structure	1	Mechanical Design
1	Ferguson	1	X-64 Structure	1	Aerodynamic Loading
2	Frost	2	X-63 Guidance	2	Analog Design
2	Frost	2	X-63 Guidance	2	Gyroscope Design
2	Frost	2	X-64 Guidance	2	Analog Design
2	Frost	2	X-64 Guidance	2	Gyroscope Design
3	Toyon	3	X-63 Telemetry	3	Digital Design
3	Toyon	3	X-63 Telemetry	3	R/F Design
3	Toyon	3	X-64 Telemetry	3	Digital Design
3	Toyon	3	X-64 Telemetry	3	R/F Design

This data arrangement is not particularly enlightening. The employee ID numbers appear three times, and the projects and skills are duplicated for each employee. Bottom line: The SQL inner joins are not well suited to answering this type of question. You can put the SQL union join to work here, along with some strategically chosen SELECT statements, to produce a more suitable result. You begin with the basic SQL union join:

Notice that the union join has no ON clause. It doesn't filter the data, so an ON clause isn’t needed. This statement produces the result shown in the following table.

Result of SQL Union Join
E.EmpID	Name	P.EmpID	ProjectName	S.EmpID	Skill
1	Ferguson	NULL	NULL	NULL	NULL
NULL	NULL	1	X-63 Structure	NULL	NULL
NULL	NULL	1	X-64 Structure	NULL	NULL
NULL	NULL	NULL	NULL	1	Mechanical Design
NULL	NULL	NULL	NULL	1	Aerodynamic Loading
2	Frost	NULL	NULL	NULL	NULL
NULL	NULL	2	X-63 Guidance	NULL	NULL
NULL	NULL	2	X-64 Guidance	NULL	NULL
NULL	NULL	NULL	NULL	2	Analog Design
NULL	NULL	NULL	NULL	2	Gyroscope Design
3	Toyon	NULL	NULL	NULL	NULL
NULL	NULL	3	X-63 Telemetry	NULL	NULL
NULL	NULL	3	X-64 Telemetry	NULL	NULL
NULL	NULL	NULL	NULL	3	Digital Design
NULL	NULL	NULL	NULL	3	R/F Design

Each table has been extended to the right or left with nulls, and those null-extended rows have been union joined. The order of the rows is arbitrary and depends on the implementation. Now you can massage the data to put it in a more useful form.

Notice that the table has three ID columns, two of which are null in any row. You can improve the display by coalescing the ID columns. The COALESCE expression takes on the value of the first non-null value in a list of values. In the present case, it takes on the value of the only non-null value in a column list:

SELECT COALESCE (E.EmpID, P.EmpID, S.EmpID) AS ID,
   E.Name, P.ProjectName, S.Skill
  FROM EMPLOYEE E UNION JOIN PROJECTS P
   UNION JOIN SKILLS S
  ORDER BY ID ;

The FROM clause is the same as in the previous example, but now the three EMP_ID columns are coalesced into a single column named ID. You're also ordering the result by ID. The following table shows the result.

Result of SQL Union Join with COALESCE Expression
ID	Name	ProjectName	Skill
1	Ferguson	X-63 Structure	NULL
1	Ferguson	X-64 Structure	NULL
1	Ferguson	NULL	Mechanical Design
1	Ferguson	NULL	Aerodynamic Loading
2	Frost	X-63 Guidance	NULL
2	Frost	X-64 Guidance	NULL
2	Frost	NULL	Analog Design
2	Frost	NULL	Gyroscope Design
3	Toyon	X-63 Telemetry	NULL
3	Toyon	X-64 Telemetry	NULL
3	Toyon	NULL	Digital Design
3	Toyon	NULL	R/F Design

Each row in this result has data about a project or a skill, but not both. When you read the result, you first must determine what type of information is in each row (project or skill). If the ProjectName column has a non-null value, the row names a project on which the employee has worked. If the Skill column is not null, the row names one of the employee's skills.

You can make the result a little clearer by adding another COALESCE to the SELECT statement, as follows:

SELECT COALESCE (E.EmpID, P.EmpID, S.EmpID) AS ID,
   E.Name, COALESCE (P.Type, S.Type) AS Type,
   P.ProjectName, S.Skill
  FROM EMPLOYEE E
   UNION JOIN (SELECT "Project" AS Type, P.*
           FROM PROJECTS) P
   UNION JOIN (SELECT "Skill" AS Type, S.*
           FROM SKILLS) S
  ORDER BY ID, Type ;

In this union join, the PROJECTS table in the previous example is replaced with a nested SELECT that appends a column named P.Type with a constant value "Project" to the columns coming from the PROJECTS table. Similarly, the SKILLS table is replaced with a nested SELECT that appends a column named S.Type with a constant value "Skill" to the columns coming from the SKILLS table. In each row, P.Type is either null or "Project", and S.Type is either null or "Skill".

The outer SELECT list specifies a COALESCE of those two Type columns into a single column named Type. You then specify Type in the ORDER BY clause, which sorts the rows that all have the same ID in an order that puts all projects first, followed by all skills. The result is shown in the following table.

Refined Result of SQL Union Join with COALESCE Expressions
ID	Name	Type	ProjectName	Skill
1	Ferguson	Project	X-63 Structure	NULL
1	Ferguson	Project	X-64 Structure	NULL
1	Ferguson	Skill	NULL	Mechanical Design
1	Ferguson	Skill	NULL	Aerodynamic Loading
2	Frost	Project	X-63 Guidance	NULL
2	Frost	Project	X-64 Guidance	NULL
2	Frost	Skill	NULL	Analog Design
2	Frost	Skill	NULL	Gyroscope Design
3	Toyon	Project	X-63 Telemetry	NULL
3	Toyon	Project	X-64 Telemetry	NULL
3	Toyon	Skill	NULL	Digital Design
3	Toyon	Skill	NULL	R/F Design

The result table now presents a very readable account of the project experience and skill sets of all employees in the EMPLOYEE table.

Considering the number of SQL JOIN operations available, relating data from different tables shouldn't be a problem, regardless of the tables’ structure. You can trust that if the raw data exists in your database, SQL has the means to get it out and display it in a meaningful form.

About This Article

About the book author:

Allen G. Taylor is a 30-year veteran of the computer industry and the author of over 40 books, including SQL For Dummies and Crystal Reports For Dummies. He lectures nationally on databases, innovation, and entrepreneurship. He also teaches database development internationally through a leading online education provider.