MORE INFORMATION
What Is a Join?
A join is a connection between two tables where the two tables
are merged according to a field that they have in common, creating a new
virtual table (which can be saved out as a real table). For example, with the
following two tables:
Color_Table:
Join_Field Color_Field
1 Red
2 Blue
3 Green
Pattern_Table:
Join_Field Pattern_Field
2 Striped
3 Checkered
4 Polka-Dot
a simple join resembles the following:
Join_Field Color_Field Pattern_Field
2 Blue Striped
3 Green Checkered
The result table contains only records 2 and 3 in the
Join_Field field because they are the only records that exist in both the
Color_Table and the Pattern_Table.
A practical example of a join is
a retailer's list of products and manufacturers; a Products table and a Vendors
table could be joined on a Product ID field.
The Inner Join
The join in the previous example, called an inner join, is the
simplest type of join. Usually, you want to use only a fraction of the fields
in the tables. For example, in a real inner join, such as the one above, you
may want to exclude Join_Field, so that the join looks like this:
Color_Field Pattern_Field
Blue Striped
Green Checkered
In Microsoft Query, inner joins are the default join
type (for more information, see page 105 in the "Microsoft Query User's Guide,"
version 1.0).
The SQL Statement
A Structured Query Language (SQL) SELECT statement is a type of
macro that you can use when you create a join. Note that SQL is very different
from Microsoft Excel's other macro languages (Visual Basic for Applications,
and the Excel 4.0 macro language).
It is NOT necessary to understand
SQL in order to easily create joins in Microsoft Query.
Every join
has a SELECT statement associated with it. You can view the SELECT Statement
for any join in Microsoft Query by clicking "SQL" on the toolbar. As with
Microsoft Excel's macro recorder, you can use Query to record the SELECT
statement. Unlike Microsoft Excel's macro recorder, the SELECT statement
recorder is always on and cannot be turned off. Here's how the SELECT statement
might look in Microsoft Query for the above inner join:
SELECT Color_Table.Color_Field, Pattern_Table.Pattern_Field
FROM c:\database.mdb Color_Table, c:\database.mdb
Pattern_Table
WHERE Color_Table.Join_Field = Pattern_Table.Join_Field
Note that .mdb is the file name extension for a
Microsoft Access database, which can have multiple tables in one file. In some
other databases, such as dBASE, Paradox, and FoxPro, each table must have its
own file. In these cases, the SQL syntax may appear redundant because the table
name is always the same as the file name without the extension.
SQL
syntax varies among querying engines; for example, in Microsoft Access the
query from the above example resembles the following:
SELECT Color_Table.[Color_Field],
Pattern_Table.Pattern_Field
FROM Pattern_Table INNER JOIN Color_Table ON
Pattern_Table.[Join_Field] = Color_Table.[Join_Field];
The path to the table is not used in Microsoft Access
because the table is in a Microsoft Access .mdb file. Even if an external table
is attached and used in the query, the Microsoft Access SQL statement does not
display the path to the external table.
The Outer Join
Another kind of join is called an outer join. With an outer join,
you get all the records from one table and only those records from the other
table that have matching values from the first table. This may leave some of
the field entries blank, or "Null." For any two tables to be joined, there are
two possible outer joins, the "left outer join" and the "right outer join," (so
named because you usually view the tables side-by-side). Using the previous two
tables in an example, the following is one of the two possible outer joins:
Join_Field Color_Field Pattern_Field
1 Red (NULL)
2 Blue Striped
3 Green Checkered
The other possible join is as follows:
Join_Field Color_Field Pattern_Field
2 Blue Striped
3 Green Checkered
4 (NULL) Polka-Dot
NOTE: In an join, you do not see the word "NULL" when
you view the data; use the keyword "NULL" when you are working with joins.
In Microsoft Query, both kinds of outer join can be created easily
using the mouse (for more information about this procedure, see page 112 of the
"Microsoft Query User's Guide," version 1.0).
The following is how
the SQL statement might look for the second example of an outer join:
SELECT Color_Table.Color_Field, Pattern_Table.Pattern_Field
FROM {oj c:\database.mdb Color_Table LEFT OUTER JOIN
c:\database.mdb Pattern_Table ON Color_Table.Join_Field =
Pattern_Table.Join_Field}
To create a practical example of an outer join, make a
list of a company's products with sales figures for the products that had been
sold, but not excluding products that had not been sold. To do this, use a
Product ID field to join a Products table and a Sales table.
The Subtract Join
A third kind of join is the subtract join. A subtract join is the
opposite of an outer join; it includes ONLY those records in one table that
don't match any record in the other table. Like outer joins, there are two
possible subtract joins for any two tables that you want to join; however, they
are not usually referred to as "Left subtract join" or "Right subtract join." A
subtract join normally return fields from only one of the tables, because by
definition the other table's fields return only NULL values. The following is
one possible subtract join:
join_Field Color_Field
1 Red
and here is the other:
Join_Field Pattern_Field
4 Polka-Dot
In Microsoft Query, a subtract join is created by first
creating an Outer join, and then using the criteria "IS NULL" on an appropriate
field (Pattern_Field in the first example above; Color_Field in the second
example) to exclude records that match between the tables.
The
following is how the SQL statement might look for the first subtract join
above:
SELECT Color_Table.Join_Field, Color_Table.Color_Field
FROM {oj c:\database.mdb Color_Table LEFT OUTER JOIN
c:\database.mdb Pattern_Table ON Color_Table.Join_Field =
Pattern_Table.Join_Field}
WHERE (Pattern_Table.Pattern_Field Is Null)
To create a practical example of a subtract join, a
list of customers that had not ordered recently. To do this, use an Order ID
field to join a Customers table and an Orders table.
The Full Outer Join
A fourth kind of join is the full outer join. A full outer join
is a combination of an outer join with its complimentary subtract join. A full
outer join includes all records from both tables and merges those records that
are common between the two tables. The following is a full outer join:
Join_Field Color_Field Pattern_Field
1 Red (NULL)
2 Blue Striped
3 Green Checkered
4 (NULL) Polka-Dot
In Microsoft Query, a full outer join is created
by inserting the UNION operator on its own line between the outer join's SELECT
statement and the subtract join's SELECT statement (see above).
To
create a full outer join in Microsoft Query, use the following steps:
- Create an outer join, then click New on the File menu and
create a subtract join.
- Copy the subtract join's SQL.
- Switch to the outer join, type the word UNION on its own
line below the outer join's SQL statement, paste the subtract join's SQL below
the word UNION, and close the subtract join's window.
Here is how the SQL might look for the above full outer join
(commands below the UNION operator is pasted):
SELECT Color_Table.Join_Field, Color_Table.Color_Field,
Pattern_Table.Pattern_Field
FROM {oj c:\database.mdb Pattern_Table LEFT OUTER JOIN
c:\database.mdb Color_Table ON Color_Table.Join_Field =
Pattern_Table.Join_Field}
UNION
SELECT Color_Table.Join_Field, Color_Table.Color_Field,
Pattern_Table.Pattern_Field
FROM {oj c:\database.mdb Pattern_Table LEFT OUTER JOIN
c:\database.mdb Color_Table ON Color_Table.Join_Field =
Pattern_Table.Join_Field}
WHERE (Color_Table.Color_Field Is Null)
To create a practical example of a full outer join,
merge overlapping customer lists used by different departments, including fax
numbers (which were only on the first list), and Internet e-mail names (which
are only in the second list). Each department could continue to use its partial
list while having the complete, joined list available. They could be joined on
a Customer ID field.
The Self-Join
A fifth kind of join is the self-join. A self-join is a
connections where a field in a table is matched with a different field in a
copy of the same table. Using this example table:
Table_Three
Employee_ID Employee_Name Reports_To
1 Bob 3
2 Sue (NULL)
3 Jim 2
4 Jane 3
and a copy of it, as follows:
Table_Three_01
Employee_ID Employee_Name Reports_To
1 Bob 3
2 Sue (NULL)
3 Jim 2
4 Jane 3
A self-join could be used to create a list of employee
names with their supervisor's names. The Employee_ID in Table_Three would be
joined with the Reports_To in Table_Three_01. The following is how it might
look at first:
Employee_Name Employee_Name
Bob Jim
Sue (NULL)
Jim Sue
Jane Jim
However, because it is confusing to have the same field
name for both fields, change one of the field names, as follows:
Employee_Name Supervisor
Bob Jim
Sue (NULL)
Jim Sue
Jane Jim
The following is how the SQL might look for the above
self-join:
SELECT table_three.Employee_Name,
table_three_01.Employee_Name 'Supervisor'
FROM c:\database.mdb table_three, c:\database.mdb
table_three_01
WHERE table_three.Employee_ID = table_three_01.Reports_To
When you return data to Microsoft Excel, it is of no
use to rename the field in Microsoft Query. This is true because Microsoft
Excel uses the original field name. For more information about this issue,
please see the following article in the Microsoft Knowledge Base:
121551 : XL5: Field Instead of
Column Name in MSQUERY Returned to Excel
A Microsoft Excel macro
must change the column name every time the returned data is refreshed (unless
you return the data in a PivotTable, in which case the Pivot itself can create
and keep a custom field name).
The Equi-Join and Natural Join
Almost all joins, including all examples given so far, are
equi-joins and natural joins. The meanings of these terms are of little
significance to the average user of Microsoft Query, but the next two
paragraphs attempt to explain the terms for those who may be curious.
The equi-join is a join in which records are retrieved based on
whether the join fields have matching values in both tables. That may seem like
just a definition of a join, but it isn't. An example of a non-equal join is a
join where records in the first table are joined to those records in the second
table where the joined field in the first table is greater than (instead of
equal to) the joined field in the second table (or less than, or anything
besides equal to). Naturally, this returns more records than an equi-join.
A natural join is one in which only one of the two tables' joined
fields is returned. Since these two fields are by definition identical in an
equi-join, it is redundant to include both. For a non-equal join, it is
important to include both of those fields. So, equi-joins and natural joins go
together. You would want an equi- joins (which describes most joins) to be
natural join by returning only one of the joined fields; but, if you ever use a
non-equal join, you might also want to make it a non-natural join by returning
both of the joined fields.
There are other kinds of joins. The full
spectrum of joins was most recently defined in 1992 and this standard is known
as SQL-92. Some joins are not important to Microsoft Excel users because these
joins do things that are easier to do in Microsoft Excel.
The Cartesian Product
Trying to return data from two or more tables without any joins
creates what is called a "Cartesian product." A Cartesian product is defined as
all possible combinations of rows in all tables. Be sure you have joins before
trying to return data, because a Cartesian product on tables with many records
and/or on many tables could take several hours to complete. The following is a
Cartesian product as used on the two example tables; note that this table is
only 3 records times 3 records, which yields a total of 9 records. However,
imagine if instead, the table was 100 records times 1,000 records times 10,000
records; then the table would contain 1,000,000,000 records!
Join_Field Join_Field Color_Field Pattern_Field
1 2 Red Striped
1 3 Red Checkered
1 4 Red Polka-Dot
2 2 Blue Striped
2 3 Blue Checkered
2 4 Blue Polka-Dot
3 2 Green Striped
3 3 Green Checkered
3 4 Green Polka-Dot
Occasionally, some users want to use a Cartesian
product; however, most users who get them do so accidentally, and are often
confused by them. Because most users exclude most of the fields in a join, a
real Cartesian product can easily look as perplexing as this:
Color_Field
Red
Red
Red
Blue
Blue
Blue
Green
Green
Green
If 100 records are added to Pattern_Table, this query
would have 309 records (103 records each of Red, Blue, and Green).
Cartesian products have relatively simple SELECT statements. The following is
how the SQL might look for the above Cartesian product:
SELECT Color_Table.Color_Field, Pattern_Table.Pattern_Field
FROM c:\database.mdb Color_Table, c:\database.mdb
Pattern_Table
A practical example of a Cartesian product would be to
create a list of all possible combinations of options on a merchandise product,
with price totals for each combination.