Joins in SQL

A short walk-through of the different types of joins in SQL

·

4 min read

Through my stint as a backend - primarily database developer, I have observed Joins by far to be one of the most used and one of the least understood features of SQL. While the use of joins is imperative in a traditional RDBMS where tables are normalized and the different details (i.e columns) pertaining to an entity are spread across multiple tables to prevent data redundancy, they have other hidden uses as well.

What are joins?

A Join is an operation used in an sql statement to fetch and combine the data present in multiple database tables using column values that are common to both the tables.

Emphasis on the word both above - though we can combine results from more than two tables in a single statement using joins, a single join operation is performed only between two tables at a time. Many such join operations are strung together to fetch the data from multiple tables. An SQL statement fetching data from n tables requires n-1 join clauses.

Join Types

Inner Join

Returns records for which the column value of the column being joined has matching values in both the tables. The following example will clarify the same:

Consider the example of students enrolling for different elective courses in a University degree. The first table - courses enlists the different elective courses along with the respective professor teaching the course. The second table - enrollment stores the details of the courses enrolled by a few students so far.

image.png image.png

The following is the inner join query followed by the result. Using the query, we map the professors for each course against the students they will be teaching. An additional order by clause has been added for better explaining the results.

SELECT a.taught_by, a.course_id, a.course_name, b.roll_no
FROM courses a
INNER JOIN enrollment b
ON a.course_id=b.course_id
ORDER BY a.taught_by;

RESULT:

image.png

As explained, the query returns only those courses which have seen at-least one enrollment i.e those course_ids for which rows exist in the enrolment table.

Flowchart-Joins.jpg

  1. Multiple rows are reported for some course_ids, like S1. This is because two students have enrolled for the course so far and both are returned.
  2. No rows have been reported for some courses, like S4 and S7. This is due to the inherent nature of an inner join operation. In case, these values need to be projected, we need to perform an Outer join operation as explained below.

Outer Join

An outer join returns all the rows from the one or both of the tables irrespective of whether the corresponding match values are present in the other table being joined. These are of 3 types

  1. Left Outer Join - All the rows from the left-hand table are returned and only those rows of the right hand table where the join condition is satisfied (or matched) are returned. The remaining records for the right-hand table are returned as NULL.
  2. Right Outer Join - All the rows from the right-hand table are returned and only those rows of the left hand table where the join condition is satisfied (or matched) are returned.
  3. Full Outer Join - All the rows from both the tables being joined are returned with gaps (null values) reported in mismatched rows.

Let us examine Left Outer Join in detail:

SELECT a.taught_by, a.course_id, a.course_name, b.roll_no 
FROM courses a 
LEFT OUTER JOIN enrollment b 
ON a.course_id=b.course_id
ORDER BY a.taught_by;

RESULT:

image.png

As explained, this time the rows for course_id's S4 and S7 are reported, but with blank or NULL in the roll_no column as no students have enrolled for these courses as yet.

In other words, outer joins can be used to find missing values in any of the tables being joined.

For our example, we can issue the following query to fetch course_id's for which there has been no enrollment so far.

SELECT a.taught_by, a.course_id, a.course_name, b.roll_no 
FROM courses a 
LEFT OUTER JOIN enrollment b 
ON a.course_id=b.course_id
WHERE b.roll_no is null;

This will yield only the rows corresponding to course_id's S4 and S7