In this unit, we will be looking at using Joins to query across tables. Understand that for this unit, we will be learning just enough JOIN
usage to do some basic queries. Setting up especially large and intricate databases to ensure JOIN
operations are reasonably efficient can be a major undertaking, and we will largely be skipping over that and just focus on some basic queries across tables.
We will be using the same schema as the Foreign Keys module (that is, tables for Students, Courses, and Enrollments with foreign-key enforced connections), however, I have added some data.
Students
StudentID | FirstName | LastName | ComputingID |
---|---|---|---|
1 | John | Smith | abc2def |
2 | Debra | Jones | ghi3jk |
3 | Jack | Black | lmn4opq |
4 | Jack | White | rst5uvw |
Courses
CRN | Subject | CourseNumber | Section | MeetingTime |
---|---|---|---|---|
12345 | CS | 3140 | 1 | TR 14:00 - 15:15 |
54321 | CS | 3140 | 2 | TR 15:30 - 16:45 |
98765 | APMA | 2120 | 2 | MWF 12:00 - 12:50 |
Enrollments
EnrollmentID | StudentID | CRN |
---|---|---|
1 | 1 | 12345 |
2 | 1 | 98765 |
3 | 2 | 54321 |
4 | 3 | 98765 |
5 | 3 | 54321 |
6 | 4 | 98765 |
Let’s consider the following problem: “I want to get the names and computingIDs of all students in CS 3140, as well as the section they are enrolled in” (that is, across all sections). The difficulty here is that the Course’s table where that info is stored does not contain any information about what students are enrolled in the class.
This is a use-case for a JOIN, because with JOIN, we can write this as one single query.
However, let’s show how to do this without join first, over multiple queries.
First, we need to get all the CRNs for CS 3140. To do this, we can use the query
SELECT Crn
FROM Courses
WHERE Subject = 'CS' AND CourseNumber = '3140';
And we get two CRNs, 12345 and 54321.
Now that we have those, we can get all the StudentIDs for enrollment in both sections. Let’s start with 12345.
SELECT StudentID
FROM Enrollments
WHERE Crn = 12345;
And I only get a single student ID, 1. Now, to get that students name and computing ID, I use
SELECT FirstName, LastName, ComputingID
FROM Students
WHERE StudentID = 1;
And I get:
FirstName LastName ComputingID
--------- -------- -----------
John Smith abc2def
Okay, now I have my roster for Section 1, now let’s do section 2 (which was CRN 54321)
SELECT StudentID
FROM Enrollments
WHERE Crn = 54321;
I get two different student IDs, 2 and 3. I can use “IN” command below to make this only require one query:
SELECT FirstName, LastName, ComputingID
FROM Students
Where StudentID IN (2, 3);
Giving us:
FirstName LastName ComputingID
--------- -------- -----------
Debra Jones ghi3jk
Jack Black lmn4opq
Okay, so that’s 5 separate queries, with multiple times me hardcoding the results of one query into another.
Can we do better? Yes, we can do it in one query!
First, let’s just write a query that gives us the StudentID and SectionNumber of every student in CS 3140 by JOIN
-ing our Courses and Enrollments table on the Crn attribute that acts as the foreign key.
SELECT Enrollments.StudentID, Courses.Section
FROM Enrollments
JOIN Courses ON Enrollments.Crn = Courses.Crn
WHERE Courses.Subject = 'CS' AND Courses.CourseNumber = 3140;
The results of the query are:
StudentID Section
--------- -------
1 1
2 2
3 2
Here, we are saying to combine Courses and Section into one “table” (we don’t make a permanent table, but we can then use WHERE and Select columns as though it were a single table)
Specifically, starting with our Enrollments table, we use JOIN Courses ON Enrollments.Crn = Courses.Crn
- that is, we combine a record from Enrollments and Courses if and only if they have a matching CRN value. This is because the CRN, as both a primary key and a foreign key, uniquely identifies and single Course.
You’ll notice now when we select which columns to have our query return, we use Enrollments.StudentID
and Courses.Section
. The format is TableName.ColumnName
, and we generally want to use it in our queries because it will resolve any conflicts when we have two columns in different tables with the same name.
For instance, there is a Crn
attribute in both Courses and Enrollments, so when explaining our join criteria, we use JOIN Courses ON Enrollments.Crn = Courses.Crn
Because of this, if we don’t specify the table and just try to use Crn in our query by itself, we will get the following error
Parse error: ambiguous column name: Crn
While we don’t need to use TableName.ColumnName
for columns that don’t have name conflicts, it’s generally good practice to do so. Most IDE-like tools for SQL will do this automatically whenever you use auto-complete on a column name, for example.
While we are able to get the StudentIDs and SectionIDs with the above query, we are still one step short of our goal, which is using the StudentIDs to get the names and computing IDs of the students.
To do this, we can build on our existing JOIN query with:
SELECT Students.FirstName, Students.LastName, Students.ComputingID, Courses.Section
FROM Enrollments
JOIN Courses ON Enrollments.Crn = Courses.Crn
JOIN Students ON Enrollments.StudentID = Students.StudentID
WHERE Courses.Subject = 'CS' AND Courses.CourseNumber = 3140;
Now, we add a second JOIN, this time using the StudentID to combine all records from the last query with the records in Students where the StudentID between Students and Enrollments match.
The result of this query is:
FirstName LastName ComputingID Section ——— ——– ———– ——- John Smith abc2def 1 Debra Jones ghi3jk 2 Jack Black lmn4opq 2
The principle here is the same. It’s worth noting that the more joins you have, the more computationally intense the query is to run. However, at the scale of databases we will work with in this class, I generally wouldn’t worry about this cost. This is just something to file away in the back of your mind if you ever find yourself writing a query that feels incredibly slow. At that point, separating the queries may be beneficial depending on what you are doing.
There are a couple of types of joins to keep in mind.
What we have just shown is called an INNER JOIN. An Inner Join is formatted as:
FROM TableA JOIN TableB ON TableA.ColumnName = TableB.ColumnName
An inner join is used to only combine matching data when joining tables. This is by far the most common join type you’ll use in queries.
A CROSS Join is used to combine every record of one table with every record of another table. This is not a particularly common join to do, but it’s worth covering.
For instance, consider the following query
SELECT * from Students
JOIN Enrollments;
At first glance, this may look similar to what we’ve already done. But if you look closely, you’ll see that there is no ON
condition to our JOIN
. This has massive consequences. That means, instead of only combining records when their StudentIDs match, we combine every record in Student with every record in Enrollment. Since we have 4 records in Students and 6 records in Enrollments, this gives us 24 records (6 times 4) returned from our query:
StudentId FirstName LastName ComputingID EnrollmentID StudentID CRN
--------- --------- -------- ----------- ------------ --------- -----
1 John Smith abc2def 1 1 12345
1 John Smith abc2def 2 1 98765
1 John Smith abc2def 3 2 54321
1 John Smith abc2def 4 3 98765
1 John Smith abc2def 5 3 54321
1 John Smith abc2def 6 4 98765
2 Debra Jones ghi3jk 1 1 12345
2 Debra Jones ghi3jk 2 1 98765
2 Debra Jones ghi3jk 3 2 54321
2 Debra Jones ghi3jk 4 3 98765
2 Debra Jones ghi3jk 5 3 54321
2 Debra Jones ghi3jk 6 4 98765
3 Jack Black lmn4opq 1 1 12345
3 Jack Black lmn4opq 2 1 98765
3 Jack Black lmn4opq 3 2 54321
3 Jack Black lmn4opq 4 3 98765
3 Jack Black lmn4opq 5 3 54321
3 Jack Black lmn4opq 6 4 98765
4 Jack White rst5uvw 1 1 12345
4 Jack White rst5uvw 2 1 98765
4 Jack White rst5uvw 3 2 54321
4 Jack White rst5uvw 4 3 98765
4 Jack White rst5uvw 5 3 54321
4 Jack White rst5uvw 6 4 98765
This is a pretty big table, but there’s really no use-case for it. It really only makes sense to join these tables on a matching StudentID, as we did before. However, it’s worth understanding what a cross join is, as if you make a mistake when writing your ON
condition, and write a condition that always returns true, it gives you the same results as a Cross Join.
While there are Outer Joins, Left Outer Joins, Right Outer Joins, etc. that can theoretically exist, SQLite only supports LEFT OUTER JOIN.
To illustrate this, lets first add a new section of CS 3140 to our table, section 3.
INSERT INTO Courses (Subject, CourseNumber, Section, MeetingTime)
VALUES('CS', 3140, 3, 'TR 12:30 - 13:45');
Now, we have added a section of CS 3140 that no student is enrolled it.
First, I’m going to show the LEFT OUTER JOIN query and results, and then I’ll explain what’s happening.
SELECT Courses.Section, Students.FirstName, Students.LastName, Students.ComputingID
FROM Courses
OUTER LEFT JOIN Enrollments ON Courses.Crn == Enrollments.Crn
OUTER LEFT JOIN Students ON Enrollments.StudentID == Students.StudentID
WHERE Courses.Subject = 'CS' and Courses.CourseNumber = 3140;
This gives us the following results:
Section FirstName LastName ComputingID
------- --------- -------- -----------
1 John Smith abc2def
2 Debra Jones ghi3jk
2 Jack Black lmn4opq
3
At first glance, it looks the same as what we had before, but notice now that we have a query result for Section 3, the section with no students enrolled in it. However, the names and ComputingID columns are blank (null
). That’s because there are no students enrolled in this section.
If we just did our inner join from before, this row for Section 3 would not appear at all. However, with an outer left join, we are guaranteed at least one record for each record in our first table. If nothing in other tables matches the JOIN
’s ON
expression, then SQLite simply leaves those fields blank.
Using JOINs allows us to build more complex queries across multiple tables. Having multiple tables is necessary for data normalization, but JOINs still let us get summary data.
As a note, JOINs can have more advanced usages, but in this class, we will only use JOINs on SELECT
queries to query across multiple tables. This is far and away their most common use.