Postgres 12 WITH Improvement
SQL queries can quickly become large and hard to read. In a typical programming language, we can define variables and functions to structure code. In SQL, we can move sub-selects into so-called common table expressions (the WITH clause). Consider the following query that retrieves exams of students in the first year who live in Kaiserslautern and lists their name, the lecture, and the grade of the exam.
-- Q1
SELECT name, lecture, grade
FROM students, exams
WHERE exams.matnr = students.matnr
AND semester <= 2 AND address = 'KL';
Assume that we need the first-year students more often because we want to reuse this group in other queries. Then we can rewrite the query as such:
-- Q2
SELECT name, lecture, grade
FROM exams,
(SELECT * FROM students WHERE semester <= 2) s,
WHERE exams.matnr = s.matnr
AND address = 'KL';
And then, a preprocessor can insert (SELECT * FROM students where semester <= 2 )
or modifications, e.g., with a different where clause.
A more beautiful, more natural, and more applicable way for interactive SQL sessions is, to move this sub-select into a common table expression:
-- Q3
WITH first_year_students AS (
SELECT * FROM students WHERE semester <= 2
)
SELECT name, lecture, grade
FROM first_year_students s, exams
WHERE exams.matnr = s.matnr AND address = 'KL';
Now the first-year students have their own place.
Postgres 11 vs 12
If we look at the explain output of Postgres 11, then Q1 and Q2 both have this plan:
We read exams and students, the latter with filters according to the where clause. A plan that lets us score points in an undergrad database course exam. Q3’s plan looks different:
This plan output from pgAdmin is a little bit misleading. Actually the upper student scan should be the input of the common table expression. During that scan, the semester <= 2
-part is filtered, and only when the CTE’s result is read, address = 'KL'
is checked. This means, the optimizer just took the CTE and the rest of the query and handled them independently.
With Postgres 12, this is no longer the case. Now CTEs are inlined automatically and the explained output looks like this:
Everything nicely optimized, just like the other two queries.
If you want to toy around, here are the queries and some data.