Subqueries

a query within a query.

When do you need to use a subquery?

You need to use a subquery when you have the need to manipulate an existing table to "pseudo-create" a table that is then used as a part of a larger query.

ComponentsSubqueriesJoins

Combine data from multiple tables into a single result

Create a flexible view of tables stitched together using a “key”

Build an output to use in a later part of the query

Subquery Plan: What happens under the hood

Subquery Basics

When a subquery, typically in the form of a nested or inline subquery, is correlated to its outer query, it cannot run independently. This is most certainly an edge case since correlated subqueries are rarely implemented compared to standalone, simple subqueries.

Subquery: Placement

  • WITH: This subquery is used when you’d like to “pseudo-create” a table from an existing table and visually scope the temporary table at the top of the larger query.

A Common Table Expression (CTE) in SQL allows you to define a temporary result, such as a table, to then be referenced in a later part of the query.

  • Nested: This subquery is used when you’d like the temporary table to act as a filter within the larger query, which implies that it often sits within the WHERE clause.

  • Inline: This subquery is used in the same fashion as the WITH use case above. However, instead of the temporary table sitting on top of the larger query, it’s embedded within the FROM clause.

  • Scalar: This subquery is used when you’d like to generate a scalar value to be used as a benchmark of some sort.

Nested & Scalar subqueries often do not require aliases the way WITH & Inline subqueries do.

  • When a user wants to create a version of an existing table to be used in a larger query

WITH average_price AS
( SELECT brand_id, AVG(product_price) AS brand_avg_price
  FROM product_records
),
SELECT a.brand_id, a.total_brand_sales, b.brand_avg_price
FROM brand_table a
JOIN average_price b
ON b.brand_id = a.brand_id
ORDER BY a.total_brand_sales DESC;

For example, when you’d like to calculate the average salary across an entire organization to compare to individual employee salaries. Because it’s often a single value that is generated and used as a benchmark, the scalar subquery often sits within the SELECT clause.

Advantages:

  • Readability: WITH and Nested subqueries are most advantageous for readability.

  • Performance: Scalar subqueries are advantageous for performance and are often used on smaller datasets.

Subquery: Dependencies

  • Simple Subquery: The inner subquery is completely independent of the larger query.

  • Correlated Subquery: The inner subquery is dependent on the larger query.

When to use Correlated Query ?

When the value of the inner query is dependent on a value outputted from the main query (e.g., the filter statement constantly changes).

SELECT first_name, last_name, GPA, university
FROM student_db outer_db
WHERE GPA >
    (SELECT AVG(GPA) 
    FROM student_db 
    WHERE university = outer_db.university);

In the example above, you’ll notice that the value of the inner query -- average GPA -- keeps adjusting depending on the university the student goes to. THAT is a great use case for the correlated subquery.

CoRelated SubqueryNon-CoRelated Subquery

The inner query or subquery references the outer query.

The inner query doesn't depend on the outer query and can run as a stand-alone query.

The outer query executes before the inner query.

The inner query executes before the outer query.

Correlated subqueries are slower than Non-Correlated subqueries and should be avoided as much as possible.

Last updated