Subqueries
a query within a query.
Last updated
a query within a query.
Last updated
When do you need to use a subquery?
You need to use a subquery when you have the need to manipulate an existing table to "pseudo-create" a table that is then used as a part of a larger query.
Components | Subqueries | Joins |
---|---|---|
Combine data from multiple tables into a single result | ||
Create a flexible view of tables stitched together using a “key” | ||
Build an output to use in a later part of the query | ||
Subquery Plan: What happens under the hood |
When a subquery, typically in the form of a nested or inline subquery, is correlated to its outer query, it cannot run independently. This is most certainly an edge case since correlated subqueries are rarely implemented compared to standalone, simple subqueries.
WITH
: This subquery is used when you’d like to “pseudo-create” a table from an existing table and visually scope the temporary table at the top of the larger query.
A Common Table Expression (CTE) in SQL allows you to define a temporary result, such as a table, to then be referenced in a later part of the query.
Nested: This subquery is used when you’d like the temporary table to act as a filter within the larger query, which implies that it often sits within the WHERE
clause.
Inline: This subquery is used in the same fashion as the WITH
use case above. However, instead of the temporary table sitting on top of the larger query, it’s embedded within the FROM
clause.
Scalar: This subquery is used when you’d like to generate a scalar value to be used as a benchmark of some sort.
Nested & Scalar subqueries often do not require aliases the way
WITH
& Inline subqueries do.
When a user wants to create a version of an existing table to be used in a larger query
For example, when you’d like to calculate the average salary across an entire organization to compare to individual employee salaries. Because it’s often a single value that is generated and used as a benchmark, the scalar subquery often sits within the SELECT
clause.
Readability: WITH
and Nested
subqueries are most advantageous for readability.
Performance: Scalar
subqueries are advantageous for performance and are often used on smaller datasets.
Simple Subquery: The inner subquery is completely independent of the larger query.
Correlated Subquery: The inner subquery is dependent on the larger query.
When the value of the inner query is dependent on a value outputted from the main query (e.g., the filter statement constantly changes).
In the example above, you’ll notice that the value of the inner query -- average GPA -- keeps adjusting depending on the university the student goes to. THAT is a great use case for the correlated subquery.
CoRelated Subquery | Non-CoRelated Subquery |
---|---|
The inner query or subquery references the outer query. | The inner query doesn't depend on the outer query and can run as a stand-alone query. |
The outer query executes before the inner query. | The inner query executes before the outer query. |
Correlated subqueries are slower than Non-Correlated subqueries and should be avoided as much as possible.