Window Functions
When do you use window functions over aggregate functions?
When you want to measure trends or changes over rows.
When you want to rank a column to use for outreach/prioritization.
Example:
Problem | Window Function |
---|---|
Has the average price of airline tickets gone up this year? | Calculate a running average of ticket prices partitioned by the month and ordered by time |
What’s the best way to keep the running total orders of customers? | Calculate the running count of orders and maintain it as a separate row within the table |
Use a combination of factors to rank companies most likely to need a loan. | Include a rank column in the output to be used by a BD rep to reach out to potential customers. |
What is window function?
Window Function: A window function is a calculation across a set of rows in a table that are somehow related to the current row. This means we’re typically:
Calculating running totals that incorporate the current row or,
Ranking records across rows, inclusive of the current one
A window function is similar to aggregate functions combined with GROUP BY
clauses but have one key difference: Window functions retain the total number of rows between the input table and the output table (or result). Behind the scenes, the window function is able to access more than just the current row of the query result.
a window function allows users to compare one row to another without doing any joins.
When window functions are used, you’ll notice new column names like the following:
Average running price
Running total orders
Running sum sales
Rank
Percentile
Core
PARTITION BY
: A sub-clause of the OVER clause. Similar to GROUP BY.OVER
: Typically precedes the partition by that signals what to “GROUP BY”.Aggregates: Aggregate functions that are used in window functions, too (e.g., sum, count, avg).
GROUP BY | PARTITION BY |
---|---|
The ouput has a lessened # of records based on the | The output maintains the # of records in the original table. |
The output is one row per | If the original table had 10 rows, the |
Ranking
ROW_NUMBER()
: Ranking is distinct amongst records even with ties in what the table is ranked against.RANK()
: Ranking function where a row could get the same rank if they have the same value and ranks skip for subsequent values.DENSE_RANK()
: Ranking function similar toRANK()
but ranks are not skipped with ties.
Advanced
Aliases: Shorthand that can be used if there are several window functions in one query.
Percentiles: Defines what percentile a value falls into over the entire table.
Lag/Lead: Calculating differences between rows’ values.
Last updated