SQL 中的 LAG() 和 LEAD() 函数
这两个函数都是 窗口函数,用于访问结果集中当前行之前或之后的行,而不需要自连接操作。
一、LAG() 函数 - 访问前一行的数据
功能:
返回当前行之前指定偏移量行的值。
语法:
LAG(column_name, offset, default_value)
OVER (
[PARTITION BY partition_expression]
ORDER BY sort_expression
)
参数:
- column_name:要获取的列
- offset(可选):向前偏移的行数,默认为 1
- default_value(可选):当没有前一行时返回的值,默认为 NULL
示例 1:基本用法 - 比较当前行与前一行的值
SELECT
date,
sales,
LAG(sales) OVER (ORDER BY date) AS previous_day_sales,
sales - LAG(sales) OVER (ORDER BY date) AS daily_change
FROM sales_data;
结果示例:
| date | sales | previous_day_sales | daily_change |
|------------|-------|-------------------|--------------|
| 2024-01-01 | 100 | NULL | NULL |
| 2024-01-02 | 150 | 100 | 50 |
| 2024-01-03 | 120 | 150 | -30 |
示例 2:指定偏移量和默认值
SELECT
employee_id,
month,
salary,
LAG(salary, 2, 0) OVER (ORDER BY month) AS salary_two_months_ago
FROM employee_salary;
示例 3:按分区计算
SELECT
department,
employee,
salary,
LAG(salary) OVER (
PARTITION BY department
ORDER BY salary DESC
) AS prev_salary_in_dept
FROM employees;
二、LEAD() 函数 - 访问后一行的数据
功能:
返回当前行之后指定偏移量行的值。
语法:
与 LAG() 相同,只是方向相反:
LEAD(column_name, offset, default_value)
OVER (
[PARTITION BY partition_expression]
ORDER BY sort_expression
)
示例 1:基本用法
SELECT
date,
sales,
LEAD(sales) OVER (ORDER BY date) AS next_day_sales,
LEAD(sales) OVER (ORDER BY date) - sales AS next_day_change
FROM sales_data;
结果示例:
| date | sales | next_day_sales | next_day_change |
|------------|-------|----------------|-----------------|
| 2024-01-01 | 100 | 150 | 50 |
| 2024-01-02 | 150 | 120 | -30 |
| 2024-01-03 | 120 | NULL | NULL |
示例 2:计算增长率
SELECT
year,
revenue,
LEAD(revenue) OVER (ORDER BY year) AS next_year_revenue,
ROUND(
(LEAD(revenue) OVER (ORDER BY year) - revenue) * 100.0 / revenue,
2
) AS growth_rate
FROM annual_revenue;
三、实际应用场景
1. 计算连续登录天数
SELECT
user_id,
login_date,
LAG(login_date) OVER (PARTITION BY user_id ORDER BY login_date) AS prev_login_date,
DATEDIFF(day,
LAG(login_date) OVER (PARTITION BY user_id ORDER BY login_date),
login_date
) AS days_since_last_login
FROM login_history;
2. 计算库存变化
SELECT
product_id,
transaction_date,
quantity_change,
LAG(quantity_change) OVER (
PARTITION BY product_id
ORDER BY transaction_date
) AS prev_quantity_change
FROM inventory_transactions;
3. 填充缺失值(向前/向后填充)
SELECT
date,
COALESCE(
sales,
LAG(sales) OVER (ORDER BY date),
LEAD(sales) OVER (ORDER BY date)
) AS imputed_sales
FROM sales_data;
四、LAG() 和 LEAD() 的区别
| 特性 |
LAG() |
LEAD() |
|---|
| 方向 |
向前(查看前一行) |
向后(查看后一行) |
| 偏移量 |
默认 1(前一行) |
默认 1(后一行) |
| 典型用途 |
同比分析、填充缺失值 |
预测分析、计算增长率 |
| 边界处理 |
第一行返回 default_value |
最后一行返回 default_value |
五、使用注意事项
必须使用 ORDER BY:这两个函数都需要 ORDER BY 子句来确定行的顺序
性能考虑:在大数据集上使用时要考虑性能影响
分区的重要性:正确使用 PARTITION BY 可以避免跨组的数据混淆
默认值处理:合理设置 default_value 可以避免 NULL 值问题
六、综合示例
-- 计算股票的每日涨跌和前一日、后一日价格
SELECT
stock_date,
stock_symbol,
closing_price,
LAG(closing_price, 1, closing_price) OVER (
PARTITION BY stock_symbol
ORDER BY stock_date
) AS prev_close,
LEAD(closing_price, 1, closing_price) OVER (
PARTITION BY stock_symbol
ORDER BY stock_date
) AS next_close,
ROUND(
(closing_price - LAG(closing_price) OVER (
PARTITION BY stock_symbol
ORDER BY stock_date
)) * 100.0 / LAG(closing_price) OVER (
PARTITION BY stock_symbol
ORDER BY stock_date
), 2
) AS daily_change_percent
FROM stock_prices;
这两个函数在数据分析和业务报表中非常有用,可以简化许多复杂的查询操作,特别是在需要时间序列分析、比较或趋势计算的场景中。