SQL面试题练习 —— 各用户最长的连续登录天数-可间断

1 题目


现有各用户的登录记录表t_login_events如下,表中每行数据表达的信息是一个用户何时登录了平台。现要求统计各用户最长的连续登录天数,间断一天也算作连续,例如:一个用户在1,3,5,6登录,则视为连续6天登录。

样例数据

+----------+----------------------+
| user_id  |    login_datetime    |
+----------+----------------------+
| 100      | 2021-12-01 19:00:00  |
| 100      | 2021-12-01 19:30:00  |
| 100      | 2021-12-02 21:01:00  |
| 100      | 2021-12-03 11:01:00  |
| 101      | 2021-12-01 19:05:00  |
| 101      | 2021-12-01 21:05:00  |
| 101      | 2021-12-03 21:05:00  |
| 101      | 2021-12-05 15:05:00  |
| 101      | 2021-12-06 19:05:00  |
| 102      | 2021-12-01 19:55:00  |
| 102      | 2021-12-01 21:05:00  |
| 102      | 2021-12-02 21:57:00  |
| 102      | 2021-12-03 19:10:00  |
| 104      | 2021-12-04 21:57:00  |
| 104      | 2021-12-02 22:57:00  |
| 105      | 2021-12-01 10:01:00  |
+----------+----------------------+

期望结果

+----------+---------------+
| user_id  | max_log_days  |
+----------+---------------+
| 100      | 3             |
| 101      | 6             |
| 102      | 3             |
| 104      | 3             |
| 105      | 1             |
+----------+---------------+

2 建表语句


--建表语句

create table if not exists t_login_events
(
    user_id        int comment '用户id',
    login_datetime string comment '登录时间'
)
    comment '直播间访问记录';
--数据插入
INSERT overwrite table t_login_events
VALUES (100, '2021-12-01 19:00:00'),
       (100, '2021-12-01 19:30:00'),
       (100, '2021-12-02 21:01:00'),
       (100, '2021-12-03 11:01:00'),
       (101, '2021-12-01 19:05:00'),
       (101, '2021-12-01 21:05:00'),
       (101, '2021-12-03 21:05:00'),
       (101, '2021-12-05 15:05:00'),
       (101, '2021-12-06 19:05:00'),
       (102, '2021-12-01 19:55:00'),
       (102, '2021-12-01 21:05:00'),
       (102, '2021-12-02 21:57:00'),
       (102, '2021-12-03 19:10:00'),
       (104, '2021-12-04 21:57:00'),
       (104, '2021-12-02 22:57:00'),
       (105, '2021-12-01 10:01:00');

3 题解


  1. 数据去重
select user_id,
       to_date(login_datetime) as login_date
from t_login_events
group by user_id, to_date(login_datetime)

执行结果

+----------+-------------+
| user_id  | login_date  |
+----------+-------------+
| 100      | 2021-12-01  |
| 100      | 2021-12-02  |
| 100      | 2021-12-03  |
| 101      | 2021-12-01  |
| 101      | 2021-12-03  |
| 101      | 2021-12-05  |
| 101      | 2021-12-06  |
| 102      | 2021-12-01  |
| 102      | 2021-12-02  |
| 102      | 2021-12-03  |
| 104      | 2021-12-02  |
| 104      | 2021-12-04  |
| 105      | 2021-12-01  |
+----------+-------------+
  1. 计算日期差

根据用户分组,使用lag函数获得当前行的上一行数据中的日期,使用datediff函数判断日期当期日期与上一行日期的时间差。

select user_id,
       login_date,
       lag(login_date, 1, null) over (partition by user_id order by login_date asc)                       as lag_log_date,
       datediff(login_date, lag(login_date, 1, null) over (partition by user_id order by login_date asc)) as date_diff
from (select user_id,
             to_date(login_datetime) as login_date
      from t_login_events
      group by user_id, to_date(login_datetime)) t1

执行结果

+----------+-------------+---------------+------------+
| user_id  | login_date  | lag_log_date  | date_diff  |
+----------+-------------+---------------+------------+
| 100      | 2021-12-01  | NULL          | NULL       |
| 100      | 2021-12-02  | 2021-12-01    | 1          |
| 100      | 2021-12-03  | 2021-12-02    | 1          |
| 101      | 2021-12-01  | NULL          | NULL       |
| 101      | 2021-12-03  | 2021-12-01    | 2          |
| 101      | 2021-12-05  | 2021-12-03    | 2          |
| 101      | 2021-12-06  | 2021-12-05    | 1          |
| 102      | 2021-12-01  | NULL          | NULL       |
| 102      | 2021-12-02  | 2021-12-01    | 1          |
| 102      | 2021-12-03  | 2021-12-02    | 1          |
| 104      | 2021-12-02  | NULL          | NULL       |
| 104      | 2021-12-04  | 2021-12-02    | 2          |
| 105      | 2021-12-01  | NULL          | NULL       |
+----------+-------------+---------------+------------+
  1. 判断是否连续,累积求和得到分组id

根据date_diff结果判断是否连续,如果date_diff <= 2则认为连续 我们给赋值为0,否则不连续,赋值为1。

select user_id,
       login_date,
       lag_log_date,
       date_diff,
       sum(if(date_diff <= 2, 0, 1)) over (partition by user_id order by login_date asc) as group_id
from (select user_id,
             login_date,
             lag(login_date, 1, null) over (partition by user_id order by login_date asc)  as lag_log_date,
             datediff(login_date, lag(login_date, 1, null)
                                      over (partition by user_id order by login_date asc)) as date_diff
      from (select user_id,
                   to_date(login_datetime) as login_date
            from t_login_events
            group by user_id, to_date(login_datetime)) t1) t2

执行结果

+----------+-------------+---------------+------------+-----------+
| user_id  | login_date  | lag_log_date  | date_diff  | group_id  |
+----------+-------------+---------------+------------+-----------+
| 100      | 2021-12-01  | NULL          | NULL       | 1         |
| 100      | 2021-12-02  | 2021-12-01    | 1          | 1         |
| 100      | 2021-12-03  | 2021-12-02    | 1          | 1         |
| 101      | 2021-12-01  | NULL          | NULL       | 1         |
| 101      | 2021-12-03  | 2021-12-01    | 2          | 1         |
| 101      | 2021-12-05  | 2021-12-03    | 2          | 1         |
| 101      | 2021-12-06  | 2021-12-05    | 1          | 1         |
| 102      | 2021-12-01  | NULL          | NULL       | 1         |
| 102      | 2021-12-02  | 2021-12-01    | 1          | 1         |
| 102      | 2021-12-03  | 2021-12-02    | 1          | 1         |
| 104      | 2021-12-02  | NULL          | NULL       | 1         |
| 104      | 2021-12-04  | 2021-12-02    | 2          | 1         |
| 105      | 2021-12-01  | NULL          | NULL       | 1         |
+----------+-------------+---------------+------------+-----------+
  1. 按照用户和group_id 分组,计算每次连续登陆的天数,再根据用户分组计算最大连续天数

首先根据user_id和group_id分组,用datediff计算出出最大登陆日期和最小登陆日期,两者做差+1 得到每次连续登陆的天数。然后按照用户分组,使用max()计算每个用户最大连续天数。

select user_id,
       max(log_days) as max_log_days
from (select user_id,
             group_id,
             datediff(max(login_date), min(login_date)) + 1 as log_days
      from (select user_id,
                   login_date,
                   lag_log_date,
                   date_diff,
                   sum(if(date_diff <= 2, 0, 1)) over (partition by user_id order by login_date asc) as group_id
            from (select user_id,
                         login_date,
                         lag(login_date, 1, null) over (partition by user_id order by login_date asc)  as lag_log_date,
                         datediff(login_date, lag(login_date, 1, null)
                                                  over (partition by user_id order by login_date asc)) as date_diff
                  from (select user_id,
                               to_date(login_datetime) as login_date
                        from t_login_events
                        group by user_id, to_date(login_datetime)) t1) t2) t3
      group by user_id,
               group_id) t4
group by user_id

执行结果

+----------+---------------+
| user_id  | max_log_days  |
+----------+---------------+
| 100      | 3             |
| 101      | 6             |
| 102      | 3             |
| 104      | 3             |
| 105      | 1             |
+----------+---------------+

最近更新

  1. docker php8.1+nginx base 镜像 dockerfile 配置

    2024-07-11 04:02:02       66 阅读
  2. Could not load dynamic library ‘cudart64_100.dll‘

    2024-07-11 04:02:02       70 阅读
  3. 在Django里面运行非项目文件

    2024-07-11 04:02:02       57 阅读
  4. Python语言-面向对象

    2024-07-11 04:02:02       68 阅读

热门阅读

  1. Appium+Python自动化脚本的APK打包探索

    2024-07-11 04:02:02       25 阅读
  2. C++学习笔记(一):入门概念的疑问与解答

    2024-07-11 04:02:02       22 阅读
  3. 容器之docker compose

    2024-07-11 04:02:02       22 阅读
  4. 55、定义浅层神经网络架构和算法(matlab)

    2024-07-11 04:02:02       19 阅读
  5. FastApi+WebSocket 解析

    2024-07-11 04:02:02       18 阅读