Suspicious login activity 🤔
Scenario
A company is investigating suspicious login activity on their platform.
Consecutive failed login attempts are considered suspicious once they reach a certain threshold, and the company wants to identify users who have reached this threshold.
Their platform logs login, logout, and login failed events for each user.
Question
For the events below, identify the users who have login failed events at least five times in a row.
Keep only the user ID and their greatest number of consecutive failed login attempts.
The output should have a row for each user who meets this criterion, with the columns:
user_idconsecutive_failuresas the greatest number of consecutive failed login attempts for the user
Order the output by user_id.
Expand for the DDL
create table events (
event_id integer primary key,
user_id integer not null,
event_datetime timestamp not null,
event_type varchar not null
);
insert into events
values
(1, 1, '2024-01-01 11:00:00', 'login'),
(2, 1, '2024-01-01 12:00:00', 'logout'),
(3, 1, '2024-01-03 03:00:00', 'login failed'),
(4, 1, '2024-01-03 03:01:00', 'login failed'),
(5, 1, '2024-01-03 03:02:00', 'login failed'),
(6, 1, '2024-01-03 03:05:00', 'login'),
(7, 2, '2024-01-03 10:00:00', 'login'),
(8, 2, '2024-01-03 15:00:00', 'logout'),
(9, 1, '2024-01-03 23:00:00', 'logout'),
(10, 2, '2024-01-04 22:00:00', 'login failed'),
(11, 2, '2024-01-04 22:05:00', 'login'),
(12, 3, '2024-01-05 20:00:00', 'login'),
(13, 3, '2024-01-06 04:00:00', 'logout'),
(14, 2, '2024-01-09 15:00:00', 'logout'),
(15, 3, '2024-01-11 21:00:00', 'login'),
(16, 1, '2024-01-12 12:00:00', 'login failed'),
(17, 1, '2024-01-12 13:00:00', 'login failed'),
(18, 1, '2024-01-12 23:00:00', 'login failed'),
(19, 2, '2024-01-13 10:00:00', 'login failed'),
(20, 2, '2024-01-13 10:05:00', 'login'),
(21, 2, '2024-01-13 15:00:00', 'logout'),
(22, 1, '2024-01-13 23:00:00', 'login failed'),
(23, 1, '2024-01-13 23:01:00', 'login failed'),
(24, 1, '2024-01-13 23:02:00', 'login failed'),
(25, 2, '2024-01-14 22:00:00', 'login'),
(26, 3, '2024-01-15 20:00:00', 'login'),
(27, 3, '2024-01-16 04:00:00', 'logout'),
(28, 2, '2024-01-19 15:00:00', 'logout'),
(29, 3, '2024-01-21 21:00:00', 'login'),
(30, 1, '2024-01-22 12:00:00', 'login failed'),
(31, 1, '2024-01-22 12:05:00', 'password reset'),
(32, 1, '2024-01-22 12:10:00', 'login'),
(33, 1, '2024-01-22 13:00:00', 'logout'),
(34, 1, '2024-01-23 03:00:00', 'login'),
(35, 2, '2024-01-23 10:00:00', 'login'),
(36, 2, '2024-01-23 15:00:00', 'logout'),
(37, 1, '2024-01-23 23:00:00', 'logout'),
(38, 2, '2024-01-24 22:00:00', 'login'),
(39, 3, '2024-01-25 20:00:00', 'login'),
(40, 3, '2024-01-26 04:00:00', 'logout'),
(41, 2, '2024-01-29 15:00:00', 'logout'),
(42, 3, '2024-01-30 21:00:00', 'login failed'),
(43, 3, '2024-01-30 21:01:00', 'login failed'),
(44, 3, '2024-01-30 21:02:00', 'login failed'),
(45, 3, '2024-01-30 21:03:00', 'login failed'),
(46, 3, '2024-01-30 21:04:00', 'login failed'),
(47, 3, '2024-01-30 21:05:00', 'password reset'),
(48, 3, '2024-01-30 21:06:00', 'password reset'),
(49, 3, '2024-01-30 21:07:00', 'password reset'),
(50, 3, '2024-01-30 21:08:00', 'password reset'),
(51, 3, '2024-01-30 21:09:00', 'password reset'),
(52, 3, '2024-01-30 21:10:00', 'password reset'),
(53, 3, '2024-01-31 23:55:00', 'login failed'),
(54, 3, '2024-01-31 23:56:00', 'login failed'),
(55, 3, '2024-01-31 23:57:00', 'login failed'),
(56, 3, '2024-01-31 23:58:00', 'login failed'),
(57, 3, '2024-01-31 23:59:00', 'login failed'),
(58, 3, '2024-02-01 00:00:00', 'login failed'),
(59, 3, '2024-02-01 00:01:00', 'login failed'),
(60, 3, '2024-02-01 00:02:00', 'login failed')
;
The solution can be found at:
Sample input
| event_id | user_id | event_datetime | event_type |
|---|---|---|---|
| 1 | 1 | 2024-01-01 03:00:00 | login failed |
| 2 | 1 | 2024-01-01 03:01:00 | login failed |
| 3 | 1 | 2024-01-01 03:02:00 | login failed |
| 4 | 1 | 2024-01-01 03:03:00 | login failed |
| 5 | 1 | 2024-01-01 03:04:00 | login failed |
| 6 | 1 | 2024-01-01 03:05:00 | login |
| 7 | 2 | 2024-01-01 10:00:00 | login |
| 8 | 2 | 2024-01-01 15:00:00 | logout |
| 9 | 2 | 2024-01-01 23:00:00 | login failed |
with events(event_id, user_id, event_datetime, event_type) as (
values
(1, 1, '2024-01-01 03:00:00'::timestamp, 'login failed'),
(2, 1, '2024-01-01 03:01:00'::timestamp, 'login failed'),
(3, 1, '2024-01-01 03:02:00'::timestamp, 'login failed'),
(4, 1, '2024-01-01 03:03:00'::timestamp, 'login failed'),
(5, 1, '2024-01-01 03:04:00'::timestamp, 'login failed'),
(6, 1, '2024-01-01 03:05:00'::timestamp, 'login'),
(7, 2, '2024-01-01 10:00:00'::timestamp, 'login'),
(8, 2, '2024-01-01 15:00:00'::timestamp, 'logout'),
(9, 2, '2024-01-01 23:00:00'::timestamp, 'login failed')
)
Sample output
| user_id | consecutive_failures |
|---|---|
| 1 | 5 |
solution(user_id, consecutive_failures) as (
values
(1, 5)
)
Hint 1
This is a typical "gaps and islands" problem.
Hint 2
Use the difference between two ROW_NUMBER() functions to create a group for each user and event type. Partition on user_id for one and partition on both user_id and event_type for the other, ordering both by event_id.