Metric correlation 🔀
Scenario
A company has banded their customers into segments and calculated several metrics for each customer.
The company wants to know which pairs of metrics are most correlated within each segment.
Question
For each customer segment, find the highest correlated pair of metrics.
The correlation should be rounded to four decimal places, and the output should keep ties in the (rounded) correlation. Don't compare a metric to itself 😄
The output should have a row per segment and metric pair, with the columns:
segmentmetric_pairas the metric pair in the formatmetric_1, metric_2. Put the (lexicographically) lower metric name on the left using your database's default collationcorrelationas the correlation between the two metrics, rounded to four decimal places
Order the output by segment and metric_pair.
Expand for the DDL
create table metrics (
customer_id integer primary key,
segment integer not null,
metric_1 integer not null,
metric_2 integer not null,
metric_3 integer not null,
metric_4 integer not null,
metric_5 integer not null
);
insert into metrics
values
( 1, 7, 17, 93, 23, 110, 41),
( 2, 4, 22, 67, 38, 89, 37),
( 3, 6, 62, 30, 90, 92, 178),
( 4, 1, 20, 82, 87, 102, 37),
( 5, 1, 7, 34, 50, 41, 77),
( 6, 8, 49, 6, 24, 55, 19),
( 7, 2, 19, 64, 63, 83, 9),
( 8, 4, 46, 59, 21, 105, 64),
( 9, 1, 28, 74, 10, 103, 59),
( 10, 6, 92, 77, 28, 170, 52),
( 11, 7, 27, 78, 33, 106, 77),
( 12, 4, 10, 17, 97, 28, 88),
( 13, 6, 47, 74, 37, 122, 76),
( 14, 1, 93, 7, 69, 100, 8),
( 15, 4, 93, 38, 71, 131, 71),
( 16, 7, 21, 58, 43, 80, 74),
( 17, 1, 37, 7, 54, 44, 15),
( 18, 2, 58, 94, 62, 152, 99),
( 19, 7, 89, 52, 88, 142, 71),
( 20, 5, 100, 7, 48, 107, 99),
( 21, 7, 50, 89, 31, 139, 69),
( 22, 5, 2, 28, 40, 29, 89),
( 23, 2, 78, 74, 52, 153, 39),
( 24, 3, 61, 65, 27, 126, 51),
( 25, 10, 48, 98, 93, 145, 93),
( 26, 9, 60, 21, 54, 81, 85),
( 27, 4, 19, 48, 19, 67, 25),
( 28, 3, 56, 26, 53, 83, 98),
( 29, 4, 32, 43, 86, 75, 67),
( 30, 6, 41, 1, 66, 43, 132),
( 31, 7, 97, 32, 91, 129, 4),
( 32, 0, 91, 85, 52, 176, 65),
( 33, 10, 59, 66, 18, 125, 88),
( 34, 5, 83, 35, 77, 118, 153),
( 35, 2, 38, 52, 32, 89, 24),
( 36, 0, 10, 0, 75, 10, 62),
( 37, 6, 19, 92, 31, 111, 53),
( 38, 7, 99, 3, 24, 102, 71),
( 39, 2, 54, 41, 61, 96, 17),
( 40, 6, 75, 61, 50, 136, 108),
( 41, 4, 88, 46, 68, 133, 38),
( 42, 0, 87, 60, 32, 146, 27),
( 43, 9, 41, 25, 91, 65, 34),
( 44, 7, 76, 28, 39, 103, 56),
( 45, 2, 18, 14, 13, 31, 29),
( 46, 8, 26, 38, 1, 64, 53),
( 47, 10, 90, 53, 30, 143, 56),
( 48, 4, 3, 20, 57, 23, 32),
( 49, 7, 28, 5, 71, 34, 98),
( 50, 0, 32, 37, 30, 69, 82),
( 51, 3, 69, 35, 63, 105, 125),
( 52, 4, 67, 70, 41, 136, 70),
( 53, 8, 53, 68, 67, 122, 58),
( 54, 8, 14, 71, 95, 85, 13),
( 55, 5, 55, 88, 67, 143, 133),
( 56, 7, 42, 80, 8, 121, 2),
( 57, 10, 69, 79, 30, 148, 99),
( 58, 3, 87, 57, 69, 144, 132),
( 59, 7, 59, 68, 80, 128, 58),
( 60, 5, 34, 36, 59, 70, 118),
( 61, 6, 54, 94, 25, 148, 44),
( 62, 10, 62, 45, 68, 107, 7),
( 63, 6, 97, 98, 99, 195, 201),
( 64, 0, 64, 25, 43, 89, 25),
( 65, 7, 15, 38, 51, 53, 21),
( 66, 3, 37, 15, 1, 51, 2),
( 67, 8, 34, 64, 12, 98, 68),
( 68, 3, 85, 58, 30, 143, 58),
( 69, 7, 21, 90, 79, 111, 55),
( 70, 5, 43, 64, 18, 107, 41),
( 71, 5, 53, 85, 22, 138, 45),
( 72, 10, 67, 49, 70, 117, 11),
( 73, 10, 97, 5, 6, 102, 11),
( 74, 7, 26, 59, 42, 85, 55),
( 75, 6, 5, 25, 92, 30, 176),
( 76, 4, 76, 26, 3, 102, 95),
( 77, 1, 18, 53, 95, 71, 8),
( 78, 0, 79, 51, 82, 130, 74),
( 79, 1, 72, 63, 3, 136, 48),
( 80, 9, 45, 24, 5, 70, 47),
( 81, 0, 46, 40, 19, 86, 73),
( 82, 8, 34, 72, 17, 107, 54),
( 83, 7, 36, 54, 47, 90, 3),
( 84, 3, 50, 57, 8, 107, 15),
( 85, 2, 66, 11, 7, 77, 53),
( 86, 0, 0, 82, 95, 83, 5),
( 87, 4, 13, 38, 16, 51, 14),
( 88, 6, 61, 10, 31, 72, 56),
( 89, 7, 21, 17, 34, 38, 34),
( 90, 3, 77, 31, 96, 108, 197),
( 91, 3, 90, 27, 44, 117, 79),
( 92, 0, 87, 24, 64, 110, 38),
( 93, 6, 90, 32, 98, 122, 193),
( 94, 10, 82, 65, 19, 147, 48),
( 95, 1, 58, 93, 68, 152, 9),
( 96, 7, 42, 43, 16, 84, 27),
( 97, 5, 29, 31, 1, 60, 9),
( 98, 2, 49, 70, 58, 119, 6),
( 99, 6, 51, 51, 86, 102, 162),
(100, 4, 18, 35, 85, 53, 30)
;
The solution can be found at:
Sample input
| customer_id | segment | metric_1 | metric_2 | metric_3 | metric_4 | metric_5 |
|---|---|---|---|---|---|---|
| 1 | 1 | 21 | 58 | 66 | 79 | 29 |
| 2 | 0 | 70 | 55 | 79 | 125 | 2 |
| 3 | 1 | 68 | 55 | 10 | 123 | 70 |
| 4 | 1 | 20 | 62 | 59 | 82 | 25 |
| 5 | 0 | 42 | 9 | 80 | 51 | 13 |
| 6 | 1 | 26 | 89 | 17 | 115 | 66 |
| 7 | 1 | 45 | 51 | 90 | 96 | 17 |
| 8 | 0 | 4 | 52 | 47 | 56 | 61 |
| 9 | 0 | 57 | 48 | 82 | 105 | 40 |
| 10 | 1 | 17 | 93 | 45 | 109 | 76 |
with metrics(customer_id, segment, metric_1, metric_2, metric_3, metric_4, metric_5) as (
values
( 1, 1, 21, 58, 66, 79, 29),
( 2, 0, 70, 55, 79, 125, 2),
( 3, 1, 68, 55, 10, 123, 70),
( 4, 1, 20, 62, 59, 82, 25),
( 5, 0, 42, 9, 80, 51, 13),
( 6, 1, 26, 89, 17, 115, 66),
( 7, 1, 45, 51, 90, 96, 17),
( 8, 0, 4, 52, 47, 56, 61),
( 9, 0, 57, 48, 82, 105, 40),
(10, 1, 17, 93, 45, 109, 76)
)
Sample output
| segment | metric_pair | correlation |
|---|---|---|
| 0 | metric_1, metric_3 | 0.9051 |
| 1 | metric_4, metric_5 | 0.8357 |
solution(segment, metric_pair, correlation) as (
values
(0, 'metric_1, metric_3', 0.9051),
(1, 'metric_4, metric_5', 0.8357)
)
Hint 1
Use a correlation function, usually called CORR, to calculate the correlation between two metrics.
If you're using a database that doesn't have a built-in correlation function, you can try to calculate it manually -- but I'd instead recommend skipping this question.
Hint 2
To get every pair of metrics "side by side" for the CORR function, unpivot the table so that each metric is in its own row and then join the resulting table to itself on the segment and customer_id columns.