PostgreSQL generate_series for generating time series
Ever needed to aggregate data per time period where you didn’t have data points in every part of the period? I wanted to generate a chart of requests per hour, where I didn’t have requests in every hour. Using a naive PostgreSQL query I ended up with results such as this:
=> SELECT date_trunc('hour', created_at), -> COUNT(*) FROM events GROUP BY 1 LIMIT 10; date_trunc | count ---------------------+------- 2021-03-22 10:00:00 | 32 2021-03-22 11:00:00 | 92 2021-03-22 13:00:00 | 213 2021-03-22 14:00:00 | 237 2021-03-22 16:00:00 | 331 2021-03-22 17:00:00 | 389 2021-03-22 18:00:00 | 452 2021-03-22 19:00:00 | 493 2021-03-22 20:00:00 | 273 2021-03-22 21:00:00 | 152 (10 rows)
Clearly there is no data for 12:00-13:59 and 15:00-15:59 in the database. Wouldn’t it be great if we can fix this at the database level? That’s where the PostgreSQL generate_series()
function comes into play. We can use it to generate a consecutive range of items, which we can then join with our events table.
=> SELECT * FROM generate_series('2021-04-22 10:00'::timestamp, -> '2021-04-22 21:00', '1 hours'); generate_series --------------------- 2021-04-22 10:00:00 2021-04-22 11:00:00 2021-04-22 12:00:00 2021-04-22 13:00:00 2021-04-22 14:00:00 2021-04-22 15:00:00 2021-04-22 16:00:00 2021-04-22 17:00:00 2021-04-22 18:00:00 2021-04-22 19:00:00 2021-04-22 20:00:00 2021-04-22 21:00:00 (12 rows)
Depending on the ORM you’re using you’ll need to find out how to join this with your events.