Skip to main content

Introducing PostgreSQL as a Data Warehouse Option

· 4 min read
Hien Vu Ngoc
Software Engineer

PostgreSQL is now supported as a data warehouse (DWH) option in Bucketeer 2.2.0, alongside BigQuery and MySQL. We added it because Postgres supports TimescaleDB, which fits OLAP-style event and analytics workloads well.

Why add PostgreSQL?

Bucketeer has supported BigQuery (GCP) and MySQL (for Bucketeer Lite and self-hosted) for some time. We added PostgreSQL because it supports TimescaleDB, an extension that targets OLAP (analytical) workloads: hypertables, time-based partitioning, and compression suit the event and time-series data that the DWH stores. You can run plain PostgreSQL or PostgreSQL with TimescaleDB. That gives you a self-hosted option with analytical capabilities in the same vein as BigQuery, without tying you to GCP.

What's Included

PostgreSQL is supported as a data warehouse — the store for goal and evaluation events and event counters that power A/B testing and analytics. The same flow as with MySQL applies: events are written by the subscriber and event counter services into Postgres.

What’s in scope:

  • PostgreSQL client and storage for goal/evaluation events and event counters (mirroring the existing MySQL DWH behavior).
  • Subscriber and event counter services updated to support PostgreSQL.
  • Docker Compose and Kubernetes/Helm (and dev container) support so you can run Bucketeer with Postgres as the DWH.

Implementation details and schema are in the Bucketeer repository; this post stays high level.

When to Use Which (Usage)

OptionBest for
BigQueryFull Bucketeer on Google Cloud — when you want managed analytics and are already on GCP (Pub/Sub, GKE, etc.).
MySQLBucketeer Lite or self-hosted — when you prefer MySQL for the DWH or already run MySQL elsewhere. Works with Docker Compose and Kubernetes.
PostgreSQLBucketeer Lite or self-hosted — when you want a self-hosted DWH with optional TimescaleDB for OLAP/analytics. Same deployment options as MySQL (Docker Compose, K8s).

How PostgreSQL compares

Hosting and deployment

  • BigQuery — Managed service on GCP. You need a GCP project, IAM, and a BigQuery dataset. Messaging is Google Pub/Sub. Suited when the rest of Bucketeer (and your stack) already runs on GCP.
  • MySQL — Self-hosted. You run MySQL yourself (Docker Compose or Kubernetes/Helm). With Bucketeer Lite, messaging is Redis Streams. No cloud lock-in; you operate the database.
  • PostgreSQL — Same idea as MySQL: self-hosted, Docker Compose or Kubernetes/Helm, Redis Streams for Lite. The difference is the engine and the extension ecosystem (notably TimescaleDB).

Analytics and OLAP

  • BigQuery — Built for large-scale analytics: columnar storage, distributed queries, no schema or capacity management on your side. You pay for storage and query usage.
  • MySQL — General-purpose relational store. Works well for the DWH’s event and counter tables; analytical queries are fine at moderate scale. No built-in time-series or OLAP extensions in the same way as TimescaleDB.
  • PostgreSQL — General-purpose too, with stronger analytical SQL (e.g. window functions, CTEs, JSON). TimescaleDB extends it for OLAP and time-series: hypertables (automatic time-based partitioning), compression, and query patterns that fit event and analytics workloads. You can run plain PostgreSQL or enable TimescaleDB depending on how analytical you need to get.

Scale and cost

  • BigQuery — Intended for larger scope: high volume (e.g. millions of messages), managed scaling, pay-per-use. Fits when you need that scale and are on GCP.
  • MySQL and PostgreSQLCost savers for smaller projects. You run the database yourself (Docker or K8s), so there’s no per-query or per-message cloud bill. Suited for lower to moderate event volume and teams that prefer to own infra and control cost.

Summary

AspectBigQueryMySQLPostgreSQL
HostingGCP only, managedSelf-hostedSelf-hosted
Typical scaleLarge (e.g. millions of messages)Smaller projects, cost-consciousSmaller projects, cost-conscious
OLAP / time-seriesNative (managed)General SQLGeneral SQL + optional TimescaleDB
OperationsMinimal (managed)You operate DBYou operate DB

BigQuery when you need large-scale, managed analytics on GCP; MySQL or PostgreSQL when you want a cost-effective, self-hosted DWH for smaller or moderate traffic. On Postgres, TimescaleDB is an optional extension for OLAP-oriented event and time-series storage.

How to Get Started

PostgreSQL as a data warehouse is available. To try it:

  • Check the Bucketeer repository for Docker Compose and Kubernetes/Helm configuration and the event table setup steps.

What's next

PostgreSQL support in this release is for the data warehouse (event and analytics storage) only. We’re also working on PostgreSQL as a data storage alternative for the main application — the same place MySQL is used today for the console and core data. When that lands, you’ll be able to run Bucketeer Lite or self-hosted with PostgreSQL for both storage and DWH, alongside the existing MySQL option.