AlloyDB was announced for public preview shortly before we began collaboration with a recent client. At the time, our original intention was to migrate our client’s Snowflake database to BigQuery. This is a smart choice because of BigQuery’s speed, cost efficiency and ease of use (There's no need to manage infrastructure or indexes).
With AlloyDB, we saw the opportunity to not only provide a solution that was more appropriate for the client’s analytical needs, but also an opportunity to champion the cutting edge and be a leader in operational analytics innovations. But what exactly is AlloyDB? And how does it compare to GCP’s best of breed analytical power house, BigQuery?
Migrating databases has specific success criteria and key performance indicators, these play a large part in the business logic that goes into choosing appropriate Database Migration destinations:
AlloyDB is, in a nutshell, a PostgreSQL database with native GCP support. From the front-end, you might never know you’re using anything other than your standard PostgreSQL database, such as with Google Cloud SQL. In the back-end, however, Google has engineered an entirely disaggregated, distributed cluster architecture and added a columnar execution engine to maximize analytical query performance in line with transactional and application performance. Google separates your database into two abstractions, the Primary Instance which you can use for transactions (inserts, updates, deletes), and Read Pools for your reads and analytics at blazing, distributed speeds.
AlloyDB promises up to 100x analytical speed (reading) and 4x transactional throughput from your traditional databases. (In their demo of AlloyDB, they actually demonstrate a 160x increase.) By using a combination of ML-Driven Columnar and traditional Buffer caches, along with horizontally scalable read-only nodes, you typically get the speed class of in-memory data. With AlloyDB, your Storage Layer is entirely separate from your Compute. Not only does this allow you to bill both storage and computation separately, it also allows your system to scale both horizontally and vertically without relation. Read performance, when scaling nodes horizontally in a Read Pool, offers linear and predictable scaling of read connections.
No schema or configuration changes are needed to your workload, and GCP even suggests possibly dropping your analytical indexes altogether when using the Columnar Engine to improve transactional performance.
BigQuery has been Google’s go-to for Data Warehousing. With minimal setup, no need to manage storage capacity, compute, indexes or infrastructure, BigQuery is the analytics and data warehouse choice to many. In minimal time, you can have a production-grade data warehouse running with serverless, autoscaling queries. GCP now provides two highly scalable, relational, analytical systems. For many, it won’t be entirely obvious from the outside just how different these two products are, and whether an operational database or data warehouse is appropriate for their use case.
BigQuery is typically the best choice for offline and large-scale data analytics. As setup is minimal, and there is no need to manage infrastructure, this is a great option for quickly deploying and managing petabyte-level datasets, and getting queries fast.
However, BigQuery is not always the right tool for operational analytics, where analytical queries are built into a user-facing application, and need to run on the most recent transactional data. We’ve also found that customers currently running analytical workloads on operational databases can struggle to migrate to BigQuery with minimal application changes. And finally, for customers looking to perform a small amount of lightweight analytics on top of transactional data, adding another service to their architecture and investing in setting up and maintaining ETL between those services adds too much complexity. In practice, many of them want to query the database directly in real-time instead of setting up a full large-scale data analytics solution.
AlloyDB can help in these scenarios, especially for teams migrating legacy analytical systems to the cloud. With native compatibility with existing PostgreSQL servers, AlloyDB also comes with full integration for the Database Migration Service to provide heterogeneous migration from other database systems. To add on to this, GCP has announced the Database Migration Program for “tooling, resources, and expertise from our network of specialized technology partners”. While head to head performance compared to BigQuery is yet to be seen, AlloyDB will prove more useful and effective to any team with prior infrastructure experience and protocols.
As Google Cloud Platform continues to provide newer, and powerful products, the smart Cloud Practitioner will need to know which use cases are appropriate for which systems. In future posts, we will discuss comparing the friction of migrating a database to both BigQuery and AlloyDB. Afterwards, we will follow-up with benchmarks for both services, including testing the linear scalability of AlloyDB’s Read Pools.