Improving Global Latency with Multi-Region Deployment on GCP

January 22, 2025 · 5 min read

Software Engineer

As our users base expands worldwide, Bucketeer noticed that latency was becoming an issue for our users who operate in different regions. In response, we decided to implement a multi-region architecture on Google Cloud Platform (GCP) to reduce latency and improve the overall performance of our platform.

Why Multi-Region?

Many of our users operate applications that serve customers globally. As the demand for low-latency responses increases, requests from distant regions often experience high response times, causing performance degradation.

Our main cluster is located in Tokyo and before implementing multi-region support, we saw that requests from regions like Europe (Frankfurt) could take as long as 400-500ms before the SDK returned a response. Ideally, we wanted to bring that latency under 100ms for users worldwide. To address this, we chose seven GCP regions to deploy our infrastructure, with additional regions planned for the future, including Mexico. These are:

Asia: Tokyo, Taiwan, Singapore
South America: São Paulo
US: Iowa, Virginia, Oregon

By deploying our application across multiple regions, we can bring the data closer to users, ensuring faster response times regardless of where they are located.

Technical Implementation

We implemented the multi-region deployment using Google Kubernetes Engine (GKE) and deployed our application via Helm. Our application architecture includes four microservices:

API: The SDK endpoint service.
Batch: Background tasks running periodically.
Subscriber: Background tasks processing user data.
Web: Admin console for managing the platform.

To make multi-region support possible, we used Kubernetes MultiClusterIngress and MultiClusterService capabilities within our Helm chart. This allowed us to manage traffic routing and service discovery across regions.

Cluster Setup

Because GCP imposes certain restrictions on Kubernetes, we had to designate a main cluster—which would act as the "primary" cluster for managing deployments. We selected Tokyo as our main region.

In the main cluster (Tokyo), we deploy all four microservices.
In the child clusters (Taiwan, Singapore, São Paulo, Iowa, Virginia, and Oregon), we deploy only the API and Web services, which are the ones handling the SDK and Public API requests.

We decided to use GCP Load Balancer to manage traffic across regions, as it allows us to expose the same endpoint for users, regardless of their location. This simplifies the setup and makes it easier to route traffic to the nearest region without additional complexity.

Data Consistency & Synchronization

One challenge with multi-region deployments is ensuring data consistency across regions. For this, we opted for a single MySQL database located in the main region (Tokyo) for centralizing data. Since having a single database in one region can potentially increase latency for users in other regions, we implemented a caching mechanism.

We deployed Redis instances in each child region (Taiwan, Singapore, São Paulo, Iowa, Virginia, and Oregon), which are updated by the Batch service every minute. This ensures that the cache remains in sync with the database while providing faster access to data for requests from those regions.

Global Request Flow

When the client makes a request to Bucketeer, the traffic is routed to the nearest region using GCP Load Balancer. The typical flow of a request looks like this:

A user in Europe (Frankfurt) makes a request to the Bucketeer API.
The GCP Load Balancer identifies the nearest available region and routes the request to one of the regions in Asia (Tokyo, Taiwan, or Singapore), depending on the load and proximity.
The API service in the selected region checks the local Redis cache for the relevant data.
If the cache is up-to-date, the API responds quickly with the data.
If the data is not in the cache, the API queries the central MySQL database (located in Tokyo), updates the cache, and then responds.

Visualizing the Global Request Flow

Here's a simple Mermaid diagram that illustrates how the traffic is routed from the user to the nearest region:

Lessons Learned

In the beginning, I tried to use the Cross-region feature for GCP Memorystore Redis so I wouldn't need to implement the update cache mechanism. Unfortunately, it has a limitation of 2 child instances, and we already have more than 7 instances across many regions.

Also, our CD flow became a bit complex because PipeCD doesn't support multi-cluster, so I had to implement a new PipeCD application for each cluster to increase the number of configuration files and monitor them.
They plan to support this by March, though.

Conclusion

By implementing a multi-region deployment on GCP, we've significantly improved Bucketeer's performance for users worldwide. Our approach of using Kubernetes, Helm, GCP Load Balancer, and Redis for caching has allowed us to provide low-latency responses to users regardless of their location, which has been a game-changer for our platform.

While the setup comes with its own set of challenges, particularly around managing infrastructure and maintaining data consistency, the benefits of reduced latency and improved user experience have been well worth the effort.

We're excited to continue expanding Bucketeer's multi-region presence and look forward to making further improvements to enhance performance for our global user base.

Why Multi-Region?​

Technical Implementation​

Cluster Setup​

Data Consistency & Synchronization​

Global Request Flow​

Visualizing the Global Request Flow​

Lessons Learned​

Conclusion​