Private GKE Clusters With Terraform, Part 1 - The Design

Posted by @pcostesi on 2021-10-25Time to read: 2 minutes

As part of a research I'm doing for work, we came up with the need to provision and manage a Kubernetes cluster (GKE-based) running on Google Cloud Platform, with the following set of constraints:

Requests from the cluster must come from the same (static) IP address.
It should be able to serve traffic using an ingress bound to a specific (static) IP as efficiently as possible.
It must be able to connect to other Google Cloud services.
We should be able to run ephemeral tasks cheaply.
This should be a proof of concept, so set-up and tear-down should be easy.

Design

After getting the requirements and constraints we have to analyze our options:

The first requirement dictates that we should probably use a private cluster behind a separate VPC and use a NAT to expose it to the Internet.
The second requirement tells us that we should reserve an IP; nothing out of the ordinary.
The third, that the VPC should have access to Google services.
The fourth means that the cluster should have at least two node pools (maybe even one of them ephemeral).
Finally, the fifth means we're going to use Terraform or a nice set of scripts to accomplish it.

So now we have a pretty good idea about how our prototype may look: a set of Terraform files that deploy a private GKE cluster. We will leave provisioning for another time.

Of course, you should be asking "Why are we using K8s"? in the first place. But we (at my company) need to deploy something on that cluster that absolutely demands it.

Networking

You should really follow the advice from experts on the subject. This is just a proof of concept, so we're going to do the bare minimum — you should read about good practices from Google and others.

We will create a VPC and a NAT router with a fixed regional IP. Then, we will create a global IP that we will use for the ingress.

Pools

Our node pools should be able to handle different workloads:

A "Key" pool will host every essential service.
A "Standard" pool will handle every non-essential, replicated service.
A "Task" pool will be used to run CronJobs and Jobs.

Ingress

We want to leverage the GKE ingress and container-native load balancing. This is related to point #2 of our constraints -- we need to serve content as efficiently as possible. NEGs enable us to directly target containers, and the GKE ingress can accept extra configuration options to enable CloudCDN support.

Terraform

Since this is a proof of concept, we will split the Terraform files across three different sub-projects:

Infrastructure and Cluster creation
Cluster services and Developer Platform
A sample application

In the next post we will see how to setup the first of the Terraform modules!