✏️

Autoscaling Group of Google Compute Instances

Google Compute Engine Instance Group is a collection of VMs that you can manage as a single entity. You can create an Instance template and deploy 50 instances by changing a single number in the configurations. But the most interesting part is that you can setup automatic scaling for this group, zero-downtime rolling updates and health-checks, so unhealthy instances would be shut down and recreated automatically!

Neat! Let's do it!

πŸ§‘β€πŸŽ“

This is a bit more of an advanced tutorial, so you need to be familiar with Compute Engine and Docker basics to understand me. Consider checking out my Compute Engine Basics article, it requires no prior knowledge.

TL;DR: source code

β€£
app.js
β€£
Dockerfile
β€£
start.sh
β€£
create.sh
β€£
update.sh
β€£
autoscale.sh

The App, the Dockerfile and the startup-script

I've created a simple NodeJS app that accepts requests and responds with current time.

const http = require('http')
const moment = require('moment')

const server = http.createServer((req, res) => {
  const headers = { 'Access-Control-Allow-Origin': '*' }
  res.writeHead(200, headers)
  res.end('Time is ' + moment().format('YYYY-MM-DD hh:mm a'))
})

server.listen(8080)

It's wrapped into a Docker container and has a startup script over here so that Compute Engine knows how to start it:

FROM node:12-alpine
WORKDIR /usr/app

COPY server.js package.json ./
# COPY <file> <file> <file> <destination>

RUN yarn --frozen-lockfile 
# RUN <command>

RUN apk add --no-cache tini
ENTRYPOINT ["tini", "--"]

CMD ["node", "server.js"]
#!/bin/bash

export HOME=/home/app
mkdir $HOME || echo
cd $HOME
docker-credential-gcr configure-docker

PROJECT_ID=$(curl -X GET http://metadata.google.internal/computeMetadata/v1/project/project-id -H 'Metadata-Flavor: Google')
IMAGE_ID=$(curl -X GET http://metadata.google.internal/computeMetadata/v1/instance/attributes/image-id -H 'Metadata-Flavor: Google')

docker run --pull always -p 8080:8080 gcr.io/$PROJECT_ID/$IMAGE_ID:latest

Refer to the Compute Engine article, if you need details on that.

Create.sh

β€£
create.sh full file

The create.sh script is going to be a bit more complex here, but if you look closer, you'll see that it's just more of the same.

set -e
create.sh

Make this script throw on any unhandled error.

PROJECT_ID=cloud-architect-demo
APP_ID=demo-server
GCR_ADDRESS="gcr.io/$PROJECT_ID/$APP_ID:latest"
ZONE=us-central1-a
create.sh

Set variables for convenience and reusability.

gcloud auth activate-service-account \
	--key-file ./dev-key.json
gcloud config set project $PROJECT_ID
gcloud config set compute/zone $ZONE
create.sh

Activate a service account, set project id and zone to make sure you are changing the right stuff.

gcloud services enable containerregistry.googleapis.com
create.sh

Make sure the Google Container Registry API is enabled.

docker build . -t $GCR_ADDRESS
create.sh

Build an image with a tag, so we can refer to it.

gcloud auth configure-docker
create.sh

Enable docker to push images to the Google Container Registry.

docker push $GCR_ADDRESS
create.sh

And push the image to the Registry

yes | gcloud compute instance-groups managed delete $APP_ID-group || echo
yes | gcloud compute instance-templates delete $APP_ID-template || echo
create.sh

Try to delete the previously created template and instance group. If the template or instance group doesn't exist, then these commands will throw errors. Catch these errors and print them with echo command.

gcloud compute instance-templates create $APP_ID-template \
	--image-project cos-cloud \
	--image-family cos-77-lts \
	--machine-type e2-micro \
	--metadata google-logging-enabled=true,image-id=$APP_ID \
	--metadata-from-file startup-script=./start.sh \
	--tags $APP_ID-tag
create.sh

Create a Compute Engine Template. We'll name it "demo-server-template".

image-project and image-family define the Operating System this VM will run. In this case, I use the last stable build of a Container-optimized OS.

machine-type defines the size of the VM. This is one of the smallest instances. It’ll cost around $4 per month.

I set two pieces of metadata here: google-logging-enabled makes sure the logs of my startup-script will show up in the Cloud Logger and the image-id is my custom metadata that I will need in the startup script.

Speaking of which, here I point at the startup-script itself, so the Compute Engine would know how to start this instance.

And the last argument is tags. I need to tag this template to associate firewall rules with all instances created from this template.

Before we create the instance-group itself, let us prepare the firewall rules and a health-check for it.

gcloud compute firewall-rules create $APP_ID-firewall \
	--allow tcp:8080 \
	--target-tags $APP_ID-tag || echo

gcloud compute firewall-rules create allow-health-check \
	--allow tcp:8080 \
	--source-ranges 130.211.0.0/22,35.191.0.0/16 \
	--network default || echo
create.sh

Create firewall rules. This one allows tcp traffic on port 8080 for all Compute Engine Instances with this tag. And this one allows tcp traffic on port 8080 only for Google's health check IP range. In this case the second rule is not necessary, since all traffic on port 8080 is already allowed by the first command. I've written it here anyways so you would know how to enable health-check-only traffic.

gcloud compute health-checks create http $APP_ID-healthcheck --port 8080 \
	--check-interval 150s \
	--timeout 10s \
	--healthy-threshold 1 \
	--unhealthy-threshold 2 || echo

This command creates the health-check itself. I specify the target port, how often the checkup should run, how long it should wait for the app response, how many checkup successes in a row would mean that the instance is healthy and how many failures would mean that it's time to shut this instance down.

gcloud compute instance-groups managed create $APP_ID-group \
	--size=2 \
	--health-check=$APP_ID-healthcheck \
	--initial-delay=300 \
	--template=$APP_ID-template

And finally, I create the instance-group. I specify its name here. Then the initial number of instances in this group. I assign the health-check we created above, set a generous initial health-check delay, to make sure the instance has time to boot up, and I assign the Compute Engine Template we have created.

Let's deploy it and make sure our setup actually works. Run this command:

sh create.sh

Update.sh

Now, let's write the update script. The update.sh script is actually pretty much the same as it was in the Compute Engine video.

set -e

PROJECT_ID=cloud-architect-demo
APP_ID=demo-server
GCR_ADDRESS="gcr.io/$PROJECT_ID/$APP_ID:latest"
ZONE=us-central1-a

gcloud auth activate-service-account \
  --key-file ./dev-key.json
gcloud config set project $PROJECT_ID
gcloud config set compute/zone $ZONE

docker build . -t $GCR_ADDRESS
gcloud auth configure-docker
docker push $GCR_ADDRESS

gcloud compute instance-groups managed rolling-action replace $APP_ID-group \
  --max-unavailable 50%

Overall, what we do here is this. We push a new image and assign the latest tag to it, because there can only be one latest tag in the GCR images group. So, each newly pushed latest image becomes the only latest image, thus we make sure that the startup-script will pull the right image from the GCR.

The only difference is the last command.

It is running a rolling update by creating new instances and deleting the old ones. The only argument here is max-unavailable. It states that there should always be at least 50% of instances available, to avoid downtime.

Let's run it and see how it works!

sh update.sh

Autoscaling

Okay, now what about the autoscaling? Well, there is autoscaling and it very-much customizable. I'll put it into a separate file here for simplicity, but feel free to put it into the create.sh script.

gcloud compute instance-groups managed set-autoscaling $APP_ID-group \
  --max-num-replicas=10

The simplest version would look like this. It sets the maximal number of instances this autoscaler is allowed to create and the rest is handled with default settings.

But if you need more, then there are plenty of ways to customize it. You can set a minimum number of replicas, set autoscaler to only scale up or stop scaling at all, and so on. But the most interesting part is - you can make it scale based on the number of messages in a Pub/Sub queue. Thanks for reading!