Touching Production: Review and Change (Part 2)

Two weeks ago I wrote about touching production. I described how I prepared scripts and queries for a migration of image names. The images are stored in Cloud Storage and their object names are referred to in a relational database. I came up with three steps for the migration, all capable of being applied while the site continues to serve the images.

  • Copy old storage objects to new storage objects.
  • Update the table in the relational database to refer to the new name.
  • Remove old storage objects from Cloud Storage.

For the first and the second step I came up with shell scripts. Basically hundreds of thousands of lines calling gsutil, the command line utility to administer Cloud Storage. The second step was a file containing about 150k SQL UPDATE statements.

The Review

It is ~good~ required practice in my team that we review each others work. The systems we manage are incredibly complex and every one of us has a different mental model of how our systems work. And then there is how the systems actually work. 🙃 Reviews are therefore essential to avoid the biggest disasters and keep things running smoothly.

Pushing a change of roughly a million lines through review needs good communication. It is not enough to just drop the files in a pull request and wait for someone to pick it up. So I explained to the reviewer how I came up with the files. What I believe the systems looks like today and how I would like the system to look and behave tomorrow. This may be the most underappreciated part of conducting reviews: Having a chance to synchronize mental models inside SRE and across teams. The commit message is often an executive summary of what has been done and what the overall goal of the change is. However, by pairing up and walking someone through my thought process has not only been an extremely valuable feedback loop for myself but also lead to better code in the end.

Back to the migration change: The reviewer came up with some additional test cases and together we developed a plan for applying the migration scripts. We also had an interesting discussion about whether or not the shell scripts are I/O bound.

The Shell Scripts: Trade-offs

The shell scripts had each ~450k lines of calling gsutil. As far as I knew, gsutil has no batch mode. That’s why I had two options only:

  • Call gsutil, a thoroughly tested and trusted tool again and again. This puts a lot of overhead on the kernel for spawning new processes and context switching between them.
  • Write a tool that repeatedly makes calls to the API, thus implementing the missing batch behavior. This tool would need to get tested thoroughly before being ready for showtime on production.

Our SRE team is small which implies that engineering time is almost always more precious than computing time. That’s why I made the decision to rather spend some compute resources than investing another two or three hours into a custom tool that we would use only once. But how much compute are we talking about here? And what is the bottleneck when we run the scripts? My reviewer suggested it might be I/O bound because gsutil operations often take up to a second. Most of the time is spent waiting for Cloud Storage to return how the operation went. I was under the impression that whenever we would wait for a call to return, we could schedule another process to do it’s magic (for example, starting up).

To find out I created an instance with 64 CPU cores and more than enough memory to fit the processes and data.

We’ll have a look at the step2-remove.sh script, but more or less the same applies for the other shell script, too.

The file’s content looked like this:

~~~plain gsutil rm ‘gs://my-bucket/dir1/dir2/hytinj’ gsutil rm ‘gs://my-bucket/dir1/dir2/hytinj_b’ gsutil rm ‘gs://my-bucket/dir1/dir2/hytinj_m’


In total, the file had 466,401 lines like that.
To distribute the workload on all 64 cores I split the file at 7288 lines, that is 466,401 divided by 64 plus and then incremented by 1 to make up for rounding errors.

~~~~plain
$ split -l 7288 step2-remove.sh step2-remove-sharded.

That gave me 64 files of roughly the same length:

$ ls step2-remove-sharded.*
-rw-r--r--  1 danrl  staff   351K Aug  3 10:35 step2-remove-sharded.aa
-rw-r--r--  1 danrl  staff   351K Aug  3 10:35 step2-remove-sharded.ab
-rw-r--r--  1 danrl  staff   351K Aug  3 10:35 step2-remove-sharded.ac
✂️
-rw-r--r--  1 danrl  staff   351K Aug  3 10:35 step2-remove-sharded.cl

To run them in parallel I looped over them and sending the processes to the background:

$ for FNAME in step2-remove-sharded.*; do sh $FNAME & done

Looking at htop and iftop I have the feeling that the bottleneck really was the CPU here. The poor thing was context switching all the time, desperately waiting for I/O.

htop

As expected, memory and bandwidth usage was rather low. The instance had tens of Gigabytes of memory left unused and could have easily handled 10 GBit/s of network I/O.

iftop

In total, the shell scripts ran for three hours costing us a little less than USD 5. That is orders of magnitude cheaper than any investment in engineering time. Sometimes, a trade-off means that we wouldn’t build the fancy solution but rather throw compute or memory at a one-time problem.

The SQL Script: Managing Risk

The more interesting, because more delicate, part of the migration was running the SQL statements on the live production database. Relational databases are a piece of work… Not necessarily a distributed system designer’s dream but that’s another story.

When the reviewer and I deployed the SQL change, we gradually took more risk as we proceeded. First, we started with a single statement of which we knew it was only affecting an image belonging to a well-known account.

After executing this single statement we ran some tests to see if everything would work as expected, including the caches. Since all tests were green, we were going for ten statements. Then we tested again. We increased to 100 statements, 1k statements, and finally settled with a chunk size of 10k statements for the rest of the migration.

This ramp-up of risk (every change carries some risk) is pretty common when we do changes to production. We like to be able to roll back small changes as early as possible to affect only a few customers. On the other hand, we like to get the job done eventually. We know that engineering time is precious and that hate boring, repeating work. We use this pattern of increasing by orders of magnitude all the time, from traffic management (e.g. 0.1% of users hitting a new release) to migrating storage objects or table rows.

Conclusion

With a hands-on approach and by making reasonable trade-offs, we were able to migrate the legacy image names unnoticed by our users. Once again we touched production without causing a disaster. As we say in my team whenever someone asks us what we do: We touch production, every day, all day long, and sometimes during the night.

Touching Production: What does that mean? (Part 1)

Sometimes people ask me what I do all day as an SRE. I usually reply that I touch production every day. But what does touching production even mean? Let me share the typical SRE task of preparation for touching production at eGym (the company I work for).

The Problem Statement

We have a product that allows users to create and upload images of a certain kind. Those images are stored in a bucket on Cloud Storage. The image name is a long randomized string (similar to UUID). The image name is then referenced in a relational database table. At some time in the past, we used very short image names of up to six characters. When we began making the image names part of the external API, we had to rename those legacy images. Longer, better-randomized names are harder to predict and increase the security in case an attacker starts guessing image names. Some images were still using the old names. My task was to migrate those legacy images to longer, hard to predict names while requests were coming in.

Assessing The Scale And Impact

The second thing I usually do is to assess the scale and the expected impact of a task. The first thing is always making sure I understood the problem correctly by talking to the people who issues the request or developed the systems I am about to touch. The scale and the expected impact determine which tools I use and what approaches are feasible. Here I had to understand if we were talking about a month-long migration of data while all systems continue to serve traffic, or if we can apply some database changes in a single transaction and be done in a minute.

I queried the read replica of the production database to get the number of rows that host old-style image names (those with a length of six characters or less):

SELECT COUNT(id) FROM Image where imageType='MY_TYPE' and length(imageId) <= 6;

The result was something around 150k rows. That’s not much. This was a number that I could easily handle in memory on my local machine. From the problem statement, I knew that all new image names have been using much longer, randomized names for a long time. So the dataset we are talking about is stable and not going to change between migration planning and actual migration. A dynamic approach was therefore not needed.

Preparing Metadata Migration

To start development I wanted to have a handy copy of the dataset. I ran the select query again, but this time fetching every row and exporting into a CSV file:

SELECT id, imageId FROM Image where imageType='MY_TYPE' and length(imageId) <= 6;

I peeked into the resulting file to make sure I got the right thing:

$ head dataset.csv 
id,imageId
844365,hytinj
344614,hyt459
460974,hyt8is
834613,hytlf4
832009,hytmps
334627,hytug5
408177,hyt4c4
692956,hyt8u1
874342,hytb7g

I also wanted to make sure I got all the rows. So another sanity check was to count the lines of the CSV file:

$ wc -l < dataset.csv 
  155468

That looked good! Now I wanted to have a new image name, ideally a UUID, for every image. An easy way to do that is to just pipe the file through a custom program that does exactly that. My favorite language is currently Golang, so guess in what language I was writing the tool?

func main() {
  scanner := bufio.NewScanner(os.Stdin)
  for scanner.Scan() {
    fmt.Printf("%v,%v\n", scanner.Text(), uuid.New().String())
  }
  if err := scanner.Err(); err != nil {
    log.Fatal("scan: ", err)
  }
}

This program read from standard input and added a generated UUID to the input line. Something similar to ,1234-567-890. An line reading foo,bar on standard input becomes foo,bar,1234-567-890 on the standard output. This allowed me to create a new CSV file based on the dataset.csv file.

tail -n+2 dataset.csv | go run main.go > dataset-new.csv

Hint: tail -n+2 skips the CSV header line.

Peaking into the output gave me this:

$ head -n 3 dataset-new.csv 
844365,hytinj,cd616cba-52dd-4b81-b358-ed5e5672ae4c
344614,hyt459,88d1debe-4e9e-4482-9c06-b656efadfd62
460974,hyt8is,981d9276-2e93-47b7-962a-4ad35edf995a

The file dataset-new.csv is now the source of truth for how the rows should look like in the future. The only thing that is missing for the database part of this migration is a set of queries that we can apply. Sticking to my preference for small Golang tools I modified the previously used program to look like this:

func main() {
  scanner := bufio.NewScanner(os.Stdin)
  for scanner.Scan() {
    csv := strings.Split(scanner.Text(), ",")
    fmt.Printf("UPDATE `Image` SET `imageId`='%v' WHERE `id`='%v';\n",
      csv[2], csv[0])
  }
  if err := scanner.Err(); err != nil {
    log.Fatal("scan: ", err)
  }
}

This would create SQL queries based on the data in the CSV. I saved the queries in a file for later use:

$ go run main.go < dataset-new.csv > migration.sql

And then I ran the usual sanity checks:

$ wc -l < migration.sql
  155467
$ head -n 3 migration.sql
UPDATE `Image` SET `imageId`='cd616cba-52dd-4b81-b358-ed5e5672ae4c' WHERE `id`='844365';
UPDATE `Image` SET `imageId`='88d1debe-4e9e-4482-9c06-b656efadfd62' WHERE `id`='344614';
UPDATE `Image` SET `imageId`='981d9276-2e93-47b7-962a-4ad35edf995a' WHERE `id`='460974';

That was looking good! The queries for updating the image metadata table in the relational database are done. But the actual files need to be renamed for the reference to be valid.

Preparing Storage Object Migration

Preparing the storage object migration turned out to be a bit more complicated. We not only store the image binary data on Cloud Storage, we also store variations of the file. Those variations have an object name that follows a particular pattern. So for an image named foo we store at least three objects in the bucket:

  • foo: The original
  • foo_b: A variation of the original
  • foo_m: Another type of variation

These variations are all present for all objects that I had to potentially touch. From the documentation, I could also see that there might be just another variation foo_l. However, it was not clear if they are still in the bucket or already deprecated. I had to find this out before I could continue.

I got myself the list of all items in the bucket using the gsutil command:

$ gsutil ls gs://my-bucket/dir1/dir2/ > objects.txt

That yielded a very long list of object paths:

$ head -n 3 objects.txt 
gs://my-bucket/dir1/dir2/<random string>
gs://my-bucket/dir1/dir2/<random string>_b
gs://my-bucket/dir1/dir2/<random string>_m

To skip non-variations I used grep matching on the underscore (which we use in variations only). I piped the result to sed to extract the variation types from the object paths:

$ grep '_' < objects.txt | sed -E 's/^(.*)_(.+)$/\2/'
b
m
b
...

I got a long list of variations. Way too many for a human to check by hand. Since I was only interested in the type of variations, not the number of variations, I used the popular dream team sort and uniq to minimize the dataset:

$ grep '_' < objects.txt |  sed -E 's/^(.*)_(.+)$/\2/' | sort | uniq
b
m

This is for sure not a very efficient way, but on a dataset as small as the one I was dealing with, the whole operation only took a couple of seconds. Luckily, the result showed that I only had to care about the b and m variations. These are the only ones in production currently. Cool!

One thing I had to keep in mind was, that if I changed the image names in the relational database, I also had to change it at the same time on Cloud Storage. There is no such thing as “at the same time” in computing. So I had to have a migration strategy to ensure consistency at all times. The strategy was rather simple, though:

Copy all affected objects to their new names Run the database transaction Remove the old objects after a cool-down period (image names may be cached, we may want to roll back the transaction, you name it…)

I had the SQL queries already. The other two things that were missing were the bucket modifications. Since I wasn’t in a hurry, I chose to just generate myself a shell script that calls gsutils over and over again. Again, this is not a very efficient solution. In SRE, we chose efficiency over simplicity very selectively. As a rule of thumb, you could say: If it fits into memory, consider manipulating it there instead of introducing additional complexity.

Generating the migration scripts was as easy as changing a couple of lines in my little Golang helper program.

func main() {
  scanner := bufio.NewScanner(os.Stdin)
  for scanner.Scan() {
    csv := strings.Split(scanner.Text(), ",")
    fmt.Printf("gsutil cp 'gs://my-bucket/dir1/dir2/%v'   "+
      "'gs://my-bucket/dir1/dir2/%v'\n",
      csv[1], csv[2])
    fmt.Printf("gsutil cp 'gs://my-bucket/dir1/dir2/%v_b' "+
      "'gs://my-bucket/dir1/dir2/%v_b'\n",
      csv[1], csv[2])
    fmt.Printf("gsutil cp 'gs://my-bucket/dir1/dir2/%v_m' "+
      "'gs://my-bucket/dir1/dir2/%v_m'\n",
      csv[1], csv[2])
  }
  if err := scanner.Err(); err != nil {
    log.Fatal("scan: ", err)
  }
}

I ran the program to generate a shell script.

$ go run main.go < dataset-new.csv > step1-copy.sh
$ head -n 3 step1-copy.sh
gsutil cp 'gs://my-bucket/dir1/dir2/hytinj'   'gs://my-bucket/dir1/dir2/cd616cba-52dd-4b81-b358-ed5e5672ae4c'
gsutil cp 'gs://my-bucket/dir1/dir2/hytinj_b' 'gs://my-bucket/dir1/dir2/cd616cba-52dd-4b81-b358-ed5e5672ae4c_b'
gsutil cp 'gs://my-bucket/dir1/dir2/hytinj_m' 'gs://my-bucket/dir1/dir2/cd616cba-52dd-4b81-b358-ed5e5672ae4c_m'

This script can be run from the shell of a maintenance host with access to the production data. I needed the same for the deletion step. At this point you can probably predict what the code will be:

func main() {
  scanner := bufio.NewScanner(os.Stdin)
  for scanner.Scan() {
    csv := strings.Split(scanner.Text(), ",")
    fmt.Printf("gsutil rm 'gs://my-bucket/dir1/dir2/%v'\n", csv[1])
    fmt.Printf("gsutil rm 'gs://my-bucket/dir1/dir2/%v_b'\n", csv[1])
    fmt.Printf("gsutil rm 'gs://my-bucket/dir1/dir2/%v_m'\n", csv[1])
  }
  if err := scanner.Err(); err != nil {
    log.Fatal("scan: ", err)
  }
}

I spare you the output. But it is a list of shell commands in the form of gsutil rm <object>.

Due Diligence

Humans sometimes make mistakes. Humans that work on automation or scripting migrations sometimes create disasters. To avoid disasters (or at least the obvious ones) every piece of code that changes production has to go through a review process on my team. I submitted the files step1-copy.sh, migrations.sql, and step2-remove.sh for review and can’t wait to see what mistakes my fellow engineers will find. They are the best at spotting those. 🧐 Only after scripts and transactions have been reviewed, we actually touch production.

I hope you enjoyed that little peek into what one of the many forms of touching production is prepared.

How To Write A Tiny Shell In C

I was wondering how complex some shells are. That got me thinking what a very minimal -but useable- shell would look like. Could I write one in less than 100 lines of code? Let’s see!

A shell needs to execute commands. This can be done by overloading the current program with a new program and executing it (a lot more happens here, actually, but that is for another time). This can be done with the exec() family of functions. That seems to be a nice starting point. In a file named myshell.c I wrote:

#include <unistd.h>

int main(void) {
    execvp("date", (char *[]){"date", NULL});
}

Then I compiled the code:

$ gcc -Wall -pedantic -static myshell.c -o mysh

And executed it:

$ ./mysh
Mon Jun 18 18:51:38 UTC 2018

Cool! I hardcoded date here. But actually, a shell should prompt for the command to execute. So I went on and made the shell ask for a command to run.

#include <unistd.h>
#include <stdio.h>
#include <string.h>

#define PRMTSIZ 255

int main(void) {
    char input[PRMTSIZ + 1] = { 0x0 };
    fgets(input, PRMTSIZ, stdin);
    input[strlen(input) - 1] = '\0'; // remove trailing \n

    execvp(input, (char *[]){input, NULL});
}

And ran it:

./mysh
date
Mon Jun 18 19:27:21 UTC 2018

OK, nice. But that fails if I want to run a command with parameters:

./mysh
ls /

Nothing happens. I need to split the input into an array of char pointers. Each pointer shall point to a string containing a parameter:

#include <unistd.h>
#include <stdio.h>
#include <string.h>

#define PRMTSIZ 255
#define MAXARGS 63

int main(void) {
    char input[PRMTSIZ + 1] = { 0x0 };
    char *ptr = input;
    char *args[MAXARGS + 1] = { NULL };

    // prompt
    fgets(input, PRMTSIZ, stdin);

    // convert input line to list of arguments
    for (int i = 0; i < sizeof(args) && *ptr; ptr++) {
        if (*ptr == ' ') continue;
        if (*ptr == '\n') break;
        for (args[i++] = ptr; *ptr && *ptr != ' ' && *ptr != '\n'; ptr++);
        *ptr = '\0';
    }

    execvp(args[0], args);
}

And running it yields:

./mysh
ls /
bin  boot  dev	etc  home  lib	lib64  ✂️

It worked! I am getting closer. Wouldn’t it be great if the shell would not exit after one command, but ask for further commands every time the current command terminated? I think it is time for my old friend fork() to enter the scene!

#include <unistd.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <sys/wait.h>

#define PRMTSIZ 255
#define MAXARGS 63

int main(void) {
    for (;;) {
        char input[PRMTSIZ + 1] = { 0x0 };
        char *ptr = input;
        char *args[MAXARGS + 1] = { NULL };

        // prompt
        fgets(input, PRMTSIZ, stdin);

        // convert input line to list of arguments
        for (int i = 0; i < sizeof(args) && *ptr; ptr++) {
            if (*ptr == ' ') continue;
            if (*ptr == '\n') break;
            for (args[i++] = ptr; *ptr && *ptr != ' ' && *ptr != '\n'; ptr++);
            *ptr = '\0';
        }

        if (fork() == 0) exit(execvp(args[0], args));
        wait(NULL);
    }
}

I now fork a child every time I want to execute a command. The child exits and uses the return value of execvp() as exit code. This can help the parent process to detect errors if they are any during program overload. The parent process waits for the child to finish. Everything happens in an infinite loop for(;;) to allow more than just one command.

./mysh
date
Mon Jun 18 19:44:27 UTC 2018
ls /
bin  boot  dev	etc  home  lib	lib64  ✂️

Despite being very limited in functionality, I think this now counts as a shell.

I couldn’t stop myself from adding a few more things:

  • Disable signal SIGINT in the parent: This means I can interrupt (ctrl-c) a child process without killing my shell. Very useful 😅
  • Add a visual prompt: $ for users and # for superusers.
  • Print the exit code of the child, e.g. <1> or <0>
  • Check for empty input: Because segfaulting is not nice. 🙈

Here is the final code:

#include <unistd.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <sys/wait.h>

#define PRMTSIZ 255
#define MAXARGS 63
#define EXITCMD "exit"

int main(void) {
    for (;;) {
        char input[PRMTSIZ + 1] = { 0x0 };
        char *ptr = input;
        char *args[MAXARGS + 1] = { NULL };
        int wstatus;

        // prompt
        printf("%s ", getuid() == 0 ? "#" : "$");
        fgets(input, PRMTSIZ, stdin);

        // ignore empty input
        if (*ptr == '\n') continue;

        // convert input line to list of arguments
        for (int i = 0; i < sizeof(args) && *ptr; ptr++) {
            if (*ptr == ' ') continue;
            if (*ptr == '\n') break;
            for (args[i++] = ptr; *ptr && *ptr != ' ' && *ptr != '\n'; ptr++);
            *ptr = '\0';
        }

        // built-in: exit
        if (strcmp(EXITCMD, args[0]) == 0) return 0;

        // fork child and execute program
        signal(SIGINT, SIG_DFL);
        if (fork() == 0) exit(execvp(args[0], args));
        signal(SIGINT, SIG_IGN);

        // wait for program to finish and print exit status
        wait(&wstatus);
        if (WIFEXITED(wstatus)) printf("<%d>", WEXITSTATUS(wstatus));
    }
}

Running as root in a container:

./mysh
# ls /
bin  boot  dev	etc  home  lib	lib64  ✂️
<0># date
Mon Jun 18 19:50:09 UTC 2018
<0># nonexistent-command
<255># false
<1># true
<0># exit

In the end, I was able to write a tiny shell with limited capabilities in less than 50 lines of C code. That is less than half of what I aimed for.