Flash Cards for Site Reliability Engineers

When Site Reliability Engineers architect systems we often use a method called Non-Abstract Large System Design (NALSD).

NALSD describes an iterative process for designing, assessing, and evaluating distributed systems. During the process it comes in handy to know some numbers about typical computing and data transferring tasks, such as locking a mutex or reading data from an SSD.

Jeff Dean even went so far to declare that everyone should know at least the most important numbers:

small

I, however, tend to forget these and all the other important numbers regularly. To train my forgetful brain I made myself some good old flash cards. I like to carry these with in my backpack and use them whenever I have to wait in line or public transport is late.

If you are interested in your own set of flash cards: Wait no longer! The Flash Cards for Site Reliability Engineers are just a slide deck because I found it easier to maintain.

Follow these easy steps to turn the slide deck into flash cards:

  • Print the document, preferably on thick paper
  • Fold each page once vertically, glue the back sides together
  • Cut out the cards along the black lines

Tada!

IP over Web-avian Carriers

The Internet is notoriously hungry for bandwidth and much effort goes into increasing the throughput of Internet Protocol carrying media. One particularly successful approach in throughput optimization has been IP over Avian Carriers which, thanks to pleasant developments in storage media density, managed to stay ahead of the game by a factor of three. However, speeding up existing protocols is not enough to meet increased bandwidth demands. New protocols and unconventional link media have to be considered to keep the lights on and the bytes flowing.

Today I would like to propose an adaption of the IP over Avian Carriers protocol to make use of so-called web-avian carriers. Similar to homing pigeons, which have been the preferred medium used by traditional IP over Avian Carriers implementations, web-avian carriers have wings and a beak. The most common web-avian carrier is the blue-winged jack-bird. It is a shy, yet extroverted creature. It is easy to spot, yet only a few people had the pleasure to see it from other perspectives than side-face.

A rare shot of a web-avian carrier delivering a message to a human:

small

This article introduces the Internet Protocol over Web-avian Carriers (IPoWAC). Furthermore, it provides a proof of concept implementation written in Go.

IPoWAC Network Stack

IPoWAC is a link-layer protocol utilizing a publicly available microblogging platform as transport medium.

IPoWAC is specified for IPv4 datagrams but it is imaginable to extend the functionality to include other popular network-layer protocols, such as IPv6.

The Medium

The web-avian carrier network is a planet-scale data transmission system piggybacking on other technologies, such as mobile data networks, Digital Subscriber Lines, and possibly traditional IP over Avian Carriers installations.

The medium uses globally unique node identifiers consisting of at least one and up to 15 characters, the so-called handles. A handle is identified in the protocol by its preceding @ character. Handles are given out on a first come first serve basis by a central handle handling authority. Some nodes got luckier than others in getting their favorite handle.

Nodes modulate data onto the medium in data units of 280 characters called messages or sometimes tweets. In previous iterations, the web-avian carrier network facilitated an MTU of 140 characters. The highly debated limitation was removed in 2017. While this bold move doubled the MTU, legacy clients using the Short Message Service-based implementation of web-avian carriers are now forced to fragment submissions. The outrage was loud and voiceful but without real-life impact. A pattern we often observe in non-IP-carrying web-avian communication streams.

The web-avian carrier network features common link-layer aspects, however, it has subtle differences:

  • Relaying of data units: While it is technically not necessary, data units are often replicated or annotated by other nodes on the medium by re-tweeting. Since the decision to relay a data unit depends on content and context, the number or retweets serves as an indicator of message importance.
  • Tweets may be labeled with arbitrary strings as long as they start with a single # character. The labels are referred to as hashtags and allow message grouping around topics, political views, or favorite pet species.
  • The medium offers a certain degree of anonymity but only limited privacy. Funnily, nodes often strive to give up anonymity and are eager to prove their identity. The network rewards this behavior by providing blue checkmarks to nodes of known identity.
  • The web-avian carrier network is a broadcast medium by design. A message may be directed to a particular node by embedding the node’s handle in the message. The message will still be available for other nodes sharing the medium. Due to the variable length of handles and the lack of a fixed header, embedding handles counts against the MTU. This unfairly rewards nodes which were lucky enough to get short handles from the central handle handling authority.
  • Unlike other media, the web-avian carrier network has poor support for anycast. Replies to anycast messages either fall short on being helpful or are simply absent. Anycast messages are indicated by the use of the reserved #followerpower label. The web-avian carrier network has yet to prove it is a benefit to humanity. See also: Automatic Error Correction.
  • Nodes may end up clustering in groups with little or no interference from other clusters or the overall network. This is often referred to as an echo chamber.
  • Automatic Error Correction: Some nodes of the web-avian carrier network engage heavily in error correcting other node’s messages. Mostly unsolicited. Some execute this task to the extent where they emit erroneous messages themselves, thus decreasing the overall quality of service in the network. Error correction messages are often directly addressed to the source node by mentioning the handle to increase the chance of causing a bad feeling. A well-formatted error correction message must include at least one of the following phrases: “well, actually…”, “yes, but…”, or “how about f*&k you?“.
  • Godwin’s law applies. Usually rapidly.
  • Compared to the web-avian carrier network medium, SLIP does look like a good idea, ATM isn’t that scary anymore, and good old Ethernet is heaven on earth.

Destination Labels (Addressing)

An IPoWAC edge router attaches one or more destination labels (hashtags) to each message it modulates onto the medium. The destination label is derived from the site’s globally unique IP space allocation. The destination label begins with the # (hash) character to indicate the start of a label. It is followed by the four octets of the smallest (most specific) IP space allocation that contains the destination IP address. Octets are written in decimal notation and separated by the _ (underscore) character. Finally, the prefix length of the allocation is appended, once again using the _ (underscore) character for separation.

Label Example

The IP address 193.160.39.100 is part of the RIPE-managed IP space allocation 193.160.39.0/24. The IPoWAC message encapsulating an IP packet addressed to 193.160.39.100 is therefore labeled #193_160_39_0_24.

Wire Format

The IPoWAC message format was designed with simplicity in mind. Each message starts with the base64-encoded IP datagram. Separated by spaces, one or more (when multicasting) destination labels follow.

IPoWAC edge router operators configure their instances so that they track all labels for all IP space allocations that fall under their administrative domain. For filtering purposes, operators may decide to additionally limit message tracking by only accepting messages from nodes with high trust values. A high-trust node is identified by a blue checkmark sign next to the node name.

Additional Medium Features

The web-avian carrier network offers a feature-rich medium and additional value via meta information:

  • Messages may be grouped (threaded)
  • Messages automatically receive a globally co-ordinated timestamp
  • Messages can be liked to express satisfaction
  • Statistics of a message’s reach are available upon request
  • Messages can be relayed
  • Nodes can express theirself textually and visually

All mentioned features are available to the operator via web interface.

Introducing WACky: A Proof Of Concept Implementation

WACky is an IPoWAC proof-of-concept implementation written in Go. WACky acts as an edge router accepting Ethernet frames. It extracts the IP payload from ethertype 0x0800 frames and converts it to the IPoWAC wire format.

WACky consists of to parts that run in parallel:

  • From Ethernet To Web-avian Carrier
  • From Web-avian Carrier To Ethernet

Let’s discuss each of these separately.

From Ethernet To Web-avian Carrier

On startup, WACky tries to get a hold on the Ethernet interface it is supposed to listen on for IP datagrams.

ifi, err := net.InterfaceByName(*ifname)
if err != nil {
    fmt.Fprintf(os.Stderr, "interface %q: %v", *ifname, err)
    os.Exit(1)
}

Once the interface is under WACky’s control it listens for Ethernet frames containing an IP datagram using a raw PacketConn.

cfg, err := raw.ListenPacket(ifi, 0x0800, &raw.Config{
    LinuxSockDGRAM: false,
})
if err != nil {
    fmt.Fprintf(os.Stderr, "listen: %v", err)
    os.Exit(1)
}

Then variables are allocated for use in a worker loop later. The buffer for incoming Ethernet frames is computed by guestimating how much space is left in the IPoWAC message once base64 encoding and destination label length have been accounted for.

var frame ethernet.Frame
buf := make([]byte, ((280-len(*destinationLabel)-1)/4)*3) // ¯\_(ツ)_/¯

The worker loop reads from the Ethernet device and encodes the message (if it is half-way decent in format).

for {
    // read from device
    n, addr, err := cfg.ReadFrom(buf)
    if err != nil {
        fmt.Fprintf(os.Stderr, "read: %v\n", err)
        time.Sleep(100 * time.Millisecond)
        continue
    }

    // parse ethernet frame to extract payload
    if err := (&frame).UnmarshalBinary(buf[:n]); err != nil {
        fmt.Fprintf(os.Stderr, "unmarshal: %v\n", err)
        time.Sleep(100 * time.Millisecond)
        continue
    }

    // base64 encode the packet and discard malformed frames (using a
    // stupid but surprisingly effecting method: decode base64)
    encoded := base64.StdEncoding.EncodeToString(frame.Payload)
    _, err = base64.StdEncoding.DecodeString(encoded)
    if err != nil || strings.Contains(encoded, "/") {
        fmt.Fprint(os.Stderr, "received malformed frame\n")
        continue
    }
    tweet := encoded + " " + *destinationLabel
    fmt.Printf("from %v: %v\n", addr.String(), tweet)

When the message is ready to be modulated onto the wire, it is handed over to the wak0 virtual interface. What is that, the interested reader may wonder? Just a function call using Aditya Mukerjee’s awesome Twitter API library anaconda. 🤪

    // send to web-avian carrier network
    _, err = api.PostTweet(tweet, url.Values{})
    if err != nil {
        fmt.Fprintf(os.Stderr, "post tweet: %v\n", err)
        time.Sleep(10 * time.Second)
        continue
    }
    // twitter loves us to rate limit API usage
    time.Sleep(750 * time.Millisecond)
}

The Ethernet-reading worker loop runs inside a goroutine so that it does not block the receving side of the IPoWAC implementation.

By the way: A big thank you to Matt Layher for providing the ethernet and raw golang packages for easy link-layer programming!

From Web-avian Carrier To Ethernet

The purpose of the IPoWAC-to-Ethernet part of the WACky router is to track a label on the global medium and emit received messages via a local interface.

For that, WACky opens a raw IP socket by using the syscall package.

fd, err := syscall.Socket(syscall.AF_INET, syscall.SOCK_RAW, syscall.IPPROTO_RAW)
if err != nil {
    fmt.Fprintf(os.Stderr, "socket: %v\n", err)
    os.Exit(1)
}

Then a stream of messages is created by subscribing to the tracking label via Twitter API.

stream := api.PublicStreamFilter(url.Values{"track": []string{*trackLabel}})
defer stream.Stop()

While the stream lasts, messages are received and decoded. If a message does not decode well, it is discarded.

for v := range stream.C {
    t, ok := v.(anaconda.Tweet)
    if !ok {
        fmt.Fprintf(os.Stderr, "malformed tweet")
        continue
    }

    encoded := strings.Split(t.FullText, " ")[0]
    data, err := base64.StdEncoding.DecodeString(encoded)
    if err != nil || len(data) < 60 { // 60 = reasonable packet size ;)
        fmt.Fprint(os.Stderr, "received malformed tweet\n")
        continue
    }
    hdr := ipv4.Header{}
    hdr.Parse(data)

From the decoded message the destination IP address is extracted and used in the sockaddr that the sendto() syscall requires. Also, hello sockaddr, my old friend…

    // create socket address for sendto() syscall
    ip := hdr.Dst.To4()
    addr := syscall.SockaddrInet4{
        Port: 0,
        Addr: [4]byte{ip[0], ip[1], ip[2], ip[3]},
    }

After that there is not much left to do but to actually send the datagram.

    // make it happen, send the packet!
    fmt.Printf("to %v: %v\n", ip.String(), encoded)
    err = syscall.Sendto(fd, data, 0, &addr)
    if err != nil {
        fmt.Fprintf(os.Stderr, "sendto: %v\n", err)
        continue
    }
}

And that’s all there is to it. Simple, but effective. The full source code is available at Github in the WACky repository.

Limitations

In good tradition of PoC implementations, WACky lacks features that one would expect from a production-grade IPoWAC edge router:

  • Proper label (routing) table: In WACky, destination labels are not dynamically looked up in a IPoWAC label table but set at program start.
  • In-transit IP header and datagram modifications: WACky does neither fragment nor decrease the TTL when an IP message passes a hop. While it is debatable if a WACky installation is the right place for in-transit modifications anyway, the reason for the feature absence is simply author laziness.

World’s First IPoWAC Tansmission

Join me in the world’s first IPoWAC data transmission! For that I prepared a setup consisting of two clients and two WACky routers. Each client is connected to one edge router. Both routers use the handle @ipowac on their virtual wky0 interfaces but track different labels each.

First I started the WACky router process on each of the router machines.

root@wacky-1-router:~# go run main.go \
    -track-label="#1_3_3_0_24" \
    -destination-label="#2_4_4_0_24"

And…

root@wacky-2-router:~# go run main.go \
    -track-label="#2_4_4_0_24" \
    -destination-label="#1_3_3_0_24"

Then I ran ping on wacky-1 with a generous wait time of 120 seconds. IPoWAC turns out to not be the fastest protocol. Surprise!

root@wacky-1:~# ping -c 1 -W 120 2.4.4.8

The WACky program on wacky-1-router immediately cought the frame and sent the corresponding IPoWAC message with the destination label set to #2_4_4_0_24.

from 08:00:27:a5:2b:1e: RQAAVAxZQABAASQ7AQMDBwIEBAgIAEoCAdwAAUJT2lsAAAAAx54JAAAAAAAQERITFBUWFxgZGhscHR4fICEiIyQlJicoKSorLC0uLzAxMjM0NTY3 #2_4_4_0_24

On wacky-2-router the message was received after a short delay.

to 2.4.4.8: RQAAVAxZQABAASQ7AQMDBwIEBAgIAEoCAdwAAUJT2lsAAAAAx54JAAAAAAAQERITFBUWFxgZGhscHR4fICEiIyQlJicoKSorLC0uLzAxMjM0NTY3

The echo request was answered by wacky-2 with an echo reply, which was also seen by both routers and finally delivered to the ping process on wacky-1.

root@wacky-1:~# ping -c 1 -W 120 2.4.4.8
PING 2.4.4.8 (2.4.4.8): 56(84) bytes of data.
64 bytes from 2.4.4.8: icmp_seq=1 ttl=64 time=11933 ms

--- 2.4.4.8 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 11933.267/11933.267/11933.267/0.000 ms

Woohoo! It works!

This is the whole transmission as archived by the web-avian carrier network:

small

The world’s first successful IPoWAC transmission is also available in video!

Outlook

The PoC implementation of the WACky router shows that transmitting IP over Web-avian Carriers is generally possible. Further improvements could dramatically increase the throughput:

Compression could be applied to the data before it hits the web-avian carrier network. A more efficient encoding could be used instead of base64. Additional payload data may be encapsulated in image tweets. Rumor has it that up to 15 MB of image data can be attached in certain situations. An MTU that puts even Ethernet Jumbo Frames to shame.

Conclusion

IPoWAC is a powerful link-layer protocol running on top of the existing Internet and completely dependent on the mercy of a single company. Stacking layers of complexity on top of each other has a long tradition in software and network engineering alike. By combining the worst of both professions, the urge to make something work no matter what and the total disrespect for network layers, a beautiful piece of technology abuse is created. It should have never happened, yet, IPoWAC had to be made. Someone out there, at some point in time, will start to use the protocol to access Twitter over Twitter… Because, why not?

The Machine That Hires Me

Have you ever read Ibrahim Diallo’s famous, scary, and funny blog post The Machine Fired Me? Ibrahim, working as a software developer, accidentally got fired. Thanks to a fully automated business process his key card, used for physical accessing the facilities, stopped working. Various accounts for all kinds of work-related systems got disabled and he did not receive pay for three weeks. The automation was so powerful, he had to be re-hired to get back into the system. There was no stopping the machinery.

I had (or am I still having?) a similar experience lately. Similar in that I am also in some kind of machinery and the process seems unstoppable. Different in that I am not being fired but I am being hired by a machine.

Earlier this year I was contacted by a recruiter from Facebook on LinkedIn. We started chatting and eventually, I agreed to apply for a Production Engineering role. I had a couple of phone interviews. Then I was invited to London for a day of on-site interviews. I was extended an offer, which I would eventually turn down. All that was a very pleasant experience and I admire Facebook for their professional recruiting process. I genuinely had a lot of fun solving the challenges and interacting with recruiting and engineering. It seems, however, that somewhere in this process the machines took over. While the recruiter and I agreed to end our journey at some point and to keep in touch, the machinery had different plans.

After turning down the offer I still had access to the digital contract signing interface for some days. Furthermore, the onboarding portal suggested I decide on my preferred hardware, including a laptop computer and phone. I received a parcel containing a printed guidebook for new Londoners and a Facebook-branded blanket. That blanket! It is so fluffy!

small

Then another message arrived reading “Congratulations on your new role with Facebook!” and informing me about my upcoming business travels. For the latter, I was asked to apply for a U.S. visa or the ESTA visa waiver program. Most of this happened within a couple of days. Out of curiosity, I peeked into the emails and websites I got sent, but I did not interact further with them. I informed my recruiter so that they know just in case any harm is done. But one does not simply ignore the machinery! A couple of days later the automation poked me again: “We are very excited for you to join the team. It looks like we’re still missing some of your information. Please navigate to the People Portal to complete your outstanding tasks right away.” Let me translate this: “Human! It’s me, the machinery. You are supposed to obey. Do so now.” Even before I could let my recruiter know about the latest developments, they proactively send me a message apologizing for the repeated interaction. While the machinery at Facebook seems unstoppable, the humans are great and caring there!

At this point, I thought this was over now. Essentially some mail triggers went off when they shouldn’t, not a big deal, right? I was wrong. A month later I received a mail from Altair Global, a relocation services provider. There was no reference to Facebook in the mail. So I mistakenly related it to a different opportunity and clicked the link in the mail. A few seconds later I had an account with Altair Global asking me to complete a bunch of tasks for my upcoming move to London. Wait, what? I am moving to London? Oh! This must be the machinery that won’t stop hiring me. And yes, looking at the dashboard page of my unwanted relocation I was able to spot the Facebook logo. 🧐

It’s the machinery again. I contacted my assigned Altair relocation consultant and asked them to maybe check with their customer Facebook if this relocation is still something they want to pay for. The time is running out on some of the tasks. I am afraid the machinery will notice and poke me again for being a bad human. Forgive me, oh great automation overlord, for I am just flesh and blood! 🤖😰

Contrary to Ibrahim, who got an unsolicited lesson in job security, no harm was done in my case. Even better, I received gifts and got interesting insights into business process automation.

To be continued…?