How to score The Daily Show tickets using Twilio and Golang

I’m a big fan of Comedy Central’s The Daily Show previously hosted by John Stuart who handed over to Trevor Noah in 2015. The show is recorded in New York City in front of a live studio audience. Tickets to the show are free but limited. And therefore they are hard to get. There are two types of tickets:

  • GENERAL GUARANTEED tickets which guarantee a seat in the audience if you arrive in time, and
  • GENERAL - ENTRY NOT GUARANTEED tickets that might get you in when you wait in line next to the studio. People holding this ticket type may be invited to fill in remaining seats.

On the ticket website is a little calendar in the top right corner. Dates on which the show will be airing become available in chunks from time to time. Over the short period of time I monitored this I was not able to find a reoccurring pattern from which I could predict future ticket rounds. If I wanted to get lucky with a ticket on a specific date I would have to regularly check the ticket website. I believe humans should delegate repetitive tasks like this one to a computer.

A small program would do the job of

  • checking the website regularly for ticket availability at specific dates and
  • notifying me once tickets are available.

Checking the Dates

To get an idea of how the website was structured and how I could detect changes in the availability of tickets I took a look a the source code. 👀 The site is structured mostly using HTML.

The calendar, however, is rendered via JavaScript.

var dates_avail = {
  "2018-02-20":"style1",
  "2018-02-21":"style1",
  "2018-02-22":"style1",
  ✂️
  "2018-03-28":"style1",
  "2018-03-29":"style1"
};

Luckily, the available dates are stored in an object in the source. Furthermore, to find a particular date I simply needed to search for it in the right notation, for example "2018-02-20":"style1". That string was not matching anything else but the ticket calendar if, and only if, tickets were available on that date. That was so much easier than I thought!

So the checking part of my program was down to simply fetching the website’s source and then running a substring search over the received data.

My weapon language of choice, as usual: Golang. I used the http package to fetch the website’s source:

ticketWebsiteURL := "https://www.showclix.com/event/TheDailyShowwithTrevorNoah"
response, err := http.Get(ticketWebsiteURL)
if err != nil {
  ✂️
}
contents, err := ioutil.ReadAll(response.Body)
if err != nil {
  ✂️
}

For the substring search, I started with first defining the dates at which I would be in New York and have enough spare time to make it to the studio. After all, there is no use blocking a seat that other would be happy to have.

triggerDates := []string{"2018-02-20", "2018-02-21"}

Ranging over the triggerDates I defined the search pattern for each date individually and then used the Contains() function of the strings package.

for _, td := range triggerDates {
  search := "\"" + td + "\":\"style1\""
  if strings.Contains(string(contents), search) {
    // Found it! Let's send a notification.
    ✂️
  }
}
response.Body.Close()

That was the easier part.

Notification via SMS

I spend most of my life in the Central European Time (CET) timezone but the show is recorded in the Eastern Standard Time (EST) timezone. I assumed the tickets would, therefore, become available sometime during EST office hours, at which I might be asleep.

I needed a notification method that would safely wake me up but not interfere with the quiet hours settings of my phone. I could have used the Pagerduty API but somehow I found this to be overkill. Overkill, because I have an alert escalation configuration at Pagerduty that truly wakes me up. And my girlfriend, much to her excitement. #not

A communication channel that has been almost forgotten is the good old Short Messaging Service (SMS). I get so few SMS messages, that receiving one can hardly disturb my usual flow. With SMS to the rescue, I configured my phone to notify for SMS even during quiet hours. Now I only need to make my program send SMS. That may sound tricky, but in the age of Cloud, this is just a web service away. I headed over to Twilio to register a mobile phone number that I would use as message source.

Here is how I crafted the message in Go:

msgData := url.Values{}
msgData.Set("To", "+00-MY-PHONE-NUMBER")
msgData.Set("From", "+00-MY-TWILIO-NUMBER")
msgData.Set("Body",
  "Tickets available now ("+info+")! Visit "+ticketWebsiteURL)

Sending the text was then as easy as creating a REST call:

req, _ := http.NewRequest("POST", apiURL, &msgDataReader)
req.SetBasicAuth(accountSid, authToken)
req.Header.Add("Accept", "application/json")
req.Header.Add("Content-Type", "application/x-www-form-urlencoded")

There are a few more lines to making a successful call to the Twilio API, of course. Scroll down for the full source.

Wait for it… Win!

With fetching and notifications being set up I just had to run the code. For this I copied it over to my jump host, a compute instance that is running 247 anyway and could use some additional load. 😉

04:35:33 waiting...
04:35:33 fetching...
04:50:34 waiting...
05:05:34 fetching...
05:05:35 sms sent!
exit status 1

One day, right after my early workout, the long-awaited for text arrived. I got lucky and scored two tickets for one of the dates I wanted. Hooray!

Greetings from The Daily Show with Trevor Noah everyone!

Source Code

Here’s the full source code FYI. Enjoy the show! 🎥

package main

import (
  "encoding/json"
  "io/ioutil"
  "log"
  "net/http"
  "net/url"
  "strings"
  "time"
)

var (
  triggerDates     = []string{"2018-02-20", "2018-02-21"}
  ticketWebsiteURL = "https://www.showclix.com/event/TheDailyShowwithTrevorNoah"
)

func sendSMS(info string) {
  accountSid := "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
  authToken := "yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy"
  apiURL := "https://api.twilio.com/2010-04-01/Accounts/" +
    accountSid + "/Messages.json"

  msgData := url.Values{}
  msgData.Set("To", "+00-YOUR-NUMBER")
  msgData.Set("From", "+00-YOUR-TWILIO-NUMBER")
  msgData.Set("Body",
    "Tickets available now ("+info+")! Visit "+ticketWebsiteURL)
  msgDataReader := *strings.NewReader(msgData.Encode())

  client := &http.Client{}
  req, _ := http.NewRequest("POST", apiURL, &msgDataReader)
  req.SetBasicAuth(accountSid, authToken)
  req.Header.Add("Accept", "application/json")
  req.Header.Add("Content-Type", "application/x-www-form-urlencoded")

  resp, _ := client.Do(req)
  // Let's be very generous with the status code. We want those tickets!
  if resp.StatusCode >= 200 && resp.StatusCode < 300 {
    var data map[string]interface{}
    decoder := json.NewDecoder(resp.Body)
    err := decoder.Decode(&data)
    if err == nil {
      log.Printf("twilio: decode json: %v", err)
    }
  } else {
    log.Printf("twilio: status code: %v", resp.Status)
  }
}

func main() {
  for {
    // Behave! Wait 15 minutes to not overload the site or spam their logs.
    log.Printf("waiting...\n")
    time.Sleep(15 * time.Minute)

    // Fetch the website.
    log.Printf("fetching...\n")
    response, err := http.Get(ticketWebsiteURL)
    if err != nil {
      log.Printf("get: %v", err)
      continue
    }
    contents, err := ioutil.ReadAll(response.Body)
    if err != nil {
      log.Printf("read body: %v", err)
      continue
    }
    // Look for the trigger dates in the source code.s
    for _, td := range triggerDates {
      search := "\"" + td + "\":\"style1\""
      if strings.Contains(string(contents), search) {
        // Found it! Let's send a notification.
        sendSMS(td)
        log.Fatalf("sms sent!")
      }
    }
    response.Body.Close()
  }
}

My Little Helper: Slack Bot

As a Site Reliability Engineer (SRE) I spend a significant amount of my time on the Linux console. Furthermore, I spend some time writing software and tooling. But I also spend a lot of time on Slack (substitute with your organizations preferred chat platform) communicating with humans.1

Bridging these domains often requires copying and pasting of information including sometimes reformatting. At one point, I was so annoyed by moving console output from a terminal window to Slack that I decided to find a better way.

My idea was to have a single, statically linked binary that I can scp to a system and that would run without further actions. The job of that binary helper would be to post to Slack on my behalf.

To honor the great inventor (and actress) Hedy Lamarr my little helper was named after her.

Great, we have a problem, a solution, and the name hedybot. But what are actual uses cases in a world (striving for) full automation? Actually, there are still a lot of manual tasks left, including:

  • Tasks that require human oversight to avoid disaster
  • One-time but often long-running tasks

Manual Deployment Notifications

One example of a task that requires human oversight at my workplace is the deployment of Domain Name System (DNS) changes. Since a mistake here can easily cost thousands of dollars and unmeasurable loss of customer trust, we tend to have an experienced engineer deploy the changes. For additional assurance, we always post the deployed changes to Slack for everyone to read. People double check and sometimes ask questions about the changes. That is a wonderful use case for hedybot! Here it is in action, using dns-tools:

$ rrpush --quiet --dry-run=false --delay=0 --no-color 2>&1 \
  | hedybot --channel=FOO2342 --title="Deployment on Production DNS"

In Slack it looks like this.

small

By the way, the color follows some loose internal convention and is hardcoded. It is a potential improvement to make the color configurable via command line flag.

Long-running Jobs

Another great use case for hedybot is a long-running job. Let’s assume there is a server that we need to wipe to comply with regulations. One could easily lose track of such a task once it is started. Daily business and occasional firefighting push less urgent matters aside and soon our brain has forgotten about them. This is where a little helper comes in handy by posting a quick message:

$ dd if=/dev/urandom of=/dev/sdx bs=4096; \
  echo "disk erase finished" | hedybot --title="Example Server"

The resulting message is clear and simple:

small

Thanks to the timely reminder, we can decommission the server right away and save some money here.

Hedybot Source Code

Here is the Golang code that I used. Grab it to craft your own little helper.

package main

import (
  "flag"
  "io/ioutil"
  "log"
  "os"

  "github.com/nlopes/slack"
)

const (
  // fetch API key from your slack workspace
  apiKey = "xxxx-xxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxx"
)

func main() {
  channelID := flag.String("channel-id", "C85PT1ULR",
    "ID of the channel to post too")
  title := flag.String("title", "Message",
    "Title for the message to post")
  flag.Parse()

  bytes, err := ioutil.ReadAll(os.Stdin)
  if err != nil {
    log.Fatalf("read stdin: %v", err)
  }
  if len(bytes) < 1 {
    log.Fatalf("stdin is empty")
  }
  report := string(bytes)

  params := slack.PostMessageParameters{
    AsUser: true,
    Attachments: []slack.Attachment{
      {
        Color: "#FFA500",
        Fields: []slack.AttachmentField{
          {
            Title: *title,
            Value: report,
          },
        },
      },
    },
  }
  api := slack.New(apiKey)
  _, _, err = api.PostMessage(*channelID, "", params)
  if err != nil {
    log.Fatalf("post report: %v", err)
  }
}
  1. The tricky part my job is to figure out which activity is worth automating, which activity requires time boxing, and when going deep into the details is advised.

Reducing Stackdriver Logging Resource Usage

Yesterday I received an alarming mail from Google informing me about the new pricing model for Stackdriver logging and that I am exceeding the free tier limit. The Stackdriver pricing model had a rough start including some adjustments and postponements. As of today, charging is expected to start on March 31, 2018. This means if I want to stay within the free tier limit, I should not exceed 50GB of log intake per month. That is quite a lot for my small cluster, so why would it use more than that?

First Look

I decided to take a look how bad the situation really was.

Woah! 😱 The morning of day 2 of the month, and I am already 37GB in? Good thing charging has not yet started. Facing the reality I moved on to drill down into were the logs come from. Since I had a good portion of log data, chances were high I find something in the logs, right? 😉 The resource table clearly showed me were to find the low hanging fruits. The Month To Date (MTD) and projected End Of Month (EOM) numbers for the resource GKE Container tops everything else by orders of magnitude.

Reason 1: Google Kubernetes Engine Bug

Looking through the logs I found out that there is a bug in the synchronizer. It has been firing multiple times per second for days:

09:18:54 Restarting synchronizer: kubernetes-dashboard-key-holder-kube-system.
09:18:54 Synchronizer kubernetes-dashboard-key-holder-kube-system exited with error: kubernetes-dashboard-key-holder-kube-system watch ended with timeout
09:18:54 Restarting synchronizer: kubernetes-dashboard-key-holder-kube-system.
09:18:54 Synchronizer kubernetes-dashboard-key-holder-kube-system exited with error: kubernetes-dashboard-key-holder-kube-system watch ended with timeout
09:18:54 Restarting synchronizer: kubernetes-dashboard-key-holder-kube-system.
09:18:54 Synchronizer kubernetes-dashboard-key-holder-kube-system exited with error: kubernetes-dashboard-key-holder-kube-system watch ended with timeout

This does produce quite some log volume for Stackdriver to ingest and that piles up adding to the overall bill. It’s one of those moments where I catch myself mumbling exponential backoff

To stop the torrent of log lines from the broken dashboard, I restarted the kubernetes dashboard pod. The hard way, of course:

$ kubectl -n kube-system delete pod kubernetes-dashboard-768854d6dc-j26qx

Reason 2: Verbose Services

Note: This subsection’s data is sourced from a different cluster which did not experience the aforementioned bug but had a huge log intake for a different reason.

In another cluster I also experienced a huge intake of logs. However, there was no log spamming, meaning that this cluster was just full of regular log lines. To find out if there are services that produce significantly more log lines than others I created a log-based metric.

This metric is basically just a counter of log lines, grouped by the resource label namespace_id. With this metric in place, I headed over to Stackdriver Monitoring and created a graph that plots the log lines per second grouped by namespace.

Obviously, this is most valuable when every service is confined to exactly one namespace. Now I was able to spot the most verbose services and dug a bit deeper into them to reduce their verbosity.

Mitigation 1: Exclusion

The first solution to the high log intake problem is to take less logs in. How unexpected! Luckily, there is a method for that called Exclusion. On the resources page we can create exclusion rules (filters if you will) to reduce the log intake in a reasonable way. Reasonable here means allowing important log entries to enter the system while dropping the less useful ones.

small

The following rule, for example, discards all log entries of log level INFO. It is a pretty simple example, however, we are free to use all the nice operators we know from regular log filtering activities. Exclusions are a powerful tool!

Here is a copy’n’paste friendly version of the same rule.

resource.type="container"
severity="INFO"

Note that you can even sample logs by creating an exclusion filter and setting the drop rate to a value less than 100%. For my use case, an exclusion rate of 95% provides me with just enough samples to assess a past problem while keeping the log intake amount reasonable. During issue triage I recommend disabling exclusions temporarily or adjusting them to pass all related logs at least.

Fun fact: Stackdriver logs the actions (create, delete, etc.) performed on exclusion rules, thus creating just another log source, the Log Exclusion log source. #inception

I wonder if one can create an exclusion rule for log exclusion. 🤔

Mitigation 2: Monitoring

The next log overdose mitigation technique I like to share uses a log-based metric to alert before things turn ugly. Stackdriver comes with some handy system metrics. Systems metrics means, these are meta data from the logging system. One of those data points is bytes_count. I use this metric in the Stackdriver Monitoring system to get an early warning if log intake exceeds the expected levels.

Here is my policy using a Metric Threshold condition:

small

Let’s have a closer look at the metric threshold.

I am monitoring the resource type Log Metrics and there the metric “Log bytes”.

An acceptable intake rate for me is 10kb/s. If hit constantly, that results in about 24.2GB of total log intake in a 28-day-month and about 26.8GB in one of those longer 31-day-months. Both values leave some good room for unforeseen issues and reaction time.

As you can see in the graph, my cluster was way beyond that threshold for quite a while. That was the bug I described earlier and which took me some time to find. With that alert in place, the same or similar bugs will fire an alert after a 1-minute grace period for log bursts.

Before I wrap this up, one word of caution: Thresholds set to low may harm your inbox! 😅 Been there, done that.

Conclusion

Stackdriver’s warning email may sound scary, but there are ways to gain control over the log intake and also be prepared for unforeseen issues by having metrics-based alerts in place.