Simple outbound request rate limiting with App Engine

I've been doing a lot of playing with Google App Engine (GAE) of late, since it is a cheap/free way for me to quickly toss ideas against the wall and see what sticks. One of the tinker projects I've been working on is a sub-Reddit stat tracker (warning: hastily put together and un-finished) that records and ranks technical sub-Reddit activity over time.

Retrieving the data I needed required polling the Reddit API. There is an existing high-quality Python API client (PRAW), but I ran into a GAE + requests + HTTPS issue that prevented me from using it (PRAW uses requests). Said issue will be fixed in requests 2.10.0, but there's been no indication that requests 2.10.0 is arriving anytime soon. This was significant in that PRAW handles rate limiting and oauth authentication for you. Rather than wait for requests 2.10.0 or forking PRAW to use GAE's urlfetch service (which supports HTTPS), I decided to hit Reddit's public API directly without authenticating.

Warning: The days of unauthenticated Reddit API access may be coming to an end. I don't recommend unauthenticated API access for anyone doing anything more simple than a tinker project like mine.

Early goings

My initial draft used App Engine cron and task queues to schedule and parallelize the work. Once an hour, cron triggered a background task that would create sub-tasks for each sub-Reddit I'm tracking. These sub-tasks would hit the Reddit API once or twice, muddle through the return value, and toss some values into a Custom Metric on Google Stackdriver.

Since the first proof-of-concept wasn't rate limited, I ran into HTTP 429 (Too Many Requests) as I tracked increasingly more sub-Reddits.

App Engine Queue definitions to the rescue

I only needed to do a full scan of all of my tracked sub-Reddits once per hour. I had to make sure that I'm not doing more than 30 API calls per minute. I wanted to try to spread the requests out evenly, rather than exhaust my quota at the beginning of each minute. I also wanted to do this with minimal complexity.

Fortunately, App Engine task queues can be configured with a queue.yaml file in your project root. There are two directives in here that are particularly interesting:

  • rate - How often jobs are popped from the queue and distributed to your workers. The number of jobs that are popped at a time is determined by max_concurrent_requests. For example, a value of 30/m will mean the queue is popped at most 30 times per minute.
  • max_concurrent_requests - The max number of concurrently executing jobs.

Since the unauthenticated Reddit API rate limit is 30 requests per minute, I was able to enforce this at the queue level by using a rate of 30/m and a max_concurrent_requests of 1. Here is my full queue.yaml file. The end result:

  • Tasks are popped from the queue up to 30 times a minute (rate = 30/m).
  • We only pop one task at a time (max_concurrent_requests = 1).
  • We won't pop a new task until the currently running one is ACK'd.
  • As long as everything works as described in the docs, we stay under the rate limit at all times.

As a result, we went from rate limiting errors all over the place to:

So you can rate limit. What's the big deal?

Rate limiting is not an especially difficult thing to implement, but I thought it was interesting to see how easy App Engine made this. My code doesn't know or care that it's being rate limited, which is nice. The most beautiful lines of code are the ones you don't have to write at all!

In the future, I'll want to either move over to PRAW when requests 2.10.0 lands or implement the bare minimum for oauth authentication with App Engine's urlfetch service. At that point, I'll be able to twiddle my rate and concurrency values to get some more throughput.

Nothing earth-shattering here, but I thought I'd share!

Meditation: Beginning the Journey

The House Taylor just finished navigating a particularly difficult stretch, with there being no shortage of things to worry about. I've always been reasonably calm and persistent under fire, but the last few months have tested my limits.

As I got back to the apartment after work a few weeks ago, I tossed my pack on the floor next to the coat rack and exhaled for a moment. The respite was brief, as I immediately started thinking of what to do next. Time to start Dinner. No, wait. Laundry needs doing. Or I could read about that new shiny piece of software I heard about on HackerNews. Although, I do have some GitHub issues to slay...

Moments later, I decide to get cooking like a responsible husband. As I'm pulling a pot out of the cabinet, my mind is already flitting around thinking about the other things on the list for tonight and the rest of the week. I feel a sourness in my gut, as I mentally hand-wring over the arbitrary list. By the time Erin gets home, I'm visibly "off".

As I lay awake that night with thoughts quickly flying in one ear and out the other, I reached peak frustration: It's time to do something about this.

Meditation Time

I've been curious about how I might incorporate meditation for some time, but always found reasons to put it off. We happened to have a copy of Full Catastrophe Living collecting dust on our shelf, purchased with the greatest of intentions. I resolved to sit down and start reading a bit on most nights.

It didn't take long to arrive at some conclusions that should have been obvious, were it not for my cluttered nogging:

  • Modern life is full of distractions. San Francisco is a city that intensifies this significantly.
  • As we went from land line to dumb phone to smart phone, I had grown accustomed to increasing levels of stimulation at all times.
  • In the absence of stimulation, I felt stress. I wasn't content to ever just sit and "be".
  • Because of all of the above, my attention span had deteriorated substantially over the last five years.

None of these are earth-shattering realizations, but my state at the time brought the problems into a clarity level that wasn't there before.

As I struggled through my first few meditation sessions, it was obvious that I've got a long way to go in getting things smoothed out up top again. But that doesn't scare me. This should be an interesting, introspective journey, and I'm excited to begin it.

Onward!

I've shared what I feel is a bit of a personal glimpse into a tough time and a big challenge for me. I have no specific goals in writing this, but I'd like to share my story as I continue. Hopefully this is in some way helpful to someone!

Cassandra 2.1 Docker images and lockups

We recently noticed that we had locked up Cassandra Docker containers piling up when running our integration tests.

After digging around some more, we saw some others complaining of similar symptoms. Something in common with all cases was the use of Cassandra 2.1.x images. After seeing someone mention that Cassandra 2.2 and 3.x images weren't showing this behavior, we gave it a shot.

As luck would have it, this did the trick! So for anyone else running into Cassandra Docker containers with the following symptoms, consider switching to 2.2+:

  • Containers never fully start, hanging after creation.
  • Containers can't be rm'd or killed.
  • docker logs shows no output in addition the the previous two.