<< Previous | Home

Fun doesn't mean compromising scalability

Today I read an interesting piece on InfoWorld about Meteor, Meteor aims to make JavaScript programming fun again. It is an interview with Matt DeBergalis, a co-author of Meteor, about Meteor and why a developer would choose it. The title in particular resonated well with me, "making programming fun again" is a catch phrase I have often used in presentations I've given about Play Framework.

As the demands on the applications we write shifts, the technologies we use start to make it harder to meet them, and pretty soon we feel like we are always working against the technologies that are supposed to be helping us. By taking a step back, rethinking the technologies, and creating new ones that are better suited to todays demands, we can continue being productive writing modern applications, and its then that development becomes fun again. Though obviously not always the case, how much fun you have working with a particular technology is often well correlated to how well suited it is for solving the problems you are trying to solve, and so there is some merit to switching to technologies that are more fun.

In this light, Meteor is not a bad framework, it is particularly very interesting in its approach to solving the problems of making web applications responsive to data updates. Writing apps in it will definitely, at least initially, be very fun. But my reason for writing this post is that I had one main gripe with the article. The problem was that DeBergalis continually likened what Meteor achieves with Facebook, implying that Facebook could be implemented using Meteor. This couldn't be further from the truth.

While the end result of an application written in Meteor and Facebook are very similar - they are both applications that update instantly as people interact with them - the approach that Facebook takes to writing their apps is the complete opposite from Meteor. Meteor places a massive emphasis on "don't worry about how data is communicated, let the framework deal with that for you". Although I have not worked on Facebook myself, I am sure that their approach is all about how the data is communicated - they don't just let the framework deal with that for them.

The problem with Meteor's approach to web development is that it makes the same mistakes that some very old technologies that many people now loath made. I am going to highlight two such technologies.

The first is relational databases. The promise of relational databases was that you don't have to worry about how your data was accessed - just make sure you store it in a normalised form, and let the database handle whatever load you throw at it. Performance can be achieved by tuning with indexes. But the problem that we found on the web is that that approach did not scale. Denormalisation and caching became necessary in any app with even a modest load. And that's when NoSQL databases started popping up. NoSQL databases intentionally limited what you could do in them - forcing you to take a different perspective on your data, namely how is it going to be read/written? They forced you to make decisions that would allow you to scale early in the design process, and we found that making these decisions early were key to successfully scaling a web application.

The second technology is n-tier application servers. The promise of application servers was that you didn't have to worry about deployment, you just wrote your applications, and let the application server worry about scalability and resilience. This led to people writing massive monolithic apps, where almost every function in the app depended on every single other function, killing any chance of ever having either resilience or scalability. When performance became an issue, clustering was "turned on", and often performance went down. And that's when containerless micro service solutions started becoming popular - small services that could be individually scaled. These new architectures forced you to think about scalability up front, making those decisions early.

Are you seeing a pattern here? Letting the technology handle resilience and scaling for you is bad, forcing you to address it up front is good. But Meteor seems to be making the exact same mistakes the relational databases and n-tier application servers made. It's trying to hide those concerns from you, in the name of "making programming fun again". While fun at first, this is certainly not going to be fun when your site gets popular and starts falling over because of the load it gets.

But maybe the Meteor developers have come up with a smart way to scale it. There are apparently two ways you can run multiple Meteor nodes, and the apparently better one is described here. The approach? Have each Meteor node tail the MongoDB Oplog. Or in simple English, make every write operation in the system go to every node in the cluster. I'll let you decide whether you think making that approach scale is fun.

As I said at the start I resonated well with the title of the article - but it seems that I have a very different idea of what's fun to what the authors of Meteor have. In my opinion, hiding the details of hard problems to scale is not fun. Rather, putting them in your face, giving you the tools to solve them at the right time, now that's fun. This is exactly what Play Framework and Akka do - particularly Akka, in which the assumption when you program is that every other part of the app is likely down or not responding, and you are forced to deal with what happens when that's the case. Using these technologies to solve these hard problems is not only fun, it's very satisfying - seeing an app with 50000 concurrent users broadcasting updates every second scale with only 10 nodes, it's exciting too!

The fun approach to hard problems is not to run away from them to something that pretends they don't exist. It's to embrace them head on, using technologies that are designed to help you do so.

Tags : , ,

A practical solution to the BREACH vulnerability

Two weeks ago CERT released an advisory for a new vulnerability called BREACH. In the advisory they say there is no practical solution to this vulnerability. I believe that I've come up with a practical solution that we'll probably implement in Play Frameworks CSRF protection.

Some background

First of all, what is the BREACH vulnerability? I recommend you read the advisory, there's no point in me repeating it here, but for those that are lazy, here are is a summary. The prerequisites for exploiting this vulnerability are:

  1. The target page must be using HTTPS, preferably with a stream cipher (eg RC4) though it is possible to exploit when block ciphers with padding are used (eg AES)
  2. The target page must be using HTTP level compression, eg gzip or deflate
  3. The target page must produce responses with a static secret in them. A typical example would be a CSRF token in a form.
  4. The target page must also reflect a request parameter in the response. It may also be possible to exploit if it reflected POSTED form body values in the response.
  5. Responses must be otherwise reasonably static. Dynamic responses, particularly ones that vary the length of the response, are much more expensive to exploit.
  6. The attacker must be able to eavesdrop on the connection, and specifically, measure the length of the encrypted responses.
  7. The attacker must be able to coerce the victims browser to request the target web page many times.

To exploit, the attacker gets the victims browser to submit specially crafted requests. These requests will contain repeat patterns that the compression algorithm will compress. If the pattern matches the first part of the secret, then the response will be shorter than if it doesn't, since that part of the secret will also be compressed along with the repeat patterns. Then character by character, the attacker can determine the secret.

Some work arounds

The advisory mentions some work arounds. Whether these work arounds are effective depend greatly on the specific application, none of them can be effectively done by a framework, without potentially breaking the application.

Probably the most effective of the work arounds is randomising the secret on each request. In the case of CSRF protection tokens, which is often provided by many frameworks, this would prevent a user from using the application from multiple tabs at the same time. It would also cause issues when a user uses the back button.

I would like to propose a variant of using randomised tokens, that should work for most framework provided CSRF protection mechanisms, and that, pending feedback from the internet on whether my approach will be effective, we will probably implement in Play Framework.

Signed nonces

The idea is to use a static secret, but combine it with a nonce, sign the secret and the nonce, and do this for every response that the secret is sent in. The signature will effectively create a token that is random in each response, thus violating the third prerequisite above, that the secret be static.

The nonce does not need to be generated in a cryptographically secure way, it may be a predictable value such as a timestamp. The important thing is that the nonce should change sufficiently frequently, and should repeat old values sufficiently infrequently, that it should not be possible to get many responses back that use the same nonce. The signature is the unpredictable part of the token.

Application servers will need to have a mechanism for signing the nonce and the secret using a shared secret. For applications served from many nodes, the secret will need to be shared between all nodes.

The application will represent secrets using two types of tokens, one being "raw tokens", which is just the raw secret, the other being "signed tokens". Signed tokens are tokens for which a nonce has been generated on each use. This nonce is concatenated with the raw token, and then signed. An algorithm to do this in Scala might look like this:

def createSignedToken(rawToken: String) = {
  val nonce = System.currentTimeMillis
  val joined = rawToken + "-" + nonce
  joined + "-" + hmacSign(joined)
}

where hmacSign is a function that signs the input String using the applications shared secret using the HMAC algorithm. HMAC is not the only signing algorithm that could be used, but it is a very common choice for these types of use cases.

Each time a token is sent in a response, it must be a newly generated signed token. While it is ok to publish the raw token in HTTP response headers, to avoid confusion on which incoming tokens must be signed and which can be raw, I recommend to always publish and only accept signed tokens. When comparing tokens, the signature should be verified on each token, and if that passes then only the raw part of the tokens need to be compared. An algorithm to extract the raw token from the signed token created using the above algorithm might look like this:

def extractRawToken(signedToken: String): Option[String] = {
  val splitted = signedToken.split("-", 3)
  val (rawToken, nonce, signature) = (splitted(0), splitted(1), splitted(2))
  if (thetaNTimeEquals(signature, hmacSign(rawToken + "-" + nonce))) {
    Some(rawToken)
  } else {
    None
  }
}

where thetaNTimeEquals does a String comparison with Θ(n) time when the lengths of the Strings are equal, to prevent timing attacks. Verifying that two tokens match might look like this:

def compareSignedTokens(tokenA: String, tokenB: String) = {
  val maybeEqual = for {
    rawTokenA <- extractRawToken(tokenA)
    rawTokenB <- extractRawToken(tokenB)
  } yield thetaNTimeEquals(rawTokenA, rawTokenB)
  maybeTrue.getOrElse(false)
}

Why this works

When using a signed token, the attacker can still work out what the raw token is using the BREACH vulnerability, however since the application doesn't accept raw tokens, this is not useful to the attacker. Because the attacker doesn't have the secret used to sign the signed token, they cannot generate a signed token themselves from the raw token. Hence, they need to determine not just the raw token, but an entire signed token. But since signed tokens are random for each response, this breaks the 3rd prerequisite above, that secrets in the response must be static, hence they cannot do a character by character evaluation using the BREACH vulnerability.

Encrypted tokens

Another option is to encrypt the concatenated nonce and raw token. This may result in shorter tokens, and I am not aware of any major performance differences between HMAC and AES for this purpose. APIs for HMAC signing do tend to be a little easier to use safely than APIs for AES encryption, this is why I've used HMAC signing as my primary example.

Framework considerations

The main issue that might prevent a framework from implementing this is that they might not readily have a secret available to them to use to do the signing or encrypting. When an application runs on a single node, it may be acceptable to generate a new secret at startup, though this would mean the secret changes on every restart.

Some frameworks, like Play Framework, do have an application wide secret available to them, and so this solution is practical to implement in application provided token based protection mechanisms such as CSRF protection.

100 Continue support in Play

The 100 Continue status code in the HTTP spec is one that most people know very little about. You kind of read it, don't really understand what it's talking about, and then just skip over it. I didn't know what it was about until I became a developer of a web framework. It turns out to be very useful in certain situations.

Let's say a client needs to make a very large upload, for example 1GB. What happens if the server can't satisfy the clients request? For example, what if the client submitted invalid authentication credentials? Or the request content was too long? Or the wrong media type? HTTP is a half duplex protocol, the client and server take it in turns to speak. This means that even though the server may know immediately after receiving the request header that it can't process the request, it still has to read the entire request body before it can tell the client that, even if that request body is a 1GB long and takes an hour to upload. And if you've ever done any large HTTP uploads before, you'll know there's nothing more frustrating than getting to the end of a large upload, only get an error back from the server.

HTTP has a solution to this, in the form of the Expect request header. The Expect header is used to tell the server that the client expects a certain behaviour of it. There is one defined value for it in the HTTP spec, and that is 100-continue. This tells the server that after sending the request headers, the client will not send the body of the request until it has received a 100 continue response. Otherwise, the server can immediately return with any other response code. After receiving a 100 continue response, the client will continue to send the body, and once the server has consumed that, the server will send a second response.

This can be used whenever the server wants to do validation of just the request headers. Here are some examples:

  • Authentication - if the client is not authenticated, the server can respond with 401 Unauthorized.
  • Authorisation - if the client is not authorised to make the request, the server can respond with 403 Forbidden.
  • Resource existence - if the client has attempted to put a resource at a location that doesn't exist, the server can respond with 404 Not Found
  • Content length limits - if the client hasn't sent a content length, the server can respond with 411 Length Required, or if the content length is larger than the server is willing to accept, the server can respond with 413 Request Entity Too Large
  • Content type validation - if the client is sending a content type that the server doesn't support, the server can respond with 415 Unsupported Media Type

100 continue support in Play Framework

So with all this in mind, how can this be implemented in Play framework? As you may be aware, at the lowest level, a Play action looks like this:

trait EssentialAction extends (RequestHeader => Iteratee[Array[Byte], Result])

The iteratee that the essential action function returns is what consumes the body. An iteratee can be in one of three states, done, cont (ready to receive more input), or error. When Play invokes an action to get the iteratee for the body, and a client has specified the Expect: 100-continue header, Play is able to check if that iteratee is ready to receive input, or if it's in a done or error state. If it's in a done or error state, Play will send the result immediately without consuming the body. If it's in the cont state, then Play will send a 100 continue response, and then feeds the body into the iteratee.

So for an action to take advantage of this, it just needs to ensure that it returns a done iteratee if the validation fails. Plays built in authentication action does just this:

def Authenticated[A](
  userinfo: RequestHeader => Option[A],
  onUnauthorized: RequestHeader => Result)(action: A => EssentialAction): EssentialAction = {

  EssentialAction { request =>
    userinfo(request).map { user =>
      action(user)(request)
    }.getOrElse {
      Done(onUnauthorized(request), Input.Empty)
    }
  }
}

In addition, all of Plays body parsers, when they check the content type, will return a done iteratee if the content type is wrong. So if I have an action that looks like this:

def upload = Authenticated(
    rh => rh.headers.get("Authentication-Token").filter(_ == "secret-token"), 
    rh => Forbidden("Authentication required")
) { token => Action(parse.text) { request =>
  Ok("Got body that was " + request.body.length + " characters long")
}}

And then I submit the following request header:

POST /upload HTTP/1.1
Host: localhost
Authentication-Token: secret-token
Content-Type: text/plain
Content-Length: 12
Expect: 100-continue

Play will immediately respond with:

100 Continue HTTP/1.1

At which point, I can then send my body, and Play will send the response. The whole transaction will look like this:

C: POST /upload HTTP/1.1
C: Host: localhost
C: Authentication-Token: secret-token
C: Content-Type: text/plain
C: Content-Length: 12
C: Expect: 100-continue
C: 
S: HTTP/1.1 100 Continue
S:
C: Hello world!
S: HTTP/1.1 200 OK
S: Content-Type: text/plain;charset=utf-8
S: Content-Length: 37
S:
S: Got body that was 12 characters long

However, if I don't send an authentication token, or if my content type is wrong, this is what will happen:

C: POST /upload HTTP/1.1
C: Host: localhost
C: Content-Type: text/plain
C: Content-Length: 12
C: Expect: 100-continue
C: 
S: HTTP/1.1 403 Forbidden
S: Content-Type: text/plain;charset=utf-8
S: Content-Length: 23
S:
S: Authentication required

And so even though in the request header I said that the content length was 12, I didn't have to upload it, because I sent the expect header, and Play didn't send a 100 continue response back, instead it was able to immediately tell me that the request would fail. Obviously with such a small body, this doesn't make a lot of sense, but with a body gigabytes in length, it means I don't have to spend however many hours uploading it before I finally find out that I wasn't allowed to upload it.

Tags : , ,

How to write a REST API in Play Framework

A very common question that we get on the Play mailing list is how do you write a REST API using Play Framework? There's no explicit documentation on it, you won't find a page in the Play documentation titled "Writing REST APIs". The question is often met with confusion, to those that try to answer it, the question for them is "how can you not write a REST API with Play? Play is all about REST."

So let me explain why we don't have a page on writing REST APIs. Play is fundamentally a framework for writing REST APIs, just like a fridge is a tool that is fundamentally for keeping food cold. When you buy a fridge, and you get the manual for a fridge, do you find a page titled "How to keep food cold using the fridge"? Probably not. You'll find instructions for installing the frige, turning it on, setting the temperature, adjusting the shelves, but you won't find instructions that explicitly say how to keep the food cold. Why not? Because it's assumed that you understand, when you buy the fridge, that the way to keep food cool in it is by putting food in and closing the door. The whole manual is about how to keep food cold, since that's the fridges fundamental function.

It's the same with Play. We assume first of all that you know what a REST API is. There's plenty of documentation out there on the web on what a REST API is, there's no reason for us to repeat this in our documentation, a good place to start might be this StackOverflow question. As the first answer to that question says, "Really, what it's about is using the true potential of HTTP", Play also provides everything you need to use the true potential of HTTP.

So we have documentation on writing routes in Scala and Java, we have documentation on sending results in Scala and Java, we have documentation on handling JSON in Scala and Java, and so on and so on. All this documentation is giving you the tools you need to implement what Play fundamentally about, that is, HTTP, which when realised to its true potential, will be REST. There's nothing special about a REST API in Play, writing a REST API in Play means writing a web application in the way that Play is designed to be used. We could probably rename the Play documentation home page to be "Writing a REST API in Play", that would accurately describe what most of the Play documentation is about.

Let me repeat again, Play is all about realising the full potential of HTTP, which means Play is all about REST. You want to read about how to write a REST API in Play? Read the Play documentation, it's all about writing a REST API in Play.

Tags : , , ,