all that jazz

james' blog about scala and all that jazz

A practical solution to the BREACH vulnerability

Two weeks ago CERT released an advisory for a new vulnerability called BREACH. In the advisory they say there is no practical solution to this vulnerability. I believe that I've come up with a practical solution that we'll probably implement in Play Frameworks CSRF protection.

Some background

First of all, what is the BREACH vulnerability? I recommend you read the advisory, there's no point in me repeating it here, but for those that are lazy, here are is a summary. The prerequisites for exploiting this vulnerability are:

  1. The target page must be using HTTPS, preferably with a stream cipher (eg RC4) though it is possible to exploit when block ciphers with padding are used (eg AES)
  2. The target page must be using HTTP level compression, eg gzip or deflate
  3. The target page must produce responses with a static secret in them. A typical example would be a CSRF token in a form.
  4. The target page must also reflect a request parameter in the response. It may also be possible to exploit if it reflected POSTED form body values in the response.
  5. Responses must be otherwise reasonably static. Dynamic responses, particularly ones that vary the length of the response, are much more expensive to exploit.
  6. The attacker must be able to eavesdrop on the connection, and specifically, measure the length of the encrypted responses.
  7. The attacker must be able to coerce the victims browser to request the target web page many times.

To exploit, the attacker gets the victims browser to submit specially crafted requests. These requests will contain repeat patterns that the compression algorithm will compress. If the pattern matches the first part of the secret, then the response will be shorter than if it doesn't, since that part of the secret will also be compressed along with the repeat patterns. Then character by character, the attacker can determine the secret.

Some work arounds

The advisory mentions some work arounds. Whether these work arounds are effective depend greatly on the specific application, none of them can be effectively done by a framework, without potentially breaking the application.

Probably the most effective of the work arounds is randomising the secret on each request. In the case of CSRF protection tokens, which is often provided by many frameworks, this would prevent a user from using the application from multiple tabs at the same time. It would also cause issues when a user uses the back button.

I would like to propose a variant of using randomised tokens, that should work for most framework provided CSRF protection mechanisms, and that, pending feedback from the internet on whether my approach will be effective, we will probably implement in Play Framework.

Signed nonces

The idea is to use a static secret, but combine it with a nonce, sign the secret and the nonce, and do this for every response that the secret is sent in. The signature will effectively create a token that is random in each response, thus violating the third prerequisite above, that the secret be static.

The nonce does not need to be generated in a cryptographically secure way, it may be a predictable value such as a timestamp. The important thing is that the nonce should change sufficiently frequently, and should repeat old values sufficiently infrequently, that it should not be possible to get many responses back that use the same nonce. The signature is the unpredictable part of the token.

Application servers will need to have a mechanism for signing the nonce and the secret using a shared secret. For applications served from many nodes, the secret will need to be shared between all nodes.

The application will represent secrets using two types of tokens, one being "raw tokens", which is just the raw secret, the other being "signed tokens". Signed tokens are tokens for which a nonce has been generated on each use. This nonce is concatenated with the raw token, and then signed. An algorithm to do this in Scala might look like this:

def createSignedToken(rawToken: String) = {
  val nonce = System.currentTimeMillis
  val joined = rawToken + "-" + nonce
  joined + "-" + hmacSign(joined)
}

where hmacSign is a function that signs the input String using the applications shared secret using the HMAC algorithm. HMAC is not the only signing algorithm that could be used, but it is a very common choice for these types of use cases.

Each time a token is sent in a response, it must be a newly generated signed token. While it is ok to publish the raw token in HTTP response headers, to avoid confusion on which incoming tokens must be signed and which can be raw, I recommend to always publish and only accept signed tokens. When comparing tokens, the signature should be verified on each token, and if that passes then only the raw part of the tokens need to be compared. An algorithm to extract the raw token from the signed token created using the above algorithm might look like this:

def extractRawToken(signedToken: String): Option[String] = {
  val splitted = signedToken.split("-", 3)
  val (rawToken, nonce, signature) = (splitted(0), splitted(1), splitted(2))
  if (thetaNTimeEquals(signature, hmacSign(rawToken + "-" + nonce))) {
    Some(rawToken)
  } else {
    None
  }
}

where thetaNTimeEquals does a String comparison with Θ(n) time when the lengths of the Strings are equal, to prevent timing attacks. Verifying that two tokens match might look like this:

def compareSignedTokens(tokenA: String, tokenB: String) = {
  val maybeEqual = for {
    rawTokenA <- extractRawToken(tokenA)
    rawTokenB <- extractRawToken(tokenB)
  } yield thetaNTimeEquals(rawTokenA, rawTokenB)
  maybeTrue.getOrElse(false)
}

Why this works

When using a signed token, the attacker can still work out what the raw token is using the BREACH vulnerability, however since the application doesn't accept raw tokens, this is not useful to the attacker. Because the attacker doesn't have the secret used to sign the signed token, they cannot generate a signed token themselves from the raw token. Hence, they need to determine not just the raw token, but an entire signed token. But since signed tokens are random for each response, this breaks the 3rd prerequisite above, that secrets in the response must be static, hence they cannot do a character by character evaluation using the BREACH vulnerability.

Encrypted tokens

Another option is to encrypt the concatenated nonce and raw token. This may result in shorter tokens, and I am not aware of any major performance differences between HMAC and AES for this purpose. APIs for HMAC signing do tend to be a little easier to use safely than APIs for AES encryption, this is why I've used HMAC signing as my primary example.

Framework considerations

The main issue that might prevent a framework from implementing this is that they might not readily have a secret available to them to use to do the signing or encrypting. When an application runs on a single node, it may be acceptable to generate a new secret at startup, though this would mean the secret changes on every restart.

Some frameworks, like Play Framework, do have an application wide secret available to them, and so this solution is practical to implement in application provided token based protection mechanisms such as CSRF protection.

100 Continue support in Play

The 100 Continue status code in the HTTP spec is one that most people know very little about. You kind of read it, don't really understand what it's talking about, and then just skip over it. I didn't know what it was about until I became a developer of a web framework. It turns out to be very useful in certain situations.

Let's say a client needs to make a very large upload, for example 1GB. What happens if the server can't satisfy the clients request? For example, what if the client submitted invalid authentication credentials? Or the request content was too long? Or the wrong media type? HTTP is a half duplex protocol, the client and server take it in turns to speak. This means that even though the server may know immediately after receiving the request header that it can't process the request, it still has to read the entire request body before it can tell the client that, even if that request body is a 1GB long and takes an hour to upload. And if you've ever done any large HTTP uploads before, you'll know there's nothing more frustrating than getting to the end of a large upload, only get an error back from the server.

HTTP has a solution to this, in the form of the Expect request header. The Expect header is used to tell the server that the client expects a certain behaviour of it. There is one defined value for it in the HTTP spec, and that is 100-continue. This tells the server that after sending the request headers, the client will not send the body of the request until it has received a 100 continue response. Otherwise, the server can immediately return with any other response code. After receiving a 100 continue response, the client will continue to send the body, and once the server has consumed that, the server will send a second response.

This can be used whenever the server wants to do validation of just the request headers. Here are some examples:

  • Authentication - if the client is not authenticated, the server can respond with 401 Unauthorized.
  • Authorisation - if the client is not authorised to make the request, the server can respond with 403 Forbidden.
  • Resource existence - if the client has attempted to put a resource at a location that doesn't exist, the server can respond with 404 Not Found
  • Content length limits - if the client hasn't sent a content length, the server can respond with 411 Length Required, or if the content length is larger than the server is willing to accept, the server can respond with 413 Request Entity Too Large
  • Content type validation - if the client is sending a content type that the server doesn't support, the server can respond with 415 Unsupported Media Type

100 continue support in Play Framework

So with all this in mind, how can this be implemented in Play framework? As you may be aware, at the lowest level, a Play action looks like this:

trait EssentialAction extends (RequestHeader => Iteratee[Array[Byte], Result])

The iteratee that the essential action function returns is what consumes the body. An iteratee can be in one of three states, done, cont (ready to receive more input), or error. When Play invokes an action to get the iteratee for the body, and a client has specified the Expect: 100-continue header, Play is able to check if that iteratee is ready to receive input, or if it's in a done or error state. If it's in a done or error state, Play will send the result immediately without consuming the body. If it's in the cont state, then Play will send a 100 continue response, and then feeds the body into the iteratee.

So for an action to take advantage of this, it just needs to ensure that it returns a done iteratee if the validation fails. Plays built in authentication action does just this:

def Authenticated[A](
  userinfo: RequestHeader => Option[A],
  onUnauthorized: RequestHeader => Result)(action: A => EssentialAction): EssentialAction = {

  EssentialAction { request =>
    userinfo(request).map { user =>
      action(user)(request)
    }.getOrElse {
      Done(onUnauthorized(request), Input.Empty)
    }
  }
}

In addition, all of Plays body parsers, when they check the content type, will return a done iteratee if the content type is wrong. So if I have an action that looks like this:

def upload = Authenticated(
    rh => rh.headers.get("Authentication-Token").filter(_ == "secret-token"), 
    rh => Forbidden("Authentication required")
) { token => Action(parse.text) { request =>
  Ok("Got body that was " + request.body.length + " characters long")
}}

And then I submit the following request header:

POST /upload HTTP/1.1
Host: localhost
Authentication-Token: secret-token
Content-Type: text/plain
Content-Length: 12
Expect: 100-continue

Play will immediately respond with:

100 Continue HTTP/1.1

At which point, I can then send my body, and Play will send the response. The whole transaction will look like this:

C: POST /upload HTTP/1.1
C: Host: localhost
C: Authentication-Token: secret-token
C: Content-Type: text/plain
C: Content-Length: 12
C: Expect: 100-continue
C: 
S: HTTP/1.1 100 Continue
S:
C: Hello world!
S: HTTP/1.1 200 OK
S: Content-Type: text/plain;charset=utf-8
S: Content-Length: 37
S:
S: Got body that was 12 characters long

However, if I don't send an authentication token, or if my content type is wrong, this is what will happen:

C: POST /upload HTTP/1.1
C: Host: localhost
C: Content-Type: text/plain
C: Content-Length: 12
C: Expect: 100-continue
C: 
S: HTTP/1.1 403 Forbidden
S: Content-Type: text/plain;charset=utf-8
S: Content-Length: 23
S:
S: Authentication required

And so even though in the request header I said that the content length was 12, I didn't have to upload it, because I sent the expect header, and Play didn't send a 100 continue response back, instead it was able to immediately tell me that the request would fail. Obviously with such a small body, this doesn't make a lot of sense, but with a body gigabytes in length, it means I don't have to spend however many hours uploading it before I finally find out that I wasn't allowed to upload it.

How to write a REST API in Play Framework

A very common question that we get on the Play mailing list is how do you write a REST API using Play Framework? There's no explicit documentation on it, you won't find a page in the Play documentation titled "Writing REST APIs". The question is often met with confusion, to those that try to answer it, the question for them is "how can you not write a REST API with Play? Play is all about REST."

So let me explain why we don't have a page on writing REST APIs. Play is fundamentally a framework for writing REST APIs, just like a fridge is a tool that is fundamentally for keeping food cold. When you buy a fridge, and you get the manual for a fridge, do you find a page titled "How to keep food cold using the fridge"? Probably not. You'll find instructions for installing the frige, turning it on, setting the temperature, adjusting the shelves, but you won't find instructions that explicitly say how to keep the food cold. Why not? Because it's assumed that you understand, when you buy the fridge, that the way to keep food cool in it is by putting food in and closing the door. The whole manual is about how to keep food cold, since that's the fridges fundamental function.

It's the same with Play. We assume first of all that you know what a REST API is. There's plenty of documentation out there on the web on what a REST API is, there's no reason for us to repeat this in our documentation, a good place to start might be this StackOverflow question. As the first answer to that question says, "Really, what it's about is using the true potential of HTTP", Play also provides everything you need to use the true potential of HTTP.

So we have documentation on writing routes in Scala and Java, we have documentation on sending results in Scala and Java, we have documentation on handling JSON in Scala and Java, and so on and so on. All this documentation is giving you the tools you need to implement what Play fundamentally about, that is, HTTP, which when realised to its true potential, will be REST. There's nothing special about a REST API in Play, writing a REST API in Play means writing a web application in the way that Play is designed to be used. We could probably rename the Play documentation home page to be "Writing a REST API in Play", that would accurately describe what most of the Play documentation is about.

Let me repeat again, Play is all about realising the full potential of HTTP, which means Play is all about REST. You want to read about how to write a REST API in Play? Read the Play documentation, it's all about writing a REST API in Play.

Call response WebSockets in Play Framework

I got a question from a Play user about implementing call/response WebSockets in Play Framework. This is not something that comes up that often, since it means using WebSockets to do basically what AJAX does for you, so what's the point? But here are some use cases that I've thought of:

  • You have some transformation of a stream that can only be done on the server side. For example, perhaps the transformation requires some heavy database work, or is too computationally expensive for a mobile client, or perhaps you want to encrypt the stream with a key that is private to the server.
  • You are already processing a stream of events from the server using WebSockets, and the responses to the calls are just more events in this stream, so you'd like to share the same transport mechanism for these events.
  • Your application is particularly chatty, and you don't want the overhead of the HTTP protocol on each call/response.

There are possibly more use cases - WebSockets is quite a new technology and as an industry we haven't really settled on what it's best use cases are.

A simple echo implementation

A Play WebSocket is implemented by providing an iteratee that consumes messages from the client, and an enumerator that produces messages for the client. If we simply wanted to echo every message that the client sent us, then we would want to return an iteratee whose input becomes the output of the enumerator that we return. Play doesn't come with anything out of the box to do this, but we will probably add something out of the box that does this in a future release. For now, I'm going to write a method called joined, that returns a joined iteratee/enumerator pair:

/**
 * Create a joined iteratee enumerator pair.
 *
 * When the enumerator is applied to an iteratee, the iteratee subsequently consumes whatever the iteratee in the pair
 * is applied to.  Consequently the enumerator is "one shot", applying it to subsequent iteratees will throw an
 * exception.
 */
def joined[A]: (Iteratee[A, Unit], Enumerator[A]) = {
  val promisedIteratee = Promise[Iteratee[A, Unit]]()
  val enumerator = new Enumerator[A] {
    def apply[B](i: Iteratee[A, B]) = {
      val doneIteratee = Promise[Iteratee[A, B]]()

      // Equivalent to map, but allows us to handle failures
      def wrap(delegate: Iteratee[A, B]): Iteratee[A, B] = new Iteratee[A, B] {
        def fold[C](folder: (Step[A, B]) => Future[C]) = {
          val toReturn = delegate.fold {
            case done @ Step.Done(a, in) => {
              doneIteratee.success(done.it)
              folder(done)
            }
            case Step.Cont(k) => {
              folder(Step.Cont(k.andThen(wrap)))
            }
            case err => folder(err)
          }
          toReturn.onFailure {
            case e => doneIteratee.failure(e)
          }
          toReturn
        }
      }

      if (promisedIteratee.trySuccess(wrap(i).map(_ => ()))) {
        doneIteratee.future
      } else {
        throw new IllegalStateException("Joined enumerator may only be applied once")
      }
    }
  }
  (Iteratee.flatten(promisedIteratee.future), enumerator)
}

This code might be a little scary if you don't understand iteratees, but as I said we will probably add this to Play itself in future. The rest of the code in this blog post will be simple.

Now that we have our joined iteratee/enumerator, let's implement an echo WebSocket. For the rest of this post we'll be assuming that all our WebSockets are sending/receiving JSON messages.

def echo = WebSocket.using[JsValue] { req =>
  joined[JsValue]
}

So now we have an echo call/response WebSocket. But this is not very useful, we want to do something with the incoming messages, and producing new outgoing messages as responses.

Processing messages

So now that we've expressed our call/response in terms of a joined iteratee/enumerator, how can we transform the call messages to be different response messages? The answer is enumeratees. Enumeratees can be used to transform iteratees and enumerators. We return both an enumerator and an iteratee, so which one do we transform? The answer is it doesn't matter, I'm going to use it to transform the iteratee. The enumeratee that we're going to use is the map enumeratee:

def process = WebSocket.using[JsValue] { req =>
  val (iter, enum) = joined[JsValue]

  (Enumeratee.map[JsValue] { json =>
    Json.obj(
      "status" -> "received",
      "msg" -> json
    )
  } &> iter, enum)
}

Enumeratees are one of the most powerful features of iteratees for end users. You could use any enumeratee here, but let's look at some examples of other common use cases.

What if we don't want to return a response to every message? There are numerous ways to do this, but the simplest is to use the collect enumeratee, which takes a partial function:

def process = WebSocket.using[JsValue] { req =>
  val (iter, enum) = joined[JsValue]

  (Enumeratee.collect[JsValue] { 
    case json if (json \ "foo").asOpt[JsValue].isDefined =>
      Json.obj(
        "status" -> "received",
        "msg" -> json
      )
  } &> iter, enum)
}

Perhaps we want to produce many responses for a single input. The mapConcat enumeratee can be used in this case, with our map function returning a sequence of JsValue messages to return:

def process = WebSocket.using[JsValue] { req =>
  val (iter, enum) = joined[JsValue]

  (Enumeratee.mapConcat[JsValue] { json =>
    Seq(
      Json.obj(
        "status" -> "received",
        "msg" -> json
      ),
      Json.obj("foo" -> "bar")
    )
  } &> iter, enum)
}

What if we want to do some blocking operations? In Play 2.2, this will be able to be done simply by providing an execution context suitable for blocking calls to whichever enumeratee you decide to use, but Play 2.1 does not yet support this, so we have to dispatch the callback to another execution context ourselves. This can be done using the mapM enumeratee:

val ec: ExecutionContext = ...

def process = WebSocket.using[JsValue] { req =>
  val (iter, enum) = joined[JsValue]

  (Enumeratee.mapM[JsValue] { json =>
    Future {
      // Some expensive computation, eg a database call, that returns JsValue
    }(ec)
  } &> iter, enum)
}

Pushing from an external enumerator

You may want to combine your call/response messages with messages from some other enumerator that spontaneously pushes messages to the client, for example a broadcasting enumerator for all clients. This can be done by interleaving your joined enumerator with the external enumerator:

val globalEvents: Enumerator[JsValue] = ...

def process = WebSocket.using[JsValue] { req =>
  val (iter, enum) = joined[JsValue]

  (Enumeratee.map[JsValue] { json =>
    ...
  } &> iter, Enumerator.interleave(enum, globalEvents))
}

Conclusion

Using WebSockets in a call response style may be something that your application needs. If so, using enumeratees to map the stream of messages coming in to messages going out is the most natural and idiomatic way of doing this in Play. It allows you to call on the large number of composable enumeratees that Play provides out of the box, and makes your code simple and easy to reason about.

What is wrong with Monads?

Nothing is wrong with Monads. What is wrong is how people communicate them.

There are people out there that learn abstract concepts first, and then work out how to apply them to solving their every day problems. Then there are people that learn how to solve their every day problems, and then they can understand how these solutions can be generalised into abstract concepts for solving problems. I am firmly in the second camp, explain an abstract concept to me that I've never come across before, and it'll go in one ear and out the other. Explain a solution to a problem that I have at hand, and I'll understand. Tell me that the solution can be generalised into an abstract concept, and you won't need to tell me anything more about that abstract concept, I'll understand it immediately.

I envy the people that can understand the abstract concepts first before applying them to real problems. I really wish I had that ability. But I don't. And I don't think I'm alone. I suspect many, maybe even most developers learn the same way that I do.

So this is the problem with monads, the problem is too abstract. I tried to read up about monads quite a number of times, and I always got stuck. I even read about "burritos" and "semicolons", these simplifications just made me more confused. None of the stuff I was reading made any sense to me.

Then one day I started using Play 2. Play 2 is an asynchronous web framework. It heavily uses futures as a means to implement asynchronous code. I had been writing some asynchronous code for a while before I had started using Play 2, and I knew how painful it could be. When I saw futures in Play, it solved a problem that I had perfectly. The ability to return a result that could then be mapped/flatMapped was exactly what I needed.

I was so excited about this new discovery that I prepared and gave a presentation on Play's futures at the Berlin Play User Group. I put forward a real world problem, and explained how Future, map and flatMap could solve it. I got great feedback from everyone at the user group, it really helped a lot of people to understand asynchronous programming and futures. At that point, I still had no idea what a monad was, yet somehow I was teaching others about monads.

A while later I was at the Berlin Play User Group again, and a conversation about monads came up. I admitted to everyone there that I didn't know what a monad was. Someone looked at me funny, and said "it's basically just flatMap". Suddenly it all made sense to me. I went back and read the same papers and blog posts that I had tried to read before, and they all made perfect sense, I now had no problems understanding monads.

So why didn't I understand monads before that? What was missing was an explanation that was given within the context of a concrete problem that I could relate to. Why was it missing? Because monads are simply too abstract. Each specific different monad solves a different concrete problem. The option monad solves a problem involving values that might not exist. The future monad solves a problem involving values that don't exist yet. The sequence monad solves a problem involving a sequence of values. And so on. To explain the general concept of monads, you need to abstract it away from any of the specific concrete problems that each monad solves, into a general abstract problem. And so if, to explain monads, you start with monads, you are at an abstraction level that is twice removed from real world problems. That means if you try to explain them starting with the abstract concepts to someone such as myself, at best you'll just make my head explode.

This is why I say, if you want to explain monads to a developer who is anything like me, don't start with monads. Don't even use the word monad, the moment you do that you are throwing down a concrete wall in front of them 30 foot high that they will not be able to climb. Start with a real world problem - asynchronous programming with futures is an excellent one, and talk about how to solve that. Then maybe later talk about monads. But by that stage, does it matter if they know what a monad is? They've learnt the concepts already, probably well enough to give a presentation at a user group about them.

About

Hi! My name is James Roper, and I am a software developer with a particular interest in open source development and trying new things. I program in Scala, Java, Go, PHP, Python and Javascript, and I work for Lightbend as the architect of Kalix. I also have a full life outside the world of IT, enjoy playing a variety of musical instruments and sports, and currently I live in Canberra.