all that jazz

james' blog about scala and all that jazz

Iteratees for imperative programmers

When I first heard the word iteratee, I thought it was a joke. Turns out, it wasn't a joke, in fact there are also enumerators (that's ok) and enumeratees (you're killing me). If you're an imperative programmer, or rather a programmer who feels more comfortable writing imperative code than functional code, then you may be a little overwhelmed by all the introductions to iteratees out there, because they all assume that you think from a functional perspective. Well I just learnt iteratees, and although I'm feeling more and more comfortable with functional programming every day, I still think like an imperative programmer at heart. This made learning iteratees very difficult for me. So while I'm still in the imperative mindset, I thought this a very good opportunity to explain iteratees from an imperative programmers perspective, taking no functional knowledge for granted. If you're an imperative programmer who wants to learn iteratees, this is the blog post for you. I'm going to specifically be looking at Play's Iteratee API, but the concepts learnt here will apply to all Iteratees in general.

So let's start off with explaining what iteratees, and their counterparts, are trying to achieve. An iteratee is a method of reactively handling streams of data that is very easily composable. By reactive, I mean non blocking, ie you react to data being available to read, and react to the opportunity to write data. By composable, I mean you write simple iteratees that do one small thing well, then you use those as the building blocks to write iteratees that do bigger things, and you use those as the building blocks to write iteratees to do even bigger things, and so on. At each stage, everything is simple and easy to reason about.

Reactive stream handling

If you're looking for information about iteratees, then I'm guessing you already know a bit about what reactive stream handling is. Let's contrast it to synchronous IO code:

trait InputStream {
  def read(): Byte
}

So this should be very familiar, if you want to read a byte, you call read. If no byte is currently available to be read, that call will block, and your thread will wait until a byte is available. With reactive streams, obviously it's the other way around, you pass a callback to the stream you want to receive data from, and it will call that when it's ready to give data to you. So typically you might implement a trait that looks like this:

trait InputStreamHandler {
  def onByte(byte: Byte)
}

So before we go on, let's look at how the same thing would be achieved in a pure functional world. At this point I don't want you to ask why we want to do things this way, you will see that later on, but if you know anything about functional programming, you know that everything tends to be immutable, and functions have no side effects. The trait above has to have side effects, because unless you are ignoring the bytes passed to onByte, you must be changing your state (or something elses state) somehow in that function. So, how do we handle data without changing our state? The answer is the same way other immutable data structures work, we return a copy of ourselves, updated with the new state. So if the InputStreamHandler were to be functional, it might look like this:

trait InputStreamHandler {
  def onByte(byte: Byte): InputStreamHandler
}

And an example implementation of one, that reads input into a seq, might look like this:

class Consume(data: Seq[Byte]) extends InputStreamHandler {
  def onByte(byte: Byte) = new Consume(data :+ byte)
}

So we now have imperative and functional traits that react to our input stream, and you might be thinking this is all there is to reactive streams. If that's the case, you're wrong. What if we're not ready to handle data when the onByte method is called? If we're building structures in memory this will never be the case, but if for example we're storing them to a file or to a database as we receive the data, then this very likely will be the case. So reactive streams are two way, it's not just you, the stream consumer that is reacting to input, the stream producer must react to you being ready for input.

Now this is possible to implement in an imperative world, though things do start looking much more functional. We simply start using futures:

trait InputStreamHandler {
  def onByte(byte: Byte): Future[Unit]
}

So, when the stream we are consuming has a byte for us, it calls onByte, and then attaches a callback to the future we return, to pass the next byte, when it's ready. If you have a look at Netty's asynchronous channel APIs, you'll see it uses exactly this pattern. We can also implement something similar for an immutable functional API:

trait InputStreamHandler {
  def onByte(byte: Byte): Future[InputStreamHandler]
}

And so here we have a functional solution for reactive stream handling. But it's not a very good one, for a start, there's no way for the handlers to communicate to the code that uses them that they don't want to receive any more input, or if they've encountered an error (exceptions are frowned upon in functional programming). We could add things to handle this, but very soon our interface would become quite complex, hard to break up into small pieces that can be composed, etc. I'm not going to justify this now, I think you'll see it later when I show you just how easy iteratees are to compose.

So, by this stage I hope you have understood two important points. Firstly, reactive stream handling means twofold reacting, both your code has to react to the stream being ready, and the stream has to react to you being ready. Secondly, when I say that we want a functional solution, I mean a solution where everything is immutable, and that is achieved by our stream handlers producing copies of themselves each time they receive/send data. If you've understood those two important points, then now we can move on to introducing iteratees.

Iteratees

There are a few things that our interface hasn't yet addressed. The first is, how does the stream communicate to us that it is finished, that is, that it has no more data for us? To do this, instead of passing in a byte directly, we're going to abstract our byte to be something of type Input[Byte], and that type can have three possible implementations, EOF, an element, or empty. Let's not worry about why we need empty just yet, but assume for some reason we might want to pass empty. So this is what Input looks like:

sealed trait Input[+E]

object Input {
  case object EOF extends Input[Nothing]
  case object Empty extends Input[Nothing]
  case class El[+E](e: E) extends Input[E]
}

Updating our InputStreamHandler, we now get something that looks like this:

trait InputStreamHandler[E] {
  def onInput(in: Input[E]): Future[InputStreamHandler[E]]
}

Now updating our Consumer from before to handle this, it might look like this:

class Consume(data: IndexedSeq[Byte]) extends InputStreamHandler[Byte] {
  def onInput(in: Input[Byte]) = in match {
    case El(byte) => Future.successful(new Consume(data :+ byte))
    case _ => Future.successful(this)
  }
}

You can see that when we get EOF or Empty, there's nothing for us to do to change our state, so we just return ourselves again. If we were writing to another stream, we might, when we receive EOF, close that stream (or rather, send it an EOF).

The next thing we're going to do is make it easier for our handler to consume input immediately without having to create a future. To do this, rather than passing the byte directly, we'll pass a function, that takes a function as a parameter, and that function will take the byte as a parameter. So, our handler, when it's ready, will create a function to handle the byte, and then invoke the function that was passed to it, with that function. We'll call the first function the cont function, which is short for continue, and means when you're ready to continue receiving input invoke me. Too many functions? Let's look at the code:

trait InputStreamHandler[E] {
  def onByte[B](cont: (Input[E] => InputStreamHandler[E]) => Future[B]): Future[B]
}

Now where did this Future[B] come from? B is just the mechanism that the stream uses to pass state back to itself. As the handler, we don't have to worry about what it is, we just have to make sure that we eventually invoke the cont function, and eventually make sure that the B it returns makes it back to our caller. And what does this look like in our Consume iteratee? Let's have a look:

class Consume(data: IndexedSeq[Byte]) extends InputStreamHandler {
  def onByte(cont: (Input[Byte] => InputStreamHandler) => Future[B]) = cont {
    case Input.El(byte) => new Consume(data :+ byte)
    case _ => this
  }
}

You can see in our simple case of being ready to handle input immediately, we just immediately invoke cont, we no longer need to worry about creating futures. If we want to handle the input asynchronously, it is a little more complex, but we'll take a look at that later.

Now we have one final step in producing our iteratee API. How does the handler communicate back to the stream that it is finished receiving data? There could be two reasons for this, one is that it's finished receiving data. For example, if our handler is a JSON parser, it might have reached the end of the object it was parsing, and so doesn't want to receive anymore. The other reason is that it's encountered an error, for a JSON parser, this might be a syntax error, or if it's sending data through to another stream, it might be an IO error on that stream.

To allow our iteratee to communicate with the stream, we're going to create a trait that represents its state. We'll call this trait Step, and the three states that the iteratee can be in will be Cont, Done and Error. Our Cont state is going to contain our Input[Byte] => InputStreamHandler function, so that the stream can invoke it. Our Done state will contain our result (in the case of Consume, a Seq[Byte]) and our Error state will contain an error message.

In addition to this, both our Done and Error states need to contain the left over input that they didn't consume. This will be important for when we are composing iteratees together, so that once one iteratee has finished consuming input from a stream, the next can pick up where the first left off. This is one reason why we need Input.Empty, because if we did consume all the input, then we need some way to indicate that.

So, here's our Step trait:

sealed trait Step[E, +A]

object Step {
  case class Done[+A, E](a: A, remaining: Input[E]) extends Step[E, A]
  case class Cont[E, +A](k: Input[E] => InputStreamHandler[E, A]) extends Step[E, A]
  case class Error[E](msg: String, input: Input[E]) extends Step[E, Nothing]
}

The type parameter E is the type of input our iteratee wants to accept, and A is what it's producing. So our handler trait now looks like this:

trait InputStreamHandler[E, A] {
  def onInput[B](step: Step[E, A] => Future[B]): Future[B]
}

And our consumer is implemented like this:

class Consume(data: Seq[Byte]) extends InputStreamHandler[Byte, Seq[Byte]] {
  def onInput(step: Step[Byte, Seq[Byte]] => Future[B]) = step(Step.Cont({
    case Input.El(byte) => new Consume(data :+ byte)
    case Input.EOF => new InputStreamHandler[Byte, Seq[Byte]] {
      def onInput(cont: Step[Byte, Seq[Byte]] => Future[B]) = step(Step.Done(data, Input.Empty))
    }       
    case Input.Empty => this
  }))
}

One big difference here that you now notice is when we receive EOF, we actually pass Done into the step function, to say we are done consuming the input.

And so now we've built our iteratee interface. Our naming isn't quite right though, so we'll rename the trait obviously to Iteratee, and we'll rename onInput to fold, since we are folding our state into one result. And so now we get our interface:

trait Iteratee[E, +A] {
  def fold[B](folder: Step[E, A] => Future[B]): Future[B]
}

Iteratees in practice

So far we've started with the requirements of a traditional imperative input stream, and described what an iteratee is in constrast to that. But looking at the above code, you might think that using them is really difficult. They seem like they are far more complex than they need to be, at least conceptually, to implement reactive streams. Well, it turns out that although so far we've shown the basics of the iteratee interface, there is a lot more that a full iteratee API has to offer, and once we start understanding this, and using it, you will start to see how powerful, simple and useful iteratees are.

So remember how iteratees are immutable? And remember how iteratees can be in one of three states, cont, done and error, and depending on which state it's in, it will pass its corresponding step class to the folder function? Well, if an iteratee is immutable and it can be in one of three states, then it can only ever be in that state that it's in, and therefore it will only ever pass that corresponding step to the folder function. If an iteratee is done, it's done, it doesn't matter how many times you call its fold function, it will never become cont or error, and its done value will never change, it will only ever pass the Done step to the folder function with the same A value and the same left over input. Because of this, there is only one implementation of a done iteratee that we'll ever need, it looks like this:

case class Done[E, A](a: A, e: Input[E] = Input.Empty) extends Iteratee[E, A] {
  def fold[B](folder: Step[E, A] => Future[B]): Future[B] = folder(Step.Done(a, e))
}

This is the only done iteratee you'll ever need to indicate that you're done. In the Consume iteratee above, when we reached EOF, we created a done iteratee using an anonymous inner class, we didn't need to do this, we could have just used the Done iteratee above. The exact same thing holds for error iteratees:

case class Error[E](msg: String, e: Input[E]) extends Iteratee[E, Nothing] {
  def fold[B](folder: Step[E, Nothing] => Future[B]): Future[B] = folder(Step.Error(msg, e))
}

You may be surprised to find out the exact same thing applies to cont iteratees too - a cont iteratee just passes a function the folder, and that function, because the iteratee is immutable, is never going to change. So consequently, the following iteratee will usually be good enough for your requirements:

case class Cont[E, A](k: Input[E] => Iteratee[E, A]) extends Iteratee[E, A] {
  def fold[B](folder: Step[E, A] => Future[B]): Future[B] = folder(Step.Cont(k))
}

So let's rewrite our consume iteratee to use these helper classes:

def consume(data: Array[Byte]): Iteratee[Byte, Array[Byte]] = Cont {
  case Input.El(byte) => consume(data :+ byte)
  case Input.EOF => Done(data)
  case Input.Empty => consume(data)
}

A CSV parser

Now we're looking a lot simpler, our code is focussed on just handling the different types of input we could receive, and returning the correct result. So let's start writing some different iteratees. In fact, let's write an iteratee that parses a CSV file from a stream of characters. Our CSV parser will support optionally quoting fields, and escaping quotes with a double quote.

Our first step will be to write the building blocks of our parser. First up, we want to write something that skips some kinds of white space. So let's write a general purpose drop while iteratee:

def dropWhile(p: Char => Boolean): Iteratee[Char, Unit] = Cont {
  case in @ Input.El(char) if !p(char) => Done(Unit, in)
  case in @ Input.EOF => Done(Unit, in)
  case _ => dropWhile(p)
}

Since we're just dropping input, our result is actually Unit. We return Done if the predicate doesn't match the current char, or if we reach EOF, and otherwise, we return ourselves again. Note that when we are done, we include the input that was passed into us as the remaining data, because this is going to be needed to be consumed by the next iteratee. Using this iteratee we can now write an iteratee that drops white space:

def dropSpaces = dropWhile(c => c == ' ' || c == '\t' || c == '\r')

Next up, we're going to write a take while iteratee, it's going to be a mixture between our earlier consume iteratee, carrying state between each invocation, and the drop while iteratee:

def takeWhile(p: Char => Boolean, data: Seq[Char] = IndexedSeq[Char]()): Iteratee[Char, Seq[Char]] = Cont {
  case in @ Input.El(char) => if (p(char)) {
    takeWhile(p, data :+ char)
  } else {
    Done(data, in)
  }
  case in @ Input.EOF => Done(data, in)
  case _ => takeWhile(p, data)
}

We also want to write a peek iteratee, that looks at what the next input is, without actually consuming it:

def peek: Iteratee[Char, Option[Char]] = Cont {
  case in @ Input.El(char) => Done(Some(char), in)
  case in @ Input.EOF => Done(None, in)
  case Input.Empty => peek
}

Note that our peek iteratee must return an option, since if it encounters EOF, it can't return anything.

And finally, we want a take one iteratee:

def takeOne: Iteratee[Char, Option[Char]] = Cont {
  case in @ Input.El(char) => Done(Some(char))
  case in @ Input.EOF => Done(None, in)
  case Input.Empty => takeOne
}

Using the take one iteratee, we'll build an expect iteratee, that mandates that a certain character must appear next otherwise it throws an error:

def expect(char: Char): Iteratee[Char, Unit] = takeOne.flatMap {
  case Some(c) if c == char => Done(Unit)
  case Some(c) => Error("Expected " + char + " but got " + c, Input.El(c))
  case None => Error("Premature end of input, expected: " + char, Input.EOF)
}

Notice the use of flatMap here. If you haven't come across it before, in the asynchronous world, flatMap basically means "and then". It applies a function to the result of the iteratee, and returns a new iteratee. In our case we're using it to convert the result to either a done iteratee, or an error iteratee, depending on whether the result is what we expected. flatMap is one of the fundamental mechanisms that we'll be using to compose our iteratees together.

Now with our building blocks, we are ready to start building our CSV parser. The first part of it that we'll write is an unquoted value parser. This is very simple, we just want to take all characters that aren't a comma or new line, with one catch. We want the result to be a String, not a Seq[Char] like takeWhile produces. Let's see how we do that:

def unquoted = takeWhile(c => c != ',' && c != '\n').map(v => v.mkString.trim)

As you can see, we've used the map function to transform the end result from a sequence of characters into a String. This is another key method on iteratees that you will find useful.

Our next task is to parse a quoted value. Let's start with an implementation that doesn't take into account escaped quotes. To parse a quoted value, we need to expect a quote, and then we need to take any value that is not a quote, and then we need to expect a quote. Notice that during that sentence I said "and then" 2 times? What method can we use to do an "and then"? That's right, the flatMap method that I talked about before. Let's see what our quoted value parser looks like:

def quoted = expect('"')
  .flatMap(_ => takeWhile(_ != '"'))
  .flatMap(value => expect('"')
    .map(_ => value.mkString))

So now you can probably start to see the usefulness of flatMap. In fact it is so useful, not just for iteratees, but many other things, that Scala has a special syntax for it, called for comprehensions. Let's rewrite the above iteratee using that:

def quoted = for {
  _     <- expect('"')
  value <- takeWhile(_ != '"')
  _     <- expect('"')
} yield value.mkString

Now at this point I hope you are getting excited. What does the above code look like? It looks like ordinary imperative synchronous code. Read this value, then read this value, then read this value. Except it's not synchronous, and it's not imperative. It's functional and asynchronous. We've taken our building blocks, and composed them into a piece of very readable code that makes it completely clear exactly what we are doing.

Now in case you're not 100% sure about the above syntax, the values to the left of the <- signs are the results of the iteratees to the right. These are able to be used anywhere in any subsequent lines, including in the end yield statement. Underscores are used to say we're not interested in the value, we're using this for the expect iteratee since that just returns Unit anyway. The statement after the yield is a map function, which gives us the opportunity to take all the intermediate values and turn them into a single result.

So now that we understand that, let's rewrite our quoted iteratee to support escaped quotes. After reading our quote, we want to peek at the next character. If it's a quote, then we want to append the value we just read, plus a quote to our cumulated value, and recursively invoke the quoted iteratee again. Otherwise, we've reached the end of the value.

def quoted(value: Seq[Char] = IndexedSeq[Char]()): Iteratee[Char, String] = for {
  _          <- expect('"')
  maybeValue <- takeWhile(_ != '"')
  _          <- expect('"')
  nextChar   <- peek
  value      <- nextChar match {
    case Some('"') => quoted(value ++ maybeValue :+ '"')
    case _ => Done[Char, String]((value ++ maybeValue).mkString)
  }
} yield value

Now we need to write an iteratee that can parse either a quoted or unquoted value. We choose which one by peeking at the first character, and then accordingly returning the right iteratee.

def value = for {
  char  <- peek
  value <- char match {
    case Some('"') => quoted()
    case None => Error[Char]("Premature end of input, expected a value", Input.EOF)
    case _ => unquoted
  }
} yield value

Let's now parse an entire line, reading until the end of line character.

def values(state: Seq[String] = IndexedSeq[String]()): Iteratee[Char, Seq[String]] = for {
  _        <- dropSpaces
  value    <- value
  _        <- dropSpaces
  nextChar <- takeOne
  values   <- nextChar match {
    case Some('\n') | None => Done[Char, Seq[String]](state :+ value)
    case Some(',') => values(state :+ value)
    case Some(other) => Error("Expected comma, newline or EOF, but found " + other, Input.El(other))
  }
} yield values

Enumeratees

Now, in a similar way to how we parse the values, we could also parse each line of a CSV file until we reach EOF. But this time we're going to do something a little different. We've seen how we can sequence iteratees using flatMap, but there are further possibilities for composing iteratees. Another concept in iteratees is enumeratees. Enumeratees adapt a stream to be consumed by an iteratee. The simplest enumeratees simply map the input values of the stream to be something else. So, for example, here's an enumeratee that converts a stream of strings to a stream of ints:

def toInt: Enumeratee[String,Int] = Enumeratee.map[String](_.toInt)

One of the methods on Enumeratee is transform. We can use this method to apply an enumeratee to an iteratee:

val someIteratee: Iteratee[Int, X] = ...
val adaptedIteratee: Iteratee[String, X] = toInt.transform(someIteratee)

This method is also aliased to an operator, &>>, and so this code below is equivalent to the code above:

val adaptedIteratee: Iteratee[String, X] = toInt &>> someIteratee

We can also make an enumeratee out of another iteratee, and this is exactly what we're going to do with our values iteratee. The Enumeratee.grouped method takes an iteratee and applies it to the stream over and over, the result of each application being an input to feed into the the iteratee that will be transformed. Let's have a look:

def csv = Enumeratee.grouped(values())

Now let's get a little bit more creative with enumeratees. Let's say that our CSV file is very big, so we don't want to load it into memory. Each line is a series of 3 integer columns, and we want to sum each column. So, let's define an enumeratee that converts each set of values to integers:

def toInts = Enumeratee.map[Seq[String]](_.map(_.toInt))

And another enumeratee to convert the sequence to a 3-tuple:

def toThreeTuple = Enumeratee.map[Seq[Int]](s => (s(0), s(1), s(2)))

And finally an iteratee to sum the them:

def sumThreeTuple(a: Int = 0, b: Int = 0, c: Int = 0): Iteratee[(Int, Int, Int), (Int, Int, Int)] = Cont {
  case Input.El((x, y, z)) => sumThreeTuple(a + x, b + y, c + z)
  case Input.Empty => sumThreeTuple(a, b, c)
  case in @ Input.EOF => Done((a, b, c), in)
}

Now to put them all together. There is another method on enumeratee called compose, which, you guessed it, let's you compose enumeratees. This has an alias operator, ><>. Let's use it:

val processCsvFile = csv ><> toInts ><> toThreeTuple &>> sumThreeTuple()

Enumerators

Finally, if an iteratee consumes a stream, what produces a stream? The answer is an enumerator. An enumerator can be applied to an iteratee using its apply method, which is also aliased to >>>. This will leave the iteratee in a cont state, ready to receive more input. If however the enumerator contains the entirety of the stream, then the run method can be used instead which will send the iteratee an EOF once it's finished. This is aliased to |>>>.

The Play enumerator API makes it easy to create an enumerator by passing a sequence of inputs to the Enumerator companion objects apply method. So, we can create an enumerator of characters using the following code:

val csvFile = Enumerator(
  """1,2,3
    |4,5,6""".stripMargin.toCharArray:_*)

And we can feed this into our iteratee like so:

val result = csvFile |>>> processCsvFile

And our result in this case will be a future that is eventually redeemed with (5, 7, 9).

Conclusion

Well, it's been a long journey, but hopefully if you're an imperative programmer, you not only understand iteratees, you understand the reasoning behind their design, and how easily they compose. I also hope you have a better understanding of both functional and asynchronous programming in general. The functional mindset is quite different to the imperative mindset, and I'm still getting my head around it, but particularly after seeing how nice and simple iteratees can be to work with (once you understand them), I'm becoming convinced that functional programming is the way to go.

If you are interested in downloading the code from this blog post, or if you want to see a more complex JSON parsing iteratee/enumeratee, checkout this GitHub project, which has a few examples, including parsing byte/character streams in array chunks, rather than one at a time.

Scaling Scala vs Java

In my previous post I showed how it makes no sense to benchmark Scala against Java, and concluded by saying that when it comes to performance, the question you should be asking is "How will Scala help me when my servers are falling over from unanticipated load?" In this post I will seek to answer that, and show that indeed Scala is a far better language for building scalable systems than Java.

However, don't expect our journey to get there to be easy. For a start, while it's very easy to do micro benchmarks, trying to show how real world apps do or don't handle the loads that are put on them is very hard, because it's very hard to create an app that's small enough to demo and explain in a single blog post that is at the same time big enough to actually show how real world apps behave under load, and it's also very hard to simulate real world loads. So I am going to take one small aspect of something that might go wrong in the real world, and show just one way in which Scala will help you, where Java won't. Then I will explain that this is just the tip of the iceberg, there are far more situations, and far more features of Scala that will help you in the real world.

An online store

For this exercise I have implemented an online store. The architecture of this store is in the diagram below:

As you can see, there is a payment service and a search service that the store talks to, and the store handles three types of requests, one for the index page that doesn't require going to any other services, one for making payments that uses the payments service, and another for searching the stores product list which uses the search service. The online store is the part of the system that I am going to be benchmarking, I will implement one version in Java, and another in Scala, and compare them. The search and payment services won't change. Their actual implementations will be simple JSON APIs that return hard coded values, but they will each simulate a processing time of 20ms.

For the Java implementation of the store, I am going to keep it as simple as possible, using straight servlets to handle requests, Apache Commons HTTP client for making requests, and Jackson for JSON parsing and formatting. I will deploy the application to Tomcat, and configure Tomcat with the NIO connector, using the default connection limit of 10000 and thread pool size of 200.

For the Scala implementation I will use Play Framework 2.1, using the Play WS API which is backed by the Ning HTTP client to make requests, and the Play JSON API which is backed by Jackson to handle JSON parsing and formatting. Play Framework is built using Netty which has no connection limit, and uses Akka for thread pooling, and I have it configured to use the default thread pool size, which is one thread per CPU, and my machine has 4.

The benchmark I will be performing will be using JMeter. For each request type (index, payments and search) I will have 300 threads spinning in a loop making requests with a random 500-1500ms pause in between each request. This gives an average maximum throughput of 300 requests per second per request type, or 900 requests per second all up.

So, let's have a look at the result of the Java benchmark:

On this graph I have plotted 3 metrics per request type. The median is the median request time. For the index page, this is next to nothing, for the search and payments requests, this is about 77ms. I have also plotted the 90% line, which is a common metric in web applications, it shows what 90% of the requests were under, and so gives a good idea of what the slow requests are like. This shows again almost nothing for the index page, and 116ms for the search and payments requests. The final metric is the throughput, which shows number of requests per second that were handled. We are not too far off the theoretical maximum, with the index showing 290 requests per second, and the search and payments requests coming through at about 270 requests per second. These results are good, our Java service handles the load we are throwing at it without a sweat.

Now let's take a look at the Scala benchmark:

As you can see, it's identical to the Java results. This is not surprising, since both the Java and the Scala implementations of the online store are doing absolutely minimal work code wise, most of the processing time is going in to making requests on the remote services.

Something goes wrong

So, we've seen two happy implementations of the same thing in Scala and Java, shrugging off the load I give them. But what happens when things aren't so fine and dandy? What happens if one of the services that they are talking to goes down? Let's say the search service starts taking 30 seconds to respond, after which point it returns an error. This is not an unusual failure situation, particularly if you're load balancing through a proxy, the proxy tries to connect to the service, and fails after 30 seconds, giving you a gateway error. Let's see how our applications handle the load I throw at them now. We would expect the search request to take at least 30 seconds to respond, but what about the others? Here's the Java results:

Well, we no longer have a happy app at all. The search requests are naturally taking a long time, but the payments service is now taking an average of 9 seconds to respond, the 90% line is at 20 seconds. Not only that, but the index page is similarly impacted - users are not going to be waiting that long if they've browsed into your site for the home page to show up. And the throughput of each has gone down to 30 requests per second. This is not good, because your search service went down, your whole site is now practically unusable, and you will soon start losing customers and money.

So how does our Scala app fair? Let's find out:

Now before I say anything else, let me point out that I've bounded the response time to 160ms - the search requests are actually taking about 30 seconds to respond, but on the graph, with 30 seconds next to the other values, they hardly register a line a pixel high. So what we can see here is that while search is unusable, our payments and index request response times and throughput are unchanged. Obviously, customers aren't going to be happy with not being able to do searches, but at least they can still use other parts of your site, see your home page with specials, and even still make payments for items. And hey, Google isn't down, they can always use Google to search your site. So you might lose some business, but the impact is limited.

So, in this benchmark, we can see that Scala wins hands down. When things start to go wrong, a Scala application will take it in it's stride, giving you the best it can, while a Java application will likely just fall over.

But I can do that in Java

Now starts the bit where I counter the many anticipated criticisms that people will make of this benchmark. And the first, and most obvious one, is that in my Scala solution I used asynchronous IO, whereas in my Java solution I didn't, so they can't be compared. It is true, I could have implemented an asynchronous solution in Java, and in that case the Java results would have been identical to the Scala results. However, while I could have done that, Java developers don't do that. It's not that they can't, it's that they don't. I have written a lot of webapps in Java that make calls to other systems, and very rarely, and only in very special circumstances, have I ever used asynchronous IO. And let me show you why.

Let's say you have to do a series of calls on a series of remote services, each one depending on data returned from the previous. Here's a good old fashioned synchronous solution in Java:

User user = getUserById(id);
List<Order> orders = getOrdersForUser(user.email);
List<Product> products = getProductsForOrders(orders);
List<Stock> stock = getStockForProducts(products);

The above code is simple, easy to read, and feels completely natural for a Java developer to write. For completeness, let's have a look at the same thing in Scala:

val user = getUserById(id)
val orders = getOrdersForUser(user.email)
val products = getProductsForOrders(orders)
val stock = getStockForProducts(products)

Now, let's have a look at the same code, but this time assuming we are making asynchronous calls and returning the results in promises. What does it look like in Java?

Promise<User> user = getUserById(id);
Promise<List<Order>> orders = user.flatMap(new Function<User, List<Order>>() {
  public Promise<List<Order>> apply(User user) {
    return getOrdersForUser(user.email);
  }
}
Promise<List<Product>> products = orders.flatMap(new Function<List<Order>, List<Product>>() {
  public Promise<List<Product>> apply(List<Order> orders) {
    return getProductsForOrders(orders);
  }
}
Promise<List<Stock>> stock = products.flatMap(new Function<List<Product>, List<Stock>>() {
  public Promise<List<Stock>> apply(List<Product> products) {
    return getStockForProducts(products);
  }
}

So firstly, the above code is not readable, in fact it's much harder to follow, there is a massively high noise level to actual code that does stuff, and hence it's very easy to make mistakes and miss things. Secondly, it's tedious to write, no developer wants to write code that looks like that, I hate doing it. Any developer that wants to write their whole app like that is insane. And finally, it just doesn't feel natural, it's not the way you do things in Java, it's not idiomatic, it doesn't play well with the rest of the Java ecosystem, third party libraries don't integrate well with this style. As I said before, Java developers can write code that does this, but they don't, and as you can see, they don't for good reason.

So let's take a look at the asynchronous solution in Scala:

for {
  user <- getUserById(id)
  orders <- getOrdersForUser(user.email)
  products <- getProductsForOrders(orders)
  stock <- getStockForProducts(products)
} yield stock

In contrast to the Java asynchronous solution, this solution is completely readable, just as readable as the Scala and Java synchronous solutions. And this isn't just some weird Scala feature that most Scala developers never touch, this is how a typical Scala developer writes code every day. Scala libraries are designed to work using these idioms, it feels natural, the language is working with you. It's fun to write code like this in Scala!

This post is not about how with one language you can write a highly tuned app for performance that's faster than the same app written in another language highly tuned for performance. This post is about how Scala helps you write applications that are scalable by default, using natural, readable and idiomatic code. Just like a ball in lawn bowls has a bias, Scala has a bias to helping you write scalable applications, where Java makes you swim upstream.

But scaling means so much more than that

The example I've provided of Scala scaling well where Java doesn't is a very specific example, but then what situation where your app is failing under high load isn't? Let me give a few other examples of where Scala's much nicer asynchronous IO support helps you to write scalable code:

  • Using Akka, you can easily define actors for different types of requests, and allocate them different resource limits. So if certain parts of your single application start struggling or receiving unanticipated load, those parts may stop responding, but the rest of your app can stay healthy.
  • Scala, Play and Akka make handling single requests using multiple threads running in parallel doing different operations incredibly simple, allowing you to have requests that do a lot in very little time. Klout wrote an excellent article about how they did just that in their API.
  • Because asynchronous IO is so simple, offloading processing onto other machines can be safely done without tying up threads on the first machine.

Java 8 will make asynchronous IO simple in Java

Java 8 is probably going to include support for closures of some sort, which is great news for the Java world, especially if you want to do asynchronous IO. However, the syntax still won't be anywhere near is readable as the Scala code I showed above. And when will Java 8 be released? Java 7 was released last year, and it took 5 years to release that. Java 8 is scheduled for summer 2013, but even if it arrives on schedule, how long will it take for the ecosystem to catch up? And how long will it take for Java developers to switch from a synchronous to an asynchronous mindset? In my opinion, Java 8 is too little too late.

So this is all about asynchronous IO?

So far all I've talked about and shown is how easy Scala makes asynchronous IO, and how that helps you scale. But it doesn't stop there. Let me pick another feature of Scala, immutability.

When you start using multiple threads to process single requests, you start sharing state between those threads. And this is where things get very messy, because the world of shared state in a computer system is a crazy world where impossible things happen. It's a world of deadlocks, a world of updating memory in one thread, but another thread not seeing that change, a world of race conditions, and a world of performance bottle necks because you over eagerly marked some methods as synchronized.

However, it's not that bad, because there is a very simple solution, make all your state immutable. If all your state is immutable, then none of the above problems can happen. And this is again where Scala helps you big time, because in Scala, things are immutable by default. The collection APIs are immutable, you have to explicitly ask for a mutable collection in order to get mutable collections.

Now in Java, you can make things immutable. There are some libraries that help you (albeit clumsily) to work with immutable collections. But it's so easy to accidentally forget to make something mutable. The Java API and language itself don't make working with immutable structures easy, and if you're using a third party library, it's highly likely that it's not using immutable structures, and often requires you to use mutable structures, for example, JPA requires this.

Let's have a look at some code. Here is an immutable class in Scala:

case class User(id: Long, name: String, email: String)

That structure is immutable. Moreover, it automatically generates accessors for the properties. Let's look at the corresponding Java:

public class User {
  private final long id;
  private final String name;
  private final String email;

  public User(long id, String name, String email) {
    this.id = id;
    this.name = name;
    this.email = email;
  }

  public long getId() {
    return id;
  }

  public String getName() {
    return name;
  }

  public String getEmail() {
    return email
  }
}

That's an enormous amount of code! And what if I add a new property? I have to add a new parameter to my constructor which will break existing code, or I have to define a second constructor. In Scala I can just do this:

case class User(id: Long, name: String, email: String, company: Option[Company] = None)

All my existing code that calls that constructor will still work. And what about when this object grows to have 10 items in the constructor, constructing it becomes a nightmare! A solution to this in Java is to use the builder pattern, which more than doubles the amount of code you have to write for the object. In Scala, you can name the parameters, so it's easy to see which parameter is which, and they don't have to be in the right order. But maybe I might want to just modify one property. This can be done in Scala like this:

case class User(id: Long, name: String, email: String, company: Option[Company] = None) {
  def copy(id: Long = id, name: String = name, email: String = email, company: Option[Company] = company) = User(id, name, email, company)
}

val james = User(1, "James", "james@jazzy.id.au")
val jamesWithCompany = james.copy(company = Some(Company("Typesafe")))

The above code is natural, it's simple, it's readable, it's how Scala developers write code every day, and it's immutable. It is aptly suited to concurrent code, and allows you to safely write systems that scale. The same can be done in Java, but it's tedious, and not at all a joy to write. I am a big advocate of immutable code in Java, and I have written many immutable classes in Java, and it hurts, but it's the lesser of two hurts. In Scala, it takes more code to use mutable objects than to use immutable. Again, Scala is biased towards helping you scale.

Conclusion

I cannot possibly go into all the ways in which Scala helps you scale where Java doesn't. But I hope I have given you a taste of why Scala is on your side when it comes to writing Scalable systems. I've shown some concrete metrics, I've compared Java and Scala solutions for writing scalable code, and I've shown, not that Scala systems will always scale better than Java systems, but rather that Scala is the language that is on your side when writing scalable systems. It is biased towards scaling, it encourages practices that help you scale. Java, in contrast, makes it difficult for you to implement these practices, it works against you.

If you're interested in my code for the online store, you can find it in this GitHub repository. The numbers from my performance test can be found in this spreadsheet.

Passing common state to templates in Play Framework

This question comes up very frequently on the Play mailing list, so I thought I'd document a quick example of how to pass common state to shared templates in Play Framework when using Scala. The use case is typically that you have a common header, but certain parts of it are dynamic, requiring data to be loaded from the database.

First of all, since the state required for rendering a header can logically be grouped into one object, a header object, let's do that. This will mean we can easily add new types of data to the header without changing any of our existing code. So I'm going to define a header that contains a list of menu items, and the current user:

case class Header(menu: Seq[MenuItem], user: Option[String])
case class MenuItem(url: String, name: String)

Now the template that uses this is my main template, so I'm going to add my Header class to it as an implicit parameter. Implicit parameters are parameters that don't need to be explicitly passed when you invoke a method, instead, if an implicit value exists in the scope of the invocation, that will be used:

@(title: String)(content: Html)(implicit header: Header)

<html>
    <head>
        ...
    </head>
    <body>
        <div id="header">
            @header.user.map { user =>
                <div>User: @user</div>
            }
            <ul>
            @for(item <- header.menu) {
                <li><a href="@item.url">@item.name</a></li>
            }
            </ul>
        </div>
        @content
    </body>
</html>

Now to pass this implicit parameter, all I need to do is declare that each of my templates that use the main template also accept an implicit header parameter. So for example, in my index template:

@(message: String)(implicit header: Header)

@main("Welcome to Play 2.0") {
    @play20.welcome(message)
}

Now comes the magic, supplying this parameter. I will write an implicit method that generates it. When a parameterless method (implicit parameters don't count as parameters in this case) is declared as implicit, this allows it to be used to supply a value for an implicit parameter. Here's the my method:

trait ProvidesHeader {

  implicit def header[A](implicit request: Request[A]) : Header = {
    val menu = Seq(MenuItem("/home", "Home"),
      MenuItem("/about", "About"),
      MenuItem("/contact", "Contact"))
    val user = request.session.get("user")
    Header(menu, user)
  }
}

For now I've just hard coded the menu items, but you get the point. Since this method is implicit, and it returns something of type Header, then it can be used to supply the implicit Header parameter that our index template needs. You can also see that this method itself accepts an implicit parameter, the request. If your method doesn't need the request, then you can remove that, however make sure that you remove the parenthesis from the method, it will only work with parameterless methods, not zero argument methods.

So what now do I need to do to my actions? Almost nothing, I just need to make sure that my implicit header method is in scope, and that they declare the request they accept as an implicit parameter, so for example:

object Application extends Controller with ProvidesHeader {

  def index = Action { implicit request =>
    Ok(views.html.index("Your new application is ready."))
  }
}

As you can see I've simply declared that my controller extends the ProvidesHeader trait. My action code itself is left completely uncluttered, it doesn't need to know whether the templates it renders require a header, that's automatically provided, and in fact more implicit parameters could be added to my templates, and my action code still doesn't have to be aware of it.

A note for Java Play apps

Unfortunately this doesn't work so nicely for Java Play apps, since although you can still use implicit parameter passing in the templates, this needs to be explicitly handled by your actions. As an alternative to implicit parameter passing, Play offers the args map for storing arbitrary per request data on the Http.Context class. This can be populated through action composition, or however you want, and then accessed in your templates through the Http.Context.current thread local.

Benchmarking Scala against Java

A question recently came up at work about benchmarks between Java and Scala. Maybe you came across my blog post because you too are wanting to know which is faster, Java or Scala. Well I'm sorry to say this, but if that is you, you are asking the wrong question. In this post, I will show you that Scala is faster than Java. After that, I will show you why the question was the wrong question and why my results should be ignored. Then I will explain what question you should have asked.

The benchmark

Today we are going to choose a very simple algorithm to benchmark, the quick sort algorithm. I will provide implementations both in Scala and Java. Then with each I will sort a list of 100000 elements 100 times, and see how long each implementations takes to sort it. So let's start off with Java:

    public static void quickSort(int[] array, int left, int right) {
        if (right <= left) {
            return;
        }
        int pivot = array[right];
        int p = left;
        int i = left;
        while (i < right) {
            if (array[i] < pivot) {
                if (p != i) {
                    int tmp = array[p];
                    array[p] = array[i];
                    array[i] = tmp;
                }
                p += 1;
            }
            i += 1;
        }
        array[right] = array[p];
        array[p] = pivot;
        quickSort(array, left, p - 1);
        quickSort(array, p + 1, right);
    }

Timing this, sorting a list of 100000 elements 100 times on my 2012 MacBook Pro with Retina Display, it takes 852ms. Now the Scala implementation:

  def sortArray(array: Array[Int], left: Int, right: Int) {
    if (right <= left) {
      return
    }
    val pivot = array(right)
    var p = left
    var i = left
    while (i < right) {
      if (array(i) < pivot) {
        if (p != i) {
          val tmp = array(p)
          array(p) = array(i)
          array(i) = tmp
        }
        p += 1
      }
      i += 1
    }
    array(right) = array(p)
    array(p) = pivot
    sortArray(array, left, p - 1)
    sortArray(array, p + 1, right)
  }

It looks very similar to the Java implementation, slightly different syntax, but in general, the same. And the time for the same benchmark? 695ms. No benchmark is complete without a graph, so let's see what that looks like visually:

So there you have it. Scala is about 20% faster than Java. QED and all that.

The wrong question

However this is not the full story. No micro benchmark ever is. So let's start off with answering the question of why Scala is faster than Java in this case. Now Scala and Java both run on the JVM. Their source code both compiles to bytecode, and from the JVMs perspective, it doesn't know if one is Scala or one is Java, it's just all bytecode to the JVM. If we look at the bytecode of the compiled Scala and Java code above, we'll notice one key thing, in the Java code, there are two recursive invocations of the quickSort routine, while in Scala, there is only one. Why is this? The Scala compiler supports an optimisation called tail call recursion, where if the last statement in a method is a recursive call, it can get rid of that call and replace it with an iterative solution. So that's why the Scala code is so much quicker than the Java code, it's this tail call recursion optimisation. You can turn this optimisation off when compiling Scala code, when I do that it now takes 827ms, still a little bit faster but not much. I don't know why Scala is still faster without tail call recursion.

This brings me to my next point, apart from a couple of extra niche optimisations like this, Scala and Java both compile to bytecode, and hence have near identical performance characteristics for comparable code. In fact, when writing Scala code, you tend to use a lot of exactly the same libraries between Java and Scala, because to the JVM it's all just bytecode. This is why benchmarking Scala against Java is the wrong question.

But this still isn't the full picture. My implementation of quick sort in Scala was not what we'd call idiomatic Scala code. It's implemented in an imperative fashion, very performance focussed - which it should be, being code that is used for a performance benchmark. But it's not written in a style that a Scala developer would write day to day. Here is an implementation of quick sort that is in that idiomatic Scala style:

  def sortList(list: List[Int]): List[Int] = list match {
    case Nil => Nil
    case head :: tail => sortList(tail.filter(_ < head)) ::: head :: sortList(tail.filter(_ >= head))
  }

If you're not familiar with Scala, this code may seem overwhelming at first, but trust me, after a few weeks of learning the language, you would be completely comfortable reading this, and would find it far clearer and easier to maintain than the previous solution. So how does this code perform? Well the answer is terribly, it takes 13951ms, 20 times longer than the other Scala code. Obligatory chart:

So am I saying that when you write Scala in the "normal" way, your codes performance will always be terrible? Well, that's not quite how Scala developers write code all the time, they aren't dumb, they know the performance consequences of their code.

The key thing to remember is that most problems that developers solve are not quick sort, they are not computation heavy problems. A typical web application for example is concerned with moving data around, not doing complex algorithms. The amount of computation that a piece of Java code that a web developer might write to process a web request might take 1 microsecond out of the entire request to run - that is, one millionth of a second. If the equivalent Scala code takes 20 microseconds, that's still only one fifty thousandth of a second. The whole request might take 20 milliseconds to process, including going to the database a few times. Using idiomatic Scala code would therefore increase the response time by 0.1%, which is practically nothing.

So, Scala developers, when they write code, will write it in the idiomatic way. As you can see above, the idiomatic way is clear and concise. It's easy to maintain, much easier than Java. However, when they come across a problem that they know is computationally expensive, they will revert to writing in a style that is more like Java. This way, they have the best of both worlds, with the easy to maintain idiomatic Scala code for the most of their code base, and the well performaning Java like code where the performance matters.

The right question

So what question should you be asking, when comparing Scala to Java in the area of performance? The answer is in Scala's name. Scala was built to be a "Scalable language". As we've already seen, this scalability does not come in micro benchmarks. So where does it come? This is going to be the topic of a future blog post I write, where I will show some closer to real world benchmarks of a Scala web application versus a Java web application, but to give you an idea, the answer comes in how the Scala syntax and libraries provided by the Scala ecosystem is aptly suited for the paradigms of programming that are required to write scalable fault tolerant systems. The exact equivalent bytecode could be implemented in Java, but it would be a monstrous nightmare of impossible to follow anonymous inner classes, with a constant fear of accidentally mutating the wrong shared state, and a good dose of race conditions and memory visibility issues.

To put it more concisely, the question you should be asking is "How will Scala help me when my servers are falling over from unanticipated load?" This is a real world question that I'm sure any IT professional with any sort of real world experience would love an answer to. Stay tuned for my next blog post.

Why does Chrome consider some ports unsafe?

Today in my Twitter feed I noticed a frustrated tweet from Dan North complaining about why Chrome seems to arbitrarily block connections to some ports, giving the confusing error code "net::ERR_UNSAFE_PORT". I've encountered this before, and have also been similarly frustrated, so I decided to do a little research, and work out what Google means by unsafe. There are a few discussions in stack overflow and forums about exactly which ports are blocked, but I didn't find much about why they are considered unsafe.

My first assumption was that somehow it was unsafe for Chrome itself to connect to services on these ports, because maybe that services protocol might confuse Chrome into doing something wrong. However, this didn't make sense, and I then worked out that it's not about protecting Chrome, but protecting the services themselves running on these ports. Many internet protocols are designed to be very permissive as to what they accept, and this presents an interesting attack opportunity for attackers.

As web developers with a mind for security are well aware, the browser is incredibly obliging to attackers when it comes to making requests on servers on your behalf. XSRF is a perfect example of this, but the good news is, we know how to protect against XSRF. However, what if an attacker tricked a web browser into making a connection to something that doesn't speak HTTP? Let's take, for example, an SMTP server. If you've never seen an SMTP session before, here is an example session that I've ripped from wikipedia:

S: 220 smtp.example.com ESMTP Postfix
C: HELO relay.example.org
S: 250 Hello relay.example.org, I am glad to meet you
C: MAIL FROM:
S: 250 Ok
C: RCPT TO:
S: 250 Ok
C: RCPT TO:
S: 250 Ok
C: DATA
S: 354 End data with .
C: From: "Bob Example" 
C: To: "Alice Example" 
C: Cc: theboss@example.com
C: Date: Tue, 15 January 2008 16:02:43 -0500
C: Subject: Test message
C:
C: Hello Alice.
C: This is a test message with 5 header fields and 4 lines in the message body.
C: Your friend,
C: Bob
C: .
S: 250 Ok: queued as 12345
C: QUIT
S: 221 Bye

On first glance, this may look nothing like HTTP. Except for one important feature, like HTTP, SMTP separates elements of its protocol with newlines. Furthermore, what happens when we send data to an SMTP server that it doesn't understand? Well, here's a simple HTTP GET request with an example SMTP server:

C: GET / HTTP/1.1
S: 500 5.5.1 Command unrecognized: "GET / HTTP/1.1"
C: Host: www.example.com
S: 500 5.5.1 Command unrecognized: "Host: www.example.com"
C: 
S: 500 5.5.1 Command unrecognized: ""

So, obviously our mail server is confused, but an important thing to notice here is that it's not outright rejecting our connection, it's telling the client that there is an error, but continuing to accept commands. So can we exploit this? The answer is a resounding yes. What if I, an attacker, were to craft a page that had a form, that was automatically submitted (using a javascript on load event that invoked the submit method - similar to a form based XSRF attack) to an SMTP server. In order to speak SMTP with this server, I would need to be able to include new lines in my messages. This is not possible using application/x-www-form-urlencoded, since newlines are URL encoded, however using multipart/form-data, I can create a field that has the client side messages in the above SMTP conversation, and so using my victims web browser, I can submit an email to the target SMTP server. The SMTP server would ignore all the HTTP protocol as in the above example, including the method and headers, as well as the multipart/form-data header content, but once it got my field, which would have my HELO, MAIL FROM, RCPT TO etc commands in it, it would start processing them.

But of course, an attacker could just connect directly to the SMTP server, right? For some SMTP servers yes. However, many SMTP servers are protected with nothing more than a firewall, and especially if the SMTP server is configured to use this as a security mechanism for verifying the authenticity of the source of the email, it may be desirable for an attacker to be able to send email through such an SMTP server. So what this means is an attacker could use a browser running behind a firewall that was protecting an SMTP server as a proxy to connecting to that SMTP server, and so send messages using it.

As it turns out, some SMTP servers actually detect HTTP requests made on them and immediately close the connection (my servers SMTP server did this when I tried with it). However, not all SMTP servers do this, because it wasn't envisioned that a piece of client server, such as a web browser, could be used as an open proxy to a protected network. And it's certainly not a good idea to assume that other types of services/servers, such as FTP servers and even IRC servers, will do this too.

So, why does Chrome refuse to connect to some ports? Because the Google engineers has gone through the list of well known ports, and worked out how tolerant the protocols that use these ports are to being sent HTTP requests, and if they are tolerant, they've marked it as unsafe and so blocked it, to prevent Google Chrome from being an open proxy to a secured network. Should all web browsers do this? Probably.

About

Hi! My name is James Roper, and I am a software developer with a particular interest in open source development and trying new things. I program in Scala, Java, Go, PHP, Python and Javascript, and I work for Lightbend as the architect of Kalix. I also have a full life outside the world of IT, enjoy playing a variety of musical instruments and sports, and currently I live in Canberra.