all that jazz

james' blog about scala and all that jazz

Passing common state to templates in Play Framework

This question comes up very frequently on the Play mailing list, so I thought I'd document a quick example of how to pass common state to shared templates in Play Framework when using Scala. The use case is typically that you have a common header, but certain parts of it are dynamic, requiring data to be loaded from the database.

First of all, since the state required for rendering a header can logically be grouped into one object, a header object, let's do that. This will mean we can easily add new types of data to the header without changing any of our existing code. So I'm going to define a header that contains a list of menu items, and the current user:

case class Header(menu: Seq[MenuItem], user: Option[String])
case class MenuItem(url: String, name: String)

Now the template that uses this is my main template, so I'm going to add my Header class to it as an implicit parameter. Implicit parameters are parameters that don't need to be explicitly passed when you invoke a method, instead, if an implicit value exists in the scope of the invocation, that will be used:

@(title: String)(content: Html)(implicit header: Header)

<html>
    <head>
        ...
    </head>
    <body>
        <div id="header">
            @header.user.map { user =>
                <div>User: @user</div>
            }
            <ul>
            @for(item <- header.menu) {
                <li><a href="@item.url">@item.name</a></li>
            }
            </ul>
        </div>
        @content
    </body>
</html>

Now to pass this implicit parameter, all I need to do is declare that each of my templates that use the main template also accept an implicit header parameter. So for example, in my index template:

@(message: String)(implicit header: Header)

@main("Welcome to Play 2.0") {
    @play20.welcome(message)
}

Now comes the magic, supplying this parameter. I will write an implicit method that generates it. When a parameterless method (implicit parameters don't count as parameters in this case) is declared as implicit, this allows it to be used to supply a value for an implicit parameter. Here's the my method:

trait ProvidesHeader {

  implicit def header[A](implicit request: Request[A]) : Header = {
    val menu = Seq(MenuItem("/home", "Home"),
      MenuItem("/about", "About"),
      MenuItem("/contact", "Contact"))
    val user = request.session.get("user")
    Header(menu, user)
  }
}

For now I've just hard coded the menu items, but you get the point. Since this method is implicit, and it returns something of type Header, then it can be used to supply the implicit Header parameter that our index template needs. You can also see that this method itself accepts an implicit parameter, the request. If your method doesn't need the request, then you can remove that, however make sure that you remove the parenthesis from the method, it will only work with parameterless methods, not zero argument methods.

So what now do I need to do to my actions? Almost nothing, I just need to make sure that my implicit header method is in scope, and that they declare the request they accept as an implicit parameter, so for example:

object Application extends Controller with ProvidesHeader {

  def index = Action { implicit request =>
    Ok(views.html.index("Your new application is ready."))
  }
}

As you can see I've simply declared that my controller extends the ProvidesHeader trait. My action code itself is left completely uncluttered, it doesn't need to know whether the templates it renders require a header, that's automatically provided, and in fact more implicit parameters could be added to my templates, and my action code still doesn't have to be aware of it.

A note for Java Play apps

Unfortunately this doesn't work so nicely for Java Play apps, since although you can still use implicit parameter passing in the templates, this needs to be explicitly handled by your actions. As an alternative to implicit parameter passing, Play offers the args map for storing arbitrary per request data on the Http.Context class. This can be populated through action composition, or however you want, and then accessed in your templates through the Http.Context.current thread local.

Benchmarking Scala against Java

A question recently came up at work about benchmarks between Java and Scala. Maybe you came across my blog post because you too are wanting to know which is faster, Java or Scala. Well I'm sorry to say this, but if that is you, you are asking the wrong question. In this post, I will show you that Scala is faster than Java. After that, I will show you why the question was the wrong question and why my results should be ignored. Then I will explain what question you should have asked.

The benchmark

Today we are going to choose a very simple algorithm to benchmark, the quick sort algorithm. I will provide implementations both in Scala and Java. Then with each I will sort a list of 100000 elements 100 times, and see how long each implementations takes to sort it. So let's start off with Java:

    public static void quickSort(int[] array, int left, int right) {
        if (right <= left) {
            return;
        }
        int pivot = array[right];
        int p = left;
        int i = left;
        while (i < right) {
            if (array[i] < pivot) {
                if (p != i) {
                    int tmp = array[p];
                    array[p] = array[i];
                    array[i] = tmp;
                }
                p += 1;
            }
            i += 1;
        }
        array[right] = array[p];
        array[p] = pivot;
        quickSort(array, left, p - 1);
        quickSort(array, p + 1, right);
    }

Timing this, sorting a list of 100000 elements 100 times on my 2012 MacBook Pro with Retina Display, it takes 852ms. Now the Scala implementation:

  def sortArray(array: Array[Int], left: Int, right: Int) {
    if (right <= left) {
      return
    }
    val pivot = array(right)
    var p = left
    var i = left
    while (i < right) {
      if (array(i) < pivot) {
        if (p != i) {
          val tmp = array(p)
          array(p) = array(i)
          array(i) = tmp
        }
        p += 1
      }
      i += 1
    }
    array(right) = array(p)
    array(p) = pivot
    sortArray(array, left, p - 1)
    sortArray(array, p + 1, right)
  }

It looks very similar to the Java implementation, slightly different syntax, but in general, the same. And the time for the same benchmark? 695ms. No benchmark is complete without a graph, so let's see what that looks like visually:

So there you have it. Scala is about 20% faster than Java. QED and all that.

The wrong question

However this is not the full story. No micro benchmark ever is. So let's start off with answering the question of why Scala is faster than Java in this case. Now Scala and Java both run on the JVM. Their source code both compiles to bytecode, and from the JVMs perspective, it doesn't know if one is Scala or one is Java, it's just all bytecode to the JVM. If we look at the bytecode of the compiled Scala and Java code above, we'll notice one key thing, in the Java code, there are two recursive invocations of the quickSort routine, while in Scala, there is only one. Why is this? The Scala compiler supports an optimisation called tail call recursion, where if the last statement in a method is a recursive call, it can get rid of that call and replace it with an iterative solution. So that's why the Scala code is so much quicker than the Java code, it's this tail call recursion optimisation. You can turn this optimisation off when compiling Scala code, when I do that it now takes 827ms, still a little bit faster but not much. I don't know why Scala is still faster without tail call recursion.

This brings me to my next point, apart from a couple of extra niche optimisations like this, Scala and Java both compile to bytecode, and hence have near identical performance characteristics for comparable code. In fact, when writing Scala code, you tend to use a lot of exactly the same libraries between Java and Scala, because to the JVM it's all just bytecode. This is why benchmarking Scala against Java is the wrong question.

But this still isn't the full picture. My implementation of quick sort in Scala was not what we'd call idiomatic Scala code. It's implemented in an imperative fashion, very performance focussed - which it should be, being code that is used for a performance benchmark. But it's not written in a style that a Scala developer would write day to day. Here is an implementation of quick sort that is in that idiomatic Scala style:

  def sortList(list: List[Int]): List[Int] = list match {
    case Nil => Nil
    case head :: tail => sortList(tail.filter(_ < head)) ::: head :: sortList(tail.filter(_ >= head))
  }

If you're not familiar with Scala, this code may seem overwhelming at first, but trust me, after a few weeks of learning the language, you would be completely comfortable reading this, and would find it far clearer and easier to maintain than the previous solution. So how does this code perform? Well the answer is terribly, it takes 13951ms, 20 times longer than the other Scala code. Obligatory chart:

So am I saying that when you write Scala in the "normal" way, your codes performance will always be terrible? Well, that's not quite how Scala developers write code all the time, they aren't dumb, they know the performance consequences of their code.

The key thing to remember is that most problems that developers solve are not quick sort, they are not computation heavy problems. A typical web application for example is concerned with moving data around, not doing complex algorithms. The amount of computation that a piece of Java code that a web developer might write to process a web request might take 1 microsecond out of the entire request to run - that is, one millionth of a second. If the equivalent Scala code takes 20 microseconds, that's still only one fifty thousandth of a second. The whole request might take 20 milliseconds to process, including going to the database a few times. Using idiomatic Scala code would therefore increase the response time by 0.1%, which is practically nothing.

So, Scala developers, when they write code, will write it in the idiomatic way. As you can see above, the idiomatic way is clear and concise. It's easy to maintain, much easier than Java. However, when they come across a problem that they know is computationally expensive, they will revert to writing in a style that is more like Java. This way, they have the best of both worlds, with the easy to maintain idiomatic Scala code for the most of their code base, and the well performaning Java like code where the performance matters.

The right question

So what question should you be asking, when comparing Scala to Java in the area of performance? The answer is in Scala's name. Scala was built to be a "Scalable language". As we've already seen, this scalability does not come in micro benchmarks. So where does it come? This is going to be the topic of a future blog post I write, where I will show some closer to real world benchmarks of a Scala web application versus a Java web application, but to give you an idea, the answer comes in how the Scala syntax and libraries provided by the Scala ecosystem is aptly suited for the paradigms of programming that are required to write scalable fault tolerant systems. The exact equivalent bytecode could be implemented in Java, but it would be a monstrous nightmare of impossible to follow anonymous inner classes, with a constant fear of accidentally mutating the wrong shared state, and a good dose of race conditions and memory visibility issues.

To put it more concisely, the question you should be asking is "How will Scala help me when my servers are falling over from unanticipated load?" This is a real world question that I'm sure any IT professional with any sort of real world experience would love an answer to. Stay tuned for my next blog post.

Why does Chrome consider some ports unsafe?

Today in my Twitter feed I noticed a frustrated tweet from Dan North complaining about why Chrome seems to arbitrarily block connections to some ports, giving the confusing error code "net::ERR_UNSAFE_PORT". I've encountered this before, and have also been similarly frustrated, so I decided to do a little research, and work out what Google means by unsafe. There are a few discussions in stack overflow and forums about exactly which ports are blocked, but I didn't find much about why they are considered unsafe.

My first assumption was that somehow it was unsafe for Chrome itself to connect to services on these ports, because maybe that services protocol might confuse Chrome into doing something wrong. However, this didn't make sense, and I then worked out that it's not about protecting Chrome, but protecting the services themselves running on these ports. Many internet protocols are designed to be very permissive as to what they accept, and this presents an interesting attack opportunity for attackers.

As web developers with a mind for security are well aware, the browser is incredibly obliging to attackers when it comes to making requests on servers on your behalf. XSRF is a perfect example of this, but the good news is, we know how to protect against XSRF. However, what if an attacker tricked a web browser into making a connection to something that doesn't speak HTTP? Let's take, for example, an SMTP server. If you've never seen an SMTP session before, here is an example session that I've ripped from wikipedia:

S: 220 smtp.example.com ESMTP Postfix
C: HELO relay.example.org
S: 250 Hello relay.example.org, I am glad to meet you
C: MAIL FROM:
S: 250 Ok
C: RCPT TO:
S: 250 Ok
C: RCPT TO:
S: 250 Ok
C: DATA
S: 354 End data with .
C: From: "Bob Example" 
C: To: "Alice Example" 
C: Cc: theboss@example.com
C: Date: Tue, 15 January 2008 16:02:43 -0500
C: Subject: Test message
C:
C: Hello Alice.
C: This is a test message with 5 header fields and 4 lines in the message body.
C: Your friend,
C: Bob
C: .
S: 250 Ok: queued as 12345
C: QUIT
S: 221 Bye

On first glance, this may look nothing like HTTP. Except for one important feature, like HTTP, SMTP separates elements of its protocol with newlines. Furthermore, what happens when we send data to an SMTP server that it doesn't understand? Well, here's a simple HTTP GET request with an example SMTP server:

C: GET / HTTP/1.1
S: 500 5.5.1 Command unrecognized: "GET / HTTP/1.1"
C: Host: www.example.com
S: 500 5.5.1 Command unrecognized: "Host: www.example.com"
C: 
S: 500 5.5.1 Command unrecognized: ""

So, obviously our mail server is confused, but an important thing to notice here is that it's not outright rejecting our connection, it's telling the client that there is an error, but continuing to accept commands. So can we exploit this? The answer is a resounding yes. What if I, an attacker, were to craft a page that had a form, that was automatically submitted (using a javascript on load event that invoked the submit method - similar to a form based XSRF attack) to an SMTP server. In order to speak SMTP with this server, I would need to be able to include new lines in my messages. This is not possible using application/x-www-form-urlencoded, since newlines are URL encoded, however using multipart/form-data, I can create a field that has the client side messages in the above SMTP conversation, and so using my victims web browser, I can submit an email to the target SMTP server. The SMTP server would ignore all the HTTP protocol as in the above example, including the method and headers, as well as the multipart/form-data header content, but once it got my field, which would have my HELO, MAIL FROM, RCPT TO etc commands in it, it would start processing them.

But of course, an attacker could just connect directly to the SMTP server, right? For some SMTP servers yes. However, many SMTP servers are protected with nothing more than a firewall, and especially if the SMTP server is configured to use this as a security mechanism for verifying the authenticity of the source of the email, it may be desirable for an attacker to be able to send email through such an SMTP server. So what this means is an attacker could use a browser running behind a firewall that was protecting an SMTP server as a proxy to connecting to that SMTP server, and so send messages using it.

As it turns out, some SMTP servers actually detect HTTP requests made on them and immediately close the connection (my servers SMTP server did this when I tried with it). However, not all SMTP servers do this, because it wasn't envisioned that a piece of client server, such as a web browser, could be used as an open proxy to a protected network. And it's certainly not a good idea to assume that other types of services/servers, such as FTP servers and even IRC servers, will do this too.

So, why does Chrome refuse to connect to some ports? Because the Google engineers has gone through the list of well known ports, and worked out how tolerant the protocols that use these ports are to being sent HTTP requests, and if they are tolerant, they've marked it as unsafe and so blocked it, to prevent Google Chrome from being an open proxy to a secured network. Should all web browsers do this? Probably.

My complaint against Haskell

I have nothing against Haskell. I've been getting my teeth stuck into some Scala over the recent months, and I am loving functional programming, immutable types, and am even starting to see what's so good about monads. So Haskell, with its reputation of being a good pure functional language, is interesting to me. I did Haskell when I was in university. I didn't learn it, if I had learnt it, I would have understood it. I just did it. We learnt really useful things like doubling every integer in a list with a simple map function. Like curried and uncurried functions, like zipping, folding, reducing... all within the context of trivial contrived examples that did absolutely nothing to help us in the real world. In my 15 years of programming, I have only once ever needed to double every integer in a list, that was in my Haskell lab at uni. Very useful.

My problem is not with Haskell. My problem is with the people that teach and promote Haskell. They have never presented me with an example of how Haskell can help me in the real world. And today I'd like to highlight this, by looking at the first piece of code you'll encounter if you read the documentation on Haskell on the Haskell website. If you visit that website, and you want to know what Haskell is, you'll very quickly notice the link "What is Haskell". Clicking on it, you'll be taken to a page, and as you scroll down, you'll come across your first piece of Haskell code. It's quicksort:

quicksort :: Ord a => [a] -> [a]
quicksort []     = []
quicksort (p:xs) = (quicksort lesser) ++ [p] ++ (quicksort greater)
    where
        lesser  = filter (< p) xs
        greater = filter (>= p) xs

If you've ever implemented quicksort in a procedural language before, you'll notice that that code is very simple in comparison. It has some syntax you might not recognise, but it's not a soup of crazy characters like Perl, it looks ordered, and easy to read. It's impressive. But is it practical?

What is quicksort? The name implies that it is quick. So what's its running time? O(n2). Wait, hang on, that's not quick, there are faster algorithms out there, like heap sort, it runs in O(n log(n)) time. Well, it turns out, the average running time of quicksort is n log(n), but why would you use it over another algorithm that guarantees n log(n) running time? The reason is, the actual implementation of quicksort is really simple from a computation perspective, it does very little per element of the array, making its actual running time (on average) about 3 times faster than heap sort or merge sort. This makes quicksort very useful, if you want an algorithm that sorts an array as fast as possible, at the cost of in some circumstances not being very fast, then quicksort is perfect for you. This is why quicksort is called quicksort, it is quick. It may not mathematically have the best running time, but practically, it is quick.

This brings me to quicksort in Haskell. One of the reasons why quicksort, implemented in a procedural language, is so quick is that it does very close to the minimum necessary swap operations needed to sort the array. It sorts in place, it doesn't require building supplementary data structures, like two separate arrays in merge sort or the maximum heap in heap sort. It plays very nicely with hardware optimisations like CPU caches, maximising cache hits regardless of the size of the array and cache. However, the implementation above in Haskell does use additional structures. It partitions the list by creating two new lists on each recursion of the algorithm, requiring many more memory writes than the in place implementation. It uses linked lists, which play very poorly with CPU caches. It still runs in n log(n) on average, but it's now doing a lot more operations with very little help from hardware. This added overhead means that quicksort is no longer 3 times faster than merge or heap sort. Which means it now, on average, has similar run times to the O(n log(n)) algorithms, and still with the possibility of running in n2 for some inputs. It no longer is quick, and hence is not practical to use in the real world, you may as well use merge sort or heap sort.

To make matters worse, this implementation will always perform poorly on lists that are already k sorted, which is not an uncommon real world input to sort algorithms. A procedural implementation may also perform poorly on k sorted lists, but it's a simple change to make it perform well, just choose a random pivot, or a pivot from somewhere in the middle of the list. In Haskell, you can't do that without adding yet another overhead, because your lists are linked lists, and so accessing a random pivot takes n time.

Now, none of this means that Haskell is a bad language. Merge sort in Haskell is also very simple, and is also a very practical sorting algorithm in many cases, and so could have been used as an example instead. But why did the authors of this documentation choose a piece of code that is completely useless in the real world as the first taste of Haskell that many people will ever see? This is my complaint about Haskell, it's never been promoted to me in a way that bears any practical usefulness. I'm sure it's a great language. Pity about the people that promote it.

Hashbangs on the new Google Groups

Hash-bang URLs have been a contentious issue at my work over the past week, so much so that we completely got rid of them in the project that I'm working on. Coming out of the experience, I'm not convinced that hash-bangs are always a bad idea, however, there are some types of sites where they are very bad. Google groups I believe falls firmly into this category. And it's too late to do anything about it.

Google Groups Logo

The major issue with hash-bangs is not that it's a hack - many people would have called dynamically generating documents on each request a hack when websites first started doing that almost 20 years ago. It's not that it's ugly. It's not that some users don't have Javascript enabled (HTML5 mandates various Javascript APIs, if it doesn't do Javascript, it doesn't do HTML, Javascript is a core building block of the web - get over it). The big issue is maintainability of URLs.

For some sites, this is irrelevant. Take a site that is your own personal dashboard to managing your own private things - a webmail site for example. Nothing is going to link to it, so there is no need to maintain URLs. For other sites, it's an issue, but it's not so important. I know the folks at Twitter are considering it a big issue at the moment, but really, if a link breaks to someones 140 character word of wisdom... does it really matter?

But then there is a category of sites for which it is really important. These are the sites that maintain permanent records for future use. Mailing lists are one such example. I have an open source project, called the mongo jackson mapper. It has a small user base, and I provide support to people when they asked questions. About a month ago, I noticed that I was getting a number of emails from people asking for help. I was more than happy to answer these, however, I was concerned that if one person has a question, then other people are likely to have the same question, and I would be doing a lot of repeat work in answering people questions. So, I setup a Google group for questions.

Now, when someone asks the same question twice, I can simply link to the old answer, no need to rewrite the answer. Furthermore, people on other sites, such as stack overflow, can link to my answers. The URL to that answer will stand as a link in the chain to a record in time that will help many people help themselves in future. However, the usefulness is dependent on that link still working in future. Let's have a look at an example link:

https://groups.google.com/forum/?fromgroups#!topic/mongo-jackson-mapper/AeFF5Cqqolw

As you can see, the ID of the message only appears after the hash-bang. This means, it doesn't get sent to the server when you request it, the only way Google can know what message is being request is to serve up some Javascript that parses it out of the URL. And that's exactly what they do. But what happens, in future, when they decide to rewrite Google groups again? And again? And again... If the resource, https://groups.google.com/forum/?fromgroups, doesn't continue to serve up that Javascript, then all links to all posts that people have copied from their browser address bar and pasted elsewhere will break. The permanent record of networked help pages that people use to solve their every day problems will be broken.

In contrast, if they didn't use hash-bang URLs, then they could make whatever changes to their URL structure that they liked, and all they have to do to maintain the URLs is implement a simple redirect on the server side. No need to serve Javascript up at a particular resource.

Google groups does have a permalink feature, but really, who uses that? It feels unnatural for me even though I know and understand the difference between permalinks and what is in my address bar. When I want to copy a URL, without thinking, I copy it straight from the address bar, because this is what I always do for every other site.

So, Google groups, really? Hash-bangs? Are you not thinking of the future? Are you seriously committing to maintaining Javascript at arbitrary URLs for the lifetime of the internet? Unfortunately, the horse has bolted, these hash-bang links to Google groups are already starting to find themselves around the web, so you have no choice. And I'm guessing that someone in future will forget this, and and so you'll just break all those links.

About

Hi! My name is James Roper, and I am a software developer with a particular interest in open source development and trying new things. I program in Scala, Java, Go, PHP, Python and Javascript, and I work for Lightbend as the architect of Kalix. I also have a full life outside the world of IT, enjoy playing a variety of musical instruments and sports, and currently I live in Canberra.