all that jazz

james' blog about scala and all that jazz

About

Scaling Scala vs Java

Posted 2 November 2012

In my previous post I showed how it makes no sense to benchmark Scala against Java, and concluded by saying that when it comes to performance, the question you should be asking is "How will Scala help me when my servers are falling over from unanticipated load?" In this post I will seek to answer that, and show that indeed Scala is a far better language for building scalable systems than Java.

However, don't expect our journey to get there to be easy. For a start, while it's very easy to do micro benchmarks, trying to show how real world apps do or don't handle the loads that are put on them is very hard, because it's very hard to create an app that's small enough to demo and explain in a single blog post that is at the same time big enough to actually show how real world apps behave under load, and it's also very hard to simulate real world loads. So I am going to take one small aspect of something that might go wrong in the real world, and show just one way in which Scala will help you, where Java won't. Then I will explain that this is just the tip of the iceberg, there are far more situations, and far more features of Scala that will help you in the real world.

An online store

For this exercise I have implemented an online store. The architecture of this store is in the diagram below:

As you can see, there is a payment service and a search service that the store talks to, and the store handles three types of requests, one for the index page that doesn't require going to any other services, one for making payments that uses the payments service, and another for searching the stores product list which uses the search service. The online store is the part of the system that I am going to be benchmarking, I will implement one version in Java, and another in Scala, and compare them. The search and payment services won't change. Their actual implementations will be simple JSON APIs that return hard coded values, but they will each simulate a processing time of 20ms.

For the Java implementation of the store, I am going to keep it as simple as possible, using straight servlets to handle requests, Apache Commons HTTP client for making requests, and Jackson for JSON parsing and formatting. I will deploy the application to Tomcat, and configure Tomcat with the NIO connector, using the default connection limit of 10000 and thread pool size of 200.

For the Scala implementation I will use Play Framework 2.1, using the Play WS API which is backed by the Ning HTTP client to make requests, and the Play JSON API which is backed by Jackson to handle JSON parsing and formatting. Play Framework is built using Netty which has no connection limit, and uses Akka for thread pooling, and I have it configured to use the default thread pool size, which is one thread per CPU, and my machine has 4.

The benchmark I will be performing will be using JMeter. For each request type (index, payments and search) I will have 300 threads spinning in a loop making requests with a random 500-1500ms pause in between each request. This gives an average maximum throughput of 300 requests per second per request type, or 900 requests per second all up.

So, let's have a look at the result of the Java benchmark:

On this graph I have plotted 3 metrics per request type. The median is the median request time. For the index page, this is next to nothing, for the search and payments requests, this is about 77ms. I have also plotted the 90% line, which is a common metric in web applications, it shows what 90% of the requests were under, and so gives a good idea of what the slow requests are like. This shows again almost nothing for the index page, and 116ms for the search and payments requests. The final metric is the throughput, which shows number of requests per second that were handled. We are not too far off the theoretical maximum, with the index showing 290 requests per second, and the search and payments requests coming through at about 270 requests per second. These results are good, our Java service handles the load we are throwing at it without a sweat.

Now let's take a look at the Scala benchmark:

As you can see, it's identical to the Java results. This is not surprising, since both the Java and the Scala implementations of the online store are doing absolutely minimal work code wise, most of the processing time is going in to making requests on the remote services.

Something goes wrong

So, we've seen two happy implementations of the same thing in Scala and Java, shrugging off the load I give them. But what happens when things aren't so fine and dandy? What happens if one of the services that they are talking to goes down? Let's say the search service starts taking 30 seconds to respond, after which point it returns an error. This is not an unusual failure situation, particularly if you're load balancing through a proxy, the proxy tries to connect to the service, and fails after 30 seconds, giving you a gateway error. Let's see how our applications handle the load I throw at them now. We would expect the search request to take at least 30 seconds to respond, but what about the others? Here's the Java results:

Well, we no longer have a happy app at all. The search requests are naturally taking a long time, but the payments service is now taking an average of 9 seconds to respond, the 90% line is at 20 seconds. Not only that, but the index page is similarly impacted - users are not going to be waiting that long if they've browsed into your site for the home page to show up. And the throughput of each has gone down to 30 requests per second. This is not good, because your search service went down, your whole site is now practically unusable, and you will soon start losing customers and money.

So how does our Scala app fair? Let's find out:

Now before I say anything else, let me point out that I've bounded the response time to 160ms - the search requests are actually taking about 30 seconds to respond, but on the graph, with 30 seconds next to the other values, they hardly register a line a pixel high. So what we can see here is that while search is unusable, our payments and index request response times and throughput are unchanged. Obviously, customers aren't going to be happy with not being able to do searches, but at least they can still use other parts of your site, see your home page with specials, and even still make payments for items. And hey, Google isn't down, they can always use Google to search your site. So you might lose some business, but the impact is limited.

So, in this benchmark, we can see that Scala wins hands down. When things start to go wrong, a Scala application will take it in it's stride, giving you the best it can, while a Java application will likely just fall over.

But I can do that in Java

Now starts the bit where I counter the many anticipated criticisms that people will make of this benchmark. And the first, and most obvious one, is that in my Scala solution I used asynchronous IO, whereas in my Java solution I didn't, so they can't be compared. It is true, I could have implemented an asynchronous solution in Java, and in that case the Java results would have been identical to the Scala results. However, while I could have done that, Java developers don't do that. It's not that they can't, it's that they don't. I have written a lot of webapps in Java that make calls to other systems, and very rarely, and only in very special circumstances, have I ever used asynchronous IO. And let me show you why.

Let's say you have to do a series of calls on a series of remote services, each one depending on data returned from the previous. Here's a good old fashioned synchronous solution in Java:

User user = getUserById(id);
List<Order> orders = getOrdersForUser(user.email);
List<Product> products = getProductsForOrders(orders);
List<Stock> stock = getStockForProducts(products);

The above code is simple, easy to read, and feels completely natural for a Java developer to write. For completeness, let's have a look at the same thing in Scala:

val user = getUserById(id)
val orders = getOrdersForUser(user.email)
val products = getProductsForOrders(orders)
val stock = getStockForProducts(products)

Now, let's have a look at the same code, but this time assuming we are making asynchronous calls and returning the results in promises. What does it look like in Java?

Promise<User> user = getUserById(id);
Promise<List<Order>> orders = user.flatMap(new Function<User, List<Order>>() {
  public Promise<List<Order>> apply(User user) {
    return getOrdersForUser(user.email);
  }
}
Promise<List<Product>> products = orders.flatMap(new Function<List<Order>, List<Product>>() {
  public Promise<List<Product>> apply(List<Order> orders) {
    return getProductsForOrders(orders);
  }
}
Promise<List<Stock>> stock = products.flatMap(new Function<List<Product>, List<Stock>>() {
  public Promise<List<Stock>> apply(List<Product> products) {
    return getStockForProducts(products);
  }
}

So firstly, the above code is not readable, in fact it's much harder to follow, there is a massively high noise level to actual code that does stuff, and hence it's very easy to make mistakes and miss things. Secondly, it's tedious to write, no developer wants to write code that looks like that, I hate doing it. Any developer that wants to write their whole app like that is insane. And finally, it just doesn't feel natural, it's not the way you do things in Java, it's not idiomatic, it doesn't play well with the rest of the Java ecosystem, third party libraries don't integrate well with this style. As I said before, Java developers can write code that does this, but they don't, and as you can see, they don't for good reason.

So let's take a look at the asynchronous solution in Scala:

for {
  user <- getUserById(id)
  orders <- getOrdersForUser(user.email)
  products <- getProductsForOrders(orders)
  stock <- getStockForProducts(products)
} yield stock

In contrast to the Java asynchronous solution, this solution is completely readable, just as readable as the Scala and Java synchronous solutions. And this isn't just some weird Scala feature that most Scala developers never touch, this is how a typical Scala developer writes code every day. Scala libraries are designed to work using these idioms, it feels natural, the language is working with you. It's fun to write code like this in Scala!

This post is not about how with one language you can write a highly tuned app for performance that's faster than the same app written in another language highly tuned for performance. This post is about how Scala helps you write applications that are scalable by default, using natural, readable and idiomatic code. Just like a ball in lawn bowls has a bias, Scala has a bias to helping you write scalable applications, where Java makes you swim upstream.

But scaling means so much more than that

The example I've provided of Scala scaling well where Java doesn't is a very specific example, but then what situation where your app is failing under high load isn't? Let me give a few other examples of where Scala's much nicer asynchronous IO support helps you to write scalable code:

Using Akka, you can easily define actors for different types of requests, and allocate them different resource limits. So if certain parts of your single application start struggling or receiving unanticipated load, those parts may stop responding, but the rest of your app can stay healthy.
Scala, Play and Akka make handling single requests using multiple threads running in parallel doing different operations incredibly simple, allowing you to have requests that do a lot in very little time. Klout wrote an excellent article about how they did just that in their API.
Because asynchronous IO is so simple, offloading processing onto other machines can be safely done without tying up threads on the first machine.

Java 8 will make asynchronous IO simple in Java

Java 8 is probably going to include support for closures of some sort, which is great news for the Java world, especially if you want to do asynchronous IO. However, the syntax still won't be anywhere near is readable as the Scala code I showed above. And when will Java 8 be released? Java 7 was released last year, and it took 5 years to release that. Java 8 is scheduled for summer 2013, but even if it arrives on schedule, how long will it take for the ecosystem to catch up? And how long will it take for Java developers to switch from a synchronous to an asynchronous mindset? In my opinion, Java 8 is too little too late.

So this is all about asynchronous IO?

So far all I've talked about and shown is how easy Scala makes asynchronous IO, and how that helps you scale. But it doesn't stop there. Let me pick another feature of Scala, immutability.

When you start using multiple threads to process single requests, you start sharing state between those threads. And this is where things get very messy, because the world of shared state in a computer system is a crazy world where impossible things happen. It's a world of deadlocks, a world of updating memory in one thread, but another thread not seeing that change, a world of race conditions, and a world of performance bottle necks because you over eagerly marked some methods as synchronized.

However, it's not that bad, because there is a very simple solution, make all your state immutable. If all your state is immutable, then none of the above problems can happen. And this is again where Scala helps you big time, because in Scala, things are immutable by default. The collection APIs are immutable, you have to explicitly ask for a mutable collection in order to get mutable collections.

Now in Java, you can make things immutable. There are some libraries that help you (albeit clumsily) to work with immutable collections. But it's so easy to accidentally forget to make something mutable. The Java API and language itself don't make working with immutable structures easy, and if you're using a third party library, it's highly likely that it's not using immutable structures, and often requires you to use mutable structures, for example, JPA requires this.

Let's have a look at some code. Here is an immutable class in Scala:

case class User(id: Long, name: String, email: String)

That structure is immutable. Moreover, it automatically generates accessors for the properties. Let's look at the corresponding Java:

public class User {
  private final long id;
  private final String name;
  private final String email;

  public User(long id, String name, String email) {
    this.id = id;
    this.name = name;
    this.email = email;
  }

  public long getId() {
    return id;
  }

  public String getName() {
    return name;
  }

  public String getEmail() {
    return email
  }
}

That's an enormous amount of code! And what if I add a new property? I have to add a new parameter to my constructor which will break existing code, or I have to define a second constructor. In Scala I can just do this:

case class User(id: Long, name: String, email: String, company: Option[Company] = None)

All my existing code that calls that constructor will still work. And what about when this object grows to have 10 items in the constructor, constructing it becomes a nightmare! A solution to this in Java is to use the builder pattern, which more than doubles the amount of code you have to write for the object. In Scala, you can name the parameters, so it's easy to see which parameter is which, and they don't have to be in the right order. But maybe I might want to just modify one property. This can be done in Scala like this:

case class User(id: Long, name: String, email: String, company: Option[Company] = None) {
  def copy(id: Long = id, name: String = name, email: String = email, company: Option[Company] = company) = User(id, name, email, company)
}

val james = User(1, "James", "james@jazzy.id.au")
val jamesWithCompany = james.copy(company = Some(Company("Typesafe")))

The above code is natural, it's simple, it's readable, it's how Scala developers write code every day, and it's immutable. It is aptly suited to concurrent code, and allows you to safely write systems that scale. The same can be done in Java, but it's tedious, and not at all a joy to write. I am a big advocate of immutable code in Java, and I have written many immutable classes in Java, and it hurts, but it's the lesser of two hurts. In Scala, it takes more code to use mutable objects than to use immutable. Again, Scala is biased towards helping you scale.

Conclusion

I cannot possibly go into all the ways in which Scala helps you scale where Java doesn't. But I hope I have given you a taste of why Scala is on your side when it comes to writing Scalable systems. I've shown some concrete metrics, I've compared Java and Scala solutions for writing scalable code, and I've shown, not that Scala systems will always scale better than Java systems, but rather that Scala is the language that is on your side when writing scalable systems. It is biased towards scaling, it encourages practices that help you scale. Java, in contrast, makes it difficult for you to implement these practices, it works against you.

If you're interested in my code for the online store, you can find it in this GitHub repository. The numbers from my performance test can be found in this spreadsheet.

Passing common state to templates in Play Framework

Posted 26 October 2012

This question comes up very frequently on the Play mailing list, so I thought I'd document a quick example of how to pass common state to shared templates in Play Framework when using Scala. The use case is typically that you have a common header, but certain parts of it are dynamic, requiring data to be loaded from the database.

First of all, since the state required for rendering a header can logically be grouped into one object, a header object, let's do that. This will mean we can easily add new types of data to the header without changing any of our existing code. So I'm going to define a header that contains a list of menu items, and the current user:

case class Header(menu: Seq[MenuItem], user: Option[String])
case class MenuItem(url: String, name: String)

Now the template that uses this is my main template, so I'm going to add my Header class to it as an implicit parameter. Implicit parameters are parameters that don't need to be explicitly passed when you invoke a method, instead, if an implicit value exists in the scope of the invocation, that will be used:

@(title: String)(content: Html)(implicit header: Header)

<html>
    <head>
        ...
    </head>
    <body>
        <div id="header">
            @header.user.map { user =>
                <div>User: @user</div>
            }
            <ul>
            @for(item <- header.menu) {
                <li><a href="@item.url">@item.name</a></li>
            }
            </ul>
        </div>
        @content
    </body>
</html>

Now to pass this implicit parameter, all I need to do is declare that each of my templates that use the main template also accept an implicit header parameter. So for example, in my index template:

@(message: String)(implicit header: Header)

@main("Welcome to Play 2.0") {
    @play20.welcome(message)
}

Now comes the magic, supplying this parameter. I will write an implicit method that generates it. When a parameterless method (implicit parameters don't count as parameters in this case) is declared as implicit, this allows it to be used to supply a value for an implicit parameter. Here's the my method:

trait ProvidesHeader {

  implicit def header[A](implicit request: Request[A]) : Header = {
    val menu = Seq(MenuItem("/home", "Home"),
      MenuItem("/about", "About"),
      MenuItem("/contact", "Contact"))
    val user = request.session.get("user")
    Header(menu, user)
  }
}

For now I've just hard coded the menu items, but you get the point. Since this method is implicit, and it returns something of type Header, then it can be used to supply the implicit Header parameter that our index template needs. You can also see that this method itself accepts an implicit parameter, the request. If your method doesn't need the request, then you can remove that, however make sure that you remove the parenthesis from the method, it will only work with parameterless methods, not zero argument methods.

So what now do I need to do to my actions? Almost nothing, I just need to make sure that my implicit header method is in scope, and that they declare the request they accept as an implicit parameter, so for example:

object Application extends Controller with ProvidesHeader {

  def index = Action { implicit request =>
    Ok(views.html.index("Your new application is ready."))
  }
}

As you can see I've simply declared that my controller extends the ProvidesHeader trait. My action code itself is left completely uncluttered, it doesn't need to know whether the templates it renders require a header, that's automatically provided, and in fact more implicit parameters could be added to my templates, and my action code still doesn't have to be aware of it.

A note for Java Play apps

Unfortunately this doesn't work so nicely for Java Play apps, since although you can still use implicit parameter passing in the templates, this needs to be explicitly handled by your actions. As an alternative to implicit parameter passing, Play offers the args map for storing arbitrary per request data on the Http.Context class. This can be populated through action composition, or however you want, and then accessed in your templates through the Http.Context.current thread local.

Benchmarking Scala against Java

Posted 16 October 2012

A question recently came up at work about benchmarks between Java and Scala. Maybe you came across my blog post because you too are wanting to know which is faster, Java or Scala. Well I'm sorry to say this, but if that is you, you are asking the wrong question. In this post, I will show you that Scala is faster than Java. After that, I will show you why the question was the wrong question and why my results should be ignored. Then I will explain what question you should have asked.

The benchmark

Today we are going to choose a very simple algorithm to benchmark, the quick sort algorithm. I will provide implementations both in Scala and Java. Then with each I will sort a list of 100000 elements 100 times, and see how long each implementations takes to sort it. So let's start off with Java:

    public static void quickSort(int[] array, int left, int right) {
        if (right <= left) {
            return;
        }
        int pivot = array[right];
        int p = left;
        int i = left;
        while (i < right) {
            if (array[i] < pivot) {
                if (p != i) {
                    int tmp = array[p];
                    array[p] = array[i];
                    array[i] = tmp;
                }
                p += 1;
            }
            i += 1;
        }
        array[right] = array[p];
        array[p] = pivot;
        quickSort(array, left, p - 1);
        quickSort(array, p + 1, right);
    }

Timing this, sorting a list of 100000 elements 100 times on my 2012 MacBook Pro with Retina Display, it takes 852ms. Now the Scala implementation:

  def sortArray(array: Array[Int], left: Int, right: Int) {
    if (right <= left) {
      return
    }
    val pivot = array(right)
    var p = left
    var i = left
    while (i < right) {
      if (array(i) < pivot) {
        if (p != i) {
          val tmp = array(p)
          array(p) = array(i)
          array(i) = tmp
        }
        p += 1
      }
      i += 1
    }
    array(right) = array(p)
    array(p) = pivot
    sortArray(array, left, p - 1)
    sortArray(array, p + 1, right)
  }

It looks very similar to the Java implementation, slightly different syntax, but in general, the same. And the time for the same benchmark? 695ms. No benchmark is complete without a graph, so let's see what that looks like visually:

So there you have it. Scala is about 20% faster than Java. QED and all that.

The wrong question

However this is not the full story. No micro benchmark ever is. So let's start off with answering the question of why Scala is faster than Java in this case. Now Scala and Java both run on the JVM. Their source code both compiles to bytecode, and from the JVMs perspective, it doesn't know if one is Scala or one is Java, it's just all bytecode to the JVM. If we look at the bytecode of the compiled Scala and Java code above, we'll notice one key thing, in the Java code, there are two recursive invocations of the quickSort routine, while in Scala, there is only one. Why is this? The Scala compiler supports an optimisation called tail call recursion, where if the last statement in a method is a recursive call, it can get rid of that call and replace it with an iterative solution. So that's why the Scala code is so much quicker than the Java code, it's this tail call recursion optimisation. You can turn this optimisation off when compiling Scala code, when I do that it now takes 827ms, still a little bit faster but not much. I don't know why Scala is still faster without tail call recursion.

This brings me to my next point, apart from a couple of extra niche optimisations like this, Scala and Java both compile to bytecode, and hence have near identical performance characteristics for comparable code. In fact, when writing Scala code, you tend to use a lot of exactly the same libraries between Java and Scala, because to the JVM it's all just bytecode. This is why benchmarking Scala against Java is the wrong question.

But this still isn't the full picture. My implementation of quick sort in Scala was not what we'd call idiomatic Scala code. It's implemented in an imperative fashion, very performance focussed - which it should be, being code that is used for a performance benchmark. But it's not written in a style that a Scala developer would write day to day. Here is an implementation of quick sort that is in that idiomatic Scala style:

  def sortList(list: List[Int]): List[Int] = list match {
    case Nil => Nil
    case head :: tail => sortList(tail.filter(_ < head)) ::: head :: sortList(tail.filter(_ >= head))
  }

If you're not familiar with Scala, this code may seem overwhelming at first, but trust me, after a few weeks of learning the language, you would be completely comfortable reading this, and would find it far clearer and easier to maintain than the previous solution. So how does this code perform? Well the answer is terribly, it takes 13951ms, 20 times longer than the other Scala code. Obligatory chart:

So am I saying that when you write Scala in the "normal" way, your codes performance will always be terrible? Well, that's not quite how Scala developers write code all the time, they aren't dumb, they know the performance consequences of their code.

The key thing to remember is that most problems that developers solve are not quick sort, they are not computation heavy problems. A typical web application for example is concerned with moving data around, not doing complex algorithms. The amount of computation that a piece of Java code that a web developer might write to process a web request might take 1 microsecond out of the entire request to run - that is, one millionth of a second. If the equivalent Scala code takes 20 microseconds, that's still only one fifty thousandth of a second. The whole request might take 20 milliseconds to process, including going to the database a few times. Using idiomatic Scala code would therefore increase the response time by 0.1%, which is practically nothing.

So, Scala developers, when they write code, will write it in the idiomatic way. As you can see above, the idiomatic way is clear and concise. It's easy to maintain, much easier than Java. However, when they come across a problem that they know is computationally expensive, they will revert to writing in a style that is more like Java. This way, they have the best of both worlds, with the easy to maintain idiomatic Scala code for the most of their code base, and the well performaning Java like code where the performance matters.

The right question

So what question should you be asking, when comparing Scala to Java in the area of performance? The answer is in Scala's name. Scala was built to be a "Scalable language". As we've already seen, this scalability does not come in micro benchmarks. So where does it come? This is going to be the topic of a future blog post I write, where I will show some closer to real world benchmarks of a Scala web application versus a Java web application, but to give you an idea, the answer comes in how the Scala syntax and libraries provided by the Scala ecosystem is aptly suited for the paradigms of programming that are required to write scalable fault tolerant systems. The exact equivalent bytecode could be implemented in Java, but it would be a monstrous nightmare of impossible to follow anonymous inner classes, with a constant fear of accidentally mutating the wrong shared state, and a good dose of race conditions and memory visibility issues.

To put it more concisely, the question you should be asking is "How will Scala help me when my servers are falling over from unanticipated load?" This is a real world question that I'm sure any IT professional with any sort of real world experience would love an answer to. Stay tuned for my next blog post.

ScalaFlect: Statically typed query DSLs

Posted 23 April 2012

I've been hard at work over the past few weeks redesigning my Mongo object mapper, the Mongo Jackson Mapper, to support Jackson 2.0, be simpler to use, and be far more powerful with less surprises to users. Something that has been at the front of my mind is how to make it nicer to use in Scala, and the biggest thing that I'd like to do here is implement the query and update APIs using a DSL that makes the queries look like ordinary code. But there's a problem. If you want your classes to be ordinary case classes, then when querying, you're going to have to refer to the properties using strings. Strings that are very easy to make typos in, strings that don't get updated automatically when you refactor, strings that don't throw compile errors if they are wrong, and in the case of MongoDB queries, strings that don't even throw runtime errors if they are wrong. Scala has no support for referring to a field, rather than its value, so unfortunately, strings are the only option. That is, until now.

Today I started a new open source project called ScalaFlect. It provides exactly what every query DSL implementor has been wanting, statically typed reflection. It allows you to reference a field or a method, and the result being that you get the java.lang.reflect.Field/Method associated with it. Before I go into the details of how I did this, let's first look at an example DSL that I've written to demonstrate the capabilities so far.

Example DSL usage

So to start off, I have some case classes, a BlogPost, and a Comment, with a one to many relationship between blog posts and comments:

case class BlogPost(author: String, title: String, published: Boolean,
  tags: Set[String], comments: List[Comment])
case class Comment(author: String, text: String)

Now I have implemented my DSL by implementing an abstract class that provides the basic elements of my DSL. This class is called ObjectListDao, and as the name suggests, it stores objects in a list. But it could just as easily be storing objects in a database like MongoDB. Let's look at a DAO that uses this DSL:

class BlogPostDao extends ObjectListDao(classOf[BlogPost]) {
  import au.id.jazzy.scalaflect.ScalaFlect.toTraverser

  def findByAuthor(author: String) = {
    query(
      $(_.author) %= author
    )
  }

  def findPublishedByTag(tag: String) = {
    query(
      $(_.tags.$) %= tag,
      $(_.published) %= true
    )
  }

  def findPostsCommentedOnByAuthor(author: String) = {
    query(
      $(_.comments.$.author) %= author
    )
  }
}

So the first thing you notice is the static access to properties. The $ method accepts a function that accepts a parameter of type BlogPost, and the function should return the property that you want to reference. You can also see that using the $ operator on a collection, you can refer to properties in that collections type.

Example DSL definition

So how have I implemented my DSL? Quite simply. Here is the code:

class ObjectListDao[T](val clazz: Class[T]) {
  private val scalaFlect = new ScalaFlect(clazz)
  private val list = new ListBuffer[T]

  def $[R](f : T => R): QueryField[R] = new QueryField(scalaFlect.reflect(f))
  
  def query(exps: QueryExpression*) = {
    var results = list.toList
    for (exp <- exps) {
      results = results.filter(exp.matches(_))
    }
    results.toList
  }

  def save(obj: T) {
    list += obj
  }
}

class QueryField[R](val field: Member[R]) {
  def %=(value: R): QueryExpression = new QueryExpression {
    def matches(obj: Any) = !(field.get(obj) filter {v: Any => v == value} isEmpty)
  }
}

trait QueryExpression {
  def matches(obj: Any): Boolean
}

Here we have an instance of ScalaFlect for our class (in the example usage, this was BlogPost). The $ method simply invokes reflect() on this, which is what does the reflection. reflect() returns a Member. A Member is a reference to a series of methods and fields that were invoked by the reflection function. Using it, you can get access to the Java reflection API classes that represent these methods and fields.

The other thing to note is the implementation of matches. This is where the Member class is actually used. If this was a DSL for something like MongoDB, then at this point the Memmber and its parents would be inspected to build a query object, which would then be sent to the database. In our case though we're just operating on a list of objects that we already have in memory, and Member provides a convenient get function that gets all the values of that property for a given object. Note that I said values, not value, ScalaFlect allows stepping into collections using the $ operator, so if the reference goes through a collection, then multiple values might be returned.

Implementation

At this point, especially if you've thought about how to do this before, you're probably wondering how on earth I've implemented this. The passed in reflection function is never executed. An important thing to remember in Scala is that functions are actually classes themselves. ScalaFlect simply takes the function class, decompiles it, finds the apply method, and then does some fairly simple analysis on the code to find out what properties were referenced. By fairly simple, I mean I wrote it in 2 hours. The result is cached so each function is only ever analysed once.

Ok, so this sounds like a hack, and it is. And it certainly has its limitations. The behaviour if you put complex code into your reflection function is completely undefined (it will most likely throw an exception, in future I'll add some decent error handling). However, it's a hack that will work well - there is nothing magic about how a Scala function that invokes some methods/accesses some fields on the passed in argument and returns the result is compiled down to bytecode. I don't expect that anything will be added to Scala in future that might break this.

One limitation that I do anticipate though is working with hot swapped code. I imagine, if you update a query, and hotswap it into a running JVM, there will be no way for ScalaFlect to either detect that, or even analyse it if it could. This is because the way ScalaFlect decompiles the bytecode is by loading the class from the reflection functions classloader. I actually have no idea how hot swapping works, but I suspect that it doesn't update the bytecode that the classloader sees. The same goes for any runtime generated and enhanced code, but maybe that issue is not as likely to ever cause a problem.

The future

I read somewhere that Scala 2.10 is getting improved reflection, but I don't know what that means. The need for this library can be completely removed if Scala added language level features for referring to a field, and not it's value, in code. My ideal outcome from implementing ScalaFlect is that the value of such a feature will be realised, and hence added to Scala. In the meantime, ScalaFlect will work nicely at giving DSL writers statically typed reflection.

So check out the library on GitHub, and feel free to comment on my approach. I haven't cut a release yet, but I'll get to that and deploy a beta to Maven central in the coming days (depending on how fast Sonatype will process my request to deploy to my domains groupId).

Update

It's just been pointed out to me that Scala 2.10 will (might?) have macro support, which would achieve this (and potentially much more) in a much cleaner fashion. This is great news for writing query DSLs. I don't know though what the state of the macro support is, it's marked as experimental and doesn't seem to be really mentioned in any Scala 2.10 feature lists. So maybe ScalaFlect might have some time to serve some useful purpose yet.

Three things I had to learn about Scala before it made sense

Posted 19 March 2012

Last year I sat down in a meeting room with a few other colleagues to learn from one of Atlassian's biggest Scala advocates about Scala. I kept on hearing good stuff about Scala, and I like trying new things, so I was eager to have my eyes opened so that I could learn how awesome this thing was. Unfortunately, I was disapponted. Most of the lesson went straight over my head. But to my dismay, the other people in the room seemed to get it. It was on this day that I realised an interesting thing about my learning style. I remembered back at uni, I was exactly the same in lectures... I never got anything I was taught in lectures. Which is probably why by the end of uni, I had pretty much stopped attending all CS lectures. It wasn't until I was given a real problem in a lab that I had to solve that I got it. This is why I never really got functional languages at uni, the lecturers that taught functional languages at my uni weren't the most practical of people. Actually, I don't think they ever even considered whether their work had any practical implications at all. So they taught things like map, reduce, filter, fold, curried functions etc like this "Look, here's something you've never heard of! Behold its awesomeness! Now here's a trivial mathematical use case with no apparent real world application! Look at how practical it is!"

Some people have no problems learning this way, and I envy them. For me, I need to have a problem that I understand and want to solve today, that the thing that I'm learning can solve, in order to learn it. So if you're reading this and thinking "That is so me!", and you want to learn Scala, then hopefully this is the blog post for you. I'm going to start with a practical problem that hopefully you will understand and want to solve, and then show you how three particular Scala features allow solving this really elegantly and simply. Hopefully after that, you'll have a good understanding of those features in Scala, and be able to understand a lot more of Scala code when you encounter it.

The problem

The problem that I want to solve is dependency management. If you've ever used maven, you'll know how verbose this can be. For those that haven't encountered maven, here's what specifying a single dependency looks like:

<dependency>
    <groupId>net.vz.mongodb.jackson</groupId>
    <artifactId>mongo-jackson-mapper</artifactId>
    <version>1.4.1</version>
</dependency>

In English, this means a dependency on the artifact mongo-jackson-mapper version 1.4.1 from the organisation net.vz.mongodb.jackson. The English explanation is shorter than the code! Maybe the Maven authors could have made things nicer, by placing it on one line, and that would be clearer too, for example, something like this:

<dependency>net.vz.mongodb.jackson:mongo-jackson-mapper:1.4.1</dependency>

There are however some advantages to the more verbose model, the language (XML) that the DSL (maven pom.xml format) uses is able to fully express every element of the dependency, rather than the elements being defined in some other format XML is unaware of. There are also other attributes (such as scope and classifier) that the more verbose solution can easily unambiguously specify, these become very hard if it's just an arbitrary string of characters.

So here is the problem. I want a DSL to specify dependencies with as little noise as possible from the DSL, so that my dependency specification is short, unambiguous, and its format should be enforced by the language it's specified in. The language I'm going to use is Scala.

And just to ground this problem even more solidly in reality, this example is not contrived. SBT, the Scala Build Tool, uses Scala to specify its build configuration, including dependencies. The solution I'm going to show you is the solution that SBT uses. So by the end of this you will also understand a little about how to use SBT, which you will no doubt need to know as you delve into the world of Scala.

How would we do it with no weird Scala tricks?

First of all I will give a solution that uses language constructs that you should already be familiar with, using classes, fields, methods etc. If the following syntax makes no sense to you, you should probably do some reading on basic Scala syntax.

def groupId(groupId: String) = new GroupId(groupId)

class GroupId(val groupId: String) {
  def artifact(artifactId: String) = new Artifact(groupId, artifactId)
}

class Artifact(val groupId: String, val artifactId: String) {
  def version(version: String) = new VersionedArtifact(groupId, artifactId, version)
}

class VersionedArtifact(val groupId: String, val artifactId: String, val version: String) {
}

Here we have three classes, a GroupId, which has a method for creating an Artifact from that group id, which in turn has a method for creating a VersionedArtifact, which captures our dependency. There's also a factory method for creating the GroupId. To use it, we can do this:

groupId("net.vz.mongodb.jackson")
    .artifact("mongo-jackson-mapper")
    .version("1.4.1")

Already it's looking better than the XML version, but this could be done in Java too. Scala has a lot more to offer yet. As we go through the following lessons, it's important to remember that each change we make, we are still basically achieving the same as the above.

Lesson One: Scala methods can be made of operator characters

To Java developers such as myself, this is a bit weird, though in other languages this is quite common. There's some strict rules in Scala over exactly what a method name can be, but one of them is that it can be made of just operator characters (ie, +, =, *, /, %, -). So, the first change that we're going to make to our above code is to get rid of the method names. It makes things more concise, but other than that it may seem like a weird first step. Bear with me.

def groupId(groupId: String) = new GroupId(groupId)

class GroupId(val groupId: String) {
  def %(artifactId: String) = new Artifact(groupId, artifactId)
}

class Artifact(val groupId: String, val artifactId: String) {
  def %(version: String) = new VersionedArtifact(groupId, artifactId, version)
}

class VersionedArtifact(val groupId: String, val artifactId: String, val version: String) {
}

And so now our code looks like this:

groupId("net.vz.mongodb.jackson").%("mongo-jackson-mapper").%("1.4.1")

Lesson Two: Scala lets you call methods without dots or braces

There are some strict rules governing this, including what happens when obvious ambiguities arise, and you can read about that in your own research. Simply put, if a method has only one parameter, you can call it without using the dot or the braces, rather replace them with whitespace:

groupId "net.vz.mongodb.jackson" % "mongo-jackson-mapper" % "1.4.1"

This is called "infix" notation, which you may have also come across in other languages. We're looking much more concise now, yet the language is still enforcing the rules of what a dependency needs, and we still end up with a strongly typed VersionedArtifact. But we're not finished yet.

Lesson Three: Scala lets you implicitly convert any type to any other type

For me, this was the weirdest feature of Scala when I first learnt it. It sounded so cool and dangerous at the same time. Most languages, when you call a method on an object, if that method doesn't exist on that object, will throw an error. For the statically typed languages, this will be a compile error, for dynamic languages it will be a runtime error. However, Scala is a bit different. When it can't find a method on an object (at compile time, because Scala is statically typed), it has an extra step before it gives up with an error.

Scala has a concept of implicit methods (and fields, but right now let's only worry about methods). When a method that you call doesn't exist on the object that you're trying to invoke it on, it will check the current scope for any implicit methods that take the type of your object as an argument. If the implicit methods result has a method that matches the method you are trying to invoke, then Scala wraps your object in that implicit method call. Have I lost you? Let's look at some code:

implicit def groupId(groupId: String) = new GroupId(groupId)

So the only change I've made to the groupId method is that it is now implicit. When I define my dependency, I do this:

"net.vz.mongodb.jackson" % "mongo-jackson-mapper" % "1.4.1"

Notice that the call to groupId has completely disappeared. If at this point our newly learned Scala syntax is a bit overwhelming for you, maybe it would help to see what would happen if we applied the implicit rule first, before we got rid of our dots and braces, and before we changed our method names to percent signs:

"net.vz.mongodb.jackson".artifact("mongo-jackson-mapper").version("1.4.1")

What it looks like is that the String class has a method called artifact. But we know it doesn't, and the Scala compiler knows it doesn't. When it sees that, it says "are there any implicit methods available that accept a String as an argument, and return a type that has an artifact method?" And so it finds the groupId method, which matches that criteria, and converts the code to this:

groupId("net.vz.mongodb.jackson").artifact("mongo-jackson-mapper").version("1.4.1")

When I first learnt about this feature, I thought that's going to cause massive amounts of confusion. And, if used irresponsibly, it certainly can. But when used carefully, it provides a massive amount of power, for example, in creating DSLs like the one above, or for adding methods to third party/legacy libraries.

On a side note, Scala actually uses this to add functional methods to the Java collections API, by providing implicit conversion methods to convert Java collections to Scala collections. The confusion that could arise can be avoided by being careful to only import the implicit methods you need, where you need them. Scala allows you to put an import statement anywhere in code, so if you have one method in a class that needs to work with Java collections, you can import the implicit collection conversions into just that block of code. In the case of DSLs, it's usually the case that the use of these are separated from the rest of your code by normal encapsulation, so a DSL for accessing a database would only appear in a DAO, where you're expecting it to appear, so there would be no confusion.

Putting it into context

There is one last step, and that is to show what we actually do with the dependency we've declared. In maven, our XML ends up being part of the rest of the overly verbose POM file. In SBT, there is a file called build.sbt that contains snippets of Scala, which build up the configuration. Available to the snippets are certain variables, for example, libraryDependencies, which contain the configuration. So adding my dependency to my SBT project means adding a line to the build.sbt file, like this:

libraryDependencies += "net.vz.mongodb.jackson" % "mongo-jackson-mapper" % "1.4.1"

You may notice here that there is another use of infix operator named methods, += is a method defined in mutable collections for adding elements. Multiple dependencies can be added more tersely like this:

libraryDependencies ++= Seq(
    "net.vz.mongodb.jackson" % "mongo-jackson-mapper" % "1.4.1",
    "net.vz.mongodb.jackson" % "play-mongo-jackson-mapper" % "1.0.0"
)

So there we have it. We have a very terse way of specifying dependencies, that is strongly typed, and so validated at compile time. I hope that the problem I have presented and solved is a problem that you have understood and can relate to, and that you've understood how Scala can be used to help solve it, and that in learning how Scala helps solve it, you've learnt and understood some important features of the Scala language that you're now keen to apply to other problems that you have.