all that jazz

james' blog about scala and all that jazz

About

How to write a REST API in Play Framework

Posted 14 June 2013

A very common question that we get on the Play mailing list is how do you write a REST API using Play Framework? There's no explicit documentation on it, you won't find a page in the Play documentation titled "Writing REST APIs". The question is often met with confusion, to those that try to answer it, the question for them is "how can you not write a REST API with Play? Play is all about REST."

So let me explain why we don't have a page on writing REST APIs. Play is fundamentally a framework for writing REST APIs, just like a fridge is a tool that is fundamentally for keeping food cold. When you buy a fridge, and you get the manual for a fridge, do you find a page titled "How to keep food cold using the fridge"? Probably not. You'll find instructions for installing the frige, turning it on, setting the temperature, adjusting the shelves, but you won't find instructions that explicitly say how to keep the food cold. Why not? Because it's assumed that you understand, when you buy the fridge, that the way to keep food cool in it is by putting food in and closing the door. The whole manual is about how to keep food cold, since that's the fridges fundamental function.

It's the same with Play. We assume first of all that you know what a REST API is. There's plenty of documentation out there on the web on what a REST API is, there's no reason for us to repeat this in our documentation, a good place to start might be this StackOverflow question. As the first answer to that question says, "Really, what it's about is using the true potential of HTTP", Play also provides everything you need to use the true potential of HTTP.

So we have documentation on writing routes in Scala and Java, we have documentation on sending results in Scala and Java, we have documentation on handling JSON in Scala and Java, and so on and so on. All this documentation is giving you the tools you need to implement what Play fundamentally about, that is, HTTP, which when realised to its true potential, will be REST. There's nothing special about a REST API in Play, writing a REST API in Play means writing a web application in the way that Play is designed to be used. We could probably rename the Play documentation home page to be "Writing a REST API in Play", that would accurately describe what most of the Play documentation is about.

Let me repeat again, Play is all about realising the full potential of HTTP, which means Play is all about REST. You want to read about how to write a REST API in Play? Read the Play documentation, it's all about writing a REST API in Play.

Scaling Scala vs Java

Posted 2 November 2012

In my previous post I showed how it makes no sense to benchmark Scala against Java, and concluded by saying that when it comes to performance, the question you should be asking is "How will Scala help me when my servers are falling over from unanticipated load?" In this post I will seek to answer that, and show that indeed Scala is a far better language for building scalable systems than Java.

However, don't expect our journey to get there to be easy. For a start, while it's very easy to do micro benchmarks, trying to show how real world apps do or don't handle the loads that are put on them is very hard, because it's very hard to create an app that's small enough to demo and explain in a single blog post that is at the same time big enough to actually show how real world apps behave under load, and it's also very hard to simulate real world loads. So I am going to take one small aspect of something that might go wrong in the real world, and show just one way in which Scala will help you, where Java won't. Then I will explain that this is just the tip of the iceberg, there are far more situations, and far more features of Scala that will help you in the real world.

An online store

For this exercise I have implemented an online store. The architecture of this store is in the diagram below:

As you can see, there is a payment service and a search service that the store talks to, and the store handles three types of requests, one for the index page that doesn't require going to any other services, one for making payments that uses the payments service, and another for searching the stores product list which uses the search service. The online store is the part of the system that I am going to be benchmarking, I will implement one version in Java, and another in Scala, and compare them. The search and payment services won't change. Their actual implementations will be simple JSON APIs that return hard coded values, but they will each simulate a processing time of 20ms.

For the Java implementation of the store, I am going to keep it as simple as possible, using straight servlets to handle requests, Apache Commons HTTP client for making requests, and Jackson for JSON parsing and formatting. I will deploy the application to Tomcat, and configure Tomcat with the NIO connector, using the default connection limit of 10000 and thread pool size of 200.

For the Scala implementation I will use Play Framework 2.1, using the Play WS API which is backed by the Ning HTTP client to make requests, and the Play JSON API which is backed by Jackson to handle JSON parsing and formatting. Play Framework is built using Netty which has no connection limit, and uses Akka for thread pooling, and I have it configured to use the default thread pool size, which is one thread per CPU, and my machine has 4.

The benchmark I will be performing will be using JMeter. For each request type (index, payments and search) I will have 300 threads spinning in a loop making requests with a random 500-1500ms pause in between each request. This gives an average maximum throughput of 300 requests per second per request type, or 900 requests per second all up.

So, let's have a look at the result of the Java benchmark:

On this graph I have plotted 3 metrics per request type. The median is the median request time. For the index page, this is next to nothing, for the search and payments requests, this is about 77ms. I have also plotted the 90% line, which is a common metric in web applications, it shows what 90% of the requests were under, and so gives a good idea of what the slow requests are like. This shows again almost nothing for the index page, and 116ms for the search and payments requests. The final metric is the throughput, which shows number of requests per second that were handled. We are not too far off the theoretical maximum, with the index showing 290 requests per second, and the search and payments requests coming through at about 270 requests per second. These results are good, our Java service handles the load we are throwing at it without a sweat.

Now let's take a look at the Scala benchmark:

As you can see, it's identical to the Java results. This is not surprising, since both the Java and the Scala implementations of the online store are doing absolutely minimal work code wise, most of the processing time is going in to making requests on the remote services.

Something goes wrong

So, we've seen two happy implementations of the same thing in Scala and Java, shrugging off the load I give them. But what happens when things aren't so fine and dandy? What happens if one of the services that they are talking to goes down? Let's say the search service starts taking 30 seconds to respond, after which point it returns an error. This is not an unusual failure situation, particularly if you're load balancing through a proxy, the proxy tries to connect to the service, and fails after 30 seconds, giving you a gateway error. Let's see how our applications handle the load I throw at them now. We would expect the search request to take at least 30 seconds to respond, but what about the others? Here's the Java results:

Well, we no longer have a happy app at all. The search requests are naturally taking a long time, but the payments service is now taking an average of 9 seconds to respond, the 90% line is at 20 seconds. Not only that, but the index page is similarly impacted - users are not going to be waiting that long if they've browsed into your site for the home page to show up. And the throughput of each has gone down to 30 requests per second. This is not good, because your search service went down, your whole site is now practically unusable, and you will soon start losing customers and money.

So how does our Scala app fair? Let's find out:

Now before I say anything else, let me point out that I've bounded the response time to 160ms - the search requests are actually taking about 30 seconds to respond, but on the graph, with 30 seconds next to the other values, they hardly register a line a pixel high. So what we can see here is that while search is unusable, our payments and index request response times and throughput are unchanged. Obviously, customers aren't going to be happy with not being able to do searches, but at least they can still use other parts of your site, see your home page with specials, and even still make payments for items. And hey, Google isn't down, they can always use Google to search your site. So you might lose some business, but the impact is limited.

So, in this benchmark, we can see that Scala wins hands down. When things start to go wrong, a Scala application will take it in it's stride, giving you the best it can, while a Java application will likely just fall over.

But I can do that in Java

Now starts the bit where I counter the many anticipated criticisms that people will make of this benchmark. And the first, and most obvious one, is that in my Scala solution I used asynchronous IO, whereas in my Java solution I didn't, so they can't be compared. It is true, I could have implemented an asynchronous solution in Java, and in that case the Java results would have been identical to the Scala results. However, while I could have done that, Java developers don't do that. It's not that they can't, it's that they don't. I have written a lot of webapps in Java that make calls to other systems, and very rarely, and only in very special circumstances, have I ever used asynchronous IO. And let me show you why.

Let's say you have to do a series of calls on a series of remote services, each one depending on data returned from the previous. Here's a good old fashioned synchronous solution in Java:

User user = getUserById(id);
List<Order> orders = getOrdersForUser(user.email);
List<Product> products = getProductsForOrders(orders);
List<Stock> stock = getStockForProducts(products);

The above code is simple, easy to read, and feels completely natural for a Java developer to write. For completeness, let's have a look at the same thing in Scala:

val user = getUserById(id)
val orders = getOrdersForUser(user.email)
val products = getProductsForOrders(orders)
val stock = getStockForProducts(products)

Now, let's have a look at the same code, but this time assuming we are making asynchronous calls and returning the results in promises. What does it look like in Java?

Promise<User> user = getUserById(id);
Promise<List<Order>> orders = user.flatMap(new Function<User, List<Order>>() {
  public Promise<List<Order>> apply(User user) {
    return getOrdersForUser(user.email);
  }
}
Promise<List<Product>> products = orders.flatMap(new Function<List<Order>, List<Product>>() {
  public Promise<List<Product>> apply(List<Order> orders) {
    return getProductsForOrders(orders);
  }
}
Promise<List<Stock>> stock = products.flatMap(new Function<List<Product>, List<Stock>>() {
  public Promise<List<Stock>> apply(List<Product> products) {
    return getStockForProducts(products);
  }
}

So firstly, the above code is not readable, in fact it's much harder to follow, there is a massively high noise level to actual code that does stuff, and hence it's very easy to make mistakes and miss things. Secondly, it's tedious to write, no developer wants to write code that looks like that, I hate doing it. Any developer that wants to write their whole app like that is insane. And finally, it just doesn't feel natural, it's not the way you do things in Java, it's not idiomatic, it doesn't play well with the rest of the Java ecosystem, third party libraries don't integrate well with this style. As I said before, Java developers can write code that does this, but they don't, and as you can see, they don't for good reason.

So let's take a look at the asynchronous solution in Scala:

for {
  user <- getUserById(id)
  orders <- getOrdersForUser(user.email)
  products <- getProductsForOrders(orders)
  stock <- getStockForProducts(products)
} yield stock

In contrast to the Java asynchronous solution, this solution is completely readable, just as readable as the Scala and Java synchronous solutions. And this isn't just some weird Scala feature that most Scala developers never touch, this is how a typical Scala developer writes code every day. Scala libraries are designed to work using these idioms, it feels natural, the language is working with you. It's fun to write code like this in Scala!

This post is not about how with one language you can write a highly tuned app for performance that's faster than the same app written in another language highly tuned for performance. This post is about how Scala helps you write applications that are scalable by default, using natural, readable and idiomatic code. Just like a ball in lawn bowls has a bias, Scala has a bias to helping you write scalable applications, where Java makes you swim upstream.

But scaling means so much more than that

The example I've provided of Scala scaling well where Java doesn't is a very specific example, but then what situation where your app is failing under high load isn't? Let me give a few other examples of where Scala's much nicer asynchronous IO support helps you to write scalable code:

Using Akka, you can easily define actors for different types of requests, and allocate them different resource limits. So if certain parts of your single application start struggling or receiving unanticipated load, those parts may stop responding, but the rest of your app can stay healthy.
Scala, Play and Akka make handling single requests using multiple threads running in parallel doing different operations incredibly simple, allowing you to have requests that do a lot in very little time. Klout wrote an excellent article about how they did just that in their API.
Because asynchronous IO is so simple, offloading processing onto other machines can be safely done without tying up threads on the first machine.

Java 8 will make asynchronous IO simple in Java

Java 8 is probably going to include support for closures of some sort, which is great news for the Java world, especially if you want to do asynchronous IO. However, the syntax still won't be anywhere near is readable as the Scala code I showed above. And when will Java 8 be released? Java 7 was released last year, and it took 5 years to release that. Java 8 is scheduled for summer 2013, but even if it arrives on schedule, how long will it take for the ecosystem to catch up? And how long will it take for Java developers to switch from a synchronous to an asynchronous mindset? In my opinion, Java 8 is too little too late.

So this is all about asynchronous IO?

So far all I've talked about and shown is how easy Scala makes asynchronous IO, and how that helps you scale. But it doesn't stop there. Let me pick another feature of Scala, immutability.

When you start using multiple threads to process single requests, you start sharing state between those threads. And this is where things get very messy, because the world of shared state in a computer system is a crazy world where impossible things happen. It's a world of deadlocks, a world of updating memory in one thread, but another thread not seeing that change, a world of race conditions, and a world of performance bottle necks because you over eagerly marked some methods as synchronized.

However, it's not that bad, because there is a very simple solution, make all your state immutable. If all your state is immutable, then none of the above problems can happen. And this is again where Scala helps you big time, because in Scala, things are immutable by default. The collection APIs are immutable, you have to explicitly ask for a mutable collection in order to get mutable collections.

Now in Java, you can make things immutable. There are some libraries that help you (albeit clumsily) to work with immutable collections. But it's so easy to accidentally forget to make something mutable. The Java API and language itself don't make working with immutable structures easy, and if you're using a third party library, it's highly likely that it's not using immutable structures, and often requires you to use mutable structures, for example, JPA requires this.

Let's have a look at some code. Here is an immutable class in Scala:

case class User(id: Long, name: String, email: String)

That structure is immutable. Moreover, it automatically generates accessors for the properties. Let's look at the corresponding Java:

public class User {
  private final long id;
  private final String name;
  private final String email;

  public User(long id, String name, String email) {
    this.id = id;
    this.name = name;
    this.email = email;
  }

  public long getId() {
    return id;
  }

  public String getName() {
    return name;
  }

  public String getEmail() {
    return email
  }
}

That's an enormous amount of code! And what if I add a new property? I have to add a new parameter to my constructor which will break existing code, or I have to define a second constructor. In Scala I can just do this:

case class User(id: Long, name: String, email: String, company: Option[Company] = None)

All my existing code that calls that constructor will still work. And what about when this object grows to have 10 items in the constructor, constructing it becomes a nightmare! A solution to this in Java is to use the builder pattern, which more than doubles the amount of code you have to write for the object. In Scala, you can name the parameters, so it's easy to see which parameter is which, and they don't have to be in the right order. But maybe I might want to just modify one property. This can be done in Scala like this:

case class User(id: Long, name: String, email: String, company: Option[Company] = None) {
  def copy(id: Long = id, name: String = name, email: String = email, company: Option[Company] = company) = User(id, name, email, company)
}

val james = User(1, "James", "james@jazzy.id.au")
val jamesWithCompany = james.copy(company = Some(Company("Typesafe")))

The above code is natural, it's simple, it's readable, it's how Scala developers write code every day, and it's immutable. It is aptly suited to concurrent code, and allows you to safely write systems that scale. The same can be done in Java, but it's tedious, and not at all a joy to write. I am a big advocate of immutable code in Java, and I have written many immutable classes in Java, and it hurts, but it's the lesser of two hurts. In Scala, it takes more code to use mutable objects than to use immutable. Again, Scala is biased towards helping you scale.

Conclusion

I cannot possibly go into all the ways in which Scala helps you scale where Java doesn't. But I hope I have given you a taste of why Scala is on your side when it comes to writing Scalable systems. I've shown some concrete metrics, I've compared Java and Scala solutions for writing scalable code, and I've shown, not that Scala systems will always scale better than Java systems, but rather that Scala is the language that is on your side when writing scalable systems. It is biased towards scaling, it encourages practices that help you scale. Java, in contrast, makes it difficult for you to implement these practices, it works against you.

If you're interested in my code for the online store, you can find it in this GitHub repository. The numbers from my performance test can be found in this spreadsheet.

Benchmarking Scala against Java

Posted 16 October 2012

A question recently came up at work about benchmarks between Java and Scala. Maybe you came across my blog post because you too are wanting to know which is faster, Java or Scala. Well I'm sorry to say this, but if that is you, you are asking the wrong question. In this post, I will show you that Scala is faster than Java. After that, I will show you why the question was the wrong question and why my results should be ignored. Then I will explain what question you should have asked.

The benchmark

Today we are going to choose a very simple algorithm to benchmark, the quick sort algorithm. I will provide implementations both in Scala and Java. Then with each I will sort a list of 100000 elements 100 times, and see how long each implementations takes to sort it. So let's start off with Java:

    public static void quickSort(int[] array, int left, int right) {
        if (right <= left) {
            return;
        }
        int pivot = array[right];
        int p = left;
        int i = left;
        while (i < right) {
            if (array[i] < pivot) {
                if (p != i) {
                    int tmp = array[p];
                    array[p] = array[i];
                    array[i] = tmp;
                }
                p += 1;
            }
            i += 1;
        }
        array[right] = array[p];
        array[p] = pivot;
        quickSort(array, left, p - 1);
        quickSort(array, p + 1, right);
    }

Timing this, sorting a list of 100000 elements 100 times on my 2012 MacBook Pro with Retina Display, it takes 852ms. Now the Scala implementation:

  def sortArray(array: Array[Int], left: Int, right: Int) {
    if (right <= left) {
      return
    }
    val pivot = array(right)
    var p = left
    var i = left
    while (i < right) {
      if (array(i) < pivot) {
        if (p != i) {
          val tmp = array(p)
          array(p) = array(i)
          array(i) = tmp
        }
        p += 1
      }
      i += 1
    }
    array(right) = array(p)
    array(p) = pivot
    sortArray(array, left, p - 1)
    sortArray(array, p + 1, right)
  }

It looks very similar to the Java implementation, slightly different syntax, but in general, the same. And the time for the same benchmark? 695ms. No benchmark is complete without a graph, so let's see what that looks like visually:

So there you have it. Scala is about 20% faster than Java. QED and all that.

The wrong question

However this is not the full story. No micro benchmark ever is. So let's start off with answering the question of why Scala is faster than Java in this case. Now Scala and Java both run on the JVM. Their source code both compiles to bytecode, and from the JVMs perspective, it doesn't know if one is Scala or one is Java, it's just all bytecode to the JVM. If we look at the bytecode of the compiled Scala and Java code above, we'll notice one key thing, in the Java code, there are two recursive invocations of the quickSort routine, while in Scala, there is only one. Why is this? The Scala compiler supports an optimisation called tail call recursion, where if the last statement in a method is a recursive call, it can get rid of that call and replace it with an iterative solution. So that's why the Scala code is so much quicker than the Java code, it's this tail call recursion optimisation. You can turn this optimisation off when compiling Scala code, when I do that it now takes 827ms, still a little bit faster but not much. I don't know why Scala is still faster without tail call recursion.

This brings me to my next point, apart from a couple of extra niche optimisations like this, Scala and Java both compile to bytecode, and hence have near identical performance characteristics for comparable code. In fact, when writing Scala code, you tend to use a lot of exactly the same libraries between Java and Scala, because to the JVM it's all just bytecode. This is why benchmarking Scala against Java is the wrong question.

But this still isn't the full picture. My implementation of quick sort in Scala was not what we'd call idiomatic Scala code. It's implemented in an imperative fashion, very performance focussed - which it should be, being code that is used for a performance benchmark. But it's not written in a style that a Scala developer would write day to day. Here is an implementation of quick sort that is in that idiomatic Scala style:

  def sortList(list: List[Int]): List[Int] = list match {
    case Nil => Nil
    case head :: tail => sortList(tail.filter(_ < head)) ::: head :: sortList(tail.filter(_ >= head))
  }

If you're not familiar with Scala, this code may seem overwhelming at first, but trust me, after a few weeks of learning the language, you would be completely comfortable reading this, and would find it far clearer and easier to maintain than the previous solution. So how does this code perform? Well the answer is terribly, it takes 13951ms, 20 times longer than the other Scala code. Obligatory chart:

So am I saying that when you write Scala in the "normal" way, your codes performance will always be terrible? Well, that's not quite how Scala developers write code all the time, they aren't dumb, they know the performance consequences of their code.

The key thing to remember is that most problems that developers solve are not quick sort, they are not computation heavy problems. A typical web application for example is concerned with moving data around, not doing complex algorithms. The amount of computation that a piece of Java code that a web developer might write to process a web request might take 1 microsecond out of the entire request to run - that is, one millionth of a second. If the equivalent Scala code takes 20 microseconds, that's still only one fifty thousandth of a second. The whole request might take 20 milliseconds to process, including going to the database a few times. Using idiomatic Scala code would therefore increase the response time by 0.1%, which is practically nothing.

So, Scala developers, when they write code, will write it in the idiomatic way. As you can see above, the idiomatic way is clear and concise. It's easy to maintain, much easier than Java. However, when they come across a problem that they know is computationally expensive, they will revert to writing in a style that is more like Java. This way, they have the best of both worlds, with the easy to maintain idiomatic Scala code for the most of their code base, and the well performaning Java like code where the performance matters.

The right question

So what question should you be asking, when comparing Scala to Java in the area of performance? The answer is in Scala's name. Scala was built to be a "Scalable language". As we've already seen, this scalability does not come in micro benchmarks. So where does it come? This is going to be the topic of a future blog post I write, where I will show some closer to real world benchmarks of a Scala web application versus a Java web application, but to give you an idea, the answer comes in how the Scala syntax and libraries provided by the Scala ecosystem is aptly suited for the paradigms of programming that are required to write scalable fault tolerant systems. The exact equivalent bytecode could be implemented in Java, but it would be a monstrous nightmare of impossible to follow anonymous inner classes, with a constant fear of accidentally mutating the wrong shared state, and a good dose of race conditions and memory visibility issues.

To put it more concisely, the question you should be asking is "How will Scala help me when my servers are falling over from unanticipated load?" This is a real world question that I'm sure any IT professional with any sort of real world experience would love an answer to. Stay tuned for my next blog post.

Configuring Tomcat to use Apache SSL certificates

Posted 21 January 2010

In a typical SSL configuration for a Tomcat web server, Apache sits in front of Tomcat as a reverse proxy, and does the SSL. This was the configuration of some systems I work with. There are a number of reasons why this configuration is used, the primary one being that Apache's SSL implementation is much faster than Tomcat's. So it's not often that you would go from using this configuration to switching to a Tomcat only configuration, but that's exactly what I just did.

The reason for doing this is that we wanted to use Tomcat's NIO connector, in order to use Tomcat's comet capabilities. Setting up SSL with Tomcat is something that I had never done before, I had heard though that it was not easy. After trying to do it without really understanding what I was doing, I found that it really wasn't easy. The problem was that everything I looked at on the web talked about using the Java keytool to generate a key, so you could send a certificate signing request to your trusted authority to sign. The thing is, I already had a key, and a certificate, and the Java keytool utility that does all this key manipulation has no way of importing an existing key.

Eventually I found this utility, and was able to get things working. But, as often happens when solving these problems, I then read back over the Tomcat SSL HowTo, and now with more of an understanding of what I was doing I found a much simpler and easier way of getting Tomcat to use my existing certificate.

The trick is, rather than use a JKS repository, which is the native Java SSL certificate store, and what most of the documentation on the web talks about, is use a PKCS12 repository, which is an internet standard, and can be manipulated using standard tools such as openssl. This tool requires three files, which are easy to find from your Apache SSL configuration, one is the private key file, another is the certificate, and finally the certificate signer chain. The command to run is:

openssl pkcs12 -export -in mycert.crt -inkey mykey.key \
                        -out mycert.p12 -name tomcat -CAfile myCA.crt \
                        -caname root -chain

The name and caname arguments can be anything, they're just convenient aliases to allow later manipulation of the file. The command will prompt you for a password, this password gets set as the keystorePass in the Tomcat connector configuration. The keystoreType must be set to PKCS12. Here is my Tomcat configuration:

    <Connector port="8443" maxHttpHeaderSize="8192"
               maxThreads="150" enableLookups="false" acceptCount="100"
               connectionTimeout="20000" disableUploadTimeout="true"
               protocol="org.apache.coyote.http11.Http11NioProtocol"
               SSLEnabled="true" scheme="https" secure="true" clientAuth="false" sslProtocol="TLS"
               keystoreFile="/path/to/mycert.p12"
               keystoreType="PKCS12" keystorePass="tomcat"/>

Java Concurrency and Volatile

Posted 24 April 2009

The volatile keyword is a keyword that very few Java developers know the meaning of, let alone when they should use it. The reason for this, I believe, is that the reason why it's needed is such a complex topic that unless you've studied in detail the way CPUs use registers, cache, and the way the JVM uses stack frames, it's impossible to understand why it's needed. The other reason I think, is that it is difficult to demonstrate the consequences of not using it. That is why I came up with this little puzzle, to highlight how important the volatile keyword is.

For this demonstration, you will need a multi processor Linux 2.6 or OpenSolaris system, with Java 5 or above. It will not work on Mac or Windows. If you know why it doesn't work on Mac or Windows, please leave a comment explaining, I'd really like to know. What this does highlight though is just how complex Java concurrency issues are.

So on to the puzzle. Without executing it, try and work out what will happen when you run the following code:

public class ConcurrencyFun implements Runnable
{
    private String str;
    void setStr(String str)
    {
        this.str = str;
    }
    public void run()
    {
        while (str == null);
        System.out.println(str);
    }
    public static void main(String[] args) throws Exception
    {
        ConcurrencyFun fun = new ConcurrencyFun();
        new Thread(fun).start();
        Thread.sleep(1000);
        fun.setStr("Hello world!!");
    }
}

Most people would guess that the above code would wait for about one second, print the text "Hello world!!", and then exit. The spawned thread busy waits for str to not be null, and then prints it. The main thread, after starting the spawned thread, waits for one second, and then sets str to be "Hello world!!". Simple, right?

Now try running it (remember, only on a multi processor Linux 2.6 or Solaris system). What actually happens? On my machine, the program never exits. Why is this?

The reason is that the JVM is free to make its own copy of the str pointer available to each thread that uses it. This could come in many forms. It could be that the pointer is loaded into a register and is continually read from that register. This is what is most likely happening in our case. It could be that the pointer is loaded into the CPU cache, and never expired, even after update. Or, it is also possible that the JVM will make a copy of the pointer in the threads stack frame, to allow for more efficient memory access. Whether you understand anything I've just said or not, the point is that changes to the str field may not necessarily be seen by all threads accessing it, in our case, it will never be seen by the spawned thread.

This is where the volatile keyword comes in. The volatile keyword tells the JVM that any writes to that field must be viewable by all threads. This means that the compiled machine code may not read the variable into a register and use that multiple times, it must read it from memory every time. It also must not read it from the CPU cache, it must make sure that every read comes straight from memory. And finally, it stops the JVM from creating a local copy of the field in the threads stack frame.

So, adding the volatile keyword, like so:

public class ConcurrencyFun implements Runnable
{
    private volatile String str;
    void setStr(String str)
    {
        this.str = str;
    }
    public void run()
    {
        while (str == null);
        System.out.println(str);
    }
    public static void main(String[] args) throws Exception
    {
        ConcurrencyFun fun = new ConcurrencyFun();
        new Thread(fun).start();
        Thread.sleep(1000);
        fun.setStr("Hello world!!");
    }
}

results in the expected behaviour happening, the program waits one second, prints out "Hello world!!" and then exists.

The complexity of concurrency

There are other ways to make the above code work. For example, if in the while loop, you add some code that prints something out, you will find that it works. My guess at the reason for this is that the register storing str ends up getting used for something else, and so on each iteration, str gets read from memory. Note that this is not a real fix, it is still possible for problems to occur, and indeed on some architectures the program still will not exit. Another thing that will work is to invoke Java with the -Xint argument. This disables machine code compilation, and hence makes concurrency issues arising from registers and CPU caches much less likely. But again, it's not a solution. Using the volatile keyword is the only solution that guarantees that it will work, every time, on every platform.