all that jazz

james' blog about scala and all that jazz

Eventual Consistency is not safe and that's ok

A number of years ago, Peter Bailis wrote an excellent post titled Safety and Liveness: Eventual Consistency Is Not Safe. This post is short and to the point, and is a very salient reality check for anyone that is using eventual consistency. With the prevelance of eventually consistent architectures and systems today, I recommend that all developers read it, in spite of the posts age, it is just as important today as it was when it was written.

A member of our marketing department recently came across the post, and sent an email to our engineers because he was somewhat confused about it. At Lightbend, we are advocates of eventual consistency, and much of our tech embraces it. However he was alarmed by the statement that eventual consistency is not safe. This statement led him to the conclusion that the post was saying that eventual consistency must be bad.

I don’t think this conclusion is an unreasonable conclusion for someone not familiar with eventual consistency to make, not just a marketing person, but for a developer who is completely new to eventual consistency. And so I think the statement needs some further explanation.

So to start with, an analogy. The sun is unsafe. If you were to somehow “stand on” the sun, before you could begin to comprehend your environment, you would cease to exist in any recognisable form. This property of the sun, that standing on it and existing are incompatible, is not a bad thing, in fact it’s not a good thing either, it’s neither bad nor good, it’s just a property. It doesn’t become a good or bad thing until you try to rely on the property or its absence. It simply means, don’t stand on the sun.

Likewise, the unsafeness of eventual consistency is not a bad thing, nor is it a good thing, it’s just a property.

Now the bulk of Peter’s post was about how eventual consistency alone is not useful, it must be combined with other guarantees, such as what versions of the data can be returned when it’s not consistent, and what version of the data will eventually be returned. These guarantees help to make an eventually consistent system more safe, but they don’t stop it from being unsafe.

So what does this unsafety look like in practice, and why isn’t it necessarily a bad thing?

Let’s consider a very common eventually consistent architectural practice - Command Query Responsibility Segregation (CQRS). A CQRS implementation is usually eventually consistent, and offers further guarantees such as that the read side will always return some version of the data that matches some version in the write sides history, and that the convergent version that will eventually be returned will always be the most recent version of the write side.

So, what is not safe about it? Let’s say you have a blog that supports full text search. The search is implemented using a read side view of the data that puts all the posts into a natural text index. Since this is using CQRS, the index is updated asynchronously, and it may take some time for the new post to reach the read side. This means, immediately after publishing a new post, let’s say it’s a post about eventual consistency, if you search for “eventual consistency”, the post won’t come back in the search results. This is in spite of the fact that you can browse to the post, load it, and see the words eventual consistency in the text of the post. The system is in an inconsistent state, when we load the post it has the words “eventual consistency”, when we ask the search index for all the posts with the words “eventual consistency” in them, the index doesn’t return that post. This is the unsafety, it is not safe to rely on every part of the system returning the same answer to whether our post contains the words “eventual consistency”.

So, now we’ve seen an example of the unsafety of eventual consistency, why is this not bad? The answer is simply, there is no requirement for the text search to be updated immediately after the blog post is published. There is a requirement that it be eventually updated, and in a somewhat timely manner, and CQRS allows us to meet that requirement. But no one cares if for a few seconds or even minutes after publishing a blog post, that the text search on their blog doesn’t return the new post. So the unsafety of eventual consistency is not bad because we’re not relying on it to be safe.

How does this play out in a broader context? It’s easy to see how safety is not a requirement for a search index, but that’s a very specific scenario, is this one of the few places where eventual consistency can be used? The answer to that is necessarily no, if you’re writing a sufficiently large system, with even modest scaling and availability requirements by todays standards, you will probably find yourself having to use eventual consistency in order to scale your system and maintain availability. The upshot is that you must embrace the unsafeness property, and not put strong consistency requirements on the parts of your system that are eventually consistent. You must design your system to handle the unsafeness.

There are many ways to do this - for example, by using asynchronous message passing, messages can be queued to be handled when the system is consistent and ready to handle them, in contrast to synchronous RPC calls, which require the system to be consistent before the calls can be made. Another technique is to just store events, or facts, things that happened, and don’t calculate the state until later, when the system has converged, for example, when the state is queried. And yet another is to assume consistency, but then have a process in place to detect when that assumption was not true, and trigger some recovery operation.

In all these cases, the unsafeness of eventual consistency isn’t a bad thing, it’s just a property that has been carefully considered and addressed in the design of the system. It is not safe to rely on the consistency of the system, and that’s ok.

The Noop Monad - doing nothing safely

If you’re a fan of functional programming, as I am, you’ll know that one of the great things about it is how useful it is. But that isn’t the only great thing about functional programming, functional programming is also great for when you want to do nothing at all. Some might even say that doing nothing at all is where functional programming really shines.

So today I’m going to introduce a monad that surprisingly isn’t talked about a lot - the noop monad. The noop monad does nothing at all, but unlike noops in other programming paradigms, the noop monad does nothing safely.

§A demo

For this demonstration, I’m going to use Scala, with Scalaz to implement the monad. Let’s start off with the Noop type:

 * A noop of type T
sealed trait Noop[T] {

   * Run the noop
  def run: Unit

As you can see, the Noop type has a type parameter, so we can do nothing of various types. We can also see the run function, and it returns Unit. Now typically in functional programming, returning Unit is considered a bad thing, because Unit is not a value, so any pure function that returns Unit must have done nothing. But since Noop actually does do nothing, this is the one exception to that rule. So the run function can be evaluated to do the nothing of the type that this particular Noop does.

Now, let’s say I have method that calculates all the primes up to a given number. Here’s its signature:

def calculatePrimes(upTo: Int): List[Int]

And let’s say I want to get a list of all the Int primes, I can use the above method like so:


But wait, you say! That code is going to be very expensive to run, it’s likely to take a very, very long time, and you have better things to do. So, you want to ensure that the code doesn’t run. This is where the noop monad comes on the scene, using the point method, you can ensure that it safely doesn’t run:

val noopAllIntPrimes = calculatePrimes(Int.MaxValue).point[Noop]

And then, when you actually don’t want to run it, you can do that by evaluating the run function:

For those unfamiliar with scalaz and functional programming, a monad is an applicative, and an applicative is something that lets you create an instance of the applicative from a value. The method on Applicative for doing this is called point, in other languages it’s also called pure.

So, we can see that Noop is an applicative, but can we flatMap it? What if you don’t want to sum all those prime numbers, and then you certainly don’t want to convert that result to a String? The noop monad lets you do that:

val summedPrimesString = for {
  primes <- noopAllIntPrimes
  summed <- primes.reduce(_ + _).point[Noop]
  asString <- summed.toString.point[Noop]
} yield asString

And so then to ensure that we don’t actually do all this expensive computation, we can run it as before:


We can see how the noop monad can be used to do nothing, but what are the advantages of using the noop monad compared to some other methods of doing nothing? I’m going to highlight three advantages that I think really demonstrate the value of doing nothing in a monadic way.

§Runtime optimisation

This is often an advantage of functional programming in general, but the noop monad is the exemplar of optimization in functional programming. Let’s have a look at the implementation of the noop monads point method:

def point[A](a: => A): Noop[A] = Noop[A]

Here we can see that not only is the passed in value not evaluated, it’s not even referenced in the returned Noop. But how can the noop monad do this? Since the noop monad knows that you don’t want to do anything at all, it is able to infer that therefore it will not need to evaluate the value, and therefore it doesn’t need to hold a reference to the passed in value. But this advanced optimisation doesn’t stop there, let’s have a look at the implementation of bind:

def bind[A, B](fa: Noop[A])(f: A => Noop[B]): Noop[B] = Noop[B]

Here we can see a double optimisation. First, the passed in Noop is not referenced. The noop monad can do this because it infers that since you don’t want to do anything, you don’t need the nothing that you passed in. Secondly, the passed in bind function is never evaluated. As with the other parameter, the noop monad can infer that since the passed in Noop does nothing, there will be nothing to pass to the passed in function, and therefore, the function will never be evaluated.

As you can see, particularly for performance minded developers, the noop monad is incredibly powerful in its ability to optimise your code at runtime to do as little of nothing as possible.

§Code optimisation

But performance isn’t the only place that the noop monad can help with optimisation, the noop monad can also help at optimising your code to ensure it is as simple and concise as possible.

Let’s take our previous example of summing primes:

(for {
  primes <- calculatePrimes(Int.MaxValue).point[Noop]
  summed <- primes.reduce(_ + _).point[Noop]
  asString <- summed.toString.point[Noop]
} yield asString).run

Now, this isn’t bad looking code, but it does feel a little too complex when all we wanted to do in the first place was nothing. So how can we simplify it? Well firstly, you’ll notice that we don’t want to convert the summed result to a string, you can tell this by the .point[Noop] after it. Based on the rules of the noop monad, we can optimise our code to remove this:

(for {
  primes <- calculatePrimes(Int.MaxValue).point[Noop]
  summed <- primes.reduce(_ + _).point[Noop]
} yield summed).run

Is this safe to do? In fact it is, because we have actually replaced our intention of doing nothing, with nothing. We can do the same for summing all the primes:

(for {
  primes <- calculatePrimes(Int.MaxValue).point[Noop]
} yield primes).run

Now the final step in code optimisation, and this is the hardest to follow so bear with me, we can actually remove the not calculating the primes itself, and simultaneously remove the run function on that Noop. But how is this so? You may remember that I explained earlier if a pure function returns Unit, then it must do nothing. Our is a pure function, and it does nothing. So since evaluating run does nothing, we can safely replace it with nothing. Finding it hard to follow? This is what it looks like in code:

As you can see, we’ve gone from five reasonably complex lines of code, to absolutely no code at all! This is the embodiment of what Dijkstra meant when he said:

If we wish to count lines of code, we should not regard them as “lines produced” but as “lines spent”.

The noop monad has allowed us to spend zero lines of code in doing nothing.

§Teaching monads

Teaching monads has proven to be the unicorn of evangelising functional programming, no matter how hard anyone tries, no one seems to be able to teach them to a newcomer. The noop monad solves this by grounding monads in a context that all students can relate to - doing nothing.

In particular, the noop monad does a great job for picking up the pieces of a failed attempt to teach a student monads. For example, consider the following situations:

  • A student has been told that monads are just monoids in the category of endofunctors. What does that even mean? But if I say the noop monoid in the category of endofunctors is just something that does nothing, simple!
  • A student has been told that monads are burritos. What does that even mean? But if I say the noop burrito is just something that does nothing, simple!


So today I’ve introduced you to the noop monad. As you can see, it’s in the noop monad that functional programming is made complete, fullfilling everything that every functional programmer has ever wanted to do, that is, nothing at all.

sbt - A declarative DSL

This is my second post in a series of posts about sbt. The first post, sbt - A task engine, looked at sbt’s task engine, how it is self documenting, making tasks discoverable, and how scopes allow heirarchical fallbacks. In this post we’ll take a look at how the sbt task engine is declared, again taking a top down approach, rooted in practical examples.

§Settings are not settings

In the previous post, we were introduced to settings, being a specialisation of tasks that are only executed once at the start of an sbt session. File that bit of knowledge away, and now reset your definition of setting to nothing. I’m guessing this is a relic of past versions of sbt, but the word setting in sbt can mean two distinct things, one is a task that’s executed at the start of the session, the other is a task (or setting, as in executed at start of session) declaration. No clearer?

Let’s introduce another bit of terminology, a task key. A task key is a string and a type, the string is the name of the task, it’s how you reference it on the command line. The type is the type that the task produces. So the sources task has a name of "sources", and a type of Seq[File]. It is defined in sbt.Keys:

val sources = TaskKey[Seq[File]]("sources", "All sources, both managed and unmanaged.", BTask)

You can see it also has a description and a rank, those are not really important to us now. The thing that uniquely defines this task key is the sources string. You could define another sources key elsewhere, as long as they have the same name, they will be considered the key for the same task. Of course, if you define two tasks using the same key name, but different key types, that’s not allowed, and sbt will give you an error.

In addition to TaskKey’s there are also SettingKey’s, this is setting as in only executed once per session. Now these keys by themselves do nothing. They only do something when you declare some behaviour for them. So, a setting as in a task declaration is a task or setting key, potentially scoped, with some associated behaviour. For the remainder of this post, when I say setting, that’s what I’m referring to.

§Settings are executed like a program

Defining sbt’s task engine is done by giving sbt a series of settings, each setting declaring a task implementation. sbt then executes those settings in order. Tasks can be declared multiple times by multiple settings, the last one to execute wins.

So where do these settings come from? They come from many places. Most obviously, they come from the build file. But most settings don’t come from there - as you’re aware, sbt defines many, many fine grained tasks, but you don’t have to declare settings for these yourself. Most of the settings come from sbt plugins. One in particular that is enabled by default is the JvmPlugin. This contains all the settings necessary for building a Java or Scala program, including settings that declare the sources task that we saw yesterday. Plugin settings are executed before the settings in your build file, this means that any settings you declare in your build file will override the settings declared by the plugins.

This ordering of settings is important to note, it means settings have to be read from top to bottom. I have handled a number of support cases and mailing list questions where people haven’t realised this, they have declared a setting, and then after that included a sequence of settings from elsewhere in their build that redeclares the setting. They expected their setting to take precedence, but since their setting came before the setting from the sequence, the setting from the sequence overwrites it.

§sbt build file syntax

We’re about to get into some concrete examples of declaring settings, so before we do that we better cover the basics of the sbt build file. sbt builds can either be specified in plain *.scala files, or in sbts own *.sbt file format. As of sbt 0.13.7, the sbt format has become powerful enough that there is really not much that you can’t do with it, so we’re only going to look at that.

An sbt file may have any name, convention is that a projects main build file be called build.sbt, but that is only a convention. The file may contain a series of Scala statements and expressions, and it’s important here to distinguish between statements and expressions. What’s the difference? A statement doesn’t return a value. For example:

val foo = "bar"

This is a statement, it has assigned the val foo to "bar", but this assignment doesn’t return a value. In contrast:

5 + 4

This is an expression, it returns a value of type Int.

Expressions in an sbt file must have a type of either Setting[_] or Seq[Setting[_]]. sbt will evaluate all these exrpessions, and add them to the settings for your build. Any expression in your sbt file that isn’t of one of those types will generate an error.

Statements can be anything. They can be utility methods, vals, lazy vals, whatever. In most cases, sbt ignores them, but that doesn’t make them useless, you can use them in other expressions or statements, to help you define your build. There is one type of statement though that sbt doesn’t ignore, and that is statements that assign a val to a project, this is how projects are defined:

lazy val sbtFunProject = project in file(".")

The final thing to know about sbt build files is that sbt automatically brings in a number of imports. For example, sbt._ is imported, as is sbt.Keys._, so you have access to the full range of built in task keys that sbt defines without having to import them yourself. sbt also brings in imports declared by plugins, making it straight forward to use those plugins.

§Declaring a setting

The process of declaring a setting is done by taking a task key, optionally apply a scope to it, and then declaring an implementation for that task. Here’s a very basic example:

name := "sbt-fun"

In this case we’re declaring the implementation of the name task to simply be a static value of "sbt-fun". Note that the above is a expression, not a statement. := is not a Scala language feature, it is actually a method that sbt has provided. sbt’s syntax for declaring settings is a pure scala DSL. If this syntax confuses you, then I strongly recommend that you read a post I wrote a few years ago called Three things I had to learn about Scala before it made sense. This post explains how DSL’s are implemented in Scala, and is essential reading before you read on in this post if you don’t understand that already.

What if we want to declare our own implementation of the sources task? Remembering that we want it scoped to compile, we can do this:

sources in Compile := Seq(file("src/main/scala/MySource.scala"))

Again we’re only setting a static value for this task to return, but this time you can see how we’ve scoped the sources task in the compile scope. Note that configurations such as compile and test are available through capitalised vals, in scope in your build.

§Back to first principles

What if we want to declare a dependency on another task? Let’s say we want to declare sources to be, as it’s described, all managed and unmanaged sources. If you’ve used sbt before, you probably know that you can use this syntax:

sources := managedSources.value ++ unmanagedSources.value

This was introduced is sbt 0.13, and it’s actually implemented by a macro that does some magic for you. It’s great, I use that syntax all the time, and so should you. However, as with anything that does magic for you, if you don’t understand what it’s doing for you and how it does it, you can run into troubles.

As I described in the last post, sbt is a task engine, and tasks declare dependencies that are executed before, and provided as input, to them. In the above example, it doesn’t look like this is happening at all, what it looks like is that when the sources task is executed, it executes the managedSources task by calling value, and the unmanagedSources task by calling value, and then concatenates their results together. There is a macro that is transforming this code to something that does declare dependencies, and takes the inputs of those dependencies and passes them to the implementation.

So in order to understand what the macro is doing for us, let’s implement this ourselves manually - let’s declarce this setting from first principles.

Firstly, we’re going to use the <<= operator instead, this is how to say that I am declaring this task to be dependent on other tasks. Now, we could do a very straight forward declaration to another task:

sources <<= unmanagedSources

This will say that the sources task has a dependency on unmanagadSources, and will take the output of unmanagedSources as is, and return it as the output of sources. What if we wanted to change that value before returning it? We can do that using the map method:

sources <<= => files.filterNot(_.getName.startsWith("_")))

So now we’ve filtered out all the files that start with _ (note that sbt already provides an excludesFilter task that can be used to configure this, this is just an example).

At this point let’s take a step back and think about what the code above has done. For one, nothing has yet been executed, at least not the task implementation. That <<= method returns an object of type Setting. This setting has the following attributes:

  • The key (potentially scoped) that it is the task declaration for, in this case, sources.
  • The keys (potentially scoped) of tasks that it depends on, in this case, unmanagedSources.
  • A function that takes the output of the tasks that it depends as input parameters, executes the task, and returns the output of the task being declared (that is, the function we passed to the map method, that filters out all files that start with _).

You can see here that we haven’t actually executed anything in the task, we have only declared how the task is implemented. So when sbt goes to execute the sources task, it will find this declaration, execute the dependencies, and then execute the callback. This is why I’ve called this blog post “sbt - A declarative DSL”. All our settings just declare how tasks are implemented, they don’t actually execute anything.

So, what if we want to depend on two different tasks? Through the magic of the sbt DSL, we can put them in a tuple, and then map the tuple:

sources <<= (unmanagedSources, managedSources).map { (unmanaged, managed) => unmanaged ++ managed) }

And now we actually have our first principles implementation of the sources task. Sort of, we haven’t scoped it to compile, but that’s not hard to do:

sources in Compile <<= (unmanagedSources in Compile, managedSources in Compile).map(_ ++ _)

For brevity I’ve used a shorter syntax for concatenating the two sources sequences.

§sbt uses macros heavily

So now that we’ve seen how to declare tasks from from first principles, let’s see how the macros work. We have our declaration from before:

sources := { managedSources.value ++ unmanagedSources.value }

I’ve inserted the curly braces to make it clear what is being passed to the := macro. The := macro will go through the block of code passed to it, and find all the instances of where value is called, and gather all the keys that it is invoked on. It will then generate Scala code (or rather AST) that builds those keys as a tuple, and then invokes map. To the map call, it will pass the original code block, but replacing all the keys that had value on them with parameters that are taken as the input arguments to the function passed to map. Essentially, it builds exactly the same code that we implemented in our first principles implementation.

Now, it’s important to understand how these macros work, because when you try to use the value call outside of the context of a macro, you will obviously run into problems. An important thing to realise is that the code generated by the macro never actually invokes value, value is just a place holder method used to tell the macro to extract these keys out to be dependencies that get mapped. The value method itself is in fact a macro, one that if you invoke it outside of the context of another macro, will result in a compile time error, the exact error message being value can only be used within a task or setting macro, such as :=, +=, ++=, Def.task, or Def.setting.. And you can see why, since sbt settings are entirely declarative, you can’t access the value of a task from the key, it doesn’t make sense to do that.

From now on in this post we’ll switch to using the macros, but remember what these macros compile to.

§Redeclaring tasks

So far we’ve seen how to completely overwrite a task. What if you don’t want to ovewrite the task, you just want to modify its output? sbt allows you to make a task depend on itself, if you do that, the task will depend on the old implementation of itself, giving the output of that implementation to you as your input. In the previous blog post, I brought up the possibility of only compiling source files with a certain annotation inside them, let’s say we’re only going to compile source files that contain the text "COMPILE ME". Here’s how you might implement that, depending on the existing sources implementation:

sources := {
  sources.value.filter { sourceFile =>"COMPILE ME")

sbt also provides a short hand for doing this, the ~= operator, which takes a function that takes the old value and returns the new value:

sources ~= _.filter { sourceFile =>"COMPILE ME")

Another convenient shorthand for modifying the old value of a task that sbt provides, and that you have likely come across before, is the += and ++= operators. These take the old value, and append the item or sequence of items produced by your new implementation to it. So, to add a file to the sources:

sources += file("src/other/scala/Other.scala")

Or to add multiple files:

sources ++= Seq(

These of course can depend on other tasks through the value macro, just like when you use :=:

sources ++= Seq(
  (sourceDirectory.value / "other" / "scala").***

The *** method loads every file from a directory recursively.

§Scope me up

We’ve talked a little bit about scopes, but most of our examples so far have excluded them for brevity. So let’s take a look at how to scope settings and their dependencies.

To apply a scope to a setting, you can use the in method:

sources in Compile += file("src/other/scala/Other.scala")

Applying multiple scopes can be done by using multiple in calls, for example:

excludeFilter in sbtFunProject in unmanagedSources in Compile := "_*"

Or, they can also be done by passing multiple scopes to the in method, in the order project, configuration then task:

excludeFilter in (sbtFunProject, Compile, unmanagedSources) := "_*"

The same syntax can be used when depending on settings, though make sure you put parenthesis around the whole scoped setting in order to invoke the value method on it:

(sources in Compile) := 
  (managedSources in Compile).value ++ 
  (unmanagedSources in Compile).value


In the first post in this series, we were introduced to the concepts behind sbt and its task engine, and how to explore and discover the task graphs that sbt provides. In this post we saw the practical side of how task dependencies and implementations are declared, using both the map method to map dependency keys, as well as macros. We saw how to modify existing task declarations, as well as how to use scopes.

One thing I’ve avoided here is showing cookbooks of how to do specific tasks, for example, how to add a source generator. The sbt documentation is really not bad for this, especially for cookbook type documentation, but I also hope that after reading these posts, you aren’t as dependent on copying and pasting sbt configuration, but rather can use the tools built in to sbt to discover the structure of your build, and modify it accordingly.

sbt - A task engine

sbt is the best build tool that I’ve used. But it’s also the build tool with the steepest learning curve that I’ve ever used, and I think most people would agree that it’s very difficult to learn. When you first start using it, configuring it is like casting spells, spells that have to be learned from a spell book, that have to be said in the exact right way, otherwise they don’t work. There are lots of guides out there that are essentially spell books, they teach you all the things you need to know to achieve various tasks. But I haven’t seen a lot out there that actually explains what sbt is, what it does, why it is the way it is. This blog post is my attempt to do that.

§A task engine

Simply put, sbt is a task engine. You have tasks. A task may be dependent on other tasks. Any task from any point in the build may be redefined, and new tasks can be easily added. In some ways it is a bit like make or ant, but it differs in a fundamental way, sbt tasks produce an output value, and are able to consume the output values of the tasks they depend on - whereas make and ant just modify the file system. This property of sbt allows you to break build steps up into very fine grained tasks.

So let’s take an example. A common step that build tools support is compilation. In many traditional build tools, a compilation task is responsible for finding a set of files to compile based on some input parameters, such as a list of source directories, and compiling them. In sbt, the compile task is not responsible for finding a set of files to compile, this is the responsibility of the sources task. The output value of the sources task is a list of files to compile. The compile task depends on the sources task, taking its list of files to compile as an input.

So what’s so good about this? What it means is that I can completely customise the way sources are located, by redefining the sources task. So if I have a crazy build requirement such as wanting to put a special annotation in my source files to say whether they get compiled or not, I can very easily implement my own sources task to do that. This is something that would be very difficult to do in another build tool, but in sbt it’s straight forward.

In other build tools, if I want to generate some sources, I have to make sure that the task to generate the sources runs before the compilation task, and puts them in a place the compilation task will find. In sbt, I can just redefine the sources task to make it generate the sources. In practice though, I don’t need to do that, because generating sources is a very common requirement. Remember that I said that sbt tasks can be very fine grained. The sources task itself depends on many other tasks, one of them is the managedSources task, which collects all the files that are managed (or generated) by the build (in contrast to unmanaged sources, which are your regular source files that you manage yourself). That task in turn depends on the sourceGenerators task, which I can redefine to add new source generators.

§A self documenting task engine

At this point you might be starting to see that there are many, many tasks involved in even the simplest build in sbt. I’ve talked about just one small part, how generated sources end up being compiled, but there are many more than that. How is someone that is new to sbt supposed to know what tasks exist, so that they can customise their build? Well, it turns out sbt comes with a few built in tools for inspecting the available tasks. These are often seen as advanced features of sbt, but I think really this is what new users to sbt should be introduced to first. So if you’re new, its time to fire up sbt.

First we need a simple project. In an empty directory, create a file called build.sbt, and set your projects name:

name := "sbt-fun"

Now, if you already have sbt 0.13 or later installed, you can use that. If you already have activator installed - which is basically just a script that launches sbt, then you can use that. If you have neither, then go here and download activator or sbt, it doesn’t matter which, and install it, and then start it in your projects directory:

$ sbt
[info] Loading project definition from /Users/jroper/sbt-fun/project
[info] Updating {file:/Users/jroper/sbt-fun/project/}sbt-fun-build...
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[info] Set current project to sbt-fun (in build file:/Users/jroper/sbt-fun/)

So, now we’re on the sbt console. Earlier we were talking about the sources task. Let’s have a look at it. sbt has a command called inspect, which lets you inspect a task:

> inspect sources
[info] Task: scala.collection.Seq[]
[info] Description:
[info]  All sources, both managed and unmanaged.
[info] Provided by:
[info]  {file:/Users/jroper/sbt-fun/}sbt-fun/compile:sources
[info] Defined at:
[info]  (sbt.Defaults) Defaults.scala:188
[info] Dependencies:
[info]  compile:unmanagedSources
[info]  compile:managedSources
[info] Delegates:
[info]  compile:sources
[info]  *:sources
[info]  {.}/compile:sources
[info]  {.}/*:sources
[info]  */compile:sources
[info]  */*:sources
[info] Related:
[info]  test:sources

What are we looking at? First, we can see that sources is a task that produces a sequence of files - as I said before. We can also see a description of the task, All sources, both managed and unmanaged. The Defined at section is interesting, it shows us where the sources task is defined, in this case, it’s on line 188 of the sbt Defaults class. We can see that it has two tasks that it depends on, unmanagedSources and managedSources. The rest of the information we won’t worry about for now.

Now before we start playing with our build, we can actually get even more information here, not only is it possible to inspect a single task in sbt, you can also inspect a whole tree of tasks, using the inspect tree command:

> inspect tree sources
[info] compile:sources = Task[scala.collection.Seq[]]
[info]   +-compile:unmanagedSources = Task[scala.collection.Seq[]]
[info]   | +-*/*:sourcesInBase = true
[info]   | +-*/*:excludeFilter = sbt.HiddenFileFilter$@5a63fa71
[info]   | +-*:baseDirectory = /Users/jroper/sbt-fun
[info]   | +-*/*:unmanagedSources::includeFilter = sbt.SimpleFilter@44a44a04
[info]   | +-compile:unmanagedSourceDirectories = List(/Users/jroper/sbt-fun/src/main/scala, /Users/jroper/sbt-fun/sr..
[info]   |   +-compile:javaSource = src/main/java
[info]   |   | +-compile:sourceDirectory = src/main
[info]   |   |   +-*:sourceDirectory = src
[info]   |   |   | +-*:baseDirectory = /Users/jroper/sbt-fun
[info]   |   |   |   +-*:thisProject = Project(id sbt-fun, base: /Users/jroper/sbt-fun, configurations: List(compile,..
[info]   |   |   |   
[info]   |   |   +-compile:configuration = compile
[info]   |   |   
[info]   |   +-compile:scalaSource = src/main/scala
[info]   |     +-compile:sourceDirectory = src/main
[info]   |       +-*:sourceDirectory = src
[info]   |       | +-*:baseDirectory = /Users/jroper/sbt-fun
[info]   |       |   +-*:thisProject = Project(id sbt-fun, base: /Users/jroper/sbt-fun, configurations: List(compile,..
[info]   |       |   
[info]   |       +-compile:configuration = compile
[info]   |       
[info]   +-compile:managedSources = Task[scala.collection.Seq[]]
[info]     +-compile:sourceGenerators = List()

So in here you can see that sources to managedSources to sourceGenerators chain that I mentioned before, and you can also see the unmanagedSources chain, which is a lot more complex, we can see directory hierarchies, filters for deciding which files to include and exclude, etc.

§Settings vs Tasks

At this point you may notice that there are two types of tasks in the tree, there are things like managedSources, which just describe the type of the task:

compile:managedSources = Task[scala.collection.Seq[]]

And then there are things like scalaSource, which actually display a value:

compile:scalaSource = src/main/scala

This is actually an sbt optimisation, sbt has a special type of task called a Setting. Settings get executed once per session, so when you start sbt up, you start a new session, and all the settings get executed then. This is why when I inspect the tree, sbt can show me the value, because it already knows it. In contrast, an ordinary Task gets executed once per execution. So if I now run the sources task, that managedSources task will be executed then. If I run sources again, it will be executed again. But my settings only get executed once for the whole session.

It should be noted that an execution is a request by the user to execute a task. If two tasks in my tree depend on the sources task twice, sbt will ensure that the sources task only gets executed once. So if I run the publish task, which transitively depends on the compile task, as well as the doc task (that generates java/scala docs), and the packageSrc task (that generates source jars), these all depend on the same sources task, which will only be executed once during my publish execution, and the value will be reused as the input for all three tasks.

Now naturally, since settings are executed at the start of the session, and not as part of an execution, they can’t depend on tasks, they can only depend on other settings. Meanwhile, tasks can depend on both other tasks and settings.

When it’s important to know the difference between settings and tasks is when you’re writing your own sbt plugins that define their own settings and tasks. But in general, you can consider them to be the same thing, settings are just a small optimisation so that they don’t have to be executed every time. When defining your own tasks or settings, a good rule of thumb is if in doubt, just define a task.


Scopes are another important feature of the sbt task engine. A task can be scoped. When a task depends on another task, it can depend on that task in a particular scope. Now one obvious type of scope that sbt supports is the configuration scope. sbt has a few built in configurations, the two main ones that you’ll interact with are compile and test. So above, when the sources command depends on managedSources, you can see that it actually depends on compile:managedSources, which means it depends on managedSources in the compile scope.

In actual fact, you can see at the top that we are looking at the tree for compile:sources. When you don’t specify a scope, sbt will choose a default scope, in this case it has chosen the compile scope. The logic in how it makes that decision we won’t cover here. We could also inspect the test:sources tree:

> inspect tree test:sources
[info] test:sources = Task[scala.collection.Seq[]]
[info]   +-test:unmanagedSources = Task[scala.collection.Seq[]]
[info]   | +-test:unmanagedSourceDirectories = List(/Users/jroper/sbt-fun/src/test/scala, /Users/jroper/sbt-fun/src/t..
[info]   | | +-test:javaSource = src/test/java
[info]   | | | +-test:sourceDirectory = src/test
[info]   | | |   +-*:sourceDirectory = src
[info]   | | |   | +-*:baseDirectory = /Users/jroper/sbt-fun
[info]   | | |   |   +-*:thisProject = Project(id sbt-fun, base: /Users/jroper/sbt-fun, configurations: List(compile,..
[info]   | | |   |   
[info]   | | |   +-test:configuration = test
[info]   | | |   
[info]   | | +-test:scalaSource = src/test/scala
[info]   | |   +-test:sourceDirectory = src/test
[info]   | |     +-*:sourceDirectory = src
[info]   | |     | +-*:baseDirectory = /Users/jroper/sbt-fun
[info]   | |     |   +-*:thisProject = Project(id sbt-fun, base: /Users/jroper/sbt-fun, configurations: List(compile,..
[info]   | |     |   
[info]   | |     +-test:configuration = test
[info]   | |     
[info]   | +-*/*:unmanagedSources::includeFilter = sbt.SimpleFilter@44a44a04
[info]   | +-*/*:excludeFilter = sbt.HiddenFileFilter$@5a63fa71
[info]   | 
[info]   +-test:managedSources = Task[scala.collection.Seq[]]
[info]     +-test:sourceGenerators = List()

It looks pretty similar to the compile:sources tree, except that it depends on test scoped settings. In some cases, you can see that the scope is *, this means that it’s depending on an unscoped task/setting.

Configuration is not the only axis that you can scope tasks on in sbt, sbt supports two other axes, project and task.

The project axis is scoped by an sbt project. An sbt build can have multiple projects, and each project can have its own set of settings. When you define tasks on a project, sbt will automatically scope those tasks, and the dependencies of those tasks, to be for that project, that is if you haven’t already explicitly scoped them to a project yourself. Tasks scoped to one project can also depend on tasks in another project, so you could for example make the packageSrc command in one project depend on the sources for all the other projects, thus bringing all your sources together into one source jar.

The syntax for scoping something by project on the sbt command line is to prefix the task with the project name followed by a slash, then the task. For example sbt-fun/compile:sources is the sources task in the compile scope from the sbt-fun project. You can actually see from the output of the plain inspect command, in the Provided By section, that the full task is {file:/Users/jroper/sbt-fun/}sbt-fun/compile:sources, this is the path of the build, followed by the project name, configuration and task. Sometimes tasks and settings are scoped to be global or for the entire build, you can see some such settings above, they are prefixed with */, so */*:excludeFilter is the excludeFilter task, with no configuration scope, and no project scope.

The final axis is to be scoped by another task. Scoping by another task is incredibly useful, which we’ll see when we get to scope fallbacks, but what it means is that the same task key can be used and explicitly configured for many tasks. In the above tree we can see that unmanagedSources depends on includeFilter scoped to the unmanagedSources task, the syntax for this is unmanagedSources::includeFilter. includeFilter may also be used elsewhere, for example, in discovering resources, in that case it will be scoped to the unmanagedResources task.

§Scope fallbacks

Scopes work in a hierarchical fashion, allowing fallbacks through the hierarchy when tasks at a specific scope can’t be found. I mentioned above that unmanagedSources depends on unmanagedSources::includeFilter. Let’s have a closer look, by inspecting it:

> inspect unmanagedSources
[info] Task: scala.collection.Seq[]
[info] Description:
[info]  Unmanaged sources, which are manually created.
[info] Provided by:
[info]  {file:/Users/jroper/sbt-fun/}sbt-fun/compile:unmanagedSources
[info] Defined at:
[info]  (sbt.Defaults) Defaults.scala:182
[info]  (sbt.Defaults) Defaults.scala:209
[info] Dependencies:
[info]  compile:baseDirectory
[info]  compile:unmanagedSourceDirectories
[info]  compile:unmanagedSources::includeFilter
[info]  compile:unmanagedSources::excludeFilter

So we can see that compile:unmanagedSources depends on compile:unmanagedSources::includeFilter and compile:unmanagedSources::excludeFilter. But if we have a look at the inspect tree command, we’ll notice a discrepancy:

> inspect tree unmanagedSources
[info] compile:unmanagedSources = Task[scala.collection.Seq[]]
[info]   +-*/*:sourcesInBase = true
[info]   +-*/*:excludeFilter = sbt.HiddenFileFilter$@5a63fa71
[info]   +-*:baseDirectory = /Users/jroper/sbt-fun
[info]   +-*/*:unmanagedSources::includeFilter = sbt.SimpleFilter@44a44a04

So, while it depended on sbt-fun/compile:unmanagedSources::includeFilter, it actually got */*:unmanagedSources::includeFilter, that is, it requested a task at a specific project and configuration, but got a task that was defined for no project or configuration. Furthermore, the excludeFilter which was similarly requested, was satisfied by */*:excludeFilter, that is, it isn’t even scoped to the unmanagedSources task. This is a demonstration of how sbt uses fallbacks. When a task declares a dependency, sbt will try and satisfy that dependency with the most specific task it has for it, but if no task is defined at that specific scope, it will fallback to a less specific scope.

What this means, for example for excludeFilter, is that if you have a text editor that generates temporary files of a particular format, you can exclude those by adding it to the global excludeFilter, you don’t need to define an excludeFilter for every single scope. But, I might also decide that I want to exclude certain files in the test scope, so I can configure a different excludeFilter for tests by scoping it to test. Or, I might decide that I want a different filter again just for unmanagedSources, as opposed to unmanagedResources, so I can define the excludeFilter specifically for those tasks. The general approach that sbt takes in its predefined task dependency trees is to depend on tasks at a very specific scope, but define them at the most general scope that makes sense, allowing tasks to be overridden in a blanket fashion, but at a very fine grained level when required.

§Parallel by default

There is one last feature of the sbt task engine that I think is worth mentioning in this post. It’s not one that really needs to be understood well in order to use sbt, but it is a very powerful one that sbt’s architecture makes very simple. In sbt, all tasks are executed in parallel by default. Now of course, if a task declares a dependency on another task, those two tasks can’t run in parallel. But two tasks that have no dependency on each other, such as unmanagedSources and managedSources, can, and will be executed in parallel. Given sbt’s fine grained tasks, this makes for some considerable (and much needed, given the speed of scala compilation) performance improvements out of the box compared to other build tools.

sbt’s concurrent execution is also configurable, tasks can be tagged, and then you can define, for example, what the maximum number of tasks with that tag can be run in parallel. You can read more about these capabilities here.


In this blog post we have seen that sbt is actually a task engine, and that the fact that it breaks tasks up into many smaller interdependent tasks gives you a lot of power and flexibility. We have seen that the sbt console can be used to inspect tasks, their dependencies, and entire dependency graphs of tasks, and this allows us to learn about sbt, the tasks that are available, and see how our build fits together. We have learned how tasks can be scoped to different configurations, projects, and other tasks, and how sbt uses a fallback system to resolve dependencies at specific scopes. Hopefully sbt is now more transparent to you, you no longer need a spellbook to know how to configure it, rather, you can use the inspect commands to discover what you can configure yourself.

We have not seen anything about how to define or redefine tasks, or the syntax of the sbt build file. This is the topic of my next blog post, sbt - A declarative DSL.

OSS diversity - A thought experiment

On the topic of diversity in open source software, I have a thought experiment - what if an open source project was structured more like a company? This is only a thought experiment, I’m definitely not advocating that we structure open source projects like companies, but I am looking for ways that we can improve diversity in OSS, and this is a big problem that many are struggling to find answers to.

The big difference between a company structure and an open source project structure is that in a company, there are many different roles, and each role has a number of responsibilities specific to that role. People are hired based on their competency to fulfill the responsbilities required by those roles. In contrast, an open source project tends to have only one role - the committer, and then all the responsibilities of the project are organically fulfilled by the various people who are in that role according to need and interest. When people are selected for the committer role, they are selected on broadly the same criteria, which is based on their technical contributions to the project.

Does this difference have any impact on diversity? Of course there is some impact, since each person in the committer role must have a passion for technical contribution to the project, whereas in a company structure, there will be some people with a passion for technical work in the technical roles, and other people with passion for communication in the marketing roles. But this is a lack of vocational diversity. The issue that we’re concerned about in the OSS world is not a lack of vocational diversity, it’s a lack of diversity of gender, age, race, sexual orientation, among other things. For lack of a better word, I’ll call this type of diversity “identity” diversity, as opposed to “vocational” diversity. At face value, if the pool of people that are passionate about technical contribution are diverse in identity, then there likewise should be identity diversity among committers in open source projects.

In practice we see things are not like this in open source projects. In contrast, and in my experience, big companies with a broad range of very specialised vocational roles tend to be very identity diverse - at very least outside of executive management. Also in my experience this identity diversity exists within vocational roles, it’s not just that each vocational role is filled with people of one type diversity that in aggregate makes the whole company identity diverse.

So could it be that a lack of diversity of roles - which are vocational based - fuels a lack in identity diversity in people that fill these roles? At this stage I haven’t done enough research to be able to answer this question, but perhaps it’s something that we could start experimenting with in open source projects.

§Putting it into practice

So if we were to try to introduce diverse vocational roles into open source projects, what roles would they be, and what would they look like? To make things a little more concrete, let’s start by thinking about the introduction of a marketing role. Every large open source project does a lot of marketing, from publishing blog posts to tweeting to organising conference engagements to engaging with the community on mailing lists. This is something that it would be quite straight forward to create a role for. This wouldn’t mean that everyone else in the project stops blogging/tweeting etc, that doesn’t happen in a company. It does mean that someone is given the responsibility of coordinating and ensuring that marketing happens.

In order for such a role to work, it needs to be well defined - open source projects are not used to having such roles, so it needs to be made clear to everyone on the project, not just the person in this role, what the responsibilities of the role are. This is necessary to empower that person to be and feel effective in the project. The responsibilities are going to vary from project to project, but in the context of my project, Play Framework, and without having put too much thought into it yet, this is what I’d envision:

  • Be the public face of the project, the first point of contact for people that want to engage the Play community.
  • Manage the Twitter account, including keeping an eye out for tweets to retweet and responding to mentions where necessary.
  • Keep an eye out for community related questions on the mailing list, and respond or coordinate responses to those emails.
  • Keep a look out for conferences that Play contributors, both in the core team and the broader ecoystem, could speak at, and link the appropriate people up with appropriate conferences.
  • Write or coordinate project announcements, such as new releases, security vulnerabilities, etc.
  • Seek ways to make Play a more attractive project to contribute to, through talking to people about their contribution experiences, and improving or coordinating improvements to documentation and the processes in place for receiving contributions.
  • Seek ways to make Play an easier framework to get started with, through talking to people about their usage experiences, and improving or coordinating improvements to documentation and other resources.

Having defined the role, the next step is probably the hardest - finding someone to fill it. Finding people to fill a committer role of an open source project is generally easy, many developers that love using the software would also love to be a committer on it. Finding someone with skills and a passion for marketing who is willing to volunteer their time towards an open source project I would imagine would be a lot harder. This is possibly where my whole thought experiment falls down.

For Play though it’s not too hard to solve - Play is a project that is driven by a company, and that company already has a marketing team that does a lot of behind the scenes work in engaging the Play community. For us, we could make our marketing staff more public facing in the Play community, getting them to take on the responsibilities listed above (some of them they already do), and publicly acknowledging them as a member of the Play community, not just a member of the Typesafe marketing team, on the Play website. This may be a good start.

This is just one role that we could look at introducing to open source projects. There are many other possible roles, including leadership roles (in most companies these are not filled by technical people), product management, and a variety of HR roles. I’m not sure how well (if at all) these will work in an open source project, but in order to improve diversity, OSS projects are in need of big changes, and whatever changes are found to work will probably appear unworkable at first. So I think these are worth trying.


Hi! My name is James Roper, and I am a software developer with a particular interest in open source development and trying new things. I program in Scala, Java, Go, PHP, Python and Javascript, and I work for Lightbend as the architect of Kalix. I also have a full life outside the world of IT, enjoy playing a variety of musical instruments and sports, and currently I live in Canberra.