all that jazz

james' blog about scala and all that jazz

Testing sbt 1.0 cross builds

sbt 1.0 is now released, and everyone in the sbt community is hard at work upgrading their plugins. Because many sbt plugins depend on each other, there will be a short period of time (that we’re in now) where people won’t be able to upgrade their builds to sbt 1.0 because the plugins their builds use aren’t yet upgraded and released. However, that doesn’t mean you can’t cross build your plugin for sbt 1.0 now, simply upgrade to sbt 0.13.16 and use its sbt plugin cross building support.

I had a small problem yesterday though when working on the sbt-web upgrade, part of my plugin needed to be substantially rewritten for sbt 1.0 (sbt 1.0’s caching API now uses sjson-new rather than sbinary, so all the formats needed to be rewrittien). I didn’t want to rewrite this without an IDE because I knew nothing about sjson-new and needed to be able to easily browse and navigate its source code to discover how to use it, and I wanted the immediate feedback that IDEs give you on whether something will compile or not. The problem with doing this is that my build was still using sbt 0.13.16, and I couldn’t upgrade it because not all the plugins I depended on supported sbt 1.0. So, I came up with this small work around that I’m posting here for anyone that might find it useful, before reimporting the project into IntelliJ, I added the following configuration to my build.sbt:

sbtVersion in pluginCrossBuild := "1.0.0"
scalaVersion := "2.12.2"

Unfortunately it seems that you can’t leave this in the build file to ensure that sbt 1.0 is always the default, it seems that the sbt cross building support doesn’t override that setting (this is possibly a bug). But if you add that to your build.sbt right before you import into IntelliJ, then remove it later when you’re done developing for sbt 1.0, it’s a nice work around.

The Noop Monad - doing nothing safely

If you’re a fan of functional programming, as I am, you’ll know that one of the great things about it is how useful it is. But that isn’t the only great thing about functional programming, functional programming is also great for when you want to do nothing at all. Some might even say that doing nothing at all is where functional programming really shines.

So today I’m going to introduce a monad that surprisingly isn’t talked about a lot - the noop monad. The noop monad does nothing at all, but unlike noops in other programming paradigms, the noop monad does nothing safely.

§A demo

For this demonstration, I’m going to use Scala, with Scalaz to implement the monad. Let’s start off with the Noop type:

/**
 * A noop of type T
 */
sealed trait Noop[T] {

  /**
   * Run the noop
   */
  def run: Unit
}

As you can see, the Noop type has a type parameter, so we can do nothing of various types. We can also see the run function, and it returns Unit. Now typically in functional programming, returning Unit is considered a bad thing, because Unit is not a value, so any pure function that returns Unit must have done nothing. But since Noop actually does do nothing, this is the one exception to that rule. So the run function can be evaluated to do the nothing of the type that this particular Noop does.

Now, let’s say I have method that calculates all the primes up to a given number. Here’s its signature:

def calculatePrimes(upTo: Int): List[Int]

And let’s say I want to get a list of all the Int primes, I can use the above method like so:

calculatePrimes(Int.MaxValue)

But wait, you say! That code is going to be very expensive to run, it’s likely to take a very, very long time, and you have better things to do. So, you want to ensure that the code doesn’t run. This is where the noop monad comes on the scene, using the point method, you can ensure that it safely doesn’t run:

val noopAllIntPrimes = calculatePrimes(Int.MaxValue).point[Noop]

And then, when you actually don’t want to run it, you can do that by evaluating the run function:

noopAllIntPrimes.run

For those unfamiliar with scalaz and functional programming, a monad is an applicative, and an applicative is something that lets you create an instance of the applicative from a value. The method on Applicative for doing this is called point, in other languages it’s also called pure.

So, we can see that Noop is an applicative, but can we flatMap it? What if you don’t want to sum all those prime numbers, and then you certainly don’t want to convert that result to a String? The noop monad lets you do that:

val summedPrimesString = for {
  primes <- noopAllIntPrimes
  summed <- primes.reduce(_ + _).point[Noop]
  asString <- summed.toString.point[Noop]
} yield asString

And so then to ensure that we don’t actually do all this expensive computation, we can run it as before:

summedPrimesString.run

§Advantages

We can see how the noop monad can be used to do nothing, but what are the advantages of using the noop monad compared to some other methods of doing nothing? I’m going to highlight three advantages that I think really demonstrate the value of doing nothing in a monadic way.

§Runtime optimisation

This is often an advantage of functional programming in general, but the noop monad is the exemplar of optimization in functional programming. Let’s have a look at the implementation of the noop monads point method:

def point[A](a: => A): Noop[A] = Noop[A]

Here we can see that not only is the passed in value not evaluated, it’s not even referenced in the returned Noop. But how can the noop monad do this? Since the noop monad knows that you don’t want to do anything at all, it is able to infer that therefore it will not need to evaluate the value, and therefore it doesn’t need to hold a reference to the passed in value. But this advanced optimisation doesn’t stop there, let’s have a look at the implementation of bind:

def bind[A, B](fa: Noop[A])(f: A => Noop[B]): Noop[B] = Noop[B]

Here we can see a double optimisation. First, the passed in Noop is not referenced. The noop monad can do this because it infers that since you don’t want to do anything, you don’t need the nothing that you passed in. Secondly, the passed in bind function is never evaluated. As with the other parameter, the noop monad can infer that since the passed in Noop does nothing, there will be nothing to pass to the passed in function, and therefore, the function will never be evaluated.

As you can see, particularly for performance minded developers, the noop monad is incredibly powerful in its ability to optimise your code at runtime to do as little of nothing as possible.

§Code optimisation

But performance isn’t the only place that the noop monad can help with optimisation, the noop monad can also help at optimising your code to ensure it is as simple and concise as possible.

Let’s take our previous example of summing primes:

(for {
  primes <- calculatePrimes(Int.MaxValue).point[Noop]
  summed <- primes.reduce(_ + _).point[Noop]
  asString <- summed.toString.point[Noop]
} yield asString).run

Now, this isn’t bad looking code, but it does feel a little too complex when all we wanted to do in the first place was nothing. So how can we simplify it? Well firstly, you’ll notice that we don’t want to convert the summed result to a string, you can tell this by the .point[Noop] after it. Based on the rules of the noop monad, we can optimise our code to remove this:

(for {
  primes <- calculatePrimes(Int.MaxValue).point[Noop]
  summed <- primes.reduce(_ + _).point[Noop]
} yield summed).run

Is this safe to do? In fact it is, because we have actually replaced our intention of doing nothing, with nothing. We can do the same for summing all the primes:

(for {
  primes <- calculatePrimes(Int.MaxValue).point[Noop]
} yield primes).run

Now the final step in code optimisation, and this is the hardest to follow so bear with me, we can actually remove the not calculating the primes itself, and simultaneously remove the run function on that Noop. But how is this so? You may remember that I explained earlier if a pure function returns Unit, then it must do nothing. Our Noop.run is a pure function, and it does nothing. So since evaluating run does nothing, we can safely replace it with nothing. Finding it hard to follow? This is what it looks like in code:



As you can see, we’ve gone from five reasonably complex lines of code, to absolutely no code at all! This is the embodiment of what Dijkstra meant when he said:

If we wish to count lines of code, we should not regard them as “lines produced” but as “lines spent”.

The noop monad has allowed us to spend zero lines of code in doing nothing.

§Teaching monads

Teaching monads has proven to be the unicorn of evangelising functional programming, no matter how hard anyone tries, no one seems to be able to teach them to a newcomer. The noop monad solves this by grounding monads in a context that all students can relate to - doing nothing.

In particular, the noop monad does a great job for picking up the pieces of a failed attempt to teach a student monads. For example, consider the following situations:

  • A student has been told that monads are just monoids in the category of endofunctors. What does that even mean? But if I say the noop monoid in the category of endofunctors is just something that does nothing, simple!
  • A student has been told that monads are burritos. What does that even mean? But if I say the noop burrito is just something that does nothing, simple!

§Conclusion

So today I’ve introduced you to the noop monad. As you can see, it’s in the noop monad that functional programming is made complete, fullfilling everything that every functional programmer has ever wanted to do, that is, nothing at all.

sbt - A declarative DSL

This is my second post in a series of posts about sbt. The first post, sbt - A task engine, looked at sbt’s task engine, how it is self documenting, making tasks discoverable, and how scopes allow heirarchical fallbacks. In this post we’ll take a look at how the sbt task engine is declared, again taking a top down approach, rooted in practical examples.

§Settings are not settings

In the previous post, we were introduced to settings, being a specialisation of tasks that are only executed once at the start of an sbt session. File that bit of knowledge away, and now reset your definition of setting to nothing. I’m guessing this is a relic of past versions of sbt, but the word setting in sbt can mean two distinct things, one is a task that’s executed at the start of the session, the other is a task (or setting, as in executed at start of session) declaration. No clearer?

Let’s introduce another bit of terminology, a task key. A task key is a string and a type, the string is the name of the task, it’s how you reference it on the command line. The type is the type that the task produces. So the sources task has a name of "sources", and a type of Seq[File]. It is defined in sbt.Keys:

val sources = TaskKey[Seq[File]]("sources", "All sources, both managed and unmanaged.", BTask)

You can see it also has a description and a rank, those are not really important to us now. The thing that uniquely defines this task key is the sources string. You could define another sources key elsewhere, as long as they have the same name, they will be considered the key for the same task. Of course, if you define two tasks using the same key name, but different key types, that’s not allowed, and sbt will give you an error.

In addition to TaskKey’s there are also SettingKey’s, this is setting as in only executed once per session. Now these keys by themselves do nothing. They only do something when you declare some behaviour for them. So, a setting as in a task declaration is a task or setting key, potentially scoped, with some associated behaviour. For the remainder of this post, when I say setting, that’s what I’m referring to.

§Settings are executed like a program

Defining sbt’s task engine is done by giving sbt a series of settings, each setting declaring a task implementation. sbt then executes those settings in order. Tasks can be declared multiple times by multiple settings, the last one to execute wins.

So where do these settings come from? They come from many places. Most obviously, they come from the build file. But most settings don’t come from there - as you’re aware, sbt defines many, many fine grained tasks, but you don’t have to declare settings for these yourself. Most of the settings come from sbt plugins. One in particular that is enabled by default is the JvmPlugin. This contains all the settings necessary for building a Java or Scala program, including settings that declare the sources task that we saw yesterday. Plugin settings are executed before the settings in your build file, this means that any settings you declare in your build file will override the settings declared by the plugins.

This ordering of settings is important to note, it means settings have to be read from top to bottom. I have handled a number of support cases and mailing list questions where people haven’t realised this, they have declared a setting, and then after that included a sequence of settings from elsewhere in their build that redeclares the setting. They expected their setting to take precedence, but since their setting came before the setting from the sequence, the setting from the sequence overwrites it.

§sbt build file syntax

We’re about to get into some concrete examples of declaring settings, so before we do that we better cover the basics of the sbt build file. sbt builds can either be specified in plain *.scala files, or in sbts own *.sbt file format. As of sbt 0.13.7, the sbt format has become powerful enough that there is really not much that you can’t do with it, so we’re only going to look at that.

An sbt file may have any name, convention is that a projects main build file be called build.sbt, but that is only a convention. The file may contain a series of Scala statements and expressions, and it’s important here to distinguish between statements and expressions. What’s the difference? A statement doesn’t return a value. For example:

val foo = "bar"

This is a statement, it has assigned the val foo to "bar", but this assignment doesn’t return a value. In contrast:

5 + 4

This is an expression, it returns a value of type Int.

Expressions in an sbt file must have a type of either Setting[_] or Seq[Setting[_]]. sbt will evaluate all these exrpessions, and add them to the settings for your build. Any expression in your sbt file that isn’t of one of those types will generate an error.

Statements can be anything. They can be utility methods, vals, lazy vals, whatever. In most cases, sbt ignores them, but that doesn’t make them useless, you can use them in other expressions or statements, to help you define your build. There is one type of statement though that sbt doesn’t ignore, and that is statements that assign a val to a project, this is how projects are defined:

lazy val sbtFunProject = project in file(".")

The final thing to know about sbt build files is that sbt automatically brings in a number of imports. For example, sbt._ is imported, as is sbt.Keys._, so you have access to the full range of built in task keys that sbt defines without having to import them yourself. sbt also brings in imports declared by plugins, making it straight forward to use those plugins.

§Declaring a setting

The process of declaring a setting is done by taking a task key, optionally apply a scope to it, and then declaring an implementation for that task. Here’s a very basic example:

name := "sbt-fun"

In this case we’re declaring the implementation of the name task to simply be a static value of "sbt-fun". Note that the above is a expression, not a statement. := is not a Scala language feature, it is actually a method that sbt has provided. sbt’s syntax for declaring settings is a pure scala DSL. If this syntax confuses you, then I strongly recommend that you read a post I wrote a few years ago called Three things I had to learn about Scala before it made sense. This post explains how DSL’s are implemented in Scala, and is essential reading before you read on in this post if you don’t understand that already.

What if we want to declare our own implementation of the sources task? Remembering that we want it scoped to compile, we can do this:

sources in Compile := Seq(file("src/main/scala/MySource.scala"))

Again we’re only setting a static value for this task to return, but this time you can see how we’ve scoped the sources task in the compile scope. Note that configurations such as compile and test are available through capitalised vals, in scope in your build.

§Back to first principles

What if we want to declare a dependency on another task? Let’s say we want to declare sources to be, as it’s described, all managed and unmanaged sources. If you’ve used sbt before, you probably know that you can use this syntax:

sources := managedSources.value ++ unmanagedSources.value

This was introduced is sbt 0.13, and it’s actually implemented by a macro that does some magic for you. It’s great, I use that syntax all the time, and so should you. However, as with anything that does magic for you, if you don’t understand what it’s doing for you and how it does it, you can run into troubles.

As I described in the last post, sbt is a task engine, and tasks declare dependencies that are executed before, and provided as input, to them. In the above example, it doesn’t look like this is happening at all, what it looks like is that when the sources task is executed, it executes the managedSources task by calling value, and the unmanagedSources task by calling value, and then concatenates their results together. There is a macro that is transforming this code to something that does declare dependencies, and takes the inputs of those dependencies and passes them to the implementation.

So in order to understand what the macro is doing for us, let’s implement this ourselves manually - let’s declarce this setting from first principles.

Firstly, we’re going to use the <<= operator instead, this is how to say that I am declaring this task to be dependent on other tasks. Now, we could do a very straight forward declaration to another task:

sources <<= unmanagedSources

This will say that the sources task has a dependency on unmanagadSources, and will take the output of unmanagedSources as is, and return it as the output of sources. What if we wanted to change that value before returning it? We can do that using the map method:

sources <<= unmanagedSources.map(files => files.filterNot(_.getName.startsWith("_")))

So now we’ve filtered out all the files that start with _ (note that sbt already provides an excludesFilter task that can be used to configure this, this is just an example).

At this point let’s take a step back and think about what the code above has done. For one, nothing has yet been executed, at least not the task implementation. That <<= method returns an object of type Setting. This setting has the following attributes:

  • The key (potentially scoped) that it is the task declaration for, in this case, sources.
  • The keys (potentially scoped) of tasks that it depends on, in this case, unmanagedSources.
  • A function that takes the output of the tasks that it depends as input parameters, executes the task, and returns the output of the task being declared (that is, the function we passed to the map method, that filters out all files that start with _).

You can see here that we haven’t actually executed anything in the task, we have only declared how the task is implemented. So when sbt goes to execute the sources task, it will find this declaration, execute the dependencies, and then execute the callback. This is why I’ve called this blog post “sbt - A declarative DSL”. All our settings just declare how tasks are implemented, they don’t actually execute anything.

So, what if we want to depend on two different tasks? Through the magic of the sbt DSL, we can put them in a tuple, and then map the tuple:

sources <<= (unmanagedSources, managedSources).map { (unmanaged, managed) => unmanaged ++ managed) }

And now we actually have our first principles implementation of the sources task. Sort of, we haven’t scoped it to compile, but that’s not hard to do:

sources in Compile <<= (unmanagedSources in Compile, managedSources in Compile).map(_ ++ _)

For brevity I’ve used a shorter syntax for concatenating the two sources sequences.

§sbt uses macros heavily

So now that we’ve seen how to declare tasks from from first principles, let’s see how the macros work. We have our declaration from before:

sources := { managedSources.value ++ unmanagedSources.value }

I’ve inserted the curly braces to make it clear what is being passed to the := macro. The := macro will go through the block of code passed to it, and find all the instances of where value is called, and gather all the keys that it is invoked on. It will then generate Scala code (or rather AST) that builds those keys as a tuple, and then invokes map. To the map call, it will pass the original code block, but replacing all the keys that had value on them with parameters that are taken as the input arguments to the function passed to map. Essentially, it builds exactly the same code that we implemented in our first principles implementation.

Now, it’s important to understand how these macros work, because when you try to use the value call outside of the context of a macro, you will obviously run into problems. An important thing to realise is that the code generated by the macro never actually invokes value, value is just a place holder method used to tell the macro to extract these keys out to be dependencies that get mapped. The value method itself is in fact a macro, one that if you invoke it outside of the context of another macro, will result in a compile time error, the exact error message being value can only be used within a task or setting macro, such as :=, +=, ++=, Def.task, or Def.setting.. And you can see why, since sbt settings are entirely declarative, you can’t access the value of a task from the key, it doesn’t make sense to do that.

From now on in this post we’ll switch to using the macros, but remember what these macros compile to.

§Redeclaring tasks

So far we’ve seen how to completely overwrite a task. What if you don’t want to ovewrite the task, you just want to modify its output? sbt allows you to make a task depend on itself, if you do that, the task will depend on the old implementation of itself, giving the output of that implementation to you as your input. In the previous blog post, I brought up the possibility of only compiling source files with a certain annotation inside them, let’s say we’re only going to compile source files that contain the text "COMPILE ME". Here’s how you might implement that, depending on the existing sources implementation:

sources := {
  sources.value.filter { sourceFile =>
    IO.read(sourceFile).contains("COMPILE ME")
  }
}

sbt also provides a short hand for doing this, the ~= operator, which takes a function that takes the old value and returns the new value:

sources ~= _.filter { sourceFile =>
  IO.read(sourceFile).contains("COMPILE ME")
}

Another convenient shorthand for modifying the old value of a task that sbt provides, and that you have likely come across before, is the += and ++= operators. These take the old value, and append the item or sequence of items produced by your new implementation to it. So, to add a file to the sources:

sources += file("src/other/scala/Other.scala")

Or to add multiple files:

sources ++= Seq(
  file("src/other/scala/Other.scala"),
  file("src/other/scala/OtherOther.scala")
)

These of course can depend on other tasks through the value macro, just like when you use :=:

sources ++= Seq(
  (sourceDirectory.value / "other" / "scala").***
)

The *** method loads every file from a directory recursively.

§Scope me up

We’ve talked a little bit about scopes, but most of our examples so far have excluded them for brevity. So let’s take a look at how to scope settings and their dependencies.

To apply a scope to a setting, you can use the in method:

sources in Compile += file("src/other/scala/Other.scala")

Applying multiple scopes can be done by using multiple in calls, for example:

excludeFilter in sbtFunProject in unmanagedSources in Compile := "_*"

Or, they can also be done by passing multiple scopes to the in method, in the order project, configuration then task:

excludeFilter in (sbtFunProject, Compile, unmanagedSources) := "_*"

The same syntax can be used when depending on settings, though make sure you put parenthesis around the whole scoped setting in order to invoke the value method on it:

(sources in Compile) := 
  (managedSources in Compile).value ++ 
  (unmanagedSources in Compile).value

§Conclusion

In the first post in this series, we were introduced to the concepts behind sbt and its task engine, and how to explore and discover the task graphs that sbt provides. In this post we saw the practical side of how task dependencies and implementations are declared, using both the map method to map dependency keys, as well as macros. We saw how to modify existing task declarations, as well as how to use scopes.

One thing I’ve avoided here is showing cookbooks of how to do specific tasks, for example, how to add a source generator. The sbt documentation is really not bad for this, especially for cookbook type documentation, but I also hope that after reading these posts, you aren’t as dependent on copying and pasting sbt configuration, but rather can use the tools built in to sbt to discover the structure of your build, and modify it accordingly.

sbt - A task engine

sbt is the best build tool that I’ve used. But it’s also the build tool with the steepest learning curve that I’ve ever used, and I think most people would agree that it’s very difficult to learn. When you first start using it, configuring it is like casting spells, spells that have to be learned from a spell book, that have to be said in the exact right way, otherwise they don’t work. There are lots of guides out there that are essentially spell books, they teach you all the things you need to know to achieve various tasks. But I haven’t seen a lot out there that actually explains what sbt is, what it does, why it is the way it is. This blog post is my attempt to do that.

§A task engine

Simply put, sbt is a task engine. You have tasks. A task may be dependent on other tasks. Any task from any point in the build may be redefined, and new tasks can be easily added. In some ways it is a bit like make or ant, but it differs in a fundamental way, sbt tasks produce an output value, and are able to consume the output values of the tasks they depend on - whereas make and ant just modify the file system. This property of sbt allows you to break build steps up into very fine grained tasks.

So let’s take an example. A common step that build tools support is compilation. In many traditional build tools, a compilation task is responsible for finding a set of files to compile based on some input parameters, such as a list of source directories, and compiling them. In sbt, the compile task is not responsible for finding a set of files to compile, this is the responsibility of the sources task. The output value of the sources task is a list of files to compile. The compile task depends on the sources task, taking its list of files to compile as an input.

So what’s so good about this? What it means is that I can completely customise the way sources are located, by redefining the sources task. So if I have a crazy build requirement such as wanting to put a special annotation in my source files to say whether they get compiled or not, I can very easily implement my own sources task to do that. This is something that would be very difficult to do in another build tool, but in sbt it’s straight forward.

In other build tools, if I want to generate some sources, I have to make sure that the task to generate the sources runs before the compilation task, and puts them in a place the compilation task will find. In sbt, I can just redefine the sources task to make it generate the sources. In practice though, I don’t need to do that, because generating sources is a very common requirement. Remember that I said that sbt tasks can be very fine grained. The sources task itself depends on many other tasks, one of them is the managedSources task, which collects all the files that are managed (or generated) by the build (in contrast to unmanaged sources, which are your regular source files that you manage yourself). That task in turn depends on the sourceGenerators task, which I can redefine to add new source generators.

§A self documenting task engine

At this point you might be starting to see that there are many, many tasks involved in even the simplest build in sbt. I’ve talked about just one small part, how generated sources end up being compiled, but there are many more than that. How is someone that is new to sbt supposed to know what tasks exist, so that they can customise their build? Well, it turns out sbt comes with a few built in tools for inspecting the available tasks. These are often seen as advanced features of sbt, but I think really this is what new users to sbt should be introduced to first. So if you’re new, its time to fire up sbt.

First we need a simple project. In an empty directory, create a file called build.sbt, and set your projects name:

name := "sbt-fun"

Now, if you already have sbt 0.13 or later installed, you can use that. If you already have activator installed - which is basically just a script that launches sbt, then you can use that. If you have neither, then go here and download activator or sbt, it doesn’t matter which, and install it, and then start it in your projects directory:

$ sbt
[info] Loading project definition from /Users/jroper/sbt-fun/project
[info] Updating {file:/Users/jroper/sbt-fun/project/}sbt-fun-build...
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[info] Set current project to sbt-fun (in build file:/Users/jroper/sbt-fun/)
> 

So, now we’re on the sbt console. Earlier we were talking about the sources task. Let’s have a look at it. sbt has a command called inspect, which lets you inspect a task:

> inspect sources
[info] Task: scala.collection.Seq[java.io.File]
[info] Description:
[info]  All sources, both managed and unmanaged.
[info] Provided by:
[info]  {file:/Users/jroper/sbt-fun/}sbt-fun/compile:sources
[info] Defined at:
[info]  (sbt.Defaults) Defaults.scala:188
[info] Dependencies:
[info]  compile:unmanagedSources
[info]  compile:managedSources
[info] Delegates:
[info]  compile:sources
[info]  *:sources
[info]  {.}/compile:sources
[info]  {.}/*:sources
[info]  */compile:sources
[info]  */*:sources
[info] Related:
[info]  test:sources

What are we looking at? First, we can see that sources is a task that produces a sequence of files - as I said before. We can also see a description of the task, All sources, both managed and unmanaged. The Defined at section is interesting, it shows us where the sources task is defined, in this case, it’s on line 188 of the sbt Defaults class. We can see that it has two tasks that it depends on, unmanagedSources and managedSources. The rest of the information we won’t worry about for now.

Now before we start playing with our build, we can actually get even more information here, not only is it possible to inspect a single task in sbt, you can also inspect a whole tree of tasks, using the inspect tree command:

> inspect tree sources
[info] compile:sources = Task[scala.collection.Seq[java.io.File]]
[info]   +-compile:unmanagedSources = Task[scala.collection.Seq[java.io.File]]
[info]   | +-*/*:sourcesInBase = true
[info]   | +-*/*:excludeFilter = sbt.HiddenFileFilter$@5a63fa71
[info]   | +-*:baseDirectory = /Users/jroper/sbt-fun
[info]   | +-*/*:unmanagedSources::includeFilter = sbt.SimpleFilter@44a44a04
[info]   | +-compile:unmanagedSourceDirectories = List(/Users/jroper/sbt-fun/src/main/scala, /Users/jroper/sbt-fun/sr..
[info]   |   +-compile:javaSource = src/main/java
[info]   |   | +-compile:sourceDirectory = src/main
[info]   |   |   +-*:sourceDirectory = src
[info]   |   |   | +-*:baseDirectory = /Users/jroper/sbt-fun
[info]   |   |   |   +-*:thisProject = Project(id sbt-fun, base: /Users/jroper/sbt-fun, configurations: List(compile,..
[info]   |   |   |   
[info]   |   |   +-compile:configuration = compile
[info]   |   |   
[info]   |   +-compile:scalaSource = src/main/scala
[info]   |     +-compile:sourceDirectory = src/main
[info]   |       +-*:sourceDirectory = src
[info]   |       | +-*:baseDirectory = /Users/jroper/sbt-fun
[info]   |       |   +-*:thisProject = Project(id sbt-fun, base: /Users/jroper/sbt-fun, configurations: List(compile,..
[info]   |       |   
[info]   |       +-compile:configuration = compile
[info]   |       
[info]   +-compile:managedSources = Task[scala.collection.Seq[java.io.File]]
[info]     +-compile:sourceGenerators = List()
[info]     

So in here you can see that sources to managedSources to sourceGenerators chain that I mentioned before, and you can also see the unmanagedSources chain, which is a lot more complex, we can see directory hierarchies, filters for deciding which files to include and exclude, etc.

§Settings vs Tasks

At this point you may notice that there are two types of tasks in the tree, there are things like managedSources, which just describe the type of the task:

compile:managedSources = Task[scala.collection.Seq[java.io.File]]

And then there are things like scalaSource, which actually display a value:

compile:scalaSource = src/main/scala

This is actually an sbt optimisation, sbt has a special type of task called a Setting. Settings get executed once per session, so when you start sbt up, you start a new session, and all the settings get executed then. This is why when I inspect the tree, sbt can show me the value, because it already knows it. In contrast, an ordinary Task gets executed once per execution. So if I now run the sources task, that managedSources task will be executed then. If I run sources again, it will be executed again. But my settings only get executed once for the whole session.

It should be noted that an execution is a request by the user to execute a task. If two tasks in my tree depend on the sources task twice, sbt will ensure that the sources task only gets executed once. So if I run the publish task, which transitively depends on the compile task, as well as the doc task (that generates java/scala docs), and the packageSrc task (that generates source jars), these all depend on the same sources task, which will only be executed once during my publish execution, and the value will be reused as the input for all three tasks.

Now naturally, since settings are executed at the start of the session, and not as part of an execution, they can’t depend on tasks, they can only depend on other settings. Meanwhile, tasks can depend on both other tasks and settings.

When it’s important to know the difference between settings and tasks is when you’re writing your own sbt plugins that define their own settings and tasks. But in general, you can consider them to be the same thing, settings are just a small optimisation so that they don’t have to be executed every time. When defining your own tasks or settings, a good rule of thumb is if in doubt, just define a task.

§Scopes

Scopes are another important feature of the sbt task engine. A task can be scoped. When a task depends on another task, it can depend on that task in a particular scope. Now one obvious type of scope that sbt supports is the configuration scope. sbt has a few built in configurations, the two main ones that you’ll interact with are compile and test. So above, when the sources command depends on managedSources, you can see that it actually depends on compile:managedSources, which means it depends on managedSources in the compile scope.

In actual fact, you can see at the top that we are looking at the tree for compile:sources. When you don’t specify a scope, sbt will choose a default scope, in this case it has chosen the compile scope. The logic in how it makes that decision we won’t cover here. We could also inspect the test:sources tree:

> inspect tree test:sources
[info] test:sources = Task[scala.collection.Seq[java.io.File]]
[info]   +-test:unmanagedSources = Task[scala.collection.Seq[java.io.File]]
[info]   | +-test:unmanagedSourceDirectories = List(/Users/jroper/sbt-fun/src/test/scala, /Users/jroper/sbt-fun/src/t..
[info]   | | +-test:javaSource = src/test/java
[info]   | | | +-test:sourceDirectory = src/test
[info]   | | |   +-*:sourceDirectory = src
[info]   | | |   | +-*:baseDirectory = /Users/jroper/sbt-fun
[info]   | | |   |   +-*:thisProject = Project(id sbt-fun, base: /Users/jroper/sbt-fun, configurations: List(compile,..
[info]   | | |   |   
[info]   | | |   +-test:configuration = test
[info]   | | |   
[info]   | | +-test:scalaSource = src/test/scala
[info]   | |   +-test:sourceDirectory = src/test
[info]   | |     +-*:sourceDirectory = src
[info]   | |     | +-*:baseDirectory = /Users/jroper/sbt-fun
[info]   | |     |   +-*:thisProject = Project(id sbt-fun, base: /Users/jroper/sbt-fun, configurations: List(compile,..
[info]   | |     |   
[info]   | |     +-test:configuration = test
[info]   | |     
[info]   | +-*/*:unmanagedSources::includeFilter = sbt.SimpleFilter@44a44a04
[info]   | +-*/*:excludeFilter = sbt.HiddenFileFilter$@5a63fa71
[info]   | 
[info]   +-test:managedSources = Task[scala.collection.Seq[java.io.File]]
[info]     +-test:sourceGenerators = List()
[info]     

It looks pretty similar to the compile:sources tree, except that it depends on test scoped settings. In some cases, you can see that the scope is *, this means that it’s depending on an unscoped task/setting.

Configuration is not the only axis that you can scope tasks on in sbt, sbt supports two other axes, project and task.

The project axis is scoped by an sbt project. An sbt build can have multiple projects, and each project can have its own set of settings. When you define tasks on a project, sbt will automatically scope those tasks, and the dependencies of those tasks, to be for that project, that is if you haven’t already explicitly scoped them to a project yourself. Tasks scoped to one project can also depend on tasks in another project, so you could for example make the packageSrc command in one project depend on the sources for all the other projects, thus bringing all your sources together into one source jar.

The syntax for scoping something by project on the sbt command line is to prefix the task with the project name followed by a slash, then the task. For example sbt-fun/compile:sources is the sources task in the compile scope from the sbt-fun project. You can actually see from the output of the plain inspect command, in the Provided By section, that the full task is {file:/Users/jroper/sbt-fun/}sbt-fun/compile:sources, this is the path of the build, followed by the project name, configuration and task. Sometimes tasks and settings are scoped to be global or for the entire build, you can see some such settings above, they are prefixed with */, so */*:excludeFilter is the excludeFilter task, with no configuration scope, and no project scope.

The final axis is to be scoped by another task. Scoping by another task is incredibly useful, which we’ll see when we get to scope fallbacks, but what it means is that the same task key can be used and explicitly configured for many tasks. In the above tree we can see that unmanagedSources depends on includeFilter scoped to the unmanagedSources task, the syntax for this is unmanagedSources::includeFilter. includeFilter may also be used elsewhere, for example, in discovering resources, in that case it will be scoped to the unmanagedResources task.

§Scope fallbacks

Scopes work in a hierarchical fashion, allowing fallbacks through the hierarchy when tasks at a specific scope can’t be found. I mentioned above that unmanagedSources depends on unmanagedSources::includeFilter. Let’s have a closer look, by inspecting it:

> inspect unmanagedSources
[info] Task: scala.collection.Seq[java.io.File]
[info] Description:
[info]  Unmanaged sources, which are manually created.
[info] Provided by:
[info]  {file:/Users/jroper/sbt-fun/}sbt-fun/compile:unmanagedSources
[info] Defined at:
[info]  (sbt.Defaults) Defaults.scala:182
[info]  (sbt.Defaults) Defaults.scala:209
[info] Dependencies:
[info]  compile:baseDirectory
[info]  compile:unmanagedSourceDirectories
[info]  compile:unmanagedSources::includeFilter
[info]  compile:unmanagedSources::excludeFilter
...

So we can see that compile:unmanagedSources depends on compile:unmanagedSources::includeFilter and compile:unmanagedSources::excludeFilter. But if we have a look at the inspect tree command, we’ll notice a discrepancy:

> inspect tree unmanagedSources
[info] compile:unmanagedSources = Task[scala.collection.Seq[java.io.File]]
[info]   +-*/*:sourcesInBase = true
[info]   +-*/*:excludeFilter = sbt.HiddenFileFilter$@5a63fa71
[info]   +-*:baseDirectory = /Users/jroper/sbt-fun
[info]   +-*/*:unmanagedSources::includeFilter = sbt.SimpleFilter@44a44a04
...

So, while it depended on sbt-fun/compile:unmanagedSources::includeFilter, it actually got */*:unmanagedSources::includeFilter, that is, it requested a task at a specific project and configuration, but got a task that was defined for no project or configuration. Furthermore, the excludeFilter which was similarly requested, was satisfied by */*:excludeFilter, that is, it isn’t even scoped to the unmanagedSources task. This is a demonstration of how sbt uses fallbacks. When a task declares a dependency, sbt will try and satisfy that dependency with the most specific task it has for it, but if no task is defined at that specific scope, it will fallback to a less specific scope.

What this means, for example for excludeFilter, is that if you have a text editor that generates temporary files of a particular format, you can exclude those by adding it to the global excludeFilter, you don’t need to define an excludeFilter for every single scope. But, I might also decide that I want to exclude certain files in the test scope, so I can configure a different excludeFilter for tests by scoping it to test. Or, I might decide that I want a different filter again just for unmanagedSources, as opposed to unmanagedResources, so I can define the excludeFilter specifically for those tasks. The general approach that sbt takes in its predefined task dependency trees is to depend on tasks at a very specific scope, but define them at the most general scope that makes sense, allowing tasks to be overridden in a blanket fashion, but at a very fine grained level when required.

§Parallel by default

There is one last feature of the sbt task engine that I think is worth mentioning in this post. It’s not one that really needs to be understood well in order to use sbt, but it is a very powerful one that sbt’s architecture makes very simple. In sbt, all tasks are executed in parallel by default. Now of course, if a task declares a dependency on another task, those two tasks can’t run in parallel. But two tasks that have no dependency on each other, such as unmanagedSources and managedSources, can, and will be executed in parallel. Given sbt’s fine grained tasks, this makes for some considerable (and much needed, given the speed of scala compilation) performance improvements out of the box compared to other build tools.

sbt’s concurrent execution is also configurable, tasks can be tagged, and then you can define, for example, what the maximum number of tasks with that tag can be run in parallel. You can read more about these capabilities here.

§Conclusion

In this blog post we have seen that sbt is actually a task engine, and that the fact that it breaks tasks up into many smaller interdependent tasks gives you a lot of power and flexibility. We have seen that the sbt console can be used to inspect tasks, their dependencies, and entire dependency graphs of tasks, and this allows us to learn about sbt, the tasks that are available, and see how our build fits together. We have learned how tasks can be scoped to different configurations, projects, and other tasks, and how sbt uses a fallback system to resolve dependencies at specific scopes. Hopefully sbt is now more transparent to you, you no longer need a spellbook to know how to configure it, rather, you can use the inspect commands to discover what you can configure yourself.

We have not seen anything about how to define or redefine tasks, or the syntax of the sbt build file. This is the topic of my next blog post, sbt - A declarative DSL.

Introducing ERQX

Today I migrated my blog to a new blogging engine that I’ve written called ERQX. Now to start off with, why did I write my own blog engine? A case of not invented here syndrome? Or do I just really like writing blog engines (I was, technically still am, the lead developer of Pebble, the blog that I used to use)?

I was very close to migrating to a Jekyll blog hosted on GitHub, but there are a few reasons why I didn’t do this:

  • As a full time maintainer of Play, I don’t get a lot of opportunities to use Play as an end user. This is bad, how can I be expected to guide Play forward if I don’t feel the pain points as an end user? Hence, I jump at every opportunity I can to write new apps in it, and what better use case is there than my own blog?
  • I really like the setup we have with the documentation on the Play website - we have implemented some custom markdown extensions that allow extracting code snippets from compiled and tested source files, and all documentation is served directly out of git, which turns out to be a great way to deploy and distribute content.
  • I wanted to see how easy it would be to make a full reusable and skinnable application within Play.
  • Because I love Play!

§Features

So what are the features of ERQX? Here are a few:

§Embedabble

The blog engine is completely embeddable. All you need to do is add a single line to your routes file to include the blog router, and some configuration in application.conf pointing to a git repository, and you’re good to go.

Not convinced? Here is everything you need to do to include a blog in your existing Play application.

  1. Add a dependency to your build.sbt file:

    resolvers += "ERQX Releases" at "https://jroper.github.io/releases"
    
    libraryDependencies += "au.id.jazzy.erqx" %% "erqx-engine" % "1.0.0"
    
  2. Add the blog router to your routes file:

    ->  /blog       au.id.jazzy.erqx.engine.controllers.BlogsRouter
    
  3. Add some configuration pointing to the git repo for your blog:

    blogs {
      default {
        gitConfig {
          gitRepo = "/path/to/some/repo"
          remote = "origin"
          fetchKey = "somesecret"
        }
      }
    }
    

And there you have it!

§Git backend

In future I hope to add other backends, I think a prismic.io backend would be really cool, but for now it just supports a git backend. The layout of the git repo is somewhat inspired by Jekyll, blog posts go in a folder named _posts, named with the date and title in the name, and each blog post has a front matter in yaml format. Blog posts can either be in markdown or HTML format. There is also a _config.yml file which contains configuration for the blog, such as the title, description and a few other things.

Changes are deployed to the blog either by polling, or by registering a commit hook on GitHub. In the example adove, the url for the webhook would be http://example.com/blog/fetch/somesecret. Using commit hooks, blog posts are published within seconds of pushing to GitHub. ERQX also takes advantage of the git hash, serving that as the ETag for all content, allowing caching of the blog and its associated resources.

§Markdown

Blog posts can be in markdown format, and uses the Play documentation renderer to support pulling code samples out of compiled and tested source files. This is invaluable if you write technical blog posts full of code and you want to ensure that the code in the blog post works.

§Themeable

The blog is completely themeable, allowing you to simply override the header and footer to plug in different stylesheets, or completely use your own templates to render blog posts.

The default theme uses a pure CSS responsive layout, switching to rendering the description of the blog in a slideout tab on mobile devices, and provides support for comments via Disqus.

§Multi blog support

ERQX allows serving multiple blogs from the one server. Each may have its own theme.

§Source code and examples

ERQX and its associated documentation can be found on GitHub.

The website for this blog, showing how the blog can be emedded in a real application, plus the content of the blog itself, can also be found on GitHub. The website is in the master branch, while the blog content is in the allthatjazz branch.

About

Hi! My name is James Roper, and I am a software developer with a particular interest in open source development and trying new things. I program in Scala, Java, Go, PHP, Python and Javascript, and I work for Lightbend as the architect of Kalix. I also have a full life outside the world of IT, enjoy playing a variety of musical instruments and sports, and currently I live in Canberra.