all that jazz

james' blog about scala and all that jazz

sbt - A task engine

sbt is the best build tool that I’ve used. But it’s also the build tool with the steepest learning curve that I’ve ever used, and I think most people would agree that it’s very difficult to learn. When you first start using it, configuring it is like casting spells, spells that have to be learned from a spell book, that have to be said in the exact right way, otherwise they don’t work. There are lots of guides out there that are essentially spell books, they teach you all the things you need to know to achieve various tasks. But I haven’t seen a lot out there that actually explains what sbt is, what it does, why it is the way it is. This blog post is my attempt to do that.

§A task engine

Simply put, sbt is a task engine. You have tasks. A task may be dependent on other tasks. Any task from any point in the build may be redefined, and new tasks can be easily added. In some ways it is a bit like make or ant, but it differs in a fundamental way, sbt tasks produce an output value, and are able to consume the output values of the tasks they depend on - whereas make and ant just modify the file system. This property of sbt allows you to break build steps up into very fine grained tasks.

So let’s take an example. A common step that build tools support is compilation. In many traditional build tools, a compilation task is responsible for finding a set of files to compile based on some input parameters, such as a list of source directories, and compiling them. In sbt, the compile task is not responsible for finding a set of files to compile, this is the responsibility of the sources task. The output value of the sources task is a list of files to compile. The compile task depends on the sources task, taking its list of files to compile as an input.

So what’s so good about this? What it means is that I can completely customise the way sources are located, by redefining the sources task. So if I have a crazy build requirement such as wanting to put a special annotation in my source files to say whether they get compiled or not, I can very easily implement my own sources task to do that. This is something that would be very difficult to do in another build tool, but in sbt it’s straight forward.

In other build tools, if I want to generate some sources, I have to make sure that the task to generate the sources runs before the compilation task, and puts them in a place the compilation task will find. In sbt, I can just redefine the sources task to make it generate the sources. In practice though, I don’t need to do that, because generating sources is a very common requirement. Remember that I said that sbt tasks can be very fine grained. The sources task itself depends on many other tasks, one of them is the managedSources task, which collects all the files that are managed (or generated) by the build (in contrast to unmanaged sources, which are your regular source files that you manage yourself). That task in turn depends on the sourceGenerators task, which I can redefine to add new source generators.

§A self documenting task engine

At this point you might be starting to see that there are many, many tasks involved in even the simplest build in sbt. I’ve talked about just one small part, how generated sources end up being compiled, but there are many more than that. How is someone that is new to sbt supposed to know what tasks exist, so that they can customise their build? Well, it turns out sbt comes with a few built in tools for inspecting the available tasks. These are often seen as advanced features of sbt, but I think really this is what new users to sbt should be introduced to first. So if you’re new, its time to fire up sbt.

First we need a simple project. In an empty directory, create a file called build.sbt, and set your projects name:

name := "sbt-fun"

Now, if you already have sbt 0.13 or later installed, you can use that. If you already have activator installed - which is basically just a script that launches sbt, then you can use that. If you have neither, then go here and download activator or sbt, it doesn’t matter which, and install it, and then start it in your projects directory:

$ sbt
[info] Loading project definition from /Users/jroper/sbt-fun/project
[info] Updating {file:/Users/jroper/sbt-fun/project/}sbt-fun-build...
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[info] Set current project to sbt-fun (in build file:/Users/jroper/sbt-fun/)
> 

So, now we’re on the sbt console. Earlier we were talking about the sources task. Let’s have a look at it. sbt has a command called inspect, which lets you inspect a task:

> inspect sources
[info] Task: scala.collection.Seq[java.io.File]
[info] Description:
[info]  All sources, both managed and unmanaged.
[info] Provided by:
[info]  {file:/Users/jroper/sbt-fun/}sbt-fun/compile:sources
[info] Defined at:
[info]  (sbt.Defaults) Defaults.scala:188
[info] Dependencies:
[info]  compile:unmanagedSources
[info]  compile:managedSources
[info] Delegates:
[info]  compile:sources
[info]  *:sources
[info]  {.}/compile:sources
[info]  {.}/*:sources
[info]  */compile:sources
[info]  */*:sources
[info] Related:
[info]  test:sources

What are we looking at? First, we can see that sources is a task that produces a sequence of files - as I said before. We can also see a description of the task, All sources, both managed and unmanaged. The Defined at section is interesting, it shows us where the sources task is defined, in this case, it’s on line 188 of the sbt Defaults class. We can see that it has two tasks that it depends on, unmanagedSources and managedSources. The rest of the information we won’t worry about for now.

Now before we start playing with our build, we can actually get even more information here, not only is it possible to inspect a single task in sbt, you can also inspect a whole tree of tasks, using the inspect tree command:

> inspect tree sources
[info] compile:sources = Task[scala.collection.Seq[java.io.File]]
[info]   +-compile:unmanagedSources = Task[scala.collection.Seq[java.io.File]]
[info]   | +-*/*:sourcesInBase = true
[info]   | +-*/*:excludeFilter = sbt.HiddenFileFilter$@5a63fa71
[info]   | +-*:baseDirectory = /Users/jroper/sbt-fun
[info]   | +-*/*:unmanagedSources::includeFilter = sbt.SimpleFilter@44a44a04
[info]   | +-compile:unmanagedSourceDirectories = List(/Users/jroper/sbt-fun/src/main/scala, /Users/jroper/sbt-fun/sr..
[info]   |   +-compile:javaSource = src/main/java
[info]   |   | +-compile:sourceDirectory = src/main
[info]   |   |   +-*:sourceDirectory = src
[info]   |   |   | +-*:baseDirectory = /Users/jroper/sbt-fun
[info]   |   |   |   +-*:thisProject = Project(id sbt-fun, base: /Users/jroper/sbt-fun, configurations: List(compile,..
[info]   |   |   |   
[info]   |   |   +-compile:configuration = compile
[info]   |   |   
[info]   |   +-compile:scalaSource = src/main/scala
[info]   |     +-compile:sourceDirectory = src/main
[info]   |       +-*:sourceDirectory = src
[info]   |       | +-*:baseDirectory = /Users/jroper/sbt-fun
[info]   |       |   +-*:thisProject = Project(id sbt-fun, base: /Users/jroper/sbt-fun, configurations: List(compile,..
[info]   |       |   
[info]   |       +-compile:configuration = compile
[info]   |       
[info]   +-compile:managedSources = Task[scala.collection.Seq[java.io.File]]
[info]     +-compile:sourceGenerators = List()
[info]     

So in here you can see that sources to managedSources to sourceGenerators chain that I mentioned before, and you can also see the unmanagedSources chain, which is a lot more complex, we can see directory hierarchies, filters for deciding which files to include and exclude, etc.

§Settings vs Tasks

At this point you may notice that there are two types of tasks in the tree, there are things like managedSources, which just describe the type of the task:

compile:managedSources = Task[scala.collection.Seq[java.io.File]]

And then there are things like scalaSource, which actually display a value:

compile:scalaSource = src/main/scala

This is actually an sbt optimisation, sbt has a special type of task called a Setting. Settings get executed once per session, so when you start sbt up, you start a new session, and all the settings get executed then. This is why when I inspect the tree, sbt can show me the value, because it already knows it. In contrast, an ordinary Task gets executed once per execution. So if I now run the sources task, that managedSources task will be executed then. If I run sources again, it will be executed again. But my settings only get executed once for the whole session.

It should be noted that an execution is a request by the user to execute a task. If two tasks in my tree depend on the sources task twice, sbt will ensure that the sources task only gets executed once. So if I run the publish task, which transitively depends on the compile task, as well as the doc task (that generates java/scala docs), and the packageSrc task (that generates source jars), these all depend on the same sources task, which will only be executed once during my publish execution, and the value will be reused as the input for all three tasks.

Now naturally, since settings are executed at the start of the session, and not as part of an execution, they can’t depend on tasks, they can only depend on other settings. Meanwhile, tasks can depend on both other tasks and settings.

When it’s important to know the difference between settings and tasks is when you’re writing your own sbt plugins that define their own settings and tasks. But in general, you can consider them to be the same thing, settings are just a small optimisation so that they don’t have to be executed every time. When defining your own tasks or settings, a good rule of thumb is if in doubt, just define a task.

§Scopes

Scopes are another important feature of the sbt task engine. A task can be scoped. When a task depends on another task, it can depend on that task in a particular scope. Now one obvious type of scope that sbt supports is the configuration scope. sbt has a few built in configurations, the two main ones that you’ll interact with are compile and test. So above, when the sources command depends on managedSources, you can see that it actually depends on compile:managedSources, which means it depends on managedSources in the compile scope.

In actual fact, you can see at the top that we are looking at the tree for compile:sources. When you don’t specify a scope, sbt will choose a default scope, in this case it has chosen the compile scope. The logic in how it makes that decision we won’t cover here. We could also inspect the test:sources tree:

> inspect tree test:sources
[info] test:sources = Task[scala.collection.Seq[java.io.File]]
[info]   +-test:unmanagedSources = Task[scala.collection.Seq[java.io.File]]
[info]   | +-test:unmanagedSourceDirectories = List(/Users/jroper/sbt-fun/src/test/scala, /Users/jroper/sbt-fun/src/t..
[info]   | | +-test:javaSource = src/test/java
[info]   | | | +-test:sourceDirectory = src/test
[info]   | | |   +-*:sourceDirectory = src
[info]   | | |   | +-*:baseDirectory = /Users/jroper/sbt-fun
[info]   | | |   |   +-*:thisProject = Project(id sbt-fun, base: /Users/jroper/sbt-fun, configurations: List(compile,..
[info]   | | |   |   
[info]   | | |   +-test:configuration = test
[info]   | | |   
[info]   | | +-test:scalaSource = src/test/scala
[info]   | |   +-test:sourceDirectory = src/test
[info]   | |     +-*:sourceDirectory = src
[info]   | |     | +-*:baseDirectory = /Users/jroper/sbt-fun
[info]   | |     |   +-*:thisProject = Project(id sbt-fun, base: /Users/jroper/sbt-fun, configurations: List(compile,..
[info]   | |     |   
[info]   | |     +-test:configuration = test
[info]   | |     
[info]   | +-*/*:unmanagedSources::includeFilter = sbt.SimpleFilter@44a44a04
[info]   | +-*/*:excludeFilter = sbt.HiddenFileFilter$@5a63fa71
[info]   | 
[info]   +-test:managedSources = Task[scala.collection.Seq[java.io.File]]
[info]     +-test:sourceGenerators = List()
[info]     

It looks pretty similar to the compile:sources tree, except that it depends on test scoped settings. In some cases, you can see that the scope is *, this means that it’s depending on an unscoped task/setting.

Configuration is not the only axis that you can scope tasks on in sbt, sbt supports two other axes, project and task.

The project axis is scoped by an sbt project. An sbt build can have multiple projects, and each project can have its own set of settings. When you define tasks on a project, sbt will automatically scope those tasks, and the dependencies of those tasks, to be for that project, that is if you haven’t already explicitly scoped them to a project yourself. Tasks scoped to one project can also depend on tasks in another project, so you could for example make the packageSrc command in one project depend on the sources for all the other projects, thus bringing all your sources together into one source jar.

The syntax for scoping something by project on the sbt command line is to prefix the task with the project name followed by a slash, then the task. For example sbt-fun/compile:sources is the sources task in the compile scope from the sbt-fun project. You can actually see from the output of the plain inspect command, in the Provided By section, that the full task is {file:/Users/jroper/sbt-fun/}sbt-fun/compile:sources, this is the path of the build, followed by the project name, configuration and task. Sometimes tasks and settings are scoped to be global or for the entire build, you can see some such settings above, they are prefixed with */, so */*:excludeFilter is the excludeFilter task, with no configuration scope, and no project scope.

The final axis is to be scoped by another task. Scoping by another task is incredibly useful, which we’ll see when we get to scope fallbacks, but what it means is that the same task key can be used and explicitly configured for many tasks. In the above tree we can see that unmanagedSources depends on includeFilter scoped to the unmanagedSources task, the syntax for this is unmanagedSources::includeFilter. includeFilter may also be used elsewhere, for example, in discovering resources, in that case it will be scoped to the unmanagedResources task.

§Scope fallbacks

Scopes work in a hierarchical fashion, allowing fallbacks through the hierarchy when tasks at a specific scope can’t be found. I mentioned above that unmanagedSources depends on unmanagedSources::includeFilter. Let’s have a closer look, by inspecting it:

> inspect unmanagedSources
[info] Task: scala.collection.Seq[java.io.File]
[info] Description:
[info]  Unmanaged sources, which are manually created.
[info] Provided by:
[info]  {file:/Users/jroper/sbt-fun/}sbt-fun/compile:unmanagedSources
[info] Defined at:
[info]  (sbt.Defaults) Defaults.scala:182
[info]  (sbt.Defaults) Defaults.scala:209
[info] Dependencies:
[info]  compile:baseDirectory
[info]  compile:unmanagedSourceDirectories
[info]  compile:unmanagedSources::includeFilter
[info]  compile:unmanagedSources::excludeFilter
...

So we can see that compile:unmanagedSources depends on compile:unmanagedSources::includeFilter and compile:unmanagedSources::excludeFilter. But if we have a look at the inspect tree command, we’ll notice a discrepancy:

> inspect tree unmanagedSources
[info] compile:unmanagedSources = Task[scala.collection.Seq[java.io.File]]
[info]   +-*/*:sourcesInBase = true
[info]   +-*/*:excludeFilter = sbt.HiddenFileFilter$@5a63fa71
[info]   +-*:baseDirectory = /Users/jroper/sbt-fun
[info]   +-*/*:unmanagedSources::includeFilter = sbt.SimpleFilter@44a44a04
...

So, while it depended on sbt-fun/compile:unmanagedSources::includeFilter, it actually got */*:unmanagedSources::includeFilter, that is, it requested a task at a specific project and configuration, but got a task that was defined for no project or configuration. Furthermore, the excludeFilter which was similarly requested, was satisfied by */*:excludeFilter, that is, it isn’t even scoped to the unmanagedSources task. This is a demonstration of how sbt uses fallbacks. When a task declares a dependency, sbt will try and satisfy that dependency with the most specific task it has for it, but if no task is defined at that specific scope, it will fallback to a less specific scope.

What this means, for example for excludeFilter, is that if you have a text editor that generates temporary files of a particular format, you can exclude those by adding it to the global excludeFilter, you don’t need to define an excludeFilter for every single scope. But, I might also decide that I want to exclude certain files in the test scope, so I can configure a different excludeFilter for tests by scoping it to test. Or, I might decide that I want a different filter again just for unmanagedSources, as opposed to unmanagedResources, so I can define the excludeFilter specifically for those tasks. The general approach that sbt takes in its predefined task dependency trees is to depend on tasks at a very specific scope, but define them at the most general scope that makes sense, allowing tasks to be overridden in a blanket fashion, but at a very fine grained level when required.

§Parallel by default

There is one last feature of the sbt task engine that I think is worth mentioning in this post. It’s not one that really needs to be understood well in order to use sbt, but it is a very powerful one that sbt’s architecture makes very simple. In sbt, all tasks are executed in parallel by default. Now of course, if a task declares a dependency on another task, those two tasks can’t run in parallel. But two tasks that have no dependency on each other, such as unmanagedSources and managedSources, can, and will be executed in parallel. Given sbt’s fine grained tasks, this makes for some considerable (and much needed, given the speed of scala compilation) performance improvements out of the box compared to other build tools.

sbt’s concurrent execution is also configurable, tasks can be tagged, and then you can define, for example, what the maximum number of tasks with that tag can be run in parallel. You can read more about these capabilities here.

§Conclusion

In this blog post we have seen that sbt is actually a task engine, and that the fact that it breaks tasks up into many smaller interdependent tasks gives you a lot of power and flexibility. We have seen that the sbt console can be used to inspect tasks, their dependencies, and entire dependency graphs of tasks, and this allows us to learn about sbt, the tasks that are available, and see how our build fits together. We have learned how tasks can be scoped to different configurations, projects, and other tasks, and how sbt uses a fallback system to resolve dependencies at specific scopes. Hopefully sbt is now more transparent to you, you no longer need a spellbook to know how to configure it, rather, you can use the inspect commands to discover what you can configure yourself.

We have not seen anything about how to define or redefine tasks, or the syntax of the sbt build file. This is the topic of my next blog post, sbt - A declarative DSL.

OSS diversity - A thought experiment

On the topic of diversity in open source software, I have a thought experiment - what if an open source project was structured more like a company? This is only a thought experiment, I’m definitely not advocating that we structure open source projects like companies, but I am looking for ways that we can improve diversity in OSS, and this is a big problem that many are struggling to find answers to.

The big difference between a company structure and an open source project structure is that in a company, there are many different roles, and each role has a number of responsibilities specific to that role. People are hired based on their competency to fulfill the responsbilities required by those roles. In contrast, an open source project tends to have only one role - the committer, and then all the responsibilities of the project are organically fulfilled by the various people who are in that role according to need and interest. When people are selected for the committer role, they are selected on broadly the same criteria, which is based on their technical contributions to the project.

Does this difference have any impact on diversity? Of course there is some impact, since each person in the committer role must have a passion for technical contribution to the project, whereas in a company structure, there will be some people with a passion for technical work in the technical roles, and other people with passion for communication in the marketing roles. But this is a lack of vocational diversity. The issue that we’re concerned about in the OSS world is not a lack of vocational diversity, it’s a lack of diversity of gender, age, race, sexual orientation, among other things. For lack of a better word, I’ll call this type of diversity “identity” diversity, as opposed to “vocational” diversity. At face value, if the pool of people that are passionate about technical contribution are diverse in identity, then there likewise should be identity diversity among committers in open source projects.

In practice we see things are not like this in open source projects. In contrast, and in my experience, big companies with a broad range of very specialised vocational roles tend to be very identity diverse - at very least outside of executive management. Also in my experience this identity diversity exists within vocational roles, it’s not just that each vocational role is filled with people of one type diversity that in aggregate makes the whole company identity diverse.

So could it be that a lack of diversity of roles - which are vocational based - fuels a lack in identity diversity in people that fill these roles? At this stage I haven’t done enough research to be able to answer this question, but perhaps it’s something that we could start experimenting with in open source projects.

§Putting it into practice

So if we were to try to introduce diverse vocational roles into open source projects, what roles would they be, and what would they look like? To make things a little more concrete, let’s start by thinking about the introduction of a marketing role. Every large open source project does a lot of marketing, from publishing blog posts to tweeting to organising conference engagements to engaging with the community on mailing lists. This is something that it would be quite straight forward to create a role for. This wouldn’t mean that everyone else in the project stops blogging/tweeting etc, that doesn’t happen in a company. It does mean that someone is given the responsibility of coordinating and ensuring that marketing happens.

In order for such a role to work, it needs to be well defined - open source projects are not used to having such roles, so it needs to be made clear to everyone on the project, not just the person in this role, what the responsibilities of the role are. This is necessary to empower that person to be and feel effective in the project. The responsibilities are going to vary from project to project, but in the context of my project, Play Framework, and without having put too much thought into it yet, this is what I’d envision:

  • Be the public face of the project, the first point of contact for people that want to engage the Play community.
  • Manage the Twitter account, including keeping an eye out for tweets to retweet and responding to mentions where necessary.
  • Keep an eye out for community related questions on the mailing list, and respond or coordinate responses to those emails.
  • Keep a look out for conferences that Play contributors, both in the core team and the broader ecoystem, could speak at, and link the appropriate people up with appropriate conferences.
  • Write or coordinate project announcements, such as new releases, security vulnerabilities, etc.
  • Seek ways to make Play a more attractive project to contribute to, through talking to people about their contribution experiences, and improving or coordinating improvements to documentation and the processes in place for receiving contributions.
  • Seek ways to make Play an easier framework to get started with, through talking to people about their usage experiences, and improving or coordinating improvements to documentation and other resources.

Having defined the role, the next step is probably the hardest - finding someone to fill it. Finding people to fill a committer role of an open source project is generally easy, many developers that love using the software would also love to be a committer on it. Finding someone with skills and a passion for marketing who is willing to volunteer their time towards an open source project I would imagine would be a lot harder. This is possibly where my whole thought experiment falls down.

For Play though it’s not too hard to solve - Play is a project that is driven by a company, and that company already has a marketing team that does a lot of behind the scenes work in engaging the Play community. For us, we could make our marketing staff more public facing in the Play community, getting them to take on the responsibilities listed above (some of them they already do), and publicly acknowledging them as a member of the Play community, not just a member of the Typesafe marketing team, on the Play website. This may be a good start.

This is just one role that we could look at introducing to open source projects. There are many other possible roles, including leadership roles (in most companies these are not filled by technical people), product management, and a variety of HR roles. I’m not sure how well (if at all) these will work in an open source project, but in order to improve diversity, OSS projects are in need of big changes, and whatever changes are found to work will probably appear unworkable at first. So I think these are worth trying.

Women in tech - It's a mans problem

A few days ago a panel of four men showed the world that they had no clue about the issues that women in tech faced, and how they should be solved. The overwhelming theme that I picked up from the criticism that I read about it was “you’re not listening to us”. Is that true? Are these male industry leaders really not listening to women? I mean surely they have read the accounts of the issues that many women in tech have faced, isn’t that enough?

Now I am about to break one of the cardinal sins of talking about women in tech - relating it to my wife - but please bear with me, it’s not what you think. My wife Beth and I are currently getting marriage counselling1. In it we have rediscovered two things that we always knew but need to keep being reminded of. The first is that during conflict, Beth speaks using emotional language. The second is that during conflict, I speak using logical language. In order for us to resolve conflict, we need to speak each others language, I need to talk more about how things make me feel, and Beth needs to step back from her feelings and reason logically about hers and my actions.

It is from this difference in languages that the problems of women in tech flow. No! That’s not it at all. But, if you’re a man, you may have been nodding your head as you read that statement. If you’re a woman, if it weren’t for the no in bold text immediately after that statement, you probably would have closed your browser in anger. And this highlights a deep problem.

We men have a tendency to approach the things that women are saying - the accounts of harassment and abuse, the accounts of every day prejudice, and the calls to action - as if those women speak a different language, the way our wives/mothers/girlfriends do. But read some of the accounts again. Are these spoken with a language of emotion? They certainly talk about emotion, but those accounts are very well reasoned and logical texts that speak plainly about actual events. Even if, and that is a very big if, the women who told these accounts do have a tendency to use emotional language over logic and sound reasoning, they have clearly mastered the skill of communicating to men - after all, in this industry, they have to.

So what is the impact of this approach that we men take, and what do I mean by it? When I first read Julie Ann Horvath’s account of her experience at GitHub, my subconscious immediately told me that I had to be careful. Women have a tendency to overreact, to speak with a language of emotion that does not follow sound reasoning or logic, and I should take the things that I am reading with a grain of salt. Any reaction that I have to this I should carefully measure, I should refrain from saying anything too strongly about it, in case it turns out not to be true. While I believed that it probably was true, I let my subconscious prejudice stop me from taking it too seriously.

This reaction doesn’t make sense. But the deep impact that it has is that it causes me to distance myself somewhat from the problem. And while in certain circumstances distancing yourself from problems does little harm, in this instance, it is the worst thing I could possibly do. Why? Because what if the problem is me? If I distance myself from the problem, I will never see that it’s me.

No wonder women are complaining that men are not listening. As long as we approach women as a group that speaks a different language, we will never listen to them. We will never understand what they have to say. We will distance ourselves from their arguments, and from the implications, and this means, if there is any problem in us, any implication that should change us, we will not hear it.

It has taken me a long time to learn this. I used to think that the issue of women in tech was just some gripe over numbers, that the problem was that the number of women in tech didn’t equal the number of men in tech, and that some vocal women believed that that needed to be fixed. As I’ve read more and more and more accounts of women facing sexual harassment and discrimination I’ve slowly come to understand that it is something very different. I should have listened earlier, and come to this conclusion a long time ago. But better late than never. I’ve come to the conclusion that the issue of women in tech is a man’s problem.

§It’s a man’s problem

The only person that can change your attitudes is you. Other people can’t change them - they can point you in the right direction, they can present you with well reasoned arguments on why you should change and how to change, but at the end of the day, the only person that can change them is you. The women in tech issue comes down to the attitudes of us men, and therefore it is a problem that only we men can fix. This is what I mean by it’s a man’s problem - the changing will be done by men.

However, the initiative to fix it must be led by women. Why? Because only women can explain how they are prejudiced against, how the actions of men, particularly the small seemingly inconsequential ones that happen every day, impact women. They are the ones that see and experience the problem, and so they are the only ones that can describe and instruct on how to remedy the problem.

But the main force of change must come from men that are listening to these women. Men who are not just reading the accounts and remedies, but are actually listening to them without prejudice. These men have two tasks:

  1. Change themselves. When women identify something that men are doing that is harmful to women’s acceptance in the IT industry, men need to examine themselves to see if they are exhibiting that action, and if so, change it.
  2. Convince other men to listen to women without prejudice. This is a job that must be done by men, because if the men that need convincing aren’t listening to women, then nothing a woman says will resound with them.

If you’re a man reading this, and you think “after reading this I now understand the issues that women in tech face”, then you’ve missed the point. I don’t understand the issues that women in tech face, so there’s no way after reading something that I’ve written that you could understand them. I’ve merely pointed out the first step - to start listening to women without prejudice. The next step is to actually listen to them! Read the blog posts and news articles of the accounts of women in tech with unprejudiced eyes. Talk to your female coworkers and friends about the issues they face, and listen to them. Attend conferences and meetups aimed at promoting women in tech, and listen! This is a man’s problem that requires action by men, and the first step is listening to women.

§Footnotes

  1. No, our marriage is not on the rocks. Beth and I believe that a marriage is like a car, and marriage counselling is like a mechanic. If you wait until a car breaks down before you take it to the mechanic, it will have a much bigger impact and cost a lot more to fix - it may even get written off. Rather, you take the car to the mechanic for regular checkups while it's healthy. Likewise, waiting till a marriage breaks down to see a marriage counsellor is likely to cause a lot of pain and take a very long time to fix. Rather, seeing a marriage counsellor while your marriage is healthy ensures the long term health of the marriage, and also ensures that you both get the most out of the marriage too. We see the marriage counselling we're getting now as our 5 year checkup.

Introducing ERQX

Today I migrated my blog to a new blogging engine that I’ve written called ERQX. Now to start off with, why did I write my own blog engine? A case of not invented here syndrome? Or do I just really like writing blog engines (I was, technically still am, the lead developer of Pebble, the blog that I used to use)?

I was very close to migrating to a Jekyll blog hosted on GitHub, but there are a few reasons why I didn’t do this:

  • As a full time maintainer of Play, I don’t get a lot of opportunities to use Play as an end user. This is bad, how can I be expected to guide Play forward if I don’t feel the pain points as an end user? Hence, I jump at every opportunity I can to write new apps in it, and what better use case is there than my own blog?
  • I really like the setup we have with the documentation on the Play website - we have implemented some custom markdown extensions that allow extracting code snippets from compiled and tested source files, and all documentation is served directly out of git, which turns out to be a great way to deploy and distribute content.
  • I wanted to see how easy it would be to make a full reusable and skinnable application within Play.
  • Because I love Play!

§Features

So what are the features of ERQX? Here are a few:

§Embedabble

The blog engine is completely embeddable. All you need to do is add a single line to your routes file to include the blog router, and some configuration in application.conf pointing to a git repository, and you’re good to go.

Not convinced? Here is everything you need to do to include a blog in your existing Play application.

  1. Add a dependency to your build.sbt file:

    resolvers += "ERQX Releases" at "https://jroper.github.io/releases"
    
    libraryDependencies += "au.id.jazzy.erqx" %% "erqx-engine" % "1.0.0"
    
  2. Add the blog router to your routes file:

    ->  /blog       au.id.jazzy.erqx.engine.controllers.BlogsRouter
    
  3. Add some configuration pointing to the git repo for your blog:

    blogs {
      default {
        gitConfig {
          gitRepo = "/path/to/some/repo"
          remote = "origin"
          fetchKey = "somesecret"
        }
      }
    }
    

And there you have it!

§Git backend

In future I hope to add other backends, I think a prismic.io backend would be really cool, but for now it just supports a git backend. The layout of the git repo is somewhat inspired by Jekyll, blog posts go in a folder named _posts, named with the date and title in the name, and each blog post has a front matter in yaml format. Blog posts can either be in markdown or HTML format. There is also a _config.yml file which contains configuration for the blog, such as the title, description and a few other things.

Changes are deployed to the blog either by polling, or by registering a commit hook on GitHub. In the example adove, the url for the webhook would be http://example.com/blog/fetch/somesecret. Using commit hooks, blog posts are published within seconds of pushing to GitHub. ERQX also takes advantage of the git hash, serving that as the ETag for all content, allowing caching of the blog and its associated resources.

§Markdown

Blog posts can be in markdown format, and uses the Play documentation renderer to support pulling code samples out of compiled and tested source files. This is invaluable if you write technical blog posts full of code and you want to ensure that the code in the blog post works.

§Themeable

The blog is completely themeable, allowing you to simply override the header and footer to plug in different stylesheets, or completely use your own templates to render blog posts.

The default theme uses a pure CSS responsive layout, switching to rendering the description of the blog in a slideout tab on mobile devices, and provides support for comments via Disqus.

§Multi blog support

ERQX allows serving multiple blogs from the one server. Each may have its own theme.

§Source code and examples

ERQX and its associated documentation can be found on GitHub.

The website for this blog, showing how the blog can be emedded in a real application, plus the content of the blog itself, can also be found on GitHub. The website is in the master branch, while the blog content is in the allthatjazz branch.

Fun doesn't mean compromising scalability

Today I read an interesting piece on InfoWorld about Meteor, Meteor aims to make JavaScript programming fun again. It is an interview with Matt DeBergalis, a co-author of Meteor, about Meteor and why a developer would choose it. The title in particular resonated well with me, "making programming fun again" is a catch phrase I have often used in presentations I've given about Play Framework.

As the demands on the applications we write shifts, the technologies we use start to make it harder to meet them, and pretty soon we feel like we are always working against the technologies that are supposed to be helping us. By taking a step back, rethinking the technologies, and creating new ones that are better suited to todays demands, we can continue being productive writing modern applications, and its then that development becomes fun again. Though obviously not always the case, how much fun you have working with a particular technology is often well correlated to how well suited it is for solving the problems you are trying to solve, and so there is some merit to switching to technologies that are more fun.

In this light, Meteor is not a bad framework, it is particularly very interesting in its approach to solving the problems of making web applications responsive to data updates. Writing apps in it will definitely, at least initially, be very fun. But my reason for writing this post is that I had one main gripe with the article. The problem was that DeBergalis continually likened what Meteor achieves with Facebook, implying that Facebook could be implemented using Meteor. This couldn't be further from the truth.

While the end result of an application written in Meteor and Facebook are very similar - they are both applications that update instantly as people interact with them - the approach that Facebook takes to writing their apps is the complete opposite from Meteor. Meteor places a massive emphasis on "don't worry about how data is communicated, let the framework deal with that for you". Although I have not worked on Facebook myself, I am sure that their approach is all about how the data is communicated - they don't just let the framework deal with that for them.

The problem with Meteor's approach to web development is that it makes the same mistakes that some very old technologies that many people now loath made. I am going to highlight two such technologies.

The first is relational databases. The promise of relational databases was that you don't have to worry about how your data was accessed - just make sure you store it in a normalised form, and let the database handle whatever load you throw at it. Performance can be achieved by tuning with indexes. But the problem that we found on the web is that that approach did not scale. Denormalisation and caching became necessary in any app with even a modest load. And that's when NoSQL databases started popping up. NoSQL databases intentionally limited what you could do in them - forcing you to take a different perspective on your data, namely how is it going to be read/written? They forced you to make decisions that would allow you to scale early in the design process, and we found that making these decisions early were key to successfully scaling a web application.

The second technology is n-tier application servers. The promise of application servers was that you didn't have to worry about deployment, you just wrote your applications, and let the application server worry about scalability and resilience. This led to people writing massive monolithic apps, where almost every function in the app depended on every single other function, killing any chance of ever having either resilience or scalability. When performance became an issue, clustering was "turned on", and often performance went down. And that's when containerless micro service solutions started becoming popular - small services that could be individually scaled. These new architectures forced you to think about scalability up front, making those decisions early.

Are you seeing a pattern here? Letting the technology handle resilience and scaling for you is bad, forcing you to address it up front is good. But Meteor seems to be making the exact same mistakes the relational databases and n-tier application servers made. It's trying to hide those concerns from you, in the name of "making programming fun again". While fun at first, this is certainly not going to be fun when your site gets popular and starts falling over because of the load it gets.

But maybe the Meteor developers have come up with a smart way to scale it. There are apparently two ways you can run multiple Meteor nodes, and the apparently better one is described here. The approach? Have each Meteor node tail the MongoDB Oplog. Or in simple English, make every write operation in the system go to every node in the cluster. I'll let you decide whether you think making that approach scale is fun.

As I said at the start I resonated well with the title of the article - but it seems that I have a very different idea of what's fun to what the authors of Meteor have. In my opinion, hiding the details of hard problems to scale is not fun. Rather, putting them in your face, giving you the tools to solve them at the right time, now that's fun. This is exactly what Play Framework and Akka do - particularly Akka, in which the assumption when you program is that every other part of the app is likely down or not responding, and you are forced to deal with what happens when that's the case. Using these technologies to solve these hard problems is not only fun, it's very satisfying - seeing an app with 50000 concurrent users broadcasting updates every second scale with only 10 nodes, it's exciting too!

The fun approach to hard problems is not to run away from them to something that pretends they don't exist. It's to embrace them head on, using technologies that are designed to help you do so.

About

Hi! My name is James Roper, and I am a software developer with a particular interest in open source development and trying new things. I program in Scala, Java, Go, PHP, Python and Javascript, and I work for Lightbend as the architect of Kalix. I also have a full life outside the world of IT, enjoy playing a variety of musical instruments and sports, and currently I live in Canberra.