I have recently finished an implementation of socket.io server-side support for Play Framework. Naturally, I’ve become intimately familiar with the protocol, and I’ve formed a few opinions of it during the course of my efforts, which I’m going to share.
All reviews of technology are relative, relative to the things that the reviewer values in technology. Not everyone values the same things, and so this means that a review by someone who values different things to you is probably irrelevant to you. My reason for publishing this review is that there are a good number of people out there who share similar technology values to me, and so they will find this review useful as they evaluate whether socket.io is right for them. For people that don’t share the same values as I do, there’s not really much point in reading this blog post, you’ll probably disagree with me, but the disagreement will most likely be a disagreement in values, not on socket.io’s ability to meet those values. We can debate all day what set of values is right for software development, but this is not the blog post for that.
So what are my values? Here are a selection of values that are relevant to this review:
- I am a strong proponent of reactive systems, that is, systems that are responsive, resilient, scalable and message driven. In this context, streaming with backpressure is something that I see as important - a resilient system needs to propagate backpressure to ensure parts of the system don’t get overloaded.
- I’m strongly for tools, libraries and frameworks that enable high productivity software development.
- I think it’s important to have well defined standards and interfaces to maximise compatibility between decoupled implementations.
§The good
Why should you use socket.io? I was initially sceptical that socket.io offered much value at all, but over the course of implementing it, I’ve changed my view on that.
A lot of people are saying that with all major browsers supporting WebSockets, there’s no need for socket.io anymore. This is based on the assumption that all socket.io offers is a fallback mechanism to long polling when WebSockets are not available. This is completely false, most pertinently because socket.io doesn’t even provide that, the fallback to long polling is provided by a protocol that socket.io sits on top of, called engine.io. Here’s basically what engine.io provides:
- Multiple underlying transports (WebSockets and long polling), able to deal with disparate browser capabilities and also able to detect and deal with disparate proxy capabilities, with seamless switching between transports.
- Liveness detection and maintenance.
- Reconnection in the case of errors.
The split between engine.io and socket.io is actually a great thing, engine.io implements layer 5 of the OSI model (the session layer), while socket.io implements layer 6, the presentation layer. I don’t know if the authors of engine.io/socket.io were aware of how closely their split mapped to the OSI model, but they did a great job here.
Now, even if you say that wide support of WebSockets makes engine.io redundant, it only makes half of the first point above redundant - it doesn’t address proxies that don’t support WebSockets (although this can be somewhat worked around by using TLS), and then it doesn’t have any mechanism for liveness detection (yes, WebSockets support ping/pong but there’s no way to force a browser to send pings or monitor for liveness, so the feature is useless), and browser WebSocket APIs have no in built reconnection features.
So that’s engine.io, back to my earlier point, transparent fallback of transports is not a socket.io feature. So what exactly does socket.io provide then? Socket.io provides three main features:
- Multiplexing of multiple namespaced streams down a single engine.io session.
- Encoding of messages as named JSON and/or binary events.
- Acknowledgement callbacks per event.
The first two of these I think are important features, the third will feature in what I think is bad about socket.io.
§Namespaces
Multiple namespaces I think is a great feature of socket.io. If you have a rich application that does a lot of push communication back and forth with the server, then you will likely have more than one concern that you’re communicating about. For example, I’ve been working on a monitoring console, this console dynamically subscribes to many different streams based on what is currently in view, sometimes it needs to view logs, sometimes it needs to view state changes, some times it needs to view events. These different concerns are rendered in different components, and should be kept separate, their lifecycles are separate, the backend implementations of the streams are separate, etc. What we don’t want is one WebSocket connection to the server for each of these streams, as that is going to balloon out the number of WebSocket connections. Instead, we want to multiplex them down the one connection, but have both the client and the server treat them as if they were separate connections, able to start and stop independently. This is what socket.io allows. socket.io namespaces can be thought of (and look like) RESTful URL paths. A chat application may encode a connection to a chat room as /chat/rooms/<room-name>
, for example. And then a client can connect to multiple rooms simultaneously, disconnect them independently, and handle their events independently.
Without this feature of socket.io, if you did want to multiplex multiple streams down one connection, you would have to encode your multiplexing protocol, implementing join room and leave room messages for example, and then on the server side you would have to carefully manage these messages, ensuring that subscriptions are cleanly cleaned up. There is often a lot more to making this work cleanly than you might think. socket.io pushes all this hard work into the client/server libraries, so that you as the application developer can just focus on your business problem.
§Event encoding
There are two important advantages to event encoding, one I think is less important, and the other is more important.
The less important one is that when the protocol itself encodes the naming of events, libraries that implement the protocol can then understand more about your events, and provide features on top of this. The JavaScript socket.io implementation does just that, you register handlers per event, passing the library a function to handle each event of a particular name. Without this, you’d have to implement that subscription mechanism yourself, you’d probably encode the name inside a JSON object that each message sent down the wire would have to conform to, and then you’d have to provide a mechanism for registering callbacks for that event.
The reason why I think this is the less important advantage is because I think callbacks are a bad way to deal with streams of events communicated over the network. Callbacks are great where the list of possible events is far greater than the number of events that you’re actually interested in, such as in UIs, because they allow you to subscribe to events on an ad hoc basis. But when events are expensive, such as when they’re communicated over a network, then usually you are interested in all events. In those cases, you also often need higher level mechanisms like backpressure, delivery tracking and guarantees, and lifecycle management, callbacks don’t allow this, but many streaming approaches like reactive streams do.
So, what’s the more important advantage? It allows the event name to be independent of the actual encoding mechanism, and this is of particular important for binary messages. It’s easy to put the event name inside a JSON object, but what happens when you want to send a binary message? You could encode it inside the JSON object, but that requires using base 64 and is inefficient. You could send it as a binary message, but then how do you attach meta data to it like what the name of the event is? You’d have to come up with your own encoding to encode the name into the binary, along with the binary payload. socket.io does this for you (it actually splits binary messages into multiple messages, a header text message that contains the name, and then a linked binary message). Without this feature of socket.io, it’s I think it’s impractical to use binary messages over WebSockets unless you’re streaming nothing but binary messages.
§The bad
So, we’ve covered the good, next is the bad.
§Callback centric
I’ve already touched on this, but in my opinion the callback centric nature of socket.io is real downside. As I said earlier, one of my values is reactive streaming, as this allows resilience properties such as backpressure and message guarantees to be implemented. Callbacks offer no such guarantees, if a callback needs to do a whole bunch of asynchronous tasks, there’s no way for it go back to the caller and say “hey, can you wait a minute before you invoke me again, I just have to go and store this to a database”. And so, an over zealous client can overwhelm a server as it sends events at a higher rate than it can process, causing the server to run out of memory or CPU. Likewise, when sending events, there’s no way for the emitter to say to the caller “hey, this consumer has a slow network, can you hold off on emitting more events?”, and so if the server is producing events too quickly for the consumer, it can run out of memory as it buffers them.
Of course, whether callbacks or streams are used to implement socket.io servers and clients is completely an implementation concern, and has nothing to do with socket.io. The implementation of socket.io that I wrote is completely streams based, using Akka streams to send events, and so backpressure is supported. On the client side, in the systems I’ve worked on, we use ngrx to handle events, which once again is stream based. And the authors of socket.io cannot be entirely faulted for implementing a callback based library, the browser WebSocket APIs only support a callback based mechanism with no support for backpressure.
Nevertheless, the callback centric design of socket.io manifests itself in the socket.io protocol - socket.io events are not single messages, but lists of messages, akin to argument lists that get passed to a callback, which is an awkward format to work with when streaming. As a socket.io library implementer that wants to provide full support for the socket.io protocol, this makes defining encoders/decoders for events awkward, because you can’t just supply an API for encoding/decoding a simple JSON structure, you have to allow decoding/encoding a list of JSON structures. For end users though, this doesn’t have to be a major concern - simply design your protocols to only use single argument socket.io events. So I wouldn’t treat this as a reason not to use socket.io, it’s just a feature (ie, supplying multiple messages rather than single messages in a single event) that I think you shouldn’t use.
§Acknowledgements
socket.io allows each event to carry an acknowledgement, which is essentially a callback attached to the event. It can be invoked by the remote side, which will result in the callback on local side being invoked with the arguments passed by the remote side. They’re convenient because they allow you to essentially attach local context to an event that doesn’t get sent over the wire (typically, this context is closed over by the callback function), and then when it’s invoked you have that context to handle the invocation.
Acknowledgements are wrong for all the same reasons that socket.io’s callback centric approach is wrong, acknowledgements subvert back pressure and provide no guarantees (with a stream you can have causal ordering and use that for tracking to implement at least once delivery guarantees, but acknowledgements have no such ordering and hence no such guarantees can be implemented).
Once again, this isn’t a reason not to use socket.io, you can simply not use this feature of socket.io, it doesn’t get in the way if you don’t use it.
§No message guarantees
Again, I’ve touched on this already, but I think it needs to be stated, socket.io offers no mechanism for message guarantees. The acknowledgements feature can be used to acknowledge that a message was received, but this requires the server to track what messages have been received. A better approach would be a more Kafka like approach, where the client is responsible for its own tracking, and it does this by tracking message offsets, and then when a stream is resumed, after a disconnect for whatever reason, the open message sent to the namespace would include the offset of the last message it received, allowing the server to resume sending messages from that point. This feature incidentally is built into Server Sent Events, so it would be nice to see it in socket.io too.
Of course, this can be built on top of socket.io, but I think it’s something that the transport should provide.
§The ugly
At a high level, I don’t think anything about socket.io is really that ugly. My ugly concerns about socket.io mostly exist at a low level, they’re things that only an implementer of the protocol will see, but some of them can impact end users.
§No specification
There is no specification for the socket.io wire protocol. This page seems to think it’s a specification, but all it describes is some of the abstract concepts, it says nothing about how socket.io packets get mapped down onto the wire.
Specifications are important for interoperability. The only way I could implement socket.io support is by reverse engineering the protocol, sometimes I had to do this with wireshark, since browser debugging tools don’t show the contents of binary WebSocket frames or binary request payloads.
Now, I’ve reversed engineered it and implemented tests, that’s a one time problem right? Wrong. Unless the socket.io developers never release another version of socket.io again, there will be incompatibilities in future. New features may be added. The developers might do it in a way that is backwards compatible with their own implementation, but because there’s no specification, other implementations may have implemented their parsing differently, which will cause them to break with the new feature. There may be edge cases that I didn’t come across. And so on.
Although the lack of specification primarily impacts me now, it will negatively impact users of socket.io in future, and this needs to be considered when deciding whether socket.io is right for your project or not.
§Weird encodings
The way socket.io and engine.io are encoded, especially to binary, is very weird in places. For example, there are about 5 different ways that integers get encoded, including one very odd one where each decimal digit is encoded as an octet of value 0
to 9
, with a value of 255
used as a terminator for the number. Which might make sort of make sense (but not really) if you had a use case for arbitrary precision integers, but this is for encoding the length of a payload, where a fixed length word, like 32 bit unsigned network byte order, would have done just fine. I mean, it’s not the end of the world to have all these different ways to encode integers, but it really doesn’t inspire a lot of confidence in the designers ability to design network protocols that will be tolerant to future evolution.
People much smarter than us have, over many years, come up with standard ways to encode things, which overcome many gotchas of communicating over a network, including performance concerns, efficiency concerns and compatibility concerns. There are then many libraries that implement these standard ways of encoding things to facilitate writing compatible implementations. Why forgo all that knowledge, experience and available technology, and instead come up with new ways to encode integers? It just seems very odd to me.
§Unnecessary binary overhead
There’s not a lot of use cases for sending binary messages, but many of the use cases I can think of I would want to have as little overhead as possible, such as streaming voice/video. Binary messages in socket.io require sending two messages, one text message that looks like a regular text message, including the name and namespace of the event, and a JSON encoded placeholder for the binary message (it literally looks something like {"_placeholder":true,"num":1}
), and then the binary message gets sent immediately after. This seems to me to be a lot of overhead, it would have been better to encode the entire event into one message, using a separate binary encoding for encoding the namespace/name and then placing the binary message in that.
I can understand a little why it is the way it is - because events contain multiple messages, you can mix binary and text messages. Having one reference the other with placeholders is a sensible way to encode that. But, this all comes back to the callback centric nature of socket.io, the reason events contain multiple messages is that callbacks can have multiple arguments, if each event only contained one message then this wouldn’t be an issue.
§Conclusion
I think socket.io is a very useful piece of technology, and is incredibly relevant today in spite of the popular view that widespread support for WebSockets makes it redundant. I would recommend that it be used for highly interactive applications, its namespacing in particular is its strongest point.
When using it, I recommend not taking advantage of the multi-argument and acknowledgement features, rather, simply use plain single argument events. This allows it to integrate well with any reactive streaming technology that you want to use, and allows you to use causal ordering based tracking of events to implement messaging guarantees.
The underlying protocol is a bit of a mess, and this is compounded by the lack of a specification. That particular point can be fixed, and I hope will be, but it will require more than just someone like myself writing a spec for it, it requires the maintainers of the socket.io reference implementations to be committed to working within the spec, and ensuring compatibility of new features with all implementations going forward. There’s no point in having a spec if it’s not followed. This will introduce friction into how quickly new features can be added, but this is a natural consequence of more collaboration, and more collaboration is a good thing.