A well known security issue in HTTP is that if you return 404 Not Found for a resource that doesn’t exist, but return 401 Unauthorized or 403 Forbidden for a resource that you’re not allowed to access, then you might be giving away information that an attacker could use, that is, that the resource exists. While in some cases that’s no big deal, in other cases it’s a security problem.
The usual solution to this is when you’re not permitted to access something, the server should respond as if it doesn’t exist, by returning 404 Not Found. Thus, you receive the same response whether the resource exists or not, and are not able to determine whether it does exist. I’d like to argue that this is the wrong approach, it is not semantic.
Just before we go on, I just want to clarify, while returning 401 Unauthorized may be a semantic way to tell a client that you need to authenticate before you can continue, on the web, it’s not a practical way to tell the client that, since the HTTP spec requires that it be paired with some user agent aware authentication method, such as HTTP BASIC authentication. But typcially, sites get people to log in using forms and cookies, and so the practical response to tell a web user that they need to authenticate before they can continue is to send a redirect to the login page. For the remainder of this blog post, when I refer to the 401 status code, I am meaning informing the user that they need to authenticate, which could be done by actually sending a redirect.
So, let’s think about what we’re trying to achieve here when we say we want to protect discoverability of resources. What we’re saying is that we want to prevent users who don’t have discoverability permission from finding out if if something does or doesn’t exist. The 404 Not Found status code says that a resource doesn’t exist. So if I’m not allowed to know if a resource doesn’t exist, but the server sends me a 404 Not Found, that’s a contradiction, isn’t it? It’s certainly not semantic. Of course, it’s not a security issue because the server will also send me a 404 Not Found if the resource does exist, but that’s not semantic either, the server is in fact lying to me then.
Sending a 404 Not Found in every case is not the only solution, there’s another solution where the server can say what it really means, ie be semantic, while still protecting discoverability. If a resource doesn’t exist, but I’m not allowed to find out whether a resource does or doesn’t exist, then the semantic response is not to tell me it doesn’t exist, it’s to tell me that I’m not allowed to find out if it exists or not. This would mean, if I’m not authenticated, sending a 401 Unauthorized, or if I am authenticated but am still not allowed to find out, to send a 403 Forbidden. The server has told me the truth, you’re not allowed to know anything about this resource that may or may not exist. And, if it does exist, and I’m not allowed to do it, the server will do the same, sending a 401 or 403 response. In either case, whether the resource exists or not, the response code will be the same, and so can’t be exploited to discover resources.
In this way, we have implemented discoverability protection, but we have done so semantically. We haven’t lied to the user, we haven’t told them that something doesn’t exist that actually does, we have simply told them that they’re not allowed to find out if it does or doesn’t exist. And we haven’t seemingly violated our own contract by telling a user something is not found when they’re not allowed to know if it’s not found or not.
In practice, using this approach also provides a much better user experience. If I am not logged in, but I click a link somewhere to the resource, it would be much better for me to be redirected to the user login screen so that I can login and be redirected back to the resource, than for me to be just told that the resource doesn’t exist. If the resource doesn’t actually exist, then after logging in, I can be redirected back to the resource where I’ll get a 404 Not Found, this is semantic and makes sense, it’s not until I log in that I can actually get a Not Found out of the server, that’s what discoverability means. This is exactly my argument in a Jenkins issue, where Jenkins is currently returning a 404 Not Found screen (with no option to click log in) for builds that I don’t have permission to access until I log in.