What about caching? If the HTML and the JS are both updated, but the browser receives the new version of one and the old version of another, this will break your page. (Since you'd now have to update the integrity attribute for every JS change, it means you run this risk every time you update your JS.)
To be fair, running a mismatched version of the JS could already break things if the changes are big enough, but for minor updates, the user often won't notice the difference. Now, these cases are hard failures. That's not necessarily a bad thing, but I wonder if there's a path here to tell the browser "you have an old version of the content; go get the new version."
CDNs and invalidations can be tricky, and it sounds like this could lead to things being broken more often if you're caught in the window where one piece updates before the other.
This isn't a concern with our implementation because a hash of the asset bundle is also included in the URL. This is a pretty common cache-busting technique for static assets and lets you send more aggressive cache directives to the browser.
No, it was a very good point. Not everyone adds hashes to filenames, and to me it seems that you're right in that weird caching can break pages that way.
If indeed this is the case, subresource integrity needs a big warning sign about that. For me, your comment was that warning sign, so please keep posting while you're not awake yet.
Why would it need a warning? If the HTML provides a new integrity="" hash, then any cached version obviously wouldn't pass. Subresource integrity makes it easier to determine if a cached file has expired. The file can be permanently cached for any HTML that requests the same hash value(s).
SRI allows one to specify multiple hashes. In other words, to prevent this particular mismatch, one could include the hash of the new resource as well as the previous valid hash.
> What about caching? If the HTML and the JS are both updated, but the browser receives the new version of one and the old version of another, this will break your page.
Only if your page requires JavaScript to function and doesn't gracefully degrade. None of us would ever write that sort of page, would we?
It would break anyway because pages are usually designed to degrade when JavaScript is disabled, not when the JavaScript fails to load or behaves in an unexpected way.
You must give each new Javascript version a different filename (by including either the hash or a version number) and keep old Javascript version available forever or at least for a large enough timespan.
Would love for the next generation of SRI to include signatures as an option (e.g. integrity="ed25519-<public_key>").
Hashes means you have to specify an exact version, so there's not an easy way to add integrity to things like Google's CDN for jQuery that has latest minor version update links for the major API versions of jQuery.
Of course, that means also adding a signature to the payload response (maybe an "Integrity: <hash>-<sig>" header?). So it's understandable why signatures weren't in scope for the first release.
Ah, I see. I misunderstood. Though signatures seem to be just adding another part in the deployment process where you update the files themselves as well as the pages they're loaded from.
If you include content produced by a third party (e.g. JQuery) off a CDN, right now you can use the hash-based SRI mechanism to make sure that only the exact file you specified can be included, otherwise the CDN could suddenly send any compromised code. The file can't be changed, because otherwise the hash wouldn't match.
With a signature, you could specify "include cdn.com/jquery-X if signed by the JQuery project", so JQuery could publish security updates and those could be rolled out to the CDNs and included in all pages automatically, without the siteowners having to make changes (if the security fix doesn't break compatibility).
For your own content, you'd mostly gain the convenience of not having to update the hashes on all the pages including the resource.
This is more convenient but less secure than a straight-up hash. If an attacker compromises the JQuery signing key, they could still serve malicious files. With a hash, the authenticity is ONLY dependant on the TLS connection to the main website, e.g. github.
TL;DR:
* hash: need to compromise the main website, that supplies (and authenticates) the hash
* signature by CDN: attacker can either compromise the main website OR <del>the third party CDN</del> <ins>author/signer of the third-party resource</ins>
It would still help against compromise of the CDN, but not against compromise of the original source.
Of course it's a trade off. For stuff like Google Fonts, the Facebook like button etc I'd expect that hashes won't become common, because the effort of publishing changed hashes and embedding them into sites is to big.
Yes, it's nominally less secure but I trust the authors of major scripts (ex. the Facebook like button) to be able to keep their keys secure. If losing private keys is a major security concern, then TLS is also useless.
It's not "nominal"; losing private keys is a major security concern. No, "TLS is useless" does not follow from that - we have things like forward secrecy which are security models specifically designed to give some protection in the case of private key compromise.
Especially in the case of a library developer, they hold the keys to many websites, so there is extra incentive for an attacker to break that rather than "some random guy's website". The more third-party signers you trust, the more holes you (and your users) have.
Furthermore, you are forcing your users (who actually run this code) to place their trust in these parties too, which is not a great thing (transitive trust) to force upon someone. (This is not the case for e.g. depending on system libraries explicitly installed by the user.)
The point is that with a signature you wouldn't have to change all the pages including the resource, but just sign the updated resource with the same key.
It's just a semantic question. Does a URL point to a specific version of a resource or does it point to whatever the server considers to be the resource at a given time.
It would seem more desirable to be able to point to a specific version, instead of allowing a third party to be able to insert implicitly trusted code without acknowledgement.
You're separating which third party can do that, which is useful. I don't think it's at all unusual or wrong for me to decide that, if I'm using minified jQuery, I trust the jQuery project signing key, but I don't really trust whatever CDN I'm using (or that I want the freedom to choose a CDN solely for technical performance and not for security infrastructure).
If that's not convincing, consider the case where it's my own JS. I don't trust myself to run a CDN; I don't trust a CDN with the ability to modify my code. This allows me to build a single-page app that has ridiculously long cache lifetimes (so my own server load is low), and hand the actual, changing code off to a CDN, but verify my own signature on the data.
If that's not convincing, consider that data signing keys can generally be kept on non-internet-facing machines (and you can airgap, use a HSM, whatever), but performant SSL implementations by definition have to have their private key be in memory on an internet-facing server.
That is all fine and it would cover many cases. However, someone other than you is still able to push code to your users without your acknowledgement (assuming you have trusted a third party key).
If you are already using a CDN, put your updated manifest (index.html) there as well.
>Yes, I want that. I just want to control which third parties I trust. That's why it's called a "trusted third party", not just a "third party."
I guess I just don't see why I would trust a library developer, but not a CDN. If you don't control the keys, you don't know who has them. (Although, I'd also argue that you don't even really know if you do control the keys)
>I'm not sure how this helps. Wouldn't this leave the index.html in the hands of the CDN, such that they are free to modify it?
I think you are right, as the system currently works index.html would not be safe. Currently you need a more dynamic system where the manifest is protected as well. A sidechannel (WebSockets, WebRTC) could be established to securely deliver updated manifests (which a lightweight client would translate into DOM operations).
> I guess I just don't see why I would trust a library developer, but not a CDN. If you don't control the keys, you don't know who has them. (Although, I'd also argue that you don't even really know if you do control the keys)
I'm not capable of running a CDN myself. So I have to trust someone. I might as well minimize the number of potential someones I trust; I claim that gives me a concrete benefit.
Since I'm not writing jQuery myself, I'm not minimizing it myself, and I'm certainly not minimizing it by hand, I do already have some trust in the jQuery project and their infrastructure. I don't currently have any trust in a CDN. If I'm going to move to using a CDN, I'd like a route with lets me put slightly more trust in the jQuery project (who I already trust to some extent) than in some completely new party.
Alternatively, I don't have to trust jQuery. I can trust someone else who's good at running secure build infrastructure, auditing libraries like jQuery or anything else, and minimizing and signing the result. (This is, loosely, analogous to the role that a Linux distribution plays.) Then I can choose to trust these people or not based solely on how good they are at security, choose my library authors based solely on how good they are at writing libraries, and choose my CDN based solely at how good they are at distributing content. I don't have to conflate the security trust with anyone other than the people I intentionally choose to put security trust in.
Reducing the number of someones you trust is good. I think the only difference in our approach is what others are trusted to do. In your model, some are trusted to send new code to users. In my model, new code must be acknowledged first, before users can consume it.
I don't think one approach is necessarily worse, but one can provide all of the essential functions of the other, without allowing unseen code to be pushed to users.
But if you need to sign updates, a third party CAN'T insert new code without it going through the site owner or a trusted party (e.g. new jQuery versions signed by the jQuery project). And it would seem desirable to be able to roll out a security fix without having to touch every single page that includes it (and the potential cache issues discussed above)
You can create similar semantics. The only difference is in who controls what code goes to your users.
I'd consider jQuery a third party. If only signatures are checked (and not content), then trusted third parties can push whatever code they'd like to your users.
This is nice and all, but as a security-paranoid I really wish Github would spent some effort improving their access control model. Today, Github access control is extremely course-grained, such that if I want to give someone permission to merely set labels on issues, I also have to give them permission to push arbitrary changes to the master branch. Additionally, the access control model is weird: I can define "teams" with some set of members and some set of repositories they can access, but the entire "team" must have the same access level to all repositories they can access, making it hard to define some repositories as being more sensitive than others. (Or, possibly, I've misunderstood the model, but if so that's its own problem.)
This matters: If someone wants to hack my company, they're not going to do it by hacking Github's CDN. They're going to do it by targeting particular employees -- probably focusing on those who have the least security experience. To reduce risk, I need to give each team member the least authority they need to do their job. Github is making it really hard for me to do that; I tend to have to give "admin" rights to everyone. :(
I really wish browsers could leverage this for caching across origins. If my copy of jQuery has the same SHA256 as another file the user has already downloaded, there's no need to load it again
There's subtle, dangerous ways this can be exploited. (Short version: It'd make SRI usable as an oracle to confirm or deny guesses for the content of a cross-domain resource.)
Couldn't this mitigated by user-agents introducing random, Poisson distributed delays in all cached responses? The peak of the distribution could be made user configurable to make it further difficult to predict a user-agent.
It leaks private user info -- a malicious server could include a JS file confirmed to be highly sensitive/top secret, and measure whether the client already has that cached. If so then the user is confirmed a sensitive target.
No, there's a worse attack possible: you can attempt to include a resource with sensitive contents with SRI, and use the SRI to make a "guess" at the hash of the contents. If your guess is incorrect, the resource will fail to load, and you can detect this error and make another guess.
Obviously, this technique will only work if the contents of that resource are constrained enough that it's possible to guess them with brute force. Depending on how SRI interacts with the browser cache, though, it may be possible to make guesses very quickly -- it is likely that the browser will only fire one HTTP request for the initial attempt, and will load the resource from cache for all subsequent attempts.
you could just add a new public=true option to counter this. I think you can even already check that with an iframe (or js head inject & timing) anyway, no need for CSP for that.
So to protect against a single malicious server who might discover that we had previously loaded a cached resource, we shouldn't implement a cross-origin cache and have to make repeated requests, guaranteeing 3rd parties (the CDN) keep getting GET requests?
You're just trading one problem (someone learning I previously requested a file) for another (leaking referrers to a CDN).
Also, if you're loading "highly sensitive/top secret" data with a <link integrity="" href=""> or <script integrity="" src=""> tag, you have bigger problems.
I think another subtle exploit is you can potentially track if a user has visited a website. E.g., site1 uses SRI on their unique resource, site2 uploads the same resource and SRI on theirs. so now site2 knows if a user has been to site1.
This is one of the best additions to the Web Platform as of late IMHO. Great if you run an operation with a lot of third party code coming in from sources that you don't control - even beyond the security concerns for just "keeping them honest" about the scripts they run on your page. I hope it gets adopted by all browser vendors soon.
This looks like a fantastic technology to protect against maliciously injected javascript. Great to see GitHub leading the charge here and taking their security seriously.
As mentioned in the article, they were victims of such an attack.
Frankly I'm relieved to see that browser vendors and leading tech firms are maintaining control of the situation and protecting users, even if driven by self-interest.
Widespread adoption of Subresource Integrity could
have largely prevented the Great Cannon attack
earlier this year.
Sorry, it wouldn't have. From the CitizenLab report [1] on the Great Cannon attacks:
In the attack on GitHub and GreatFire.org, the GC
intercepted traffic sent to Baidu infrastructure
servers that host commonly used analytics, social,
or advertising scripts. If the GC saw a request
for certain Javascript files on one of these servers,
it appeared to probabilistically take one of two
actions: it either passed the request onto Baidu’s
servers unmolested (roughly 98.25% of the time),
or it dropped the request before it reached Baidu
and instead sent a malicious script back to the
requesting user (roughly 1.75% of the time). In
this case, the requesting user is an individual
outside China browsing a website making use of a
Baidu infrastructure server (e.g., a website with
ads served by Baidu’s ad network). The malicious
script enlisted the requesting user as an unwitting
participant in the DDoS attack against GreatFire.org
and GitHub.
So the idea is someone runs a site with:
<script src="http://baidu.com/ads.js">
When visitors request these scripts the request passes through the "Great Cannon" which 1.75% of the time serves a different script instead. That malicious script makes lots of requests to the victim sites, and they're overloaded.
To prevent this sort of attack with SRI you would need to change your page to look like:
<script src="http://baidu.com/ads.js"
integrity="hash of the real ads.js">
The problem is, Baidu isn't going to be willing to commit to always serving the same ads js: they need to be able to make upgrades.
SRI is useful in the case where the entity producing the html is referencing js that they've uploaded to a third party CDN or js where they choose what version to run, but not in the normal "include a snippet and we'll do stuff to your page" model.
(To block the Great Cannon there, what would have worked would be moving the js serving to HTTPS.)
Couldn't the great chinese firewall just intercept Github.com's HTML page as well and change the subresource integrity hashes? I thought that the Great Chinese Firewall already has the ability to penetrate SSL connections via some means.
The "Great Cannon" attack that they talk about in the blog post wasn't caused by replacing JS in GitHub pages. It replaced a Baidu Analytics script, used across the Chinese internet on thousands of websites, with a malicious one intended to DDOS GitHub from people's home browsers when these websites were accessed outside of China.
The way that this fixes the issue is by ensuring that the file being loaded on those thousands of websites is the correct one, and not the malicious attack script that was injected by the Chinese government or other such actors, otherwise it's not run at all.
Could the Chinese government rewrite the HTML of all these thousands of websites to also change the hash? Theoretically yes, but practically it makes it much more difficult.
The Great Firewall would probably have copies of private keys issued by CNNIC, and there's a bunch of attacks to get private keys via heartbleed, and a bunch of Debian easily guessable private keys, but there's no general purpose 'penetrate SSL' attack that we know of right now.
Given control of a certificate authority can the Chinese government issue a new certificate for github.com? I assume they can enforce that computers sold in China have their authority in the default trust list, at which point I think all bets are off when it comes to SSL.
Yes, however if they can change the contents of the HTML they can probably modify CSP headers, which means they can just deliver whatever payload they want directly and wouldn't need to modify the integrity hashes.
They could (assuming that they can infiltrate SSL as you said). I think this is more oriented towards a different attack vector whereby the controller of a resource (JS, CSS, etc.) can alter that resource while the parent page remains unaffected.
Yes, though it involves actively processing every request for every page and processing it to replace (or just remove) integrity attributes from the HTML; that's a lot harder than just wholesale replacing the contents of specific JavaScript files on their way across the firewall.
<script src="..." is very dangerous. At best, you can vet the src and check to see if it's benign or not. Often times, that vendor and their "1-line of javascript to get our whiz-bang service" in turn loads other javascript files. I don't see how cryptographically signing the bootloader solves anything in this case. Compromised analytics or vendor javascript will still lead to total site pwnage if I'm reading this right.
This protects you from providers that go rogue or are compromised after you enable their JS.
It also lets you use CloudFront as a CDN for your own JS without having to trust them to serve the content as you described it, if you calculate your hashes based on the scripts you sent them.
The parent poster's point is about providers that tell you to include script A which then loads X and Y. Knowing A can't change isn't very helpful in this situation as X and Y could change.
Careful! I've seen proxies (TracFone I think) subtly modify JSON files by removing whitespace, probably in the name of download speed. That will break the hashing.
If you start seeing unexplained errors on pay-as-you-go phones, you'll know why; although if this facility gains popularity then I'm sure they'll be pressured to stop modifying content.
This is not possible if you are loading resources over HTTPS (unless the carrier has installed a root certificate on your device, in which case you're not in a great place security-wise anyway).
The next step: A distributed, content-addressed caching system that allows the web browser to fetch the data from the fastest/nearest caching server byhash.
For a static site I expect you would be far less concerned about session hijacking or XSS if someone took over that domain. Even a complete single-page app should serve the initial html request from a trusted domain/server.
This is an excellent idea. So long as you trust the server you're talking to, and it's using TLS, you can eliminate attack vectors by a compromised CDN this way.
It's nice that Github, Inc. likes subresource integrity.
Did they put it on their web pages? As of right now, it doesn't seem to be on their home page. The next big step is for Wordpress to support it.
Subresource integrity is in some ways more important than "HTTPS Everywhere", because the MITM-as-a-service sites such as Cloudflare subvert HTTPS Everywhere. For security reasons, you might choose to serve your home page and a few security-critical pages from your own server, without using a CDN. But run everything else through the CDN, using subresource integrity to keep the CDN honest.
With subresource integrity, many items no longer need to be encrypted. This is good for security. Encryption interferes with caching, and HTTPS in front of caches means that the attack surface is larger, and includes the CDN.
(Yes, there's an argument that HTTPS conceals what the user was browsing. Not really. Checking document length will provide a good hint on what static asset was read. The pattern of document lengths requested tends to fingerprint the page being read.)
To be fair, running a mismatched version of the JS could already break things if the changes are big enough, but for minor updates, the user often won't notice the difference. Now, these cases are hard failures. That's not necessarily a bad thing, but I wonder if there's a path here to tell the browser "you have an old version of the content; go get the new version."
CDNs and invalidations can be tricky, and it sounds like this could lead to things being broken more often if you're caught in the window where one piece updates before the other.