The problem with a proxy server implementation is modern browsers (AFAIK) are unwilling to submit HTTPS requests in the clear to a proxy server rather than use CONNECT. There’s nothing about the protocol that would make that impossible or even inconvenient, browsers are just unwilling to do it. And I can see the reasoning, but if you actually want this to happen, like here, you’re stuck (or have to MITM yourself, which is its own can of worms).
That's the whole purpose behind ZAP and I use it for archiving pages all the time (they use hsqldb as the file format); it works fantastic for that purpose, but does -- as you correctly pointed out -- require MITM-ing the browser to trust their locally generated CA: https://github.com/zaproxy/zaproxy#readme
Thanks for the reference! I never investigated ZAP closely, for some reason it never occured to me it might be able to be used like that (if anything I’d have turned to mitmproxy, but that would require building a substantial amount of stuff to handle the actual archiving).
The problem with MITMing your own browser is (apart from the fact that it is an ugly hack in a security-critical portion of your setup) I don’t think any tool for doing that (including the one you referenced, from what I can find quickly) applies the complex set of important stuff browsers do on top of just verifying chains against a root store.
The bare minimum for me would be HSTS and the HSTS preload list, but I’d also like to see CT and Must-Staple enforcement, OneCRL support, TLD and validity term restrictions for some roots, and so on. (This is more or less what Chrome does from what I know, though I think they have their own equivalent to Mozilla’s OneCRL.)
Given what ZAP is designed to do, I'd bet $1 it will actively strip off any such headers before returning the response to the browser, since I think by definition injecting a MITM cert is the very case such stapling is designed to prevent against :-)
But, corporate proxies must face similar problems since they, too, MITM things, but I'm deeply thankful that I don't work in such an environment in order to know what the behavior is in that circumstance
I hope this doesn't come across as glib, but ZAP is Apache licensed, so if you are able to come up with the security behavior you want, I'd bet they'd welcome any patches to help implement it