The quick summary is that PDFs are automatically downloaded, hosted locally, and links rewritten to the local PDF; other URLs, after a delay, call the CLI version of https://github.com/gildas-lormeau/SingleFile to run headless Chrome to dump a snapshot, which are manually reviewed by myself & improved as necessary, and then links get rewritten to the snapshot HTML. They get some no-crawl HTTP headers and robots.txt exclusions to try to reduce copyright trouble.
Lots of jobs and scripts, plus usage of archive.org as well. It's an interesting read.