I used a technique like this a few years back in a production product ... We had an integration partner (who we had permission to integrate with) that offered a different api for integration partners than was used for their website but which was horribly broken and regularly gave out the wrong data. The api was broken but the data displayed on their web page was fine so someone on the team wrote a browser automation (using ruby and selenium!) to drive the browser through the series of pages needed to retrieve all the information required. Needless to say, this broke all the time as the page/css changed etc.
At some point I got pulled in and ran screaming away from selenium to puppeteer -- and quickly discovered the joy that is scripting the browser via natively supported api's and the chrome debugger protocol.
The partners web page happened to be implemented with the apollo graphql client and I came across the puppeteer api for scanning the javascript heap -- I realized that if I could find the apollo client instance in memory (buried as a local variable inside some function closure referenced within the web app) -- I could just use it myself to get the data I needed ... coded it up in an hour or so and it just worked ... super fun and effective way to write a "scraper"!
OnDocumentReady -> scan the heap for the needed object -> use it directly to get the data you need
At some point I got pulled in and ran screaming away from selenium to puppeteer -- and quickly discovered the joy that is scripting the browser via natively supported api's and the chrome debugger protocol.
The partners web page happened to be implemented with the apollo graphql client and I came across the puppeteer api for scanning the javascript heap -- I realized that if I could find the apollo client instance in memory (buried as a local variable inside some function closure referenced within the web app) -- I could just use it myself to get the data I needed ... coded it up in an hour or so and it just worked ... super fun and effective way to write a "scraper"!
OnDocumentReady -> scan the heap for the needed object -> use it directly to get the data you need