https://www.shopify.com/robots.txt lists a lot of sitemap files, which tend to b...

prayze · on Dec 14, 2023

Did this suddenly get changed? Nothing but "# ,: # ,' | # / : # --' / # \/ />/ # /" is shown now.

wizzwizz4 · on Dec 14, 2023

It's just your browser's HTML parser. Line 6:

  #                         / <//_\

This is being interpreted as a malformed HTML closing tag, which (according to the HTML5 parsing algorithm published by WHATWG) gets treated as a comment. The file doesn't contain any > past this point. This leaves the uncommented contents from lines 1–6:

  #                               ,:
  #                             ,' |
  #                            /   :
  #                         --'   /
  #                         \/ />/
  #                         /

Or, with whitespace collapsed:

  # ,: # ,' | # / : # --' / # \/ />/ # /

Which should be exactly what you observe.

Ref: https://html.spec.whatwg.org/multipage/parsing.html https://developer.mozilla.org/en-US/docs/Web/CSS/white-space...

xnx · on Dec 14, 2023

Weird. I think it did change. Google cache shows a 2229 line file: https://webcache.googleusercontent.com/search?q=cache%3Ahttp...

capableweb · on Dec 14, 2023

Seems it might be looking at the referrer. Loading https://www.shopify.com/robots.txt from clicking the link shows the weird line while opening it in a private browser window shows the right one.

calebegg · on Dec 14, 2023

For some reason, "view source" gets the right list. Maybe a referer issue like someone else said.

KomoD · on Dec 14, 2023

Looks like it's just Shopify's own pages and not anything related to actual stores.

calebegg · on Dec 14, 2023

It seems sort of questionable to use the list of things to not scrape as a starting point for scraping.... I mean, I get it's not actually enforced.

das_keyboard · on Dec 14, 2023

Not really sure why all the answers here are flagged, but you may be mistaken.

The robots.txt does not exclusively list what not to scrape.

It provides information on which parts are allowed and wich are not (disallowed).

It also provides sitemaps for crawlers as a starting point with more information (eg. which sites are available and how often are they updated, etc.)

xnx · on Dec 14, 2023

Since ~2009 many crawlers recognize "Sitemap:" directives in robots.txt to link to sitemaps: https://en.wikipedia.org/wiki/Robots.txt#Sitemap