We take the typical blog url design (/2024/08/14/slug) for granted but back in the very early 2000s pretty much every blog tool had its own URL design. Matthew Thomas back then took an inventory:
I could have sworn there was a changeset in which Matt Mullenweg was implementing those cruft-free URLs in his new fork called Wordpress, but trying for google for something with "Wordpress" from the early 2000s is basically impossible in 2024.
He calls file extensions cruft, but i've come to value them. They are a simple way to indicate file type - desired or offered - which is easily understood by machines and people.
I currently work with an API which does a bit of content negotiation using the Accept header, so clients can request data in various formats - application/json for a snapshot, text/event-stream for an updating feed, or text/html for an interactive dashboard. I wish it didn't. I wish we'd just used file extensions. Trivial to use in a browser or via curl, trivial to implement on either side.
That's fine (and already common) for images, JSON, etc.
But nobody wants webpage URL's that randomly end in .php, .htm, .html, .aspx, and so forth. That's just noise that is both gibberish and entirely irrelevant to the user.
For _APIs_ I prefer to use both - the only downside is that resource names need to be restricted to _not_ include trailing `.{EXT}`s (either at all or limiting EXT to things that aren't valid content types).
E. g. `/books` - looks at the `Accept` header. `/books.json` - sets the `Accept` header to `application/json`. `/books.xml` - `application/xml`, and so on.
I guess this reflects a view of blogging that maybe is more what people today would use twitter or mastodon for, with lots of blogposts with the same title like "open thread" or "links for sunday". Today people mostly use blogs to publish essays, and then a slug based on the title should be sufficient, since you're not going to publish two essays with the same title. That's what substack uses.
I think the date is still extremely valuable. Knowing whether something is from last month or a decade ago makes a huge difference. It's also useful so that URL's can be sorted by date.
Also, "you're not going to publish two essays with the same title" feels false. If you write 1,000 pieces and use short titles and tend to write about the same subjects, it feels extremely likely that you'll wind up repeating titles.
And it's sad how often one needs to use the URL to find the date, since many authors just don't put it on the page (corporate sites are particularly scared of dating their stuff)
Others seem to think just day and month is fine, as if the year isn't the most significant part. And if both numbers are <=12 then you have to go and find out what locale the author formats their dates in...
I agree, but I think it's important to note that the date in the URL can also be misleading. For example, it's often assigned at time of creation. If that page or post gets updated years later, even if almost entirely rewritten, it still has the original date in the URL
> even if almost entirely rewritten, it still has the original date in the URL
If we're talking about blogs/news, they don't ever get almost entirely rewritten. The original publication is the only date that matters, and it matters a lot.
If we're talking about evergreen content like documentation, then of course you don't put dates in the URL. A small "last updated" on the page itself is appropriate there.
> If we're talking about blogs/news, they don't ever get almost entirely rewritten.
Unfortunately, this isn't the case. It should be the case IMHO, but it (currently) isn't. The SEO/marketing people nowadays (ab)use popular pages for the search rankings and update them regularly to keep the content fresh and highly ranked (since search engines give much preference to new content).
Also, even for strict blogs/news, it's not unusual for a particular post to be a draft for many months before publishing. Most serious blog will fix the date to match publish date, but that isn't what happens by default especially in Wordpress (which is the most important platform for blogs).
You're onto something: Very early blogs around the millennium where often build around very short paragraphs; not big articles. Take a look at these; if one squints those looked far more like later Twitter streams:
Then in the early 2000s blogging resulted in a style with longer articles instead of paragraphs. In the middle 2000s a "retro" style begun with far shorter and differentiated entries, the so-called tumblelog:
How do search engines figure out the date of webpages that don't contain it in the metadata?
Poorly.
I have a blog so old I titled it an "online diary." It pre-dates search engines, so they tend to date the diary entries (blog posts) based on first crawl. Which means lot of the dates presented by the search engines are off by several years.
Well, arguably both Movable Type and Radio Userland's URLs were already pretty cruft-free. The success of Wordpress was mostly due to other factors (free, php, great feeds, great markup in default templates, great support for plugins).
https://web.archive.org/web/20030810201315/http://mpt.phrase...
He was on the search for his ultimate blogging system, where this "cruft-free" URL structure should be used:
https://web.archive.org/web/20051107103030/http://mpt.phrase...
I could have sworn there was a changeset in which Matt Mullenweg was implementing those cruft-free URLs in his new fork called Wordpress, but trying for google for something with "Wordpress" from the early 2000s is basically impossible in 2024.
Update: I found this: https://ma.tt/2004/08/mike-on-uris/