We take the typical blog url design (/2024/08/14/slug) for granted but back in t...

twic · on Aug 14, 2024

He calls file extensions cruft, but i've come to value them. They are a simple way to indicate file type - desired or offered - which is easily understood by machines and people.

I currently work with an API which does a bit of content negotiation using the Accept header, so clients can request data in various formats - application/json for a snapshot, text/event-stream for an updating feed, or text/html for an interactive dashboard. I wish it didn't. I wish we'd just used file extensions. Trivial to use in a browser or via curl, trivial to implement on either side.

crazygringo · on Aug 14, 2024

That's fine (and already common) for images, JSON, etc.

But nobody wants webpage URL's that randomly end in .php, .htm, .html, .aspx, and so forth. That's just noise that is both gibberish and entirely irrelevant to the user.

codetrotter · on Aug 14, 2024

.htm and .html is relevant just like .pdf and .zip etc etc

But I agree about .php, .aspx and other extensions that are telling something about the server side. That’s irrelevant for the user.

shawabawa3 · on Aug 14, 2024

> .htm and .html is relevant just like .pdf and .zip etc etc

it's _kind of_ relevant, if it weren't for the fact that the absence of any extension implies .html >99% of the time

arethuza · on Aug 14, 2024

Wouldn't that also include JSON for the other 47% of the time?

mikae1 · on Aug 14, 2024

Forever and always .html, because: ＡＥＳＴＨＥＴＩＣＳ

svieira · on Aug 14, 2024

For _APIs_ I prefer to use both - the only downside is that resource names need to be restricted to _not_ include trailing `.{EXT}`s (either at all or limiting EXT to things that aren't valid content types).

E. g. `/books` - looks at the `Accept` header. `/books.json` - sets the `Accept` header to `application/json`. `/books.xml` - `application/xml`, and so on.

johannes1234321 · on Aug 14, 2024

> He calls file extensions cruft

I think that refers mostly to the .php and .asp of the time. Those don't tell a thing to the user.

notRobot · on Aug 14, 2024

I want users to know I use PHP! :D

notRobot · on Aug 14, 2024

And I judge people who use ASP, lmao.

amadeuspagel · on Aug 14, 2024

I guess this reflects a view of blogging that maybe is more what people today would use twitter or mastodon for, with lots of blogposts with the same title like "open thread" or "links for sunday". Today people mostly use blogs to publish essays, and then a slug based on the title should be sufficient, since you're not going to publish two essays with the same title. That's what substack uses.

crazygringo · on Aug 14, 2024

I think the date is still extremely valuable. Knowing whether something is from last month or a decade ago makes a huge difference. It's also useful so that URL's can be sorted by date.

Also, "you're not going to publish two essays with the same title" feels false. If you write 1,000 pieces and use short titles and tend to write about the same subjects, it feels extremely likely that you'll wind up repeating titles.

didntcheck · on Aug 14, 2024

And it's sad how often one needs to use the URL to find the date, since many authors just don't put it on the page (corporate sites are particularly scared of dating their stuff)

Others seem to think just day and month is fine, as if the year isn't the most significant part. And if both numbers are <=12 then you have to go and find out what locale the author formats their dates in...

freedomben · on Aug 14, 2024

I agree, but I think it's important to note that the date in the URL can also be misleading. For example, it's often assigned at time of creation. If that page or post gets updated years later, even if almost entirely rewritten, it still has the original date in the URL

crazygringo · on Aug 14, 2024

> even if almost entirely rewritten, it still has the original date in the URL

If we're talking about blogs/news, they don't ever get almost entirely rewritten. The original publication is the only date that matters, and it matters a lot.

If we're talking about evergreen content like documentation, then of course you don't put dates in the URL. A small "last updated" on the page itself is appropriate there.

freedomben · on Aug 14, 2024

> If we're talking about blogs/news, they don't ever get almost entirely rewritten.

Unfortunately, this isn't the case. It should be the case IMHO, but it (currently) isn't. The SEO/marketing people nowadays (ab)use popular pages for the search rankings and update them regularly to keep the content fresh and highly ranked (since search engines give much preference to new content).

Also, even for strict blogs/news, it's not unusual for a particular post to be a draft for many months before publishing. Most serious blog will fix the date to match publish date, but that isn't what happens by default especially in Wordpress (which is the most important platform for blogs).

dhosek · on Aug 14, 2024

In fact, on my own blog, I have some recurring posts, e.g.,

https://www.dahosek.com/the-big-countdown/

https://www.dahosek.com/the-big-countdown-2/

⋮

https://www.dahosek.com/the-big-countdown-11/

Alas, the default URL scheme in Wordpress doesn’t include the date.

ttepasse · on Aug 14, 2024

You're onto something: Very early blogs around the millennium where often build around very short paragraphs; not big articles. Take a look at these; if one squints those looked far more like later Twitter streams:

https://web.archive.org/web/20020603092331/http://www.kottke...

http://scripting.com/2001/09/11.html (Every paragraph is in effect an "entry")

Then in the early 2000s blogging resulted in a style with longer articles instead of paragraphs. In the middle 2000s a "retro" style begun with far shorter and differentiated entries, the so-called tumblelog:

https://kottke.org/05/10/tumblelogs

The original Tumblr may have been inspired by this, if not just the name.

And then Twitter and other social media arrived on the scene and ate everything. :/

simonw · on Aug 14, 2024

A useful midpoint is to use just the year. That way you get a fresh namespace on January 1st.

I use that for static files on my blog and it’s worked great for 20+ years:

https://static.simonwillison.net/static/2024/mlx-whisper-gpu...

https://static.simonwillison.net/static/2003/getElementsBySe...

JohnFen · on Aug 14, 2024

> then a slug based on the title should be sufficient, since you're not going to publish two essays with the same title.

Disambiguation is one thing, but as a reader, I really like having the date indicated in the URL for informational purporses. It's very helpful.

8organicbits · on Aug 14, 2024

Search engines usually have a date filter. Here's DDG searching "wordpress" between 2000 and 2005.

https://duckduckgo.com/?q=wordpress&t=h_&df=2000-01-01..2005...

I'm guessing you were looking for:

https://wordpress.org/news/2004/01/cruft-free-uris-in-wp-10/

I assume Google supports something similar, but I've stopped using it.

notRobot · on Aug 14, 2024

How do search engines figure out the date of webpages that don't contain it in the metadata?

reaperducer · on Aug 14, 2024

How do search engines figure out the date of webpages that don't contain it in the metadata?

Poorly.

I have a blog so old I titled it an "online diary." It pre-dates search engines, so they tend to date the diary entries (blog posts) based on first crawl. Which means lot of the dates presented by the search engines are off by several years.

hansvm · on Aug 14, 2024

The simplest version is recording the date they noticed a change in the page.

toyg · on Aug 14, 2024

Well, arguably both Movable Type and Radio Userland's URLs were already pretty cruft-free. The success of Wordpress was mostly due to other factors (free, php, great feeds, great markup in default templates, great support for plugins).