Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You know I was actually really curious about this so I went back to the HTML and URL W3C standards and surprisingly they don't actually have any definitions of format other than being percent encoded. One might conflate query strings with "form-urlencoded"[0] query strings, which is one potential interoperability format, but in general a queries string is just any percent encoded string following a "?" in a url[1], and just another property in the "URL" HTML object that can be used in the generation of a response. While additionally there is a URLSearchParams object that is the result of parsing the query string with the form-urlencoded parser, this is simply an interoperability layer for JavaScript.

I'm going to be honest, I was pretty geared up to have a contrarian opinion until I looked at the standards but they're actually pretty clear, a 404 could be a proper response to unexpected query string; query string is as much part of the URL API as the path is and I think pretty much everyone can acknowledge that just tacking random stuff onto the path would be ill advised and undefined behavior.

[0]: https://url.spec.whatwg.org/#application/x-www-form-urlencod...

[1]: https://url.spec.whatwg.org/#url-class



Back in the day it was reasonably common for CMSs and forums to only have an index.php, and routing entirely by query string (in form-urlencoded form, people were not savages). So you would have index.php?p=home and index.php?p=shop. Or index.php?action=showthread&forum=42&thread=17976. It should be immediately obvious that in that scheme 404 is indeed the correct answer to unknown query parameters

In fact lots of sites still work like that, they just hide it behind a couple rewrite rules in apache/nginx for SEO reasons


At the risk of naming an Eldritch horror, IIRC it was Cold Fusion that first adopted something like an MVC-in-querystring routing system in the late 90s or early 00s, and that eventually spread when FCGI caught on and users of other languages got used to long-running middleware processes. It seemed hella elegant at the time.


Tangent: Much like PHP, "modern" CF isn't actually that bad to work with these days. In particular the superset-of-html syntax has been superseded for pure logic by "CFScript" which is just an ECMAScript dialect.

There's even a package manager, test harness, etc. And of course it's JVM hosted so it's fairly easy to use Java stuff (stdlib of otherwise) if what you need doesn't exist in CF.


It's funny to read this like it's archaic knowledge, this is my base mental map of how nicer looking URLs work :)


Only when you're using something more or less file system mapped, like Apache

When the "server" is part of the application, you have a richer routing layer, that you can do with what you want


If you're routing like it's 1999, sure, 404.

On the other hand, if it's a CRUD app and you're filtering a list of entities by various field values? Returning that no items matched your selection (or an empty list, if an API) makes more sense than a 404, which would more appropriate for an attempt to pull up a nonexistent entity URI.


There is no reason you can return that "no items matched your selection" with a 404 HTTP response code instead of a 200.


You can return whatever HTTP response code you want, but if you care about knowing whether your site is working being about to look at the logs and see "That user requested a page that doesn't exist" being different to "That user requested a page that exists but had no results" is quite useful. In coding terms it's the difference between a null and an empty array.


You can do that with filtering, which should be a feature of every single logging tools.

Anyway, I agree that when you filter via queries, an empty list is more valid response than 404. That HTTP status should be returned IMHO when the requested (for example by id) item is not found (and of course with wrong paths, etc).


In this case I don't think the status should depend on the number of results. Here are you results, [] is a valid response body when there are no result. Returning 404 if there are no result (GET /books?title=a for instance) is misleading, the caller may think that /books is a non existent route and may conclude that books are reachable via another URI. To me, the querystring has no influence on the response status.

/books/1 could return 200 or 404 depending on the existence of the book#1, here it make sense because if /books/1 does not exist the API must tell it explicitly. However 404 belongs to the 4XX family which means "client error", is it an error to ask for a non existing book ? If you enter in a bookshop and ask for a book they don't have you did not "make a mistake". It's not like if you asked for a chainsaw. But in an API, especially with hypermedia, you are not supposed to request a resource that does not exist (unless the API provides a link to an existing resource that is was deleted before the caller try to reach it).


If you enter a bookshop and you ask for a book that does not exist then it's definitely your mistake.

If you ask for a book they don't have it's a different matter.

In any case, when you ask for a book in a library you are using their "search" endpoint. The equivalent to opening a books/1 url would be asking for a specific instance of a book by serial number or so. Then it's clear that you made a mistake uf you do that for an unexistent serial number...


A response code of 204 seems more appropriate but the problem is you're not allowed to send further information, which would make that descriptive response... not descriptive enough.


Code 204 is just code 200 with the "yes the body really is zero bytes this is not an error it's supposed to be like this" bit set.


I think of it like this:

/users/ returns a 404 in an API means that this resource does not exist. As in, this is not a part of the API.

/users/123 returns a 404 means this user record does not exist.

Yes this means that a 404 is context dependent but in a way that makes it easier for a human to think of and reason about.


Yes, and this is obvious if /users/ exists and returns a 400 if the ID is required. That way you can tell the difference between /users/ being there and expecting and ID, and it not being there.


Of course it is technically possible, but doing so would violate the spec.

> The 404 (Not Found) status code indicates that the origin server did not find a current representation for the target resource or is not willing to disclose that one exists.

In the above case, the server _is_ returning a representation.

https://datatracker.ietf.org/doc/html/rfc9110#name-404-not-f...


Another reason not to return a 404 in that case is that chances there will be monitor tooling in place that will treat a 404 as an "error" that will show up in your alerting, but would not be ideal; it will just be noise.


The point was that returning a 404 for unexpected query strings doesn’t just happen to okay per the specs, but that there is significant historical precedent for doing so based on application design that was common in the past.


    204 No Content
for nothing found is both not an error (because 2xx code) but also indicates there was nothing found to match the request.

If it's an API, a 200 with an empty JSON object or array in the body is legitimate as well, but a 204 is explicit.


My rule of thumb is that if you want to keep your code clean, always returning an empty collection is preferable to returning an empty response on that branch. You don't need a guard clause to null/undef-check before consuming the result. The rule applies whether we're consuming the response from a repository or an http request.


This too is not spec compliant. 204 means the request was successful but no body is being returned in the response.


Which is the equivalent of nothing found matching the request in a collection.

The alternate is basically 200 OK

followed by a JSON body of:

[]


Yea, empty response at a valid path. Isn’t 204 the code for it?

Lots of REST libraries that I’ve used treat any 400 response as an error so generating a 404 when for an empty list would just create more headaches.


Libraries that automatically throw errors for status codes in the 400 and 500 ranges are pretty obnoxious (looking at you, axios). It adds unnecessary overhead, complexity, and bad ergonomics by hijacking control flow from the app.

Responses with status codes in the 400 range are client errors, so the client shouldn't retry the same request. So a 404 is appropriate despite how annoying a library might be at handling it. Depending on which language/ecosystem you are using, there are likely more sane alternatives.


Completely agree on the axios part - one implication of that is you can't statically type the error response shapes (since exceptions can't be typed). Where as with fetch you can have a discriminated union based on the status code (eg: https://github.com/mnahkies/openapi-code-generator/blob/main...)

Although I do feel like I've seen too many instances of a 404 being used for an empty collection where it would make more sense to return `[]` and treat it as an expected (successful) state.


Generally true although 429 is often used for rate limiting so a back off and retry is appropriate. 409, 412, 428 may also be retriable depending on the specific semantics of the given situation. 421 apparently shows up commonly in HTTP/2 connection reuse and is retriable. 423 and 425 too potentially.

It would have been nice if there was an actually grouping of retriable and not retriable but in reality it’s a complete mess.

But at a minimum beware of 429. That’s not a permanent outage and is a frequent one you might get that needs a careful retry.


204 might be acceptable if you aren’t returning an entity body to describe what is missing, but do wish to indicate the request was successful.


I think the author is comfortable creating headaches for people tacking query strings onto URLs


> It should be immediately obvious that in that scheme 404 is indeed the correct answer to unknown query parameters

That's not obvious at all. If I receive json data that contains a property I'm not aware of, i don't reject the entire document for that reason. In the case of query strings, extra query parameters might be used by other parts of the stack besides yours, so rejecting the entire document because someone somewhere else is trying to pass information to itself is the wrong approach.


> other parts of the stack

As a web developer, you’re the like the guy standing with a clipboard outside a fancy club checking if people requesting entry are allowed or not. Basically, level 1 security.

If someone is not on the list, your job is to default to declining them access, not granting them access assuming level 2 security will handle them at a deeper layer.

It’s possible that the teams you work with expect fuzzy behaviour from the website but that’s a choice, not a practice.


The first layer of any web security should never be checking someone against a list, unless this can be done in less than a few milliseconds. It should only be sanity checking for basic compliance. In the analogy, this first layer should be denying entry to obviously drunk people, zebras, and a stampede of protesters.


>It’s possible that the teams you work with expect fuzzy behaviour from the website but that’s a choice, not a practice.

This is how the vast majority of websites work. The practical reason is obvious: when we model the behaviour our code depends on, we want to create the simplest possible model that allows our code to work as expected. Placing requirements on it that our code doesn't actually depend on is useless, unneeded, complexity.

> As a web developer, you’re the like the guy standing with a clipboard outside a fancy club checking if people requesting entry are allowed or not. Basically, level 1 security.

there is no security benefit to filtering out unneeded url parameters.


> there is no security benefit to filtering out unneeded url parameters.

there is - security in depth.

If a url parameter would've been a vulnerability because something lower down the stack misinterprets it (and the param wasn't necessary for your app in the first place), then you've just left a window open for the exploit.

If the set of url params are known ahead of time (which i claim should be true), then you could make adding unknown params an error.


>If a url parameter would've been a vulnerability because something lower down the stack misinterprets it

By assumption, you are using this url parameter. So you have a bug where you've forgotten to allow this parameter, which will quickly be discovered in your logs and fixed. Then the vulnerability, which you are thus far unaware of, will quickly be exposed. Those url parameters you are not using cannot hurt you.


> there is no security benefit to filtering out unneeded url parameters.

What about passing extra data to fill the server memory with either extra known junk or a script / executable to use with a zero day in an internal component or something.

To misuse the nightclub analogy: it’s like checking for bags not being larger than A4 and disallow knives and other weapons.


No 400 is correct for bad request. As unknown query parameters is clear client error.


All 4xx errors are client errors.

400 is the general “bad request” client area, indicating something is wrong with the request but not being specific about what.

404 is simply a more specific client error: it means the client asked for a resource that couldn’t be found.


That's because Apache is basically what today's JS crowd would call a "file-based router", and then the app implements the actual routing in that index.php file. Just like early SPA stored the route in a hash. It's funny how history repeats itself.

I've gone back and forth on file-based vs programmatic routing. But each has pros and cons, so in the end I implemented both in Mastro: https://mastrojs.github.io/docs/routing/


I believe Wikipedia, and all other mediawiki sites, still do that


watch?v=oHg5SJYRHA0


item?id=48076173


Ooo.. burn.


Oh no, looks like my old forum software urls.


> in form-urlencoded form, people were not savages

Oh yeah? I remember a lot of semicolons from Perl and other CGI stuff where we would now use ampersands, back in the day, both in the path and in the query. (Sometimes the ? itself would be written ;.)


Correct. In fact, the semicolon is part of the URI scheme standard, and the ampersand is just some ad-hoc thing that got adopted naturally without any standardization effort.


Yeah, URLs really don’t have much in the way of semantics. Path is clearly intended for hierarchical data and query for non-hierarchical data, and there are strong customs, some commonly supported or even enforced by libraries, but no actual rules. Ultimately, it’s just a string that the server can decide what to do with.

The really funny thing about this is that, when I was worrying about possible side effects if I responded 404, I somehow completely forgot how much of the web’s history the path has been useless for. Paths have won. No one really starts new things with URLs like /item?id=… any more. Yay!


Wikipedia web server treats anything after /wiki/ literally as the name of the article.

So en.wikipedia.org/wiki/// is the article about C++ style comments


Oh, magnificent. Lovely high-profile example to add about empty path segments being meaningful.


it looks like it goes to a disambiguation page for what "//" could refer to now (C++ style comments being the top entry), but that's delightful!


i wonder if it ought to be `/wiki/%2F%2F` instead...


Standards are just commonly accepted behaviour that somebody chose to write down somewhere. There are a great number of commonly accepted behaviours that nobody's ever bothered to encode into a formal standard, but where failure to follow the accepted practice will result in widespread breakage. There are also a great many "standards" that you would be a fool to follow to the letter. In the OP case, the only thing that will break is people trying to visit their site, who will presumably simply press the back button on their browser and go about their day. They can decide for themselves if that is an acceptable casualty. But it isn't definitionally acceptable because no standard says it isn't (nor would is suddenly become unacceptable because a standard said it was...)


Interestingly, quite a few places that should treat query strings transparently make a lot of assumptions about their structure. We ran into that when picking a new CDN, some providers didn't handle repeat parameters (?a=1&a=2) correctly.


For anyone curious like I was, form-urlencoded and the URLSearchParam API says that params should not be deduplicated or reordered. "Get" will get the first value with the given name, and GetAll will get a list of all values

https://url.spec.whatwg.org/#dom-urlsearchparams-get


What’s do you mean by correctly?


Incorrectly would be processing the query string and deduping keys. Correctly would be passing it through as-is, or at least only lightly processing it, like normalizing escaping or such.


Indeed I would expect pass through with no changes.

Though there are “smart” CDNs that will resize images etc. all beats are off for those.


> I was pretty geared up to have a contrarian opinion until I looked at the standards but they're actually pretty clear, a 404 could be a proper response to unexpected query string; query string is as much part of the URL API as the path is and I think pretty much everyone can acknowledge that just tacking random stuff onto the path would be ill advised and undefined behavior.

This feels like a technically correct is the best kind of correct situation. Like technically, yeah web servers may respond 404 if they dont understand a query parameter, but in practise that is not how urls are conceptualized normally.


Wouldn't a generic 400 be better. It's not that the page wasn't found, but you've sent something that was not an accepted request. Fix your request and try again is how I've read it, and that's how I use it in the APIs I provide. I prefer it over 406 since it's not my end that can't process it. If your query string is tacking extra stuff trying to break things or just because your request wasn't crafted per the docs, then it's on you.


406 would be wrong for me. As it is to be used when client sends Accept: header and server cannot fulfil that. HTTP return codes get quite specific when you read the actual description and not just name.


The No-Vary-Search (proposal?)

https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/...

effectively lets you specify what parts of a query are relevant. So for example

url?a=b&c=d matches url?c=d&a=b in terms of caching


Something I discovered looking back at some old sites: "pages" defined by URL params don't always make it into the Wayback Machine.


Wait until you realize that the difference between path and query string is entirely arbitrary and decided by the server. Query strings should never have existed. They are an implementation detail of CGI webservers that leaked all over everything and now smells really bad.


I dunno, it seems like the fact that we arrived at a fairly standard structure for URL paths that works pretty well is not a bad outcome.

Seems a lot better than the other potential world we could lived in, where paths were a black box and every web server/framework invented their own structure for them.


My next website is going to have the path portion of the URL be a base64 encoded ASN.1 blob.


So long as it starts with a slash, go ahead! See how long it takes for someone to figure it out.

It’s your website. Have fun with it! Do dumb things! :-)


Make sure you use URL-safe base64 or the portions that looks like a path can get mangled

MII//epi

Is converted to MII/epi


That would be broken software.

https://en.wikipedia.org/wiki///


In my current project I use URIs to refer to absolutely any entity in a git(-ish) repo. Files, branches, revisions, diffs, anything. URI turns out to be a really good addressing scheme for everything. Surprise. But the most used and abused element is always the path. Query takes a lot of that mess away. Might have been unmanageable otherwise.

https://github.com/gritzko/beagle


In fact, GitHub URIs are a good example of overusing paths: https://github.com/gritzko/beagle/blob/a7e17290a39250092055f...

  - user gritzko,
  - project beagle, 
  - view blob, 
  - commit a7e17290a39250092055fcda5ae7015868dabdb4, 
  - file path VERBS.md
... all concatenated indiscriminately.


That’s not an indiscriminate hierarchy.

Grouping data by user is common and normal in computing: /home laid precedent decades ago.

Project directories are an extremely common grouping within a user’s work sets. Yeah, some of us just dump random files in $HOME, but this is still a sensible tier two path component.

The choice to make ‘view metadata-wrapped content in browser HTML output’ the default rather than ‘view raw file contents’ the default is legitimate for their usage. One could argue that using custom http headers would be preferable to a path element (to the exclusion of JavaScript being able to access them, iirc?) or that the path element blob should be moved into the domain component or should prefix rather than suffix the operands; all valid choices, but none implicitly better or worse here.

Object hash is obviously mandatory for git permalinks, and is perhaps the only mandatory component here. (But notably, that’s not the same as a commit hash.) However, such paths could arguably be interpreted as maximally user-hostile.

File path, interestingly enough, is completely disposable if one refers to a specific result object hash within a commit, but if the prior object hash was required to be a commit, then this is a valid unique identifier for the filesystem-tree contents of that commit. You could use the object hash instead of the full path within the commit hash, but that’s a pretty user-hostile way to go about this.

So, then, which part of the ordering and path selections do you consider indiscriminate, and why?


actually, instead of the object hash, you could also use the commit-hash. then the filename would be mandatory, but the url would be more readable and usable: give me the file VERBS.md as it is at commit <hash>


That's actually what it is here, a7e17290a39250092055fcda5ae7015868dabdb4 is a commit's oid: https://github.com/gritzko/beagle/commit/a7e17290a3925009205...


yes, you are right. and it makes a lot more sense that way. see my other comment on the difference between commit blob and raw.


But the path misses param names (or types?). E.g who said the hex-encoded part is a commit hash? Maybe it's a tree hash, or just weird ref.

Query strings are more verbose as force to give each param a name.


Which target audience of github needs extra verbosity in the commit hash, though? Once you know it you know it; if you don’t know git you aren’t the target audience; etc. Saying /user=foo is no better than ?user=foo if your audience can work it out without confusion from your unadorned paths. We have a great deal of history with filesystems showing that people are capable of keeping up with paths that lack key names if exposed to and familiar with them, and if the filesystem isn’t being constantly randomized.


> Saying /user=foo is no better than ?user=foo

I mean /foo vs ?user=foo

I know git enough, there's more than one type of hashes -- object hashes, tree hashes.


Back in the day there was an attempt to introduce "matrix URIs" as a more structured alternative to query strings: https://www.w3.org/DesignIssues/MatrixURIs.html

Of course there's nothing to stop you using URIs like this (I think Angular does, or did at one point?) but I don't think the rules for relative matrix URIs were ever figured out and standardised, so browsers don't do anything useful with them.


what would be a better way of doing that? i am not disagreeing, but i just can't think of any way to improve on this. put everything into the query part? i prefer to use the query only for optional arguments. in this example the blob argument is the only thing that doesn't fit in my opinion.


Every object in git (commit, tree, revision of a single file) has a hash that is guaranteed unique within a repository (otherwise many more things than a web UI would break) and likely also globally. I can understand wanting to isolate repositories to prevent hash collisions from causing problems, but within a repo everything has a universally unique ID.

edit: for instance, that specific VERBS.md is represented by the blob 3b9a46854589abb305ea33360f6f6d8634649108.


that's not what i meant. i was trying to suggest that the string "blob" does not fit. why is it there? why is it needed?

    https://github.com/gritzko/beagle/a7e17290a39250092055fcda5ae7015868dabdb4/VERBS.md
this should be sufficient to represent the file.

"blob" is like a descriptor of the value that follows. it would be like doing this:

    https://github.com/user/gritzko/project/beagle/blob/a7e17290a39250092055fcda5ae7015868dabdb4/file/VERBS.md
this actually irks me every time i see it in a github url


> this should be sufficient to represent the file.

Except it's not, because the oid can be a short hash (https://github.com/gritzko/beagle/blob/a7e172/VERBS.md) and that means you're at risk of colliding with every other top-level entry in the repository, so you're restricting the naming of those toplevel entries, for no reason.

So namespacing git object lookups is perfectly sensible, and doing so with the type you're looking for (rather than e.g. `git` to indicate traversal of the git db) probably simplifies routing, and to the extent that it is any use makes the destination clearer for people reading the link.


how does adding the word blob in the url help with that?

i don't think it makes a difference here.

in fact compare these urls:

https://github.com/gritzko/beagle/blob/a7e172/VERBS.md

https://github.com/gritzko/beagle/raw/a7e172/VERBS.md

https://github.com/gritzko/beagle/commit/a7e172/VERBS.md

turns out that "blob", "raw" and "commit" have nothing to do with the hash itself, but are functions to describe how the object in question is to be presented. so what i said above about blob being redundant is false, the problem is rather that it is in a weird place. it should be at the end, like a kind of extension because it signifies the format of the output. except i think putting it at the end makes handling relative paths more difficult as it would have to be appended to every link to other files.

the roxen webserver has an interesting solution for that. they call it prestates and it's placed at the beginning of a url: https://github.com/(commit)/gritzko/beagle/a7e172/VERBS.md . it sets the format value visually apart, and you could have multiple prestate values separated by a comma. i have used that feature extensively on my own sites. i even expanded on the concept in custom modules.


> how does adding the word blob in the url help with that? i don't think it makes a difference here.

How does adding a disambiguating segment help disambiguate?

"in fact, consider these urls":

https://github.com/gritzko/beagle/issues

https://github.com/gritzko/beagle/pulse

> are functions to describe how the object in question is to be presented

So they are functions, which take parameters, which makes prefix notation reasonably natural?

> the problem is rather that it is in a weird place. it should be at the end

That's, like, your opinion man.

> except i think putting it at the end makes handling relative paths more difficult as it would have to be appended to every link to other files.

It also doesn't make sense when file paths may not be relevant at all e.g. compare

https://github.com/gritzko/beagle/commit/a7e172

and

https://github.com/gritzko/beagle/commit/a7e172/VERBS.md

As well as where https://github.com/gritzko/beagle/blob/a7e172/ ends up

> the roxen webserver has an interesting solution for that. they call it prestates and it's placed at the beginning of a url: https://github.com/(commit)/gritzko/beagle/a7e172/VERBS.md .

> When developing and debugging is a great help to be able to turn on and off specific parts of the code that generates the current page.

That doesn't have anything to do with what github does.


They are following the /key/value/key/value pattern, but the first two pairs in a GitHub URL are fixed to user and project, which lets them omit the key names. I could see them not being willing to hardcode the third pair to blob.

Back when GitHub URLs were kind of cool, github.com/user/gritzko/project/beagle would have been much less cool than just github.com/gritzko/beagle.


> They are following the /key/value/key/value pattern

They are not. There's just a routing layer below the repository.


Not entirely arbitrary - forms that use the GET method instead of POST will append form values as query params.

For sites without Javascript, it's great for things like search boxes, tables with sorting/filtering, etc. instead of POST, since it preserves your query in the URL.

https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/...


It has always amazed me how much trouble the SPA folks are willing to go to in order to slowly rebuild just normal boring URLs with querystrings because users demand deep linking and back buttons and the like.

Or you could accept that you're probably going to need a round trip to the server and use a normal URL and it's fine.

For all but the absolute biggest websites in the world, anyhow. At Facebook or Google scale yeah it's needed.


Nothing you said here is correct. Paths, query strings, and fragments are all well defined entities. https://datatracker.ietf.org/doc/html/rfc3986#section-3.3


It’s a string between ? and # isn’t well defined. Or it is and it says very little.


Query strings existed before CGI did, and the way they're defined to be filled in from web forms is quite useful; I wouldn't want to need Javascript to fit that into path format. There's nothing wrong about having things decided by the server; I don't get that part of your argument at all.


Maybe dumb question: how does the server “decide” anything other than what file to serve? Today we have many choices but back in the day CGI was the first standard way to do it.

So yes query parameters existed before CGI but to use them you had to hack your server to do something with them (iirc NCSA web servers had some magic hacks for queries). CGI drove standardization.


    func specialHandler(w http.ResponseWriter, r *http.Request) {
 if time.Now().Weekday() == time.Tuesday {
  http.NotFound(w, r)
  return
 }

     fmt.Fprintln(w, "server made a decision")
    }
Your server can make decisions however you program it to, you know? It's just software.

Forgive the phone-posting.


and what server software is running this code in 1995?


CL-HTTP or AOLserver


sure looks like VB there, what’s the plugin? Didn’t see anything like that before.


That's Go.


Which runs on what computer in 1995?


I'm not sure what point you're trying to make. Here it is in C, so you can run it on you computer in 1995? Because servers could make decisions in 1995.

int main() { int s = socket(AF_INET, SOCK_STREAM, 0); setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &(int){1}, sizeof(int));

    struct sockaddr_in addr = { AF_INET, htons(8080), .sin_addr.s_addr = INADDR_ANY };
    bind(s, (struct sockaddr*)&addr, sizeof(addr));
    listen(s, 10);
    printf("Listening on :8080\n");

    while (1) {
        int c = accept(s, NULL, NULL);

        char req[1024] = {0};
        read(c, req, sizeof(req) - 1);

        time_t now = time(NULL);
        int tuesday = localtime(&now)->tm_wday == 2;

        const char *status = tuesday ? "404 Not Found" : "200 OK";
        const char *body   = tuesday ? "Not Found (it's Tuesday)" : "Hello from 1995!";

        char resp[256];
        snprintf(resp, sizeof(resp),
            "HTTP/1.1 %s\r\n"
            "Content-Length: %zu\r\n"
            "Connection: close\r\n\r\n%s",
            status, strlen(body), body);

        write(c, resp, strlen(resp));
        close(c);
    }
}


A post claimed CGI led to bad standards around query parameter formatting and parsing. I was merely pointing out that, prior to the advent of CGI, if you wanted to actually do anything with those parameters on the server, you had to extend whatever primitive HTTP server you were running, write some custom code and invent your own “standard”. There were no server side frameworks or standards.


TCP has been around a long time. Listen, read, send, you're good to go. It's just software so you can make it do anything.

But you're asking about the relationship between popular primarily file serving servers like Apache and their relationship to high level code to create custom responses? Yeah, CGI was the first big standard there that I remember, though it was a bit before my time. But that's only one possible architecture.

These days, most web apps have the web server built in, and so the custom code you're writing works with the full request directly. There may be a lightweight web server in front (or multiple), like nginx, to manage connections, but they will largely just proxy the whole thing through.


I was responding to:

> Query strings existed before CGI did… There's nothing wrong about having things decided by the server

Sure, but there is also no standard for how to format/parse the query string. And also no server plugin frameworks. So you are inventing your own standard and extending some HTTP server for which you have source. Until CGI forces a standard, bad as it might be; it’s a common ground.


It's arbitrary to a degree like the difference between using an attribute or child element in XML, but it's not entirely arbitrary. If you want to include data in the URL that's not part of the hierarchy of the path, query strings are good for that.


How do you figure?

Paths are hierarchical; query strings are name/value.

(Note I speak of common usage.)

You can create a different convention, but that one is pretty dang useful.


Whatwg is for html, try the IEEE http rfcs


The IEEE rfcs does define a spot for the query string, but doesn't really say what to do with it.

https://datatracker.ietf.org/doc/html/rfc3986#section-3.4


Try 1866 and 1867

Html is relevant historically as that syntax comes from forms. It was historically a sort of API between the browser client and the server, so yeah.

But it's pretty well defined since 1995




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: