I created a 1.1G file with dictionary words and timed...
% du -h a
1.1G a
% time look 'dog' a | wc -l
53856
real 0m0.021s
user 0m0.020s
sys 0m0.003s
% time grep '^dog' a | wc -l
53856
real 0m28.593s
user 0m0.977s
sys 0m2.223s
Ok grep performed worse than what I expected (sort took a long time though).
That seem abysmally slow for grep. On my machine, with a 1.1G file made of 1200 copies of /usr/share/dict/words, GNU grep 2.16 takes about 1.5 seconds. Try with LC_ALL=C?
Yeah there's a lot of that, but grep is like using duct tape to hold something together when you really should be using screws or so. grep can do everything, but its not always the best tool for the job.
The "ack" you reference is apparently faster only in that it's easier to specify exactly which files to search or not to search, which obviously can also be done with grep, e.g. combined with 'find'.
As for grep itself, it has had many incarnations over the decades with various pluses and minuses.
The current GNU grep allows 3 kinds of regex, basic/extended/perl -- the latter being what ack supports.
Note that Perl regexes have extensions beyond regular languages that are inherently slower than the automata specified by basic regexes. Power versus speed.
E.g. grep(1): "Back-references are very slow, and may require exponential time."
For further info:
> why GNU grep is fast
> Mike Haertel mike at ducky.net
> Sat Aug 21 03:00:30 UTC 2010
> Here's a blog post from 2006 about a developer trying to "beat grep" and looking at the algorithms it uses; it goes into a little more detail about the "doesn't need to do the loop exit test at every step" optimization mentioned in this email.
The best writeup is surely by the inimitable Russ Cox, who really really explains clearly when grep as of 2007 was one of the only fast regex implementations:
Regular Expression Matching Can Be Simple And Fast [#1]
(but is slow in Java, Perl, PHP, Python, Ruby, ...)
(This is a 4 part series but IIRC part 1 has the highlights)
I'm sure that various other tools have been strongly influenced by this famous essay, and so many more things may be as fast as grep by now, but still...
P.S. one of the other high profile "ack"-like search tools would be "ag", aka "The Silver Surfer".
"The Silver Searcher is a 3-5x faster drop in replacement for ack (which itself is better than grep)."
Seems that most time for grep was spent waiting on disk. This underlines the reason look may be significantly faster: actually reading the bad lines may take time.
Minimalism is underrated imho. Just because is simple, doesn't mean it sucks. Most sites are so full of crap they take minutes to fully load. My point is that you can build a site that covers 90% of your needs in 5 minutes. Wheter typed.pw looks good or bad is a question of taste. I like it, the most important aspect for me is readability anyway. What I absolutely dislike is sites that are too "modern". They mess with the scrollbar, use lightboxes/popups, use the same color for normal text and links, make textfields almost indistinguishable from the rest of the page, etc.
My main point is that we fail to use the tools we have properly and build something that is more powerful than needed. And sometimes this is damaging too (e.g. sites taking minutes to load, etc.).
BTW not having features sometimes IS a feature. YMMV.
I'm running Debian with KDE. The GNU tools are installed but by weight they're nowhere near the majority of the code on the system, to say nothing of the components I actually use. So calling that machine "GNU/Linux" instead of "KDE/X.org/Linux", given the relative importance of the individual components, would be stupid.
Fortunately, we have the nearly universally understood "Linux" instead, and weird specializations like Android can be called something else.
I agree that "naming all the things" is a bit silly. But I think GNU/Linux indicates Linux kernel, GNU libc, compiled with gcc -- which is actually kind of useful information. It doesn't say anything about the Graphical UI (if any). Similarly I think Android/Linux is useful, because it indicates something about what kind of (binary) software you can expect will work - and what will not.
I also happen to think distinctions such as Debian/kFreeBSD (a Debian distribution based around the FreeBSD kernel, as opposed to (just) the regular FreeBSD user-land) are informative.
I'm not sure what one should/would call a Linux distribution with an alternative (non-GNU) libc that relied on a non-GNU compiler chain... Probably "brandname"/Linux or "function"/Linux (eg: Linux Router Project or something)...
This isn't a technical thing for any of the major proponents, though. It's a marketing thing, because after the spectacular failure of Hurd GNU was relegated to effectively a sideline (a useful one, for sure, but a sideline). I get a lot more direct and personal value out of code not provided by GNU than I do by code provided by GNU. I don't think they merit top billing, and I don't think the two decades of holding one's breath and turning purple deserves a reward.
Look, the Linux kernel itself uses the GNU GPL, so GNU probably is the most important factor in all of this. Linux kernel under a proprietary license or even under BSD-style, you probably would never have heard of it. At any rate, basically nobody thinks HURD is useful anymore. RMS thinks time spent on HURD is a waste since it is totally unneeded now that we have the Linux kernel. GNU is a political movement about software freedom as much or more than a particular set of software, and even KDE is licensed with a GNU license. etc.
"Look," the creator of Linux could not give the faintest fart about GNU and has said he'd use a different license if he could. Why should a political movement that specializes in lousy marketing campaigns and haranguing be given a nod?
I do like that attempt, though. "Well, they used a license that GNU wrote, so GNU should get top billing!" Do you want to talk about how well Apache/Dropwizard and PostgreSQL/PostgreSQL work with Apache/Kotlin on GNU/OpenJDK8?
> has said he'd use a different license if he could
Citation?
A constantly repeated quote from Linus is precisely: "Making Linux GPL'd was definitely the best thing I ever did." from a 1997 interview according to https://en.wikiquote.org/wiki/Linus_Torvalds
I've never heard Torvalds say anything regretful about using GPL. When I heard him in person just last year, he clarified his dislike of GPLv3 while emphasizing his preference and like of GPLv2.
And anyway, I didn't say anything about "top billing". The fact is, Linux itself is, admittedly, absurd billing for Linus given that he is the leader of the kernel project but is among thousands of people who make it happen. GNU is not a term that credits Richard Stallman. GNU is a community project with a particular political aim. And my point all along was just about some practical way to differentiate Android from the other primary Linux-based systems, and "GNU/Linux" is a way to do that.
It's the old chicken-and-egg problem: you build a community by attracting new users, but new users will join only if there's already a good community. It's hard to get a community started.
Strong moderation with clearly defined goals and rules could go a long way, granted you had the resources to back it up long enough for it to become self-sufficient. Stack Exchange has some categories with strong moderation and rules. The problem with SE, in my opinion, is that posting there feels literally like working without being paid. There's almost no fun in posting answers on SE, while there is still a bit of fun posting on HN.
No, because you didn't create those other files, your enumerated bits won't have the right color ;)
"Bits don't have Colour; computer scientists, like computers, are Colour-blind. That is not a mistake or deficiency on our part: rather, we have worked hard to become so. Colour-blindness on the part of computer scientists helps us understand the fact that computers are also Colour-blind, and we need to be intimately familiar with that fact in order to do our jobs.
The trouble is, human beings are not in general Colour-blind. The law is not Colour-blind. It makes a difference not only what bits you have, but where they came from. [...] The law sees Colour.
Suppose you publish an article that happens to contain a sentence identical to one from this article, like "The law sees Colour." That's just four words, all of them common, and it might well occur by random chance. Maybe you were thinking about similar ideas to mine and happened to put the words together in a similar way. If so, fine. But maybe you wrote "your" article by cutting and pasting from "mine" - in that case, the words have the Colour that obligates you to follow quotation procedures and worry about "derivative work" status under copyright law and so on. Exactly the same words - represented on a computer by the same bits - can vary in Colour and have differing consequences. When you use those words without quotation marks, either you're an author or a plagiarist depending on where you got them, even though they are the same words. It matters where the bits came from." - from http://ansuz.sooke.bc.ca/entry/23
Basically, non-recorded metadata about how a sequence of bits was created matters too. Not just the bits themselves.
Now you've simultaneously copyright infringed all the worlds works by sharing them on the internet, AND generated all other works, making it impossible for anyone else to ever not infringe on your work.
I remember someone created a P2P file sharing system that used 'munges' of files to create blocks of data by themselves that have no meaning (a bit like http://monolith.sourceforge.net/). These blocks were then transferred around the network, and people could claim "I'm not transferring files, I'm transferring meaningless blocks of data".
Sure, but (and I'm not a lawyer), intent of law is just as important as the literal meaning of law. Cases play out this intent and add to the corpus of knowledge as case law. So if you did write a program to generate all the data in the world, I'd imagine people would look at your intent, rather than just what you literally did.
You'd run out of disk space before you even generated "Hello, World". Assuming ASCII only, that's 96 bits. In order to save every possible character combination you'd need about 9903 yottabytes [1] of storage.
If you went through and sorted those which were pleasant from those which weren't, you would own all those you sorted. Good luck getting even one picture that looked anything more than static or pure color.