Hacker Newsnew | past | comments | ask | show | jobs | submit | sbi's commentslogin

Heapsort has poor cache performance though. Its asymptotic complexity depends on the assumption that memory access is O(1), which is not borne out in practice.


This is true, but the worst case of quicksort is so rare that this does not matter for average performance.


What charming casual racism. "It's pretty ghetto" ... "The ghetto way is just to run this on a machine" ... "The ghetto reversing is to run strings."


Racism? Webster's definition[0] of ghetto (and that of any other respectable dictionary) makes no reference to any particular race. Would the phrase "the poor man's solution is to run strings" also be racist, by your logic?

[0] a part of a city in which members of a particular group or race live usually in poor conditions; the poorest part of a city


Webster's Dictionary does not discuss the pejorative use of the word "ghetto" and in any case is not an arbiter of racism.


Well I'm glad that you seem totally qualified to be our great arbiter of racism. What would we do without you?


That's only one of the definitions in Webster. It also includes:

> 1: a quarter of a city in which Jews were formerly required to live

http://www.merriam-webster.com/dictionary/ghetto

(Note: I'm not calling the article's author racist, just noting that the above claim about Webster is factually mistaken.)


In the US for the past many decades the word "ghetto" has referred to qualities associated with poor urban american neighborhoods, specifically black neighborhoods, and only has a historical connection to jewish ghettos.

http://www.npr.org/blogs/codeswitch/2014/04/27/306829915/seg...


Be very careful accusing people of racism. When you get it wrong, you undermine complaints about real racism.


Is that racist? I thought ghetto just meant poor/inelegant, across all colors and nationalities.


When sbi sees the word "ghetto", she thinks of a particular race. So this is a case of the pot calling the kettle...

Whoops! Never mind...


Tabasco is a state in Mexico, though; how were its trademark owners able to obtain the name?


Legally, it's due to context of use.

eg Microsoft doesn't own my ability to use the word windows as it pertains to the physical objects I look out of my house through. I can start a window company, and use the word windows accordingly. We have the clearest, best windows, of any window manufacturer.


Thanks for explaining that to me. To add to the confusion, tabasco is also a pepper cultivar named after the Mexican state.


Probably not, but in some cases sed and AWK might be faster. I am a big fan of AWK. It is limited in some ways that make it impractical to use for really serious programs, but it is very expressive. Look at [1, 2].

[1]: http://c2.com/doc/expense/

[2]: http://www.pement.org/awk/awk1line.txt


I have tested converting sed/awk lines to perl in a few base scripts that worked on a fairly large amount of data. Oddly enough, in every case, perl 5.18 performed at LEAST 1.5x faster, sometimes as much as 3.5x faster. Obviously anecdotal evidence, but recent versions of Perl seemed to have gained some good speed.


I've had a similar experience, with Perl up to about 7x faster. I had sed in a few data-mangling pipelines because I assumed simpler=faster, but replacing it with Perl was either a wash or a speedup in every case. This with the versions of perl and sed in Debian (looks like it's GNU sed), so ymmv with other seds.

The case where I saw a 7x speedup was doing many-times-per-line, fixed-string search/replace on a file consisting of very long lines (an SQL dump where some lines had >1m characters). Perl was IO-bound (so presumably would've been even faster if I'd had better disks), while sed was CPU-bound at a pretty low fraction of the possible IO performance.


Could have to do with charset handling.

For the really complex stuff, there's rejit[0]. I wonder if LuaJIT would work; these tools also need IO tricks to be fast.

[0] https://lwn.net/Articles/589009/


Rafe Colburn from Etsy wrote about this performance oddity: http://rc3.org/2014/08/28/surprisingly-perl-outperforms-sed-...

EDIT to manage expectations: the article doesn't explain why, it just provides benchmarks and one commenter made a suggestion about character handling. More insight still welcome :)


There are many different versions of awk: gawk, (BSD) nawk, mawk, etc. I think OS X uses nawk, but mawk is reputedly faster. Gawk is definitely slower than both mawk. I'm not surprised that Perl is faster though.


> I used the default OS X versions of these tools. The versions were Perl 5.16.2, Awk 20070501, and some version of BSD sed from 2005


The data from the study are available on the PLoS website. The authors coded incomplete documents as mistakes---some participants submitted empty LaTeX documents for the table exercise with hundreds of "mistakes." Since the LaTeX users mean percentage completion was lower than that of the Word users, this may explain part of the authors' finding that the LaTeX users were more error prone as well.


Looking over their data, it seems that they coded incomplete text as an error. So, for example, participants 34 and 37 completed none of the table exercise and were coded as having produced 513 errors (both were LaTeX users). This accounts for the vast majority of the variation in the observed errors. I think they've convincingly shown that Word is more productive than LaTeX for some of the study tasks, but not that the resulting output is more correct.

That being said, I don't doubt that LaTeX is harder to use and more error prone than Word. Since I find myself frequently writing mathematics-heavy text, I personally prefer LaTeX ...


Doesn't std::list<T> generate different code for each type T, at least naively? The generated code could end up taking up more space even if the data structures on which it operates look the same in memory.


This article correctly observes that "many resources are not plain memory" and then goes on to advise users to close() a file in a destructor. However, one of the ways that files are not like memory is that while free() can never fail, fclose() can fail if writing is buffered and the final write fails. This is especially problematic in C++ since throwing from a destructor is dangerous. Whether you rely on RAII or GC to free files, you will not be able to catch this sort of error. It is precisely for this reason that GC is a mechanism for memory management, not resource management in general.


The Kelly criterion doesn't quite apply in this problem, since you are asked to optimize a more conservative strategy, namely to guarantee (with high probability) a certain amount of winnings rather than maximize log(winnings). In fact the Kelly value for f (in the problem's notation) is 0.25.


Typically you wouldn't use lgamma to compute exact values of factorials at all. For a real world example of this function in action, look at John D. Cook's blog:

[1]: http://www.johndcook.com/blog/2012/07/14/log-gamma-differenc...

[2]: http://www.johndcook.com/blog/2010/08/16/how-to-compute-log-...

[3]: http://www.johndcook.com/blog/2008/04/24/how-to-calculate-bi...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: