More

glifchits · on June 10, 2021

Looks like Rust is the "good" solution for all of these?

glifchits · on March 2, 2021

Those are examples of why it is so incredible. To have it in your notes application as seamlessly as Roam does it is a game changer for notes.

glifchits · on Sept 9, 2019

Did I miss something or does this project include popularity measures as features?

In the section on dataset features, they include "popularity" (calculated by Spotify) as well as Billboard chart stats like weeks, rank, and a custom-made "score". To me it's not clear whether these features were hidden from the train/test sets or whether the popularity features were only used in their "artist past performance" measures.

If they included these popularity features, it's like asking "can we predict whether a song is a hit just by looking at how popular it is?" If it is the case that they peeked into the future and observed ex-post song popularity, obtaining just 89% accuracy hints at how unpredictable song success truly is. Check out [1] for a famous study of song success which experimentally demonstrates the unpredictability of song success.

[1] Salganik, M. J., Dodds, P. S., & Watts, D. J. (2006). Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market. Science, 311(5762), 854–856. https://doi.org/10.1126/science.1121066

Pils · on Sept 9, 2019

From the paper:

>To extend previous work, in addition to audio analysis features, we consider song duration and mine an additional artist past-performance feature. Artist past-performance for a given song represents how many prior Billboard hits the artist has released before that track’s release date

emphasis mine.

I wonder how accurate a model using this feature alone would be.

glifchits · on Sept 9, 2019

Right, this sentence made it unclear to me whether they only used the popularity features to compute past-performance, or whether they included past-performance in addition to other popularity features.

To your question, other work on success prediction of tweets [1, 2] demonstrates that past-performance is indeed much more predictive than the typical content features. This way of looking at success of "cultural products" assumes it depends to varying extents on both inherent "quality" (measured by content features), and the social processes of sharing (which are much harder to understand ahead of time, as the paper I referenced in my parent post shows).

[1] Martin, T., Hofman, J. M., Sharma, A., Anderson, A., & Watts, D. J. (2016). Exploring Limits to Prediction in Complex Social Systems. Proceedings of the 25th International Conference on World Wide Web - WWW ’16, 683–694. https://doi.org/10.1145/2872427.2883001

[2] Bakshy, E., Hofman, J. M., Mason, W. A., & Watts, D. J. (2011). Everyone’s an Influencer: Quantifying Influence on Twitter. Proceedings of the Fourth ACM International Conference on Web Search and Data Mining - WSDM ’11, 65. https://doi.org/10.1145/1935826.1935845

glifchits · on Aug 28, 2018

I would like to see more on the zeta distribution aspect of this, but it doesn't appear anywhere obvious in this repo.

vijay_nair · on Aug 29, 2018

Few things I learned after a bit of research:

• the key to efficiency here seems to be “caching”, more specifically their caching strategy

• traditionally, caching on the web is done by assuming resource access follows the Zipf Distribution[1]

• Zeta Distributions are basically Zipf Distributions[2] so you can effectively re-word the title as “Efficient data loading using caching” (zipf = “caching” & zeta = zipf => zeta = “caching”)

• It’s important to note that Zipf/Zeta don’t model extremes very well, so there’s potential for outliers causing costly cache misses. Monitor your logs!

---

Further reading:

• https://pdfs.semanticscholar.org/337e/4b7f57ccbb7485950b93da... (1999)

• https://terrytao.wordpress.com/2009/07/03/benfords-law-zipfs...

• https://en.wikipedia.org/wiki/Zipf%27s_law

• https://www.springer.com/in/book/9781402080494

---

[1] distribution follows a logarithm, so the most popular resource is accessed disproportionately more than the second most popular item and so on.

Example is word frequency, modeled as 1/n; second most popular word occurs 50% as much as the first most popular word (1/2), third most popular word occurs 33% as much as the first (1/3) and so on, showing an exponential falloff with a long tail. It thus makes sense to cache the first 10 most popular words as they are going to get accessed more than 90% of the time, giving you the efficiency. Basically this is a form of power law and similar to Pareto Distribution (20% of the things deliver 80% of the result)

[2] rigorously speaking, zeta is the normalized form of Zipf. But practically they are similar enough that people use the terms interchangeably.

fed135 · on Sept 6, 2018

Damn, it's like I don't even need to write the paper at all :) Great research work, it does capture the idea of the project.

vijay_nair · on Sept 7, 2018

Given that this is HN, my comment probably came across as a disappointment and most people were already aware of the surface level details of caching.

Hope I kept them at least mildly entertained while waiting for the real deal to drop : )

fed135 · on Sept 17, 2018

Here's the recently updated Wiki page, it's not super in-depth, but please let me know what you think :)

Labo333 · on Aug 28, 2018

Me too! Sometimes the HN titles have to overexagerate some mathy aspect to receive votes...

fed135 · on Aug 28, 2018

A man's gotta do... The zeta stuff is coming in my white paper real soon and will be summarized in https://github.com/fed135/ha-store/wiki

Cheers :)

glifchits · on Aug 27, 2018

I was just thinking that something similar to this would be really interesting for reading/peer-reviewing scientific papers. Scientific knowledge is built atop of claims made in previous papers, each claim having its own statistical strength. It would be awesome if each footnote in a paper was not just a citation to another publication, but a link to the first time that claim was made, with its p-value and sample size prominently displayed.

schwanray · on Aug 29, 2018

Thank you so much for this awesome post! We did think about scientific papers and journals but are not phd level scientists ourselves. We thought more in terms about how hard it is for journal writers to spread their knowledge to a broader audience. Your point is very interesting and novel. Really wish we could discuss further about it and possibly implement it! Please get in touch!

glifchits · on July 29, 2018

This blog post is good evidence of the quality of thought that exists on the political left, and revives my belief that the left will eventually win the misguided attacks on “political correctness” culture. The comment section reinforces this even further.

_4ziu · on July 29, 2018

While I appreciate the intellect behind this blog post (an unlikely combination), all I am getting from the actual content of the post is that the solution to this problem is actively picking non-white, non-male academic works / courses in a sort of Affirmative Action type of system. I don't doubt the success of Affirmative Action personally, but you cannot be surprised at the comments section and other people in general for not agreeing with this system.

glifchits · on July 29, 2018

Totally agree, this is just about employing affirmative action, how, and why. And yes, I get the criticism but I haven't heard anything convincing as of yet. The only baffling thing to me is that its critics believe that pure meritocracy is still the flagship solution, when the flaws seem so obvious. I think the author does a great job explaining why for this case.

glifchits · on July 3, 2018

Interesting! I figured the `as` in a `with` statement was handled uniquely, but I learned something new about Python today:

   x = open('filename')
   x.closed  # False
   with x:
     print(x.readline())
   x.closed  # True

I think you're right. I prefer the `as` variant for readability.

zb · on July 3, 2018

It's the same for classes that define it like:

    def __enter__(self):
        return self

which is a common pattern (used by open()), but there's no requirement that __enter__() return the same object.

In cases where __enter__() does something different, the assignment expression and the 'as' variable would have different values (the object of the 'with' statement, x, and the result of calling x.__enter__(), respectively).

glifchits · on June 29, 2018

In my opinion, you're one of the most entertaining and approachable writers on mathematical topics like QC. How did you get good at writing?

ScottAaronson · on June 30, 2018

It’s funny: when Philip Roth passed away recently, I was rereading some of his stuff, and thinking to myself, “why am I so terrible at writing?”

If I have any tips, I guess they’d be bend-over-backwards honesty, willingness to make an ass of yourself, practice, and more practice.

glifchits · on May 23, 2018

This is really vague. Javascript is necessary for building rich web applications. What are the uninteresting problems you see as Javascript's main purpose?

glifchits · on May 1, 2018

I wouldn't have checked this out if I knew it would burn a Medium article access for me.