textbox.value = guess;
guessWord();
if (document.querySelector(".correct")) {
console.log("Found the word:", guess);
break;
} else if (textbox.placeholder.includes("after")) {
l = m + 1;
} else {
h = m - 1;
}
}
Here's mine in JavaScript, you can paste it in the console.
Thanks! I should have realised that a solution for this could be implemented in JavaScript as well, allowing it to run directly in the web browser. Here is my translation of my earlier Python program to JavaScript:
let lo = 0, hi = dictionary.length - 1
const answer = document.getElementById('guess')
while (document.getElementsByClassName('correct').length === 0) {
const mid = Math.floor(lo + (hi - lo) / 2)
answer.value = dictionary[mid]
guessWord()
if (answer.placeholder.indexOf('after') !== -1) {
lo = mid + 1
} else {
hi = mid - 1
}
}
This solution is quite similar to yours. Thanks for this nice idea!
Nahhh, that's the tried-and-true Apple approach to marketing, and OpenAI is well positioned to adopt it for themselves. They act like they invented transformers as much as Apple acts like they invented the smartphone.
I'm the research lead of Anthropic's interpretability team. I've seen some comments like this one, which I worry downplay the importance of @leogao et al's paper due to the similarity of ours. I think these comments are really undervaluing Gao et al's work.
It's not just that this is contemporaneous work (a project like this takes many months at the very least), but also that it introduces a number of novel contributions like TopK activations and new evaluations. It seems very possible that some of these innovations will be very important for this line of work going forward.
More generally, I think it's really unfortunate when we don't value contemporaneous work or replications. Prior to this paper, one could have imagined it being the case that sparse autoencoders worked on Claude due some idiosyncracy, but wouldn't work on other frontier models for some reason. This paper can give us increased confidence that they work broadly, and that in itself is something to celebrate. It gives us a more stable foundation to build on.
I'm personally really grateful to all the authors of this paper for their work pushing sparse autoencoders and mechanistic interpretability forward.
The biggest thing I noticed comparing the two was that OpenAI's method really approached (and appears to have effectively mitigated) the dead latents problem with a clever weight initialization and an "auxiliary loss" which (I think) explicitly penalizes dead latents. The TopK activation function is the other main difference I spot between the two.
Now, on the flip side, the Anthropic effort goes much further than the OpenAI one in terms of actually doing something interesting with the outputs of all this. Feature steering and the feature UMAP are both extremely cool, and to my knowledge the OpenAI team stopped short of efforts like that in their paper.