It was for a dynamically growing ring buffer that also did short-object optimization. The natural implementation was to have the capacity and the offsets stored in fixed locations and with a fixed width, and have the variable part be a union of pointer or inline byte buffer.
Depending on the element width, you'd have space for different amounts of data in the inline buffer. Sometimes 1, sometimes a few more. Specializing for a one-element inline buffer would be quite complex with limited gains.
In retrospect trying to use that as a running gag for the blog post did not work well without actually giving the full context, but the full context would have been a distraction.
There are plenty. But it's not the comparison you want to be making. There is too much variability between the number of tokens used for a single response, especially once reasoning models became a thing. And it gets even worse when you put the models into a variable length output loop.
You really need to look at the cost per task. artificialanalysis.ai has a good composite score, measures the cost of running all the benchmarks, and has 2d a intelligence vs. cost graph.
The _Science_ paper linked is paywalled, is anyone aware of a preprint?
I find it a bit curious that they've chosen to use SMS verifications as a proxy for the difficulty of creating an account, when there are similar marketplaces for selling the actual end product of bulk-created accounts. Was there some issue with that kind of data? SMS verification is just one part of the anti-bulk account puzzle, for both the attacker and defender.
The infinite grid is a cool idea, but the game as a whole did not feel very engaging. There was no way I was finishing even one full screen of this, which is the bare minimum threshold for getting any value from the infinite grid. I'm not a word search guy, but it feels like the idea might work better with a different puzzle type.
The drag to pan, hold-then-drag to mark felt really clumsy with a mouse. The hold delay is just too long. Maybe consider drag-with-left-button to mark, click/drag-with-right button to pan.
The word list seems odd, half the words I tried were rejected. You don't need many failures to accept a reasonable word to lose faith in the game. Is the idea that only the words at the bottom are in the dictionary? Doesn't feel like it can be, given it only shows about 20 words, is not scrollable in any way I can see, and many words not shown in that list were actually accepted.
As a small point of order, they did not get banned for "finding CSAM" like the outrage- and clickbait title claims. They got banned for uploading a data set containing child porn to Google Drive. They did not find it themselves, and them later reporting the data set to an appropriate organization is not why they got banned.
I’m the person who got banned. And just to be clear: the only reason I have my account back is because 404 Media covered it. Nobody else would touch the story because it happened to a nobody. There are probably a lot of “nobodies” in this thread who might someday need a reporter like Emanuel Maiberg to actually step in. I’m grateful he did.
The dataset had been online for six years. In my appeal I told Google exactly where the data came from — they ignored it. I was the one who reported it to C3P, and that’s why it finally came down. Even after Google flagged my Drive, the dataset stayed up for another two months.
So this idea that Google “did a good thing” and 404 somehow did something wrong is just absurd.
>They got banned for uploading child porn to Google Drive
They uploaded the full "widely-used" training dataset, which happened to include CSAM (child sexual abuse material).
While the title of the article is not great, your wording here implies that they purposefully uploaded some independent CSAM pictures, which is not accurate.
No but "They got banned for uploading child porn to Google Drive" is a correct framing and "google banned a developer for finding child porn" is incorrect.
There is important additional context around it, of course, which mitigates (should remove) any criminal legal implications, and should also result in google unsuspending his account in a reasonable timeframe but what happened is also reasonable. Google does automated scans of all data uploaded to drive and caught CP images being uploaded (presumably via hashes from something like NCMEC?) and banned the user. Totally reasonable thing. Google should have an appeal process where a reasonable human can look at it and say "oh shit the guy just uploaded 100m AI training images and 7 of them were CP, he's not a pedo, unban him, ask him not to do it again and report this to someone."
The headline frames it like the story was "A developer found CP in AI training data from google and banned him in retaliation for reporting it." Totally disingenuous framing of the situation.
"There is important additional context around it, of course,"
Indeed, which is why a comment that has infinitely more room to expand on the context should include that context when they are criticizing the title for being misleading.
Both the title and the comment I replied to are misleading. One because of the framing, the other because of the deliberate exclusion of extremely important context.
Imagine if someone accused you of "Uploading CSAM to Google Drive" without any other context. It's one of the most serious accusations possible! Adding like five extra words of context to make it clear that you are not a pedophile trafficking CSAM is not that much of an ask.
Fair enough. I'd already included the fact about it being a data set in the post once, which seemed clear enough especially when my actual point was that the author did not "find" the CSAM, and by implication were not aware of it. But I have edited the message and added a repetition of it.
I bet the journalists and editors working for 404 will not correct their intentionally misleading headline. Why hold a random forum post buried in the middle of a large thread to a higher standard then the professionals writing headlines shown in 30-point font on the frontpage of their publication?
>Why hold a random forum post buried in the middle of a large thread to a higher standard then the professionals writing headlines shown in 30-point font on the frontpage of their publication?
How many times do I need to repeat that I agree the headline is misleading? Yes, the article here has a shit title. You already made that point, I have already agreed to that point.
If I had an easy and direct line to the editor who came up with the title, I would point that out to them. Unfortunately they aren't on HN, that I'm aware, or I could also write a comment to them similar to yours.
HN already needlessly rewrites headlines with automation and it's more annoying to see automation go stupidly wrong than letting the original imperfect situation stand. Having outrage about headlines is a choice.
My browser integrates an LLM, so I asked it to restate the headline of this one, and it came up with "Developer Suspended by Google After Uploading AI Dataset Containing CSAM" which seems pretty even-handed. Of course, I would want to dial the snark to 11. Many hacker stories can be headlined "Developer discovers that C still sucks" etc.
> Nobody who is doing this is willing to come clean with hard numbers but there are data points, for example from Meta and (very unofficially) Google.
The Meta link does not support the point. It's actually implying a MTBF of over 5 years at 90% utilizization even if you assume there's no bathtub curve. Pretty sure that lines up with the depreciation period.
That article makes a big claim, does not link to any source. It vaguely describes the source, but nobody who was actually in that role would describe themselves as the "GenAI principal architect at Alphabet". Like, those are not the words they would use. It would also be pointless to try to stay anonymous if that really were your title.
That is not merely an unofficial source. That is just made up trash that the blog author just lapped up despite its obviously unreliable nature, since it confirmed his beliefs.
Besides, if the claim about GPU wear-and-tear was true, this would show up consistently in GPUs sourced from cryptomining (which was generally done in makeshift compute centers with terrible cooling and other environmental factors) and it just doesn't.
> It's actually implying a MTBF of over 5 years [...] Pretty sure that lines up with the depreciation period.
You're assuming this is normal, for the MTBF to line up with the depreciation schedule. But the MTBF of data center hardware is usually quite a bit longer than the depreciation schedule right? If I recall correctly, for servers it's typically double or triple, roughly. Maybe less for GPUs, I'm not directly familiar, but a quick web search suggests these periods shouldn't line up for GPUs either.
Google is using nVidia GPUs. More than that, I'd expect Google to still be something like 90% on nVidia GPUs. You can't really check of course. Maybe I'm an idiot and it's 50%.
But you can see how that works: go to colab.research.google.com. Type in some code ... "!nvidia-smi" for instance. Click on the down arrow next to "connect", and select change runtime type. 3 out of 5 GPU options are nVidia GPUs.
Frankly, unless you rewrite your models you don't really have a choice but using nVidia GPUs, thanks to, ironically, Facebook (authors of pytorch). There is pytorch/XLA automatic translation to TPU but it doesn't work for "big" models. And as a point of advice: you want stuff to work on TPUs? Do what Googlers do: use Jax ( https://github.com/jax-ml/jax ), oh, and look at the commit logs of that repository to get your mind blown btw.
In other words, Google rents out nVidia GPUs to their cloud customers (with the hardware physically present in Google datacenters).
> Frankly, unless you rewrite your models you don't really have a choice but using nVidia GPUs, thanks to, ironically, Facebook (authors of pytorch). There is pytorch/XLA automatic translation to TPU but it doesn't work for "big" models. And as a point of advice: you want stuff to work on TPUs?
I don't understand what you mean, most models aren't anywhere near big in terms of code complexity, once you have the efficient primitives to build on (like you have an efficient hardware-accerated matmul, backprop, flash attention, etc.) these models are in the sub-thousand LoC territory and you can even vibe-convert from one environment to another.
That's kind of a shock to realize how simple the logic behind LLMs is.
I still agree with you, Google is most likely still using Nvidia chips in addition to TPUs.
> I don't understand what you mean, most models aren't anywhere near big in terms of code complexity, once you have the efficient primitives to build on (like you have an efficient hardware-accerated matmul, backprop, flash attention, etc.) these models are in the sub-thousand LoC territory and you can even vibe-convert from one environment to another.
You're right but that doesn't work. Transformers won't perform well without an endless series of tricks. So endless you can't write that series of tricks. You can't initialize the network correctly when starting from scratch. You can't do the basic training that makes the models good (ie. the trillions of tokens). Flash attention, well that's 2022, it's cuda assembly, and only works on nVidia. Now there's 6 versions of flash attention, all of which are written in Cuda Assembly. It's also only fast on nvidia.
So what do you do? Well you, as they say "start with a backbone". That used to always be a llama model, but Qwen is making serious inroads.
The scary part is that this is what you do for everything now. After all, llama and Qwen are text transformers. They answer "where is Paris?". They don't do text-speech, speech recognition, object tracking, classification, time series, image-in or out, OCR, ... and yet all SOTA approaches to all of these can be only slightly inaccurately described as "llama/qwen with a different encoder at the start".
That even has the big advantage that mixing becomes easy. All encoders produce a stream of tokens. The same tokens. So you can "just" have a text encoder, a sound encoder, an image encoder, a time series encoder and just concatenate (it's not quite that simple, but ...) the tokens together. That actually works!
So you need llama or Qwen to work, not just the inference but the training and finetuning, with all the tricks, not just flash attention, half of which are written in cuda assembly, because that's what you start from. Speech recognition? SOTA is taking sounds -> "encoding" into phonemes -> have Qwen correct it. Of course, you prefer to run the literal exact training code from ... well from either Facebook or Alibaba, with as little modifications as possible, which of course means nvidia.
reply