Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

GPT-4 (I haven't really tested other models) is surprisingly adept at "learning" from examples provided as part of the prompt. This could be due to the same underlying mechanism.


I’ve found the opposite in trying to get it to play Wordle. It’ll repeatedly forget things it’s seemingly learned within the same session, all the while confident in its correctness.


LLMs are trained on 'tokens' derived from 'words' and 'text' and even though there are tokens that are just one letter the bulk is a rough approximation to syllables as though you're trying to create a dictionary to be used for data compression.

It might be more effective to try to play 'tokendle' before trying to play 'wordle'.


Do you know whether LLMs grasp the equivalence of a word expressed as one whole-word token and as a series of single character tokens that spell out the same word? I'm curious if modifying the way some input words are split into tokens could be useful for letter-by-letter reasoning like in Wordle.

Or would an LLM get confused if we were to alter the way the tokenization of the input text is done, since it probably never encountered other token-"spellings" of the same word?


From what I understand it is anything goes, it could be letters or it could be a whole word or even a sentence fragment or a concept ('The United States of America'). Think of it as the dictionary for a compression algorithm and you wouldn't be too far off.

https://www.geeksforgeeks.org/lzw-lempel-ziv-welch-compressi...

For 'code table' substitute 'token table'.


What approach are you using to get the LLM to split words into individual letters?


Not really. That's called few shot learner.

It's basically unrelated to what happens during training, which is using gradients.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: