That’s currently the case in C, in that you can convert pointers to and from uintptr_t. However, not every number representable in that type needs to be valid memory (that’s true on the assembly level as well), hence it’s only defined for valid pointers.
> I think a memory address is a number that CPU considers to be a memory address
I meant to say that, indeed, there must be some concept of CPU for a memory address to have a meaning, and for this concept of CPU to be as widely applicable as possible, surely defining it as abstract as possible is the way to go. Ergo, the idea of a C abstract machine.
Anyway, other people in this thread are discussing the matter more accurately and in more details than I could hope to do, so I'll leave it like that.
I think the article briefly touch on that topic at some point:
> For one, gpt-3.5-turbo-instruct rarely suggests illegal moves, even in the late game. This requires “understanding” chess. If this doesn’t convince you, I encourage you to write a program that can take strings like 1. e4 d5 2. exd5 Qxd5 3. Nc3 and then say if the last move was legal.
However, I can't say if LLMs fall in the "statistical AI" category.
> Whereas the LLM makes "moves" that clearly indicate no ability to play chess: moving pieces to squares well outside their legal moveset, moving pieces that aren't on the board, etc.
Do you have any evidence of that? TFA doesn't talk about the nature of these errors.
> Yeah like several hundred "Chess IM/GMs react to ChatGPT playing chess" videos on youtube.
If I were to take that sentence literally, I would ask for at least 199 other examples, but I imagine that it was just a figure of speech. Nevertheless, if that's only one player complaining (even several times), can we really conclude that ChatGPT cannot play? Is that enough evidence, or is there something else at work?
I suppose indeed one could, if one expected an LLM to be ready to play out of the box, and that would be a fair criticism.
I am in no way trying to judge you; rather, I'm trying to get closer to the truth in that matter, and your input is valuable, as it points out a discrepancy wrt TFA, but it is also subject to caution, since it reports the results of only one chess player (right?). Furthermore, both in the case of TFA and this youtuber, we don't have full access to their whole experiments, so we can't reproduce the results, nor can we try to understand why there is a difference.
I might very well be mistaken though, and I am open to criticisms and corrections, of course.
Replace the word with one of your own choice if that will help us get to the part where you have a point to make?
I think we are discussing whether LLMs can emulate chess playing machines, regardless of whether they are actually literally composed of a flock of stochastic parrots..
But this math analogy is not quite appropriate: there's abstract math and arithmetic. A good math practitioner (LLM or human) can be bad at arithmetic, yet good at abstract reasoning. The later doesn't (necessarily) requires the former.
In chess, I don't think that you can build a good strategy if it relies on illegal moves, because tactics and strategies are tied.
However, in this specific instance, even if the text cannot be changed, couldn't the error itself in the server be processed and signaled differently, eg. by returning a Status Code 413[1], since clients ought to recognize that status code anyway?
Since the caller gets this as an error object, instead of as a plain string, it seems likely that this is within the same process, i.e. a library function returns the MaxBytesError to a level higher in the business logic, without a network transmission inbetween.
Sorry, I understand it was a bit intrusively direct. To bring some context, I toyed a little with neural networks a few years ago and wondered myself about this topic of training a so called quantized network (I wanted to write a small multilayer perceptron based library parameterized by the coefficient type - floating point or integer of different precision), but didn't implement it. Since you mentioned your own work in that area, it picked my interest, but I don't want to waste your time unnecessarily.
My intuition when reading the first lines of this article was that, just like when searching exhaustively for the correct combination on a padlock, one would cycle through each subgroup, where each of them would represent a digit on the lock. On the lock, one would do 9 steps (not 10, as this would loop the lock to a previously seen combination) on the least significant digit, then propagate the carry to the next digits. But it seems that this more complicated than that, as the steps at which subgroups connect (the carry) are not always the same?
Clearly that first intuition doesn't work. The Hamiltonian cycle for decimal numbers is perhaps an equivalent of grey code? And if it exists, is there a connection with the Rubik's cube cycle?
The reason why it's not so simple is that various operations on the cube do not commute (whereas rotations of different wheels on a combination lock do).
Yes indeed, I realized that it was way more complicated than what I initially imagined.
When I first read the article, the sequence of subgroups that were described evoked that image of a combination lock to me:
< UR >
< U, R >
< U, R, D >
< U, R, D, L >
< U, R, D, L, F >
The behavior of the basic operations on the cube reminds me of the product of quaternion base vectors (i,j,k). For instance, the product of i and j would yield either k or -k depending on the order of i and j. I think the point I wanted to make is that on a combination lock, each operation on a wheel only affect that wheel, not the others, so one cannot produce another operation by combining several of them, like what we see with quaternions. However, on the cube, it is often possible to go from one combination to another by different sequences of different operations.
But that may not matter much, if all we care about is going through every possible combination exactly once, just like what one does when using gray code on binary numbers (which is why I alluded to that in my other post), and that for that purpose we can find a set of sequences of operations - let's call them large operations - that are orthogonal (and thus emulating the rotating wheel aspect of the combination lock). I suppose that these subgroups represent the large operations. The problem you bring up now is that these large operations are not commutative, and so finding a correct way to apply them to build the circuit is more involved than simply spinning the wheels on a lock.
Is that correct?
Edit: I just had a first look at cayley graphs on wikipedia, and they use quaternion rotations as an example!
I think you are on the right track (sorry, I did not verify all the individual statements). If you weren't already aware of them, you might wish to learn what normal subgroups are, see how you can have a subgroup that's not a normal subgroup (and probably see what cosets are at the same time), and see how does dividing a group by a normal subgroup (to yield a group) work and what properties it has.
Perhaps the problem could be different. Wouldn't the energy gathered from renewable sources otherwise be used by nature? At which point will the amount we capture be large enough to impact sognificantly the ecosystem?