More

sixfiveotwo · on Dec 11, 2024

How would you define what a memory address is without first defining in which context it has a meaning?

codedokode · on Dec 11, 2024

C was written as a portable assembly language, so I think a memory address is a number that CPU considers to be a memory address.

layer8 · on Dec 11, 2024

That’s currently the case in C, in that you can convert pointers to and from uintptr_t. However, not every number representable in that type needs to be valid memory (that’s true on the assembly level as well), hence it’s only defined for valid pointers.

sixfiveotwo · on Dec 11, 2024

> I think a memory address is a number that CPU considers to be a memory address

I meant to say that, indeed, there must be some concept of CPU for a memory address to have a meaning, and for this concept of CPU to be as widely applicable as possible, surely defining it as abstract as possible is the way to go. Ergo, the idea of a C abstract machine.

Anyway, other people in this thread are discussing the matter more accurately and in more details than I could hope to do, so I'll leave it like that.

sixfiveotwo · on Nov 22, 2024

I think the article briefly touch on that topic at some point:

> For one, gpt-3.5-turbo-instruct rarely suggests illegal moves, even in the late game. This requires “understanding” chess. If this doesn’t convince you, I encourage you to write a program that can take strings like 1. e4 d5 2. exd5 Qxd5 3. Nc3 and then say if the last move was legal.

However, I can't say if LLMs fall in the "statistical AI" category.

sixfiveotwo · on Nov 22, 2024

> Whereas the LLM makes "moves" that clearly indicate no ability to play chess: moving pieces to squares well outside their legal moveset, moving pieces that aren't on the board, etc.

Do you have any evidence of that? TFA doesn't talk about the nature of these errors.

krainboltgreene · on Nov 22, 2024

Yeah like several hundred "Chess IM/GMs react to ChatGPT playing chess" videos on youtube.

sixfiveotwo · on Nov 22, 2024

Very strange, I cannot spot any specifically saying that ChatGPT cheated or played an illegal move. Can you help?

krainboltgreene · on Nov 23, 2024

https://www.youtube.com/watch?v=iWhlrkfJrCQ He has quite a few of these.

sixfiveotwo · on Nov 23, 2024

> Yeah like several hundred "Chess IM/GMs react to ChatGPT playing chess" videos on youtube.

If I were to take that sentence literally, I would ask for at least 199 other examples, but I imagine that it was just a figure of speech. Nevertheless, if that's only one player complaining (even several times), can we really conclude that ChatGPT cannot play? Is that enough evidence, or is there something else at work?

I suppose indeed one could, if one expected an LLM to be ready to play out of the box, and that would be a fair criticism.

krainboltgreene · on Nov 24, 2024

I really wish I hadn't replied to you.

sixfiveotwo · on Nov 26, 2024

I'm sorry if you feel that way.

I am in no way trying to judge you; rather, I'm trying to get closer to the truth in that matter, and your input is valuable, as it points out a discrepancy wrt TFA, but it is also subject to caution, since it reports the results of only one chess player (right?). Furthermore, both in the case of TFA and this youtuber, we don't have full access to their whole experiments, so we can't reproduce the results, nor can we try to understand why there is a difference.

I might very well be mistaken though, and I am open to criticisms and corrections, of course.

SonOfLilit · on Nov 22, 2024

But clearly the author got his GPT to play orders of magnitude better than in those videos

krainboltgreene · on Nov 23, 2024

In no way is it clear, there's no evidence.

sixfiveotwo · on Nov 22, 2024

> Machines are good at applying rules, so when they fail to apply rules correctly, it means they have incorrect, incomplete, or totally absent models.

That's assuming that, somehow, a LLM is a machine. Why would you think that?

photonthug · on Nov 22, 2024

Replace the word with one of your own choice if that will help us get to the part where you have a point to make?

I think we are discussing whether LLMs can emulate chess playing machines, regardless of whether they are actually literally composed of a flock of stochastic parrots..

sixfiveotwo · on Nov 22, 2024

That's simple logic. Quoting you again:

> Machines are good at applying rules, so when they fail to apply rules correctly, it means they have incorrect, incomplete, or totally absent models.

If this line of reasoning applies to machines, but LLMs aren't machines, how can you derive any of these claims?

"A implies B" may be right, but you must first demonstrate A before reaching conclusion B..

> I think we are discussing whether LLMs can emulate chess playing machines

That is incorrect. We're discussing whether LLMs can play chess. Unless you think that human players also emulate chess playing machines?

XenophileJKO · on Nov 22, 2024

Engineers really have a hard time coming to terms with probabilistic systems.

sixfiveotwo · on Nov 22, 2024

It's a good point.

But this math analogy is not quite appropriate: there's abstract math and arithmetic. A good math practitioner (LLM or human) can be bad at arithmetic, yet good at abstract reasoning. The later doesn't (necessarily) requires the former.

In chess, I don't think that you can build a good strategy if it relies on illegal moves, because tactics and strategies are tied.

sixfiveotwo · on Nov 21, 2024

Quite interesting, thank you.

However, in this specific instance, even if the text cannot be changed, couldn't the error itself in the server be processed and signaled differently, eg. by returning a Status Code 413[1], since clients ought to recognize that status code anyway?

[1]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/413

majewsky · on Nov 21, 2024

Since the caller gets this as an error object, instead of as a plain string, it seems likely that this is within the same process, i.e. a library function returns the MaxBytesError to a level higher in the business logic, without a network transmission inbetween.

sixfiveotwo · on Nov 9, 2024

> I spend much longer trying to figure out how to get 1-trit training to work and I never could.

What did you try? What were the research directions at the time?

llm_trw · on Nov 9, 2024

This is a big question that needs a research paper worth of explanation. Feel free to email me if you care enough to have a more in-depth discussion.

sixfiveotwo · on Nov 9, 2024

Sorry, I understand it was a bit intrusively direct. To bring some context, I toyed a little with neural networks a few years ago and wondered myself about this topic of training a so called quantized network (I wanted to write a small multilayer perceptron based library parameterized by the coefficient type - floating point or integer of different precision), but didn't implement it. Since you mentioned your own work in that area, it picked my interest, but I don't want to waste your time unnecessarily.

llm_trw · on Nov 10, 2024

Someone posted a paper that I didn't know about, but goes through pretty much all the work I did in the space: https://news.ycombinator.com/item?id=42095999

It's missing the colourful commentary that I'd usually give, but alas, we can't have it all.

sixfiveotwo · on Nov 17, 2024

thank you, that looks awesome.

sixfiveotwo · on Nov 7, 2024

Indeed, you can get a lot more from dependent types than Damas-Hindley-Milner inference, yet does it mean that you should use the former everywhere?

sixfiveotwo · on Nov 4, 2024

My intuition when reading the first lines of this article was that, just like when searching exhaustively for the correct combination on a padlock, one would cycle through each subgroup, where each of them would represent a digit on the lock. On the lock, one would do 9 steps (not 10, as this would loop the lock to a previously seen combination) on the least significant digit, then propagate the carry to the next digits. But it seems that this more complicated than that, as the steps at which subgroups connect (the carry) are not always the same?

sixfiveotwo · on Nov 4, 2024

Clearly that first intuition doesn't work. The Hamiltonian cycle for decimal numbers is perhaps an equivalent of grey code? And if it exists, is there a connection with the Rubik's cube cycle?

robryk · on Nov 4, 2024

The reason why it's not so simple is that various operations on the cube do not commute (whereas rotations of different wheels on a combination lock do).

sixfiveotwo · on Nov 4, 2024

Yes indeed, I realized that it was way more complicated than what I initially imagined.

When I first read the article, the sequence of subgroups that were described evoked that image of a combination lock to me:

< UR >

< U, R >

< U, R, D >

< U, R, D, L >

< U, R, D, L, F >

The behavior of the basic operations on the cube reminds me of the product of quaternion base vectors (i,j,k). For instance, the product of i and j would yield either k or -k depending on the order of i and j. I think the point I wanted to make is that on a combination lock, each operation on a wheel only affect that wheel, not the others, so one cannot produce another operation by combining several of them, like what we see with quaternions. However, on the cube, it is often possible to go from one combination to another by different sequences of different operations.

But that may not matter much, if all we care about is going through every possible combination exactly once, just like what one does when using gray code on binary numbers (which is why I alluded to that in my other post), and that for that purpose we can find a set of sequences of operations - let's call them large operations - that are orthogonal (and thus emulating the rotating wheel aspect of the combination lock). I suppose that these subgroups represent the large operations. The problem you bring up now is that these large operations are not commutative, and so finding a correct way to apply them to build the circuit is more involved than simply spinning the wheels on a lock.

Is that correct?

Edit: I just had a first look at cayley graphs on wikipedia, and they use quaternion rotations as an example!

robryk · on Nov 11, 2024

I think you are on the right track (sorry, I did not verify all the individual statements). If you weren't already aware of them, you might wish to learn what normal subgroups are, see how you can have a subgroup that's not a normal subgroup (and probably see what cosets are at the same time), and see how does dividing a group by a normal subgroup (to yield a group) work and what properties it has.

sixfiveotwo · on Sept 29, 2024

Perhaps the problem could be different. Wouldn't the energy gathered from renewable sources otherwise be used by nature? At which point will the amount we capture be large enough to impact sognificantly the ecosystem?