Hacker Newsnew | past | comments | ask | show | jobs | submit | hackinthebochs's commentslogin

LLMs are extrapolation machines. They have some amount of hardcoded knowledge, and they weave a narrative around this knowledgebase while extrapolating claims that are likely given the memorized training data. This extrapolation can be in the form of logical entailment, high probability guesses or just wild guessing. The training regime doesn't distinguish between different kinds of prediction so it never learns to heavily weigh logical entailment and suppress wild guessing. It turns out that much of the text we produce is highly amenable to extrapolation so LLMs learn to be highly effective at bullshitting.

LLMs are a general purpose computing paradigm. LLMs are circuit builders, the converged parameters define pathways through the architecture that pick out specific programs. Or as Karpathy puts it, LLMs are a differentiable computer[1]. Training LLMs discovers programs that well reproduce the input sequence. Roughly the same architecture can generate passable images, music, or even video.

The sequence of matrix multiplications are the high level constraint on the space of programs discoverable. But the specific parameters discovered are what determines the specifics of information flow through the network and hence what program is defined. The complexity of the trained network is emergent, meaning the internal complexity far surpasses that of the course-grained description of the high level matmul sequences. LLMs are not just matmuls and logits.

[1] https://x.com/karpathy/status/1582807367988654081


> LLMs are a general purpose computing paradigm.

Yes, so is logistic regression.


No, not at all.

Yes at all. I think you misunderstand the significance of "general computing". The binary string 01101110 is a general-purpose computer, for example.

No, that's insane. Computing is a dynamic process. A static string is not a computer.

It may be insane, but it's also true.

https://en.wikipedia.org/wiki/Rule_110


Notice that the Rule 110 string picks out a machine, it is not itself the machine. To get computation out of it, you have to actually do computational work, i.e. compare current state, perform operations to generate subsequent state. This doesn't just automatically happen in some non-physical realm once the string is put to paper.

Computation doesn't care about its substrate. A simulation of a computation is just a computation.


>If we don't think the candle in a simulated universe is a "real candle", why do we consider the intelligence in a simulated universe possibly "real intelligence"?

I can smell a "real" candle, a "real" candle can burn my hand. The term real here is just picking out a conceptual schema where its objects can feature as relata of the same laws, like a causal compatibility class defined by a shared causal scope. But this isn't unique to the question of real vs simulated. There are causal scopes all over the place. Subatomic particles are a scope. I, as a particular collection of atoms, am not causally compatible with individual electrons and neutrons. Different conceptual levels have their own causal scopes and their own laws (derivative of more fundamental laws) that determine how these aggregates behave. Real (as distinct from simulated) just identifies causal scopes that are derivative of our privileged scope.

Consciousness is not like the candle because everyone's consciousness is its own unique causal scope. There are psychological laws that determine how we process and respond to information. But each of our minds are causally isolated from one another. We can only know of each other's consciousness by judging behavior. There's nothing privileged about a biological substrate when it comes to determining "real" consciousness.


Right, but doesn't your argument imply that the only "real" consciousness is mine?

I'm not against this conclusion ( https://en.wikipedia.org/wiki/Philosophical_zombie ) but it doesn't seem to be compatible with what most people believe in general.


That's a fair reading but not what I was going for. I'm trying to argue for the irrelevance of causal scope when it comes to determining realness for consciousness. We are right to privilege non-virtual existence when it comes to things whose essential nature is to interact with our physical selves. But since no other consciousness directly physically interacts with ours, it being "real" (as in physically grounded in a compatible causal scope) is not an essential part of its existence.

Determining what is real by judging causal scope is generally successful but it misleads in the case of consciousness.


I don't think causal scope is what makes a virtual candle virtual.

If I make a button that lights the candle, and another button that puts it off, and I press those buttons, then the virtual candle is causally connected to our physical reality world.

But obviously the candle is still considered virtual.

Maybe a candle is not as illustrative, but let's say we're talking about a very realistic and immersive MMORPG. We directly do stuff in the game, and with the right VR hardware it might even feel real, but we call it a virtual reality anyway. Why? And if there's an AI NPC, we say that the NPC's body is virtual -- but when we talk about the AI's intelligence (which at this point is the only AI we know about -- simulated intelligence in computers) why do we not automatically think of this intelligence as virtual in the same way as a virtual candle or a virtual NPC's body?


Yes, causal scope isn't what makes it virtual. It's what makes us say it's not real. The real/virtual dichotomy is what I'm attacking. We treat virtual as the opposite of real, therefore a virtual consciousness is not real consciousness. But this inference is specious. We mistake the causal scope issue for the issue of realness. We say the virtual candle isn't real because it can't burn our hand. What I'm saying is that, actually the virtual candle can't burn our hand because of the disjoint causal scope. But the causal scope doesn't determine what is real, it just determines the space and limitations of potential causal interactions.

Real is about an object having all of the essential properties for that concept. If we take it as essential that candles can burn our hand, then the virtual candle isn't real. But it is not essential to consciousness that it is not virtual.


Your view is missing the forest for the trees. You see individual objects but miss the aggregate whole. You have a hard time conceiving of how exotic computers can be conscious because we are scale chauvinists by design. Our minds engage with the world on certain time and length scales, and so we naturally conceptualize our world based on entities that exist on those scales. But computing is necessarily scale independent. It doesn't matter to the computation if it is running on some 100GHz substrate or .0001Hz. It doesn't matter if its running on a CPU chip the size of a quarter or spread out over the entire planet. Computation is about how information is transformed in semantically meaningful ways. Scale just doesn't matter.

If you were a mind supervening on the behavior of some massive time/space scale computer, how would you know? How could you tell the difference between running on a human making marks with pen and paper and running on a modern CPU? Your experience updates based on information transformations, not based on how fast the fundamental substrate is changing. When your conscious experience changes, that means your current state is substantially different from your prior state and you can recognize this difference. Our human-scale chauvinism gets in the way of properly imagining this. A mind running on a CPU or a large collection of human computers is equally plausible.

A common question people like to ask is "where is the consciousness" in such a system. This is an important question if only because it highlights the futility of such questions. Where is Microsoft Word when it is running on my computer? How can you draw a boundary around a computation when there are a multitude of essential and non-essential parts of the system that work together to construct the relevant causal dynamic. It's just not a well-defined question. There is no one place where Microsoft Word occurs nor is there any one place where consciousness occurs in a system. Is state being properly recorded and correctly leveraged to compute the next state? The consciousness is in this process.


"'where is the consciousness' in such a system": One could ask the same of humans: where is the consciousness? The modern answer is (somewhere) in the brain, and I admit that's likely true. But we have no proof--no evidence, really--that our consciousness is not in some other dimension, and our brains could be receiving different kinds of signals from our souls in that other dimension, like TV sets receive audio and video signals from an old fashioned broadcast TV station.

This brain-receiver idea just isn't a very good theory. For one it increases the complexity of the model without any corresponding increase in explanatory power. The mystery of consciousness remains, except now you have all this extra mechanism involved.

Another issue is that the brain is overly complex for consciousness to just be received from elsewhere. Typically a radio is much less complex than the signal being received, or at least less complex than the potential space of signals it is possible to receive. We don't see that with consciousness. In fact, consciousness seems to be far less complex than the brain that supports it. The issue of the specificity of brain damage and the corresponding specificity in conscious deficits also points away from the receiver idea.


Now do anorexia, bulimia, or any number of social contagions. The difference between being allowed to be who you are vs. being encouraged into a lifestyle is not easy to distinguish.



>Most of the rest of the world subsidizes student tuition so students dont pay much out of pocket.

And they also severely restrict who can attend university. Of course this is a non-starter in the current US political environment.


In my country the only restriction for university is that you have a highschool diploma.

Getting into the medical faculty is harder because the government does pay for everything and training doctors is expensive- for those the university picks the best and brightest.

The government also has programs in place to send out students to Harvard and MIT as the future elite of the nation.


>Yes, and most with a background in linguistics or computer science have been saying the same since the inception of their disciplines

I'm not sure what authority linguists are supposed to have here. They have gotten approximately nowhere in the last 50 years. "Every time I fire a linguist, the performance of the speech recognizer goes up".

>Grammars are sets of rules on symbols and any form of encoding is very restrictive

But these rules can be arbitrarily complex. Hand-coded rules have a pretty severe complexity bounds. But LLMs show these are not in principle limitations. I'm not saying theory has nothing to add, but perhaps we should consider the track record when placing our bets.


I'm very confused by your comment, but appreciate that you have precisely made my point. There are no "bets" with regard to these topics. How do you think a computer works? Do you seriously believe LLMs somehow escape the limitations of the machines they run on?


And what are the limitations of the machines they run on?

We're yet to find any process at all that can't be computed with a Turing machine.

Why do you expect that "intelligence" is a sudden outlier? Do you have an actual reason to expect that?


Is everything really just computation? Gravity is (or can be) the result of a Turing machine churning away somewhere?



>We're yet to find any process at all that can't be computed with a Turing machine.

Life. Consciousness. A soul. Imagination. Reflection. Emotions.


Again: why can't any of that run on a sufficiently capable computer?

I can't help but perceive this as pseudo-profound bullshit. "Real soul and real imagination cannot run on a computer" is a canned "profound" statement with no substance to it whatsoever.

If a hunk of wet meat the size of a melon can do it, then why not a server rack full of nanofabricated silicon?


For the same reason you don't sit and talk with rocks. Nobody understands how it is that wet meat can do these things but rocks cannot. And a computer is a rock. As such, we have no idea whether all the hunks of wet meat in the world can figure out how to transform rocks into wet meat.


You don't?

Modern computers can understand natural language, and can reply in natural language. This isn't even particularly new, we've had voice assistants for over a decade. LLMs are just far better at it.

Again: I see no reason why silicon plates can't do the same exact things a mush of wet meat does. And recent advances in AI sure suggest that they can.


What do you think these in principle limitations are that preclude a computer running the right program from reaching general intelligence?


Just when the "brain doesn't finish developing until 25" nonsense has finally waned from the zeitgeist, here comes a new pile of rubbish for people to latch onto. Not that the research itself is rubbish, but how they name/describe the phases certainly is. The "adolescent" and "adult" phases don't have any correspondence to what we normally think of as those developmental periods. That certainly wont stop anyone from using this as justification for whatever normative claim they want to make though. It's just irresponsible.


LLMs aren't language models, but are a general purpose computing paradigm. LLMs are circuit builders, the converged parameters define pathways through the architecture that pick out specific programs. Or as Karpathy puts it, LLMs are a differentiable computer[1]. Training LLMs discovers programs that well reproduce the input sequence. Roughly the same architecture can generate passable images, music, or even video.

It's not that language generation is all there is to AGI, but that to sufficiently model text that is about the wide range of human experiences, we need to model those experiences. LLMs model the world to varying degrees, and perhaps in the limit of unbounded training data, they can model the human's perspective in it as well.

[1] https://x.com/karpathy/status/1582807367988654081


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: