Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Something I do sometimes is:

- Have an AI chat model come up with an answer to a problem.

- Have it write a report discussing the details of the problem and why it's answer is correct, directed at a person or AI model who has no knowledge of the initial problem or technical field.

- Have a second AI model with no knowledge of the problem grade the report, and write it's own report either (a) asking for clarification / more information about the problem that the original model didn't provide or (b) pointing out an inconsistency in the argument posed by the original model. Give this report back to the original model and ask it to write it's own report back with either the necessary information or changes.

- Repeat until either the second AI model is convinced by the first AI model's explanation or the first AI model has implemented all the changes requested by the second AI model.

It's super clunky but has given pretty good results in the cases where I tried it lol



Ah, now we know why Spain was out of electricity yesterday.


Here I was thinking cryptocurrency pre-heated the grids (and GPU manufacturing) for us already.


Oh that was a good one XD


For anything semi-adversarial, I have had good results asking the AI to come up with a plan, then take the side of the opponent coming with counter play/way to defeat the plan, finally asking for a revision of the initial plan given the potential reaction from the opponent.

The final plan you obtain is generally a lot more well rounded and thought out.

I find that amusing because the technique also works when I apply it to me. Picking flaws in your plan before revisiting it actually works.


To be honest, this is what I assumed this repo was doing from the title. It talks about arguing with itself, but it looks like it's just generating multiple alternative responses in parallel and selecting the best one.

Do you find your method handles "sycophancy" well?


I don’t really know.

I stopped using ChatGPT at some point because I disliked how cagey it became about a lot of topics. I used to enjoy making write improbable movies mashup when GPT3 was released and at some point it became very touchy about IP rights and violence which was annoying.

I generally use Deepseek nowadays which is not sycophantic and surprisingly doesn’t seem as censored to me especially if you use a version not hosted by Deepseek themselves.


Which hosting service would you recommend?


I do the same, and I have one other technique.

I will often have a few chats going for a project, but with different contexts. For example, one might be tech focused, another marketing focused, another with some context on my personal goals, etc.

So I will take the same question and feed it into the chats with differing context. It is almost like having different perspectives on the same problem. And the conclusions can often differ based on the differing contexts.


This is how I’ve been using Gemini and it’s the first time I’m really seeing consistent value.

I’ll get a context into a solid place with as much information as I can about a project. Usually getting up to 100k tokens.

Then I ask it to give me a summary I can use in a fresh chat, that will maintain the current context. This lets me reclaim space, bring responsiveness back to sane levels, have a baseline chat I use to spin up branches for marketing, design (it’s pretty helpful at trouble shooting Substance Designer graphs), etc.

I’ve found myself going into sub branches from there… like a marketing context that pushes branches into different marketing channels.


This reminds me a lot of the YT video that went over using Monte Carlo Tree Search with LLMs to maximize result quality. Link: https://www.youtube.com/watch?v=mfAV_bigdRA&ab_channel=Treli...

It seemed like a pretty good idea, though I'd guess that it would greatly increase token usage. I'd also be concerned that the LLM as a judge might struggle to grade things accurately if it wasn't also able to generate good enough answers to begin with.


If you think about marginal cost, such experiments can be run almost at only the cost of extra electricity used for that computation, which in Europe is often zero, at least by the ones who own the compute.


Kagi’s Assistant feature makes this super easy. Just switch assistants and ask them to check the other’s work.


How?


Ask the AI assistant for instructions.

Pretty soon we'll have new acronyms such as "IDKATFAIA" ["I don't know, ask the f'ing AI already"] as we all succumb to the knowledge soup.


RTFP


Read The Fine Prompt, more or less, right?


Honestly, the AI assistant isn't as smart as I thought - I'm still having to check its work.


I do it all the time in Sillytavern in a group chat - three characters kind of resembling what you just described, and me, participating in the "conversation", them going back and forth until they're satisfied.

With a good model role playing them, works awesome.


We're there any situation that first conclusion from AI was completely changed? Can you give generally examples of situations where it changed or significantly improved overall result? It sounds cool.


I would be interested to know how ofter "oscillations" occur, where they flip flop from being too "agreeable" to challenges (which probably is just a sparse latent space). This happens to me pretty frequently, where you can repeatedly say "no that's wrong" and the LLM will do a 180, explaining why it was "in fact" wrong and you are "right", repeat.


Isn't this kind of another way of how Inference Time Scaling works? It will basically produce several chain of thoughts and then pursue one that has maximum reward based on an internal function?


I've wondered if it might be helpful to randomly "shard" training data between two LLMs; just feed half the training data to one, and the rest to the other, with no overlap.

So instead of using two models, you'd be making two halves of one model do a similar (deliberative) process to yours. I wonder if that would result in a benefit over a single model with the full training set, and if you could continue to do the same thing by sharding the shards.


There's some precedent for that: you can do some useful things with the cross entropy of the two models. And k-fold cross validation might also be relevant.


This takes such a long time to do though, no? What problems does this save you time on?


i dont understand, is it doing your schoolwork?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: