balderdash?
"Q-star". Yes, the Q as in q-learning -- optimize a long term goal. The "star points" are the embedded algorithms discovered and joined within the transformer/NN architecture. Stars where formed after SGD discovered the best representation of said embedded alg type.
I'm running a scaled down version myself -- somewhat impressive. Do it at 1k B parameters? hold my beer.
yes, they stole swipekit.com "from me" two days after I searched for it on the godaddy site (in 2014). Last I checked it was owned by one of their employees (who has been taken to court for similar troubles) can't we f'n do something about that? or suppose that's business as usual
[Ilya Sutskever’s Safe Superintelligence expands in Tel Aviv] https://www.calcalistech.com/ctechnews/article/rk3schwk1x