The visualizations seem to show non-recurrent networks whereas my understanding ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		aeternum on July 27, 2020 \| parent \| context \| favorite \| on: How GPT3 Works – Visualizations and Animations The visualizations seem to show non-recurrent networks whereas my understanding is that one of the important differences between GPT1 and GPT2 & 3 is the use of recurrent networks. This allows the output to loop backwards, providing a rudimentary form of memory / context beyond just the input vector.

sdrg822 on July 27, 2020 [–]

While models such as XLNet incorporate recurrence, GPT-{2,3} is mostly just a plain decoder-only transformer model.[1]

[1]https://arxiv.org/abs/2005.14165 [2]https://d4mucfpksywv.cloudfront.net/better-language-models/l...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact