Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think the big achievement is how it surpassed in performance previous models for each individual modality.


That's not true for NLU at least. It is on par with 2018's RoBERTa on GLUE, many larger and advanced language models came after.

It is still great work though, a robust masking representation architecture that works across modalities.


They mention that on the article:

> We apply data2vec separately to speech, images and text and it outperformed the previous best single-purpose algorithms for computer vision and speech and it is competitive on NLP tasks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: