Plus all your T&S/AI Safety is not solved with translation, you need lexicons and data sets of examples.
Like, people use someone in Malaysia, to label the Arabic spoken by someone playing a video game in Doha - the cultural context is missing.
The best proxy to show the degree of lopsidedness was from this : https://cdt.org/insights/lost-in-translation-large-language-...
Which in turn had to base it on this: https://stats.aclrollingreview.org/submissions/linguistic-di...
From what I am aware of, LLM capability degrades once you move out of English, and many nation states are either building, or considering the option of building their own LLMs.
Plus all your T&S/AI Safety is not solved with translation, you need lexicons and data sets of examples.
Like, people use someone in Malaysia, to label the Arabic spoken by someone playing a video game in Doha - the cultural context is missing.
The best proxy to show the degree of lopsidedness was from this : https://cdt.org/insights/lost-in-translation-large-language-...
Which in turn had to base it on this: https://stats.aclrollingreview.org/submissions/linguistic-di...
From what I am aware of, LLM capability degrades once you move out of English, and many nation states are either building, or considering the option of building their own LLMs.