Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Google: Auto-translated content not indexed (languageops.com)
42 points by luxpir on Nov 12, 2021 | hide | past | favorite | 28 comments


i can say of myself that I kinda pioneered "auto translating on scale for SEO traffic" in 2008, 2009. at that time I was working for a startup incubator and one of our internal clients was tripwolf.com.

we had major success with an automated, aggregated SEO strategy for 123people.com and wanted to apply the learnings to the travel information space.

so we got high quality content from a lot of travel guide publishing houses and together with some other aggregation of yellow pages we translated the mostly german base content into en, fr, es, pt, .....

and it worked.

like crazy. for a short time we attracted more traffic than tripadvisor and yelp together (based on the competitive traffic data we had at that time). traffic (and my ego) exploded. we also did not go against any google guidelines, other than one: if you are spammy, you are spammy.

the google guidelines were updated (automated translated content seen as "pure spam") and next days the hammer of google came down, on different section of the websites, different markets and on and on. (the company much later migrated to a native app business model and was quite successful for a few years). the portuguese content worked the longest, even a few years later still got substantial traffic.

to my knowledge we were the biggest auto translate player at that time and we did it better than anyone else.

but all in all, it was 2008/09 and at that time for online startups the difference between traffic channel and product was not yet a known given. getting traffic via google + ads were seen as a sustainable business case. it was not. we had no real product, too much focus on SEO. so all in all, that strategy was longterm negative value.

nowadays I refuse to take any clients who do not have at least an MVP in place. cause even more traffic than your servers can handle will not save your startup, ever.


Now that you admitted this I'll admit I feel a mighty urge to downvote and/or flag you since I hate this kind of content deeply.

That said I won't do it and I'll rather want people to tell these stories instead of keeping quiet.

That said: everyone else, stop now before we need to dust off the venerable LOIC technology and nuke you from orbit ;-)

(just joking, I won't do that but I also won't do business with anyone who pollutes the Internet with that kind of cr@p if I have alternatives.)


Why do you hate content made available in another language through automatic means? I've read lots of things auto-translated into English and it's always at least acceptable.


Translation from languages I don't know to English is fine with me.

Translation from well written text in languages I know to absolute garbage in my native language, that is what I hate.

Edit: I see you have a point. I missed the fact that GP was translating German to English.

I guess people who knows German slightly better than me (I can make myself understood and have used it as a working language for a very short while) will hate auto translated German almost as much as I and others hate autotranslated English, but not quite as much:

A lot of the problem with autotranslated text is that the physical thing or software we are supoosed to use still has English user interfaces so we are left trying to decode what the original text said.


There are obviously translation tools out there if you want to use them for yourself. But it seems fairly obvious to me that it's not desirable to flood the web and search results with a bunch of barely/sort-of understandable text that is the output of ML translation.


doesn't work like that now for about a decade, and thats a good thing.

it was at a time when I saw search (engine optimisation) mostly as a technical challenge. it's not - anymore, if it ever was.


I see this a little differently. When I'm in a different country like Japan and I search for something, the results are very relevant.

When I go home to the US and search "something Japan" or "something site:*.jp" the results are worse than useless and completely different from what I get in Japan.

The internet is supposed to be our universal bridge, but here it is unwittingly segmenting us into fractured universes presented as uniform.


It's a bit annoying that the default it like that, but you can change it. On the Google search results page, click on the gear button, go to `Search settings`, and set `Region Settings` to Japan.


Which for whatever reasons doesn't really work very well. You can request that the region be different, but it doesn't seem that the setting is uniformly respected or doesn't work in a useful way.

I ran into this when I was stuck using a connection that was detected as non-USA and was shared by people who would also be non-USA. I gave up using google search because I just could not get google anything to respect my requested region in a useful way. Even google maps on the web was problematic because it would constantly try to move the view right back to where it thought I was. It's quite annoying to search something like "$food near Massachusetts, USA" and get a list of stuff in Mexico one week and Canada the next.


Definitely a welcome change, low quality auto-translated results are one of the things that has become a major issue when trying to find anything in my native language. For a lot of searches, literally most of the results end up being an incoherent mess of spammy auto-translated links.


Finally!

Note that the problem with auto-translated StackOverflow clones in Russia is so severe that browser extensions and adblock lists were created just for this purpose. E.g. https://github.com/Nebula-Mechanica/Anti-AutoTranslation-Lis...

And this kind of spam is one of the reasons why I switched from Google to DuckDuckGo for web search.


Yea, that's exactly what I'm experiencing at the moment. I created translations for my website with DeepL. One language I manually corrected, the other I left as it is. You can guess which one is ranking really well and bringing in lots of customers. The automatic translation basically didn't bring any customers at all.

Now I'm hiring translators from upwork to improve the DeepL translations. I pay around $15 per hour. You can go even cheaper too if you want a translation to a language that is spoken in developing countries.

It's about 50% cheaper to have an existing DeepL translation and asking the translators to proof-read as opposed to having them translate from scratch (even though I wouldve thought that they'd base their translation on an automatic translation first anyways).


i think it would be interesting to translate 98% of any websites to the native speakers tongue (the person viewing the content) but leave the remaining 2% as the original language (of the website author) the reason would be to eventually have everyone understood some key words in each the other's languages, its a wild concept. eventually the internet will have 1 main mashed up universal tongue


So google will penalize you for using their own google translations? And then use humans for it. This is on a blog post of a human translation service. I wonder how true this is given the incentives of the source.

How does google machine know that we are using humans versus machine translation?


If I were them, I would keep some kind of hash of any stuff they have translated, so that when they index it they know if its the output of their own translator.

They probably also have ML models to try to detect machine translations too - ifa human can pick up a machine translation easily, a machine can probably be trained to detect it.

I can see why they want to do that too - multilingual data is very useful for machine language understanding (the computer effectively has two independent ways to understand whats being said), but contaminating the data with machine translations makes it nearly useless.


>If I were them, I would keep some kind of hash of any stuff they have translated, so that when they index it they know if its the output of their own translator.

that would be trivial to work around


Then just don't use Google translator. There are some that are better quality anyway.


> The best practice as of 2021 is to take your existing, best-performing content and get it professionally translated or re-written.

Yeah because we've all got money to burn on hiring translators. Another example of small websites being eradicated from the [discoverable] Web.


I don't really understand this complaint. You are running a small website, and you want it to be promoted to users in another language, but you don't want to spend any resources in translating or localizing the content?


Maybe I don't have the spare capital as I'm a fledgling start-up, but still want to be able to reach as many people as possible?


But do the people have any interest in seeing text that is machine translated without editing? MT is good, but you can still tell that test has gone through it; for me, reading it when I haven’t triggered the translating yourself elicits the same reaction as spam emails with obvious typos.


Maybe you should focus on targeting users who you can write high-quality content for, or invest in quality translations. Don't use machine translation as a crutch.


Machine translation (and transcription) is better than nothing--especially for personal use. But it's pretty mediocre at best. So if you publish it as finished material I'd expect it to be treated like any other low quality content. You don't get some special pass because you're a "small website."


I have to admit that I have never seen a small website using auto-translated content. I see that more frequently on the content mills that noone wants, on Microsoft documentation (oh the horrors!) and most recently on various shopping websites that want to show you offers from other countries (ebay fails terribly at this). I don't think small website owners are in trouble, the big ones are.


> I see that more frequently on the content mills that noone wants

Some of those Wikipedia-republishing SEO spam sites seem to take the English Wikipedia and then auto-translate that into other languages instead of directly mirroring each country's native Wikipedia. At least they're good for a laugh because whatever translation provider they're using still tends towards those hilariously literal translations that no longer happen that frequently with Google.


Cheer Please vacate leave your location site almost it had been as it is. MT outputs aren't as good as it might look when applied twice, and often needs to be compared against original text to make out intention. They also "sound" robotic and creepy.

Once I've encountered a bunch of product advertised as "applies delight scam". Make a guess at it.


Thanks for your comment. I like it very much. It seems strange to me that people think that machine translated text is perfectly acceptable. Maybe people will appreciate it more after reading this post.


Users are sick of scale at all costs. Global scale with crappy service isn’t a birthright.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: