Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Curation doesn't scale and often assumes a single unifying ontology

Wikipedia is a pretty big exception to that assertion. Perhaps DMOZ (a clone of Yahoo circa 1996) is not the only way to do curation. Perhaps Wikipedia could apply what has worked for Wikipedia, i.e. develop a set of POV-neutral criteria for organizing collections of links and then invite everyone to participate.

It's really easy to be negative. But that's something that might at least be an interesting research project for the #1 open-curation system in the world.



You make a fair point. I'm not rubbishing Wikipedia, just questioning the supposed USP. I would also point out in response to your argument that a Wikipedia article and a set of search results are apples and oranges.

The article is written once then modified or evolved occasionally by (almost exclusively) humans, but very frequently read. It is intended to be intelligible, being structured and based in natural language. It has a very well defined scope within a flat namespace, and often clear relations to multiple formal ontologies. It is structured to be consumed in part or in whole, and may contain rich media and strong supporting contextual information (related pages).

By contrast a search result summarizes a set of potential information sources that may answer a search query in whole or in part, to various definitions of "answer". It is generally written once, by a computer, and thrown away after some period of caching. It is intended to be concise. Each component result has relatively poor context, relying upon the searcher to interpret timeliness, authority, notability, uniqueness, comprehensibility, etc. with the limited information presented, typically a very short content excerpt. It is structured to be scanned, classically in a ranked fashion from "best hit" to "worst hit", and is generally a wall of text.

Wikipedia successfully attracts people to contribute to the former, but the latter - where the information product is generated on the fly and lasting impact is amorphous (nothing particularly concrete for contributors to point to and say "I did that! Warm and fuzzies!") - is a very different beast.

I too believe there is room for innovation ... there are potentially low hanging fruit like inter-linguistic semantic queries (not keyword search) ... but there are no such key problem areas identified in the paper's summary.


The other big problem is that curating search results is inherently about prioritising a position rather than establishing a sourced and reasonably neutral version of the truth.

I'm imagining the edit wars and debates that take place on contentious wordings or facts in some parts of Wikipedia, but on a much wider scale involving hundreds of SEO consultants each aware that changing a particular criterion will have a quantifiable impact on their clients' bottom line. It doesn't sound like it would be fun to police.


Wikipedia already curates links to some extent on every page under "External Links". So there is a seed there.

And even the page text is not immune from the problem you describe. Grading and prioritizing sources is a fundamental part of producing a "reasonably neutral version of the truth." It's what determines what gets cited and how prominently it influences the article.

So while I wouldn't equate text and links in terms of the difficulty of managing POV-neutrality, I would say they sit on a spectrum.


There was a remark recently that most Jeopardy answers are Wikipedia titles. Consider Wikipedia as an ontology, with Wikipedia titles as the vocabulary. A search engine could associate articles with relevant Wikipedia titles, and try to do the same with queries. The first step of search is then relatively straightforward.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: