Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"Our feeling on interacting with the semantic web community is that innovation - especially when it conflicts with core ideology - is not welcome."

I wasn't a big fan of the "semantic web" community when it first came out, and the years have only deepened my disrespect, if not outright contempt. The entire argument was "Semantic web will do this and that and the other thing!"

"OK, how exactly will it accomplish this?"

"It would be really cool if it did! Think about what it would enable!"

"OK, fine, but how will this actually work!"

"Graph structures! RDF!"

"Yes, that's a data format. What about the algorithms? How are you going to solve the core problem, which is that nobody can agree on what ontology to apply to data at global scale, and there isn't even a hint of how to solve this problem?"

"So many questions. You must be a bad developer! It would be so cool if this worked, so it'll work!"

There has always been this vacuousness in the claims, where they've got a somewhat clear idea of where they want to go, but if you ever try to poke down even one layer deeper into how it's going to be solved, you get either A: insulted B1: claims that it's already solved just go use this solution (even though it is clearly not already solved since the semantic web promises are still promises and not manifested reality) B2: claims it's already solved and the semantic web is already huge (even though the only examples some using this can cite are trivial compared to the grand promises and the "semantic web" components borderline irrelevant, most frequently citing "those google boxes that pop up for sites in search results" just like this article does despite the fact they're wafer-thin compared to the Semantic Web promises and barely use any "Semantic Web" tech at all) or C: a simple reiteration of the top-level promises, almost as if the person making this response simply doesn't fundamentally grasp that the ideals need to manifest in real code and real data to work.

This article does nothing to dispel my beliefs about it. The second sentence says it all. For the rest, while just zooming in to the reality may be momentarily impressive, compared to the promises made it is nothing.

The whole thing was structured backwards anyhow. I'd analogize the "semantic web" effort to creating a programming language syntax definition, but failing to create the compiler, the runtime, the standard library, or the community. Sure, it's non-trivial forward progress, but it wasn't really the hard part. The real problem for semantic web and their community is the shared ontology; solve that and the rest would mostly fall into place. The problem is... that's an unsolvable problem. Unsurprisingly, a community and tech all centered around an unsolvable problem haven't been that productive.

A fun exercise (which I seriously recommend if you think this is solvable, let alone easy) is to just consider how to label a work with its author. Or its primary author and secondary authors... or the author, and the subsequent author of the second edition... or, what exactly is an authored work anyhow? And how exactly do we identify an author... consider two people with identical names/titles, for instance. If we have a "primary author" field, do we always have to declare a primary author? If it's optional, how often can you expect a non-expert bulk adding author information in to get it correct? (How would such a person necessarily even know how to pick the "primary author" out of four alphabetically-ordered citations on a paper?)

(I am aware of the fact there are various official solutions to these problems in various domains... the fact that there are various solutions is exactly my point. Even this simple issue is not agreed upon, context-dependent, it's AI-complete to translate between the various schema, and if you speak to an expert using any of them you could get an earful about their deficiencies.)



Yes. I had pretty much this conversation a while back with some non-technically minded people who had been convinced that by creating an ontology and set of "semantic business rules" - a lot of the writing of actual code could be automated away, leaving the business team to just create rules in a language almost like English and have the machine execute those English-like rules.

I had to explain that they were basically on track to re-implementing COBOL.


It's not what you have to do, or how, it's that for the first time we have a common model for data interchange (RDF) with which you can model concepts and things in your domain, or more-importantly across domains, and simply merge the datasets. Try that with the relational model or JSON. Integration is the main value proposal of RDF today, nobody sane is trying to build a single global ontology of the world .

You can despise the fringe academic research, but how do you explain Knowledge Graph use by FAANG (including powering Alexa and Siri) as well as a number of Fortune 500 companies? Here are the companies looking for SPARQL (RDF query language) developers: http://sparql.club


Many of us who have been in these battles over the decades have decided that the interchange format is almost irrelevant to the real challenge which is the modeling and semantic alignment. It's a useless parlour trick to merge graphs and call them integrated, approximately as it is to put several CSV files into an archive or simply loading unrelated tables into one RDBMS. Yes, you can run a processing engine on the amalgamation, but all the work remains to do in establishing a federating model within your query or processing instructions.

Over and over, we see that the real world problem is gated on human effort to negotiate about the models and to do data cleaning and transformation. And, the best results almost always require that modeling, cleaning, and transformation be done with an eye towards a specific downstream consumer or analysis. We get tired of having to steer leadership back to reality after they buy into the snake oil suggestion that integration costs can be avoided and unknown applications solved.

The claim that RDF solves federation more than any other serialization format for structured data and models is about the same as fixating on JSON versus XML or YAML or Lisp s-expressions.


Relational model, XML/JSON etc. simply do not have a generic merge operation defined the same way as RDF does. This can be proved with pen and paper.

And you still haven't addressed my second point about widespread industry use. It seems that SemWeb haters/sceptics always try to avoid this, why could that be?..


"simply do not have a generic merge operation defined the same way as RDF does."

Who cares? This is not a problem anyone has, which is precisely why so few formats have a solution.

"widespread industry use"

It's not in "widespread" use. It's in niche use, and it's been in niche use for about two decades, and shows no sign of escaping that niche.

Human perception is a bit broken here. You show a list of 100 users and it looks like a tech is in "widespread use"... because you don't intuit that the market has hundreds of thousands of users, if not millions. (I'm being conservative. It's almost certainly millions.) RDF is niche. You can comfortably read an effectively-complete list of users over a coffee break. Try that trick with JSON.

Also, to be honest, referring to "haters" rather proves my point about just how quickly insults get trotted out. You almost literally just said "RDF!" with no further substantive conversation exactly the way I mentioned! I know about RDF. I used it ~2005 when working on some Mozilla stuff. It had every opportunity to overtake JSON, and was never in any danger of it.

In fact my current job for the last few weeks has been working on a massively cross-team data lake in the company I work for... and nobody is talking about RDF. Not me (and I do know it, actually), not any vendor that might provide useful technology, not any vendor that consumes data to provide reports on it (nobody consumes RDF in this space), nobody. Nominally a core use case for "semanticness", and it's a complete non-starter.


Yes RDF is in its own niche -- data interchange. And that's where merge matters, when you for example need to merge protein data with genes and drugs etc. A bunch of pharma companies are using RDF Knowledge Graphs for that purpose. The need for data interchange comes with a certain company size, and that point RDF becomes the solution because there are no real alternatives.

I'm not talking about replacing JSON with RDF. Don't need data interchange -- don't use RDF. RDF is both at a different level of abstraction and solving problems of different scope.


> merge protein data with genes and drugs

Could you perhaps recommend some industry case studies or publications on that specific problem area of biopharmaceuticals?


This is one recent meta-study: https://www.nature.com/articles/s41597-021-00797-y

One of the main datasources is uniprot.org.

I know for a fact that AstraZeneca, Novo Nordisk, Novartis, Roche, Boehringer Ingelheim are all using RDF Knowledge Graphs, and there are probably many others. It would take some time to find the references though.

Check out our company page, maybe we can help ;) https://atomgraph.com/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: