Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can you provide a "so what?" summary?


>We test three representative tasks in materials chemistry: linking dopants and host materials, cataloging metal-organic frameworks, and general composition/phase/morphology/application information extraction. Records are extracted from single sentences or entire paragraphs, and the output can be returned as simple English sentences or a more structured format such as a list of JSON objects. This approach represents a simple, accessible, and highly flexible route to obtaining large databases of structured specialized scientific knowledge extracted from research papers.


Short answer: It’s a way to generate structured databases for (most) scientific topics. Why? Apply data driven methods to these databases. So what? It’s a powerful way to ask and investigate scientific questions/trends otherwise hidden inside a million scientific papers.

Example: Consider what PDB has done for our understanding of protein folding, as well as the ML/computational techniques they’ve enabled (eg, Alphafold). Most scientific questions and properties are not as data-rich as protein folding. What if they could be?

Longer answer: The last 15 years in computational/ML + science have shown that structured databases open up entirely new frontiers in discovery (eg Protein Data Bank, Materials Project). But most scientific topics/properties are NOT in structured DBs, they’re scattered about in millions of papers. It’s especially a huge problem in some topics in materials science. It’s not that these problems are data scarce, but that it’s hard to actually collate their data in a structured format. You literally cannot use most ML methods because structured DBs do not exist.

This paper is a way to generate massive structured databases of specialized, intricate, and hierarchical knowledge graphs from scientific literature. Fine tuning works, prompt engineering does not (at the time, perhaps this has changed). Once you have a database, you can analyze an entire subfield or topic in science with ML or stats methods.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: