Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

until Postgres added JSON fields I would have retorted that nosql is really handy for scraping purposes, when you just want to grab and store the collected data, which you then parse with another specific program into a relational database later.

I've learned this by doing it the opposite way for years. however, website schemas can frequently change, which ruins the database schema if you're immediately parsing into a normalized structure. In other words, it's brittle.

noSQL (or JSON fields) allow one to store unstructured data which is more forgiving when a schema changes (I.e. change the spelling of a dictionary key, etc.).



What you need is a versioned mapping layer from the scraped data into your own schema.

The problem is when you need to update your schema to support data/relations not available in your own schema.

The problem with SQL is that schemas are so sticky and hard to change. People are always hesitant to change their schemas because schema changes themselves are difficult and then you have to update a bunch of backend code, and then your frontend code maybe does strange denormalized things, and then everything breaks.

I've been thinking about this for a while, and I think what is needed is a visual tool that can show all dependencies of a data schema element (backend, frontend), so that schema changes are easier to make.

All the layers (db access, api, cache, frontend data stores/caches/frameworks) in modern architectures make it virtually impossible to modify a schema without causing chaos. The solution is keeping the data model and query interface as close as possible throughout the entire stack. For example you should never write any manual data manipulation code (e.e. `people.map(p => p.full_name = p.firstName + p.lastName)` unless this is able to be traced through the entire system. A monorepo, typed ORM, refactoring tooling (e.g. IDE) can help, but its usually never setup well enough or integrates close enough with the db.


Well, technically you could have done this since forever in an RDBMS. Key value table, have the table be a blob (or just varchar if a string).

Then yes, you'd have to parse it yourself, but you'd be doing that anyway with JSON fields, more or less.


Indeed, you are correct. A lot of these lessons came hard-earned from doing it the wrong way first.

I thought I was being such a good little programmer by normalizing my data as soon as possible. All it did was provide me job security (at the cost of headaches) every time the schema changed.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: