Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

http://www.postgresql.org/docs/9.2/static/datatype-json.html

> Such data can also be stored as text, but the json data type has the advantage of checking that each stored value is a valid JSON value. There are also related support functions available



That doesn't answer my question. Storing structured data in any single-field, be it checked or unchecked, violates the first normal form, if you're going to be querying directly on sub-fields of that field. If the json document is just getting exported out to the client without any inspection of the document from the SQL side, then it's fine and I see the point of having a JSON type that can validate the data. But to extend the SQL syntax to allow for querying sub-fields of that data is the wrong solution to the problem.

SQL Server has had this feature for quite some time as well, in the ability to store and query on XML documents. It's awful. The few times that it has "saved the day" on certain queries just because the data was already stored as XML, it turned out to be a false profit and became a serious issue down the road.


> That doesn't answer my question. Storing structured data in any single-field, be it checked or unchecked, violates the first normal form...

This seems to be one of the most common mistakes regarding understanding 1NF I come across.

The atomicity requirement doesn't say anything about structured text, or internally atomic data types. That a data type could be further decomposed doesn't impact first normal form at all. Otherwise, using a data type like DATETIME would be a total no-no. Neither does storing arrays, and so forth.

What 1NF requires is that for the domain the value is atomic. If I store in a text field "192.168.1.1/24" that is structured text but it is still atomic. It may include both a netmask and an IP address, but the data taken as a whole represents one domain value. Similarly storing '1-509-555-1212" in a text field contains an area code number, an exchange number, an end terminal number, and a country-code designation, but it represents a single phone number and that's all that is required. I could represent the phone number as ARRAY[1,5,0,9,5,5,5,1,2,1,2] and it would still be atomic as long as the array was properly ordered.

What I am getting at is that all 1NF requires is that the item represents a single value in a domain.

There's no reason why JSON would necessarily violate that iether. Consider {ip: '192.168.1.1', netbits: '24'}

That's no different from storing '192.168.1.1/24' in a text field. And for my phone number, I could store it as {ccode: '1', areacode: '509', exchange: '555', terminal: '1212'} which again is no different from storing '1-509-555-1212' in a text field.

This isn't to say that JSON doesn't often violate 1NF, but 1NF is not as restricting as people sometimes think.


You had to trim my comment to be able to make yours. As I said, "if you're going to be querying directly on sub-fields of that field. If the json document is just getting exported out to the client without any inspection of the document from the SQL side, then it's fine".

You can argue whether or not the Normal Forms are a good idea, but you can't argue that this tool--in extending the SQL syntax to be able to give access to fields within the JSON document--is not meant for violating 1NF.


> As I said, "if you're going to be querying directly on sub-fields of that field. If the json document is just getting exported out to the client without any inspection of the document from the SQL side, then it's fine".

How is that different from, say, pulling area codes from text phone numbers, subnet masks from CIDR designations, or dates from DATETIME fields?

Where first normal form is sometimes (productively) violated is something like storing a list of tags in a varchar[] array. Since the ordering doesn't matter, it is merely a set of tags (not a field which stores a single value as a whole), and violates 1NF, but is can be a reasonable tradeoff in certain cases.


Consider a common use case of wanting to mix SQL structured data and JSON structured data:

You want a logging service that records events in an SQL table. Each event has base attributes common to all events such as date, event name, component name, user id ... these parameters become columns in the SQL table. Some events can also include arbitrary custom data particular to that kind of event, and you want that data to also be in the database but you don't want to play whack-a-mole with the table schema to make sure every possible custom parameter has a column in the SQL table, in this case just stash the custom data as JSON, and hopefully the database server offers JSON comprehensions that let you access any fields within the JSON data via SQL.


I think the question is how does that compare to the usual trick of adding a key/value/attribute table?

  create table misc_attrs (maintable_id foreign key, name text, value text)


When reading data back, it's more efficient to query a single table than to have to join against an extra attributes table

See https://news.ycombinator.com/item?id=6574269 for anecdotal evidence.


there are so many variables in that anecdote that there is nothing that can be gleaned from it.


Alright, these were approximately the fields that were Postgre-fied.

CREATE TABLE event ( "references" text[], (was 2 tables, references and reference_on_event) "entities" text[], (ditto) "actors" text[], (ditto) "targets" text[], (ditto) "payload" json, (EAV) )

Gin index on all the text arrays. There are about 150k events saved per day and PostgreSQL is running on the same kinds of hardware as SQL Server was. The application is quite read heavy.


There are times actually I find it helpful to break 1NF in PostgreSQL. This is certainly not free but it really helps some sorts of queries, and it avoids having to do a sort of pseudo-EAV to get certain things working.


> That doesn't answer my question.

Maybe not, but you significantly edited your post after I answered. I was just replying to, "Why would I use this over a nullable column?" (paraphrasing)


Sorry about that, I realized my original post was rather too whiney and I thought I got in before anyone noticed :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: