An analytic column store like say Vertica has a schema like a regular SQL database. I don't know what their flexible schema story is right now.
Instead of storing the columns of a row together an analytic column store will store columns from many rows together in sorted runs. When you go to do a scan your disk will only read the columns you have selected. The format for column storage is optimized for specific types and uses type specific compression so 10-50x is something that is claimed. This further improves the IO situation. They can also zero in on relevant ranges of data for each column because they are indexed and this further reduces the IO requirements.
Where other databases are bound on seeks or sequential throughput an analytic column store will be bound on CPU, especially CPU for the non-parallel portions of every query.
Obviously a column store will have a hard time selecting individual rows because the data is not stored together so it will be expensive to materialize. They also have trouble with updates/deletes to already inserted data, in some cases requiring the data be reloaded because updates have dragged everything down.
An analytic column store like say Vertica has a schema like a regular SQL database. I don't know what their flexible schema story is right now.
Instead of storing the columns of a row together an analytic column store will store columns from many rows together in sorted runs. When you go to do a scan your disk will only read the columns you have selected. The format for column storage is optimized for specific types and uses type specific compression so 10-50x is something that is claimed. This further improves the IO situation. They can also zero in on relevant ranges of data for each column because they are indexed and this further reduces the IO requirements.
Where other databases are bound on seeks or sequential throughput an analytic column store will be bound on CPU, especially CPU for the non-parallel portions of every query.
Obviously a column store will have a hard time selecting individual rows because the data is not stored together so it will be expensive to materialize. They also have trouble with updates/deletes to already inserted data, in some cases requiring the data be reloaded because updates have dragged everything down.