Very nice overview. I am just in the final stages of a Masters thesis in data visualization, and this article gives a really good bird's eye view of the field. The visualization field is really too broad that most programmers could be expected to know more than some key points, but given that vision is the highest-bandwidth sense, visual techniques are often given less credit than they deserve. As long as there needs to be a human in the loop, you need good visualizations if your data is more than trivial. D3 is probably good for its domain, but intuition tells me you'll have a problem if you mainly use Javascript to handle a 20GB dataset. (I'm not dismissing this categorically; I am not very familiar with these tools).
Unfortunately, to my knowledge there aren't any comprehensive textbooks that cover visualization from the ground up. We didn't use a single textbook in my 2-year degree; all lectures were heavily based on research papers. Central topics if you want to read up on this is perception (which color scales should you use? how many parameters can you plausibly put in one plot?), different visualization techniques for different data (scatterplots, histograms, treemaps, horizon graphs, volume rendering, graph drawing with edge bundling, +++), interactivity and applications of basic techniques (Visual Analytics, Interactive Visual Analysis).
A multitude of scientific fields use different visualization tools, so it can be tricky to find the relevant material for whatever it is you're working with. But in general, I think the data mining/big data/analytics fields could do very well with a bigger focus on visual techniques. If you get the right visualizations for your data, the truth often just jumps out of the screen. GPUs can let you work with multi-gigabyte datasets at interactive framerates, although I haven't seen a lot of practical applications of this yet. Can also be used for non-spatial data, if you're clever with CUDA or just use the shader data structures creatively. Would be interesting to hear if anyone in the industry uses this yet.
> D3 is probably good for its domain, but intuition tells me you'll have a problem if you mainly use Javascript to handle a 20GB dataset. (I'm not dismissing this categorically; I am not very familiar with these tools).
I use d3 on the client with a 80GB (currently) dataset, by putting the dataset in elasticsearch. It's a pretty fantastic combination. You can do multi-value aggregation from unstructured data, or geo-spacial searches, or lightning-quick full text search.
The server has 8GB of ram and 2 cores, and with about 1.2 million new documents every hour, barely breaks a sweat.
What type of queries do you run on elasticsearch to pull into D3? I'm doing a very similar project (elasticsearch + web data vis) so I'm legitimately curious.
Basically I'm importing logs and system events. I run queries like show me the top 10 events over the last 24 hours from this source that were marked critical. Or for each farm shipping web logs, aggregate on the hosts, and then aggregate on the status code, and then give me the number of documents in each summed for each hour of the day.
At the moment I roll up daily stats and store them in a separate database for longitudinal analysis, but eventually I'd like to ship data that is more than a couple of weeks old to hadoop.
> D3 is probably good for its domain, but intuition tells me you'll have a problem if you mainly use Javascript to handle a 20GB dataset. (I'm not dismissing this categorically; I am not very familiar with these tools).
You can use server side javascript which should handle 20GB dataset without problems.
What I've found is that nowadays most of the resources on datavis are not about scientific/continuous functions but about categorical/statistical visualization.
I've added a few (well many) links at the bottom of the article but if you have any suggestions on resources/software/etc please let me know
Unfortunately, to my knowledge there aren't any comprehensive textbooks that cover visualization from the ground up. We didn't use a single textbook in my 2-year degree; all lectures were heavily based on research papers. Central topics if you want to read up on this is perception (which color scales should you use? how many parameters can you plausibly put in one plot?), different visualization techniques for different data (scatterplots, histograms, treemaps, horizon graphs, volume rendering, graph drawing with edge bundling, +++), interactivity and applications of basic techniques (Visual Analytics, Interactive Visual Analysis).
A multitude of scientific fields use different visualization tools, so it can be tricky to find the relevant material for whatever it is you're working with. But in general, I think the data mining/big data/analytics fields could do very well with a bigger focus on visual techniques. If you get the right visualizations for your data, the truth often just jumps out of the screen. GPUs can let you work with multi-gigabyte datasets at interactive framerates, although I haven't seen a lot of practical applications of this yet. Can also be used for non-spatial data, if you're clever with CUDA or just use the shader data structures creatively. Would be interesting to hear if anyone in the industry uses this yet.