What kinds of infrastructure/tech do you think will have the most utility for topological data analysis in the near future? E.g., GPUs, Apache Spark, FPGAs, etc.
Any thoughts on an Ayasdi public offering? I'd like to consider investing but I don't have millions of dollars (yet) :) .
A slightly more in-depth blog : https://shapeofdata.wordpress.com/2013/08/27/mapper-and-the-choice-of-scale/
A very accessible book about topology (especially from an algorithms perspective) : http://www.amazon.com/Computing-Cambridge-Monographs-Computational-Mathematics/dp/0521136091/ref=sr_1_1?ie=UTF8&qid=1444971634&sr=8-1&keywords=topology+for+computing
Blog exposing persistent homology : https://normaldeviate.wordpress.com/2012/07/01/topological-data-analysis/
Videos exposing persistent homology :
https://www.youtube.com/watch?v=CKfUzmznd9g
https://www.youtube.com/watch?v=CKfUzmznd9g
Some free software:
Python Mapper by Daniel Müllner : http://danifold.net/mapper/index.html
JPlex library by Harlan Sexton : http://www.math.colostate.edu/~adams/jplex/index.html
Dionysus by Dimitriy Morozov : http://www.mrzv.org/software/dionysus/
Topological Data Analysis in R : https://cran.r-project.org/web/packages/TDA/vignettes/article.pdf
Infrastructure
Our tech stack is:
Backend
HDFS for storage
Our ML and Math code is hand-rolled C++ and Assembly(7% LOC)
All coordination/distributed systems code is in Java
ZMQ for communication
Protocol Buffers for protocol
Frontend
D3
Backbone
Hand-rolled webGL graph visualization (we open sourced it at https://github.com/ayasdi/grapher)
We currently don't use GPUs or any other fancy hardware primarily because today, our customers use commodity hardware and getting F1000 companies to buy cutting-edge hardware is just plain horrible.
We have an awesome GPU rig at our offices that we test algorithms on and it can really make our algorithms scream, but again, none of our customers have/are willing to invest in GPUs.
Apache Spark - it is interesting that in our experience, making it work for ML algorithms is really too much work unless you invest the time to understand the framework and its fundamentals. It performs very well for ETL type tasks, which is what we use it for.
On a public offering: no comment :)
If you have more questions - I am easy to find :)
I'd love to read a couple of journal articles that you recommend to learn about TDA. I do large scale data analysis on health care data at my university and am always on the look-out for interesting techniques.