Hey HN folks - I am the co-founder and CEO of Ayasdi. If you have questions abou...

steamer25 · on Oct 16, 2015

Do you recommend any good primers on topology? I thought this (https://colah.github.io/posts/2014-03-NN-Manifolds-Topology/) was an interesting article and I see what looks like some great papers and videos available at http://www.ayasdi.com/approach/data-scientist/, but I don't know the difference between homotopy and homology (yet) :) .

What kinds of infrastructure/tech do you think will have the most utility for topological data analysis in the near future? E.g., GPUs, Apache Spark, FPGAs, etc.

Any thoughts on an Ayasdi public offering? I'd like to consider investing but I don't have millions of dollars (yet) :) .

Thanks for your time.

topologix · on Oct 16, 2015

Hey,

Some reading material: A very general blog about philosophy : http://radar.oreilly.com/2015/07/data-has-a-shape.html

		A slightly more in-depth blog : https://shapeofdata.wordpress.com/2013/08/27/mapper-and-the-choice-of-scale/

		A very accessible book about topology (especially from an algorithms perspective) : http://www.amazon.com/Computing-Cambridge-Monographs-Computational-Mathematics/dp/0521136091/ref=sr_1_1?ie=UTF8&qid=1444971634&sr=8-1&keywords=topology+for+computing

		Blog exposing persistent homology : https://normaldeviate.wordpress.com/2012/07/01/topological-data-analysis/

		Videos exposing persistent homology : 
			https://www.youtube.com/watch?v=CKfUzmznd9g
			https://www.youtube.com/watch?v=CKfUzmznd9g

	Some free software:
		Python Mapper by Daniel Müllner : http://danifold.net/mapper/index.html

		JPlex library by Harlan Sexton : http://www.math.colostate.edu/~adams/jplex/index.html

		Dionysus by Dimitriy Morozov : http://www.mrzv.org/software/dionysus/

		Topological Data Analysis in R : https://cran.r-project.org/web/packages/TDA/vignettes/article.pdf

	Infrastructure
		Our tech stack is:
			Backend
				HDFS for storage
				Our ML and Math code is hand-rolled C++ and Assembly(7% LOC)
				All coordination/distributed systems code is in Java
				ZMQ for communication
				Protocol Buffers for protocol
			Frontend
				D3
				Backbone
				Hand-rolled webGL graph visualization (we open sourced it at https://github.com/ayasdi/grapher)

		We currently don't use GPUs or any other fancy hardware primarily because today, our customers use commodity hardware and getting F1000 companies to buy cutting-edge hardware is just plain horrible.

		We have an awesome GPU rig at our offices that we test algorithms on and it can really make our algorithms scream, but again, none of our customers have/are willing to invest in GPUs.

		Apache Spark - it is interesting that in our experience, making it work for ML algorithms is really too much work unless you invest the time to understand the framework and its fundamentals. It performs very well for ETL type tasks, which is what we use it for.

	On a public offering: no comment :)

	If you have more questions - I am easy to find :)

Gurjeet

pvnick · on Oct 16, 2015

I'd love to read a couple of journal articles that you recommend to learn about TDA. I do large scale data analysis on health care data at my university and am always on the look-out for interesting techniques.

topologix · on Oct 16, 2015

Gunnar wrote a review article a few years ago called Topology and Data (http://www.ams.org/journals/bull/2009-46-02/S0273-0979-09-01...). It is an amazingly well written and accessible paper for a technical audience.

Pair it with Afra's book (http://www.amazon.com/Computing-Cambridge-Monographs-Computa...)

pvnick · on Oct 16, 2015

Thank you!