Hacker Newsnew | past | comments | ask | show | jobs | submit | topologix's commentslogin

I’d happily pay for it to support the devs. I like hypr


I've been using https://www.kiwiforgmail.com/ and really like it.


One quick edit to this description : We (Ayasdi) have generalized the notion of Reeb Graph's - such that it is no longer limited to single scalar functions. While in the single scalar function the mapper algorithm is an (extremely efficient) approximation to the Reeb Graph, in the multiple scalar function case, it has no direct theoretical analogue (although the notion of Reeb Spaces is similar).

We are generally not trying to lay claim to the phrase "Topological Data Analysis" and not going around suing people for using it. In fact we still support research in academia and actively publish in the field. TDA is the basis of what we do so it is the most efficient way of describing it.


Hey HN folks - I am the co-founder and CEO of Ayasdi. If you have questions about the math/CS aspects of this, happy to answer.


Do you recommend any good primers on topology? I thought this (https://colah.github.io/posts/2014-03-NN-Manifolds-Topology/) was an interesting article and I see what looks like some great papers and videos available at http://www.ayasdi.com/approach/data-scientist/, but I don't know the difference between homotopy and homology (yet) :) .

What kinds of infrastructure/tech do you think will have the most utility for topological data analysis in the near future? E.g., GPUs, Apache Spark, FPGAs, etc.

Any thoughts on an Ayasdi public offering? I'd like to consider investing but I don't have millions of dollars (yet) :) .

Thanks for your time.


Hey,

Some reading material: A very general blog about philosophy : http://radar.oreilly.com/2015/07/data-has-a-shape.html

		A slightly more in-depth blog : https://shapeofdata.wordpress.com/2013/08/27/mapper-and-the-choice-of-scale/

		A very accessible book about topology (especially from an algorithms perspective) : http://www.amazon.com/Computing-Cambridge-Monographs-Computational-Mathematics/dp/0521136091/ref=sr_1_1?ie=UTF8&qid=1444971634&sr=8-1&keywords=topology+for+computing

		Blog exposing persistent homology : https://normaldeviate.wordpress.com/2012/07/01/topological-data-analysis/

		Videos exposing persistent homology : 
			https://www.youtube.com/watch?v=CKfUzmznd9g
			https://www.youtube.com/watch?v=CKfUzmznd9g

	Some free software:
		Python Mapper by Daniel Müllner : http://danifold.net/mapper/index.html

		JPlex library by Harlan Sexton : http://www.math.colostate.edu/~adams/jplex/index.html

		Dionysus by Dimitriy Morozov : http://www.mrzv.org/software/dionysus/

		Topological Data Analysis in R : https://cran.r-project.org/web/packages/TDA/vignettes/article.pdf

	Infrastructure
		Our tech stack is:
			Backend
				HDFS for storage
				Our ML and Math code is hand-rolled C++ and Assembly(7% LOC)
				All coordination/distributed systems code is in Java
				ZMQ for communication
				Protocol Buffers for protocol
			Frontend
				D3
				Backbone
				Hand-rolled webGL graph visualization (we open sourced it at https://github.com/ayasdi/grapher)

		We currently don't use GPUs or any other fancy hardware primarily because today, our customers use commodity hardware and getting F1000 companies to buy cutting-edge hardware is just plain horrible.

		We have an awesome GPU rig at our offices that we test algorithms on and it can really make our algorithms scream, but again, none of our customers have/are willing to invest in GPUs.

		Apache Spark - it is interesting that in our experience, making it work for ML algorithms is really too much work unless you invest the time to understand the framework and its fundamentals. It performs very well for ETL type tasks, which is what we use it for.

	On a public offering: no comment :)

	If you have more questions - I am easy to find :)
Gurjeet


I'd love to read a couple of journal articles that you recommend to learn about TDA. I do large scale data analysis on health care data at my university and am always on the look-out for interesting techniques.


Gunnar wrote a review article a few years ago called Topology and Data (http://www.ams.org/journals/bull/2009-46-02/S0273-0979-09-01...). It is an amazingly well written and accessible paper for a technical audience.

Pair it with Afra's book (http://www.amazon.com/Computing-Cambridge-Monographs-Computa...)


Thank you!


Afra Zomorodian has a pretty accessible book called 'Topology for Computing'.

You can also play around with OSS such as pymapper, jplex etc.


Disclaimer: I have written a few academic papers based on persistent homology and co-founded a company which uses it.

Persistent Homology was invented to deal with noise (even though nothing that deals with data is ever IMMUNE to noise). The basic idea is to pick out/discern the topological features (betti numbers) which persist over a range of one or more parameters. Let's take the simple case of single parameter persistence (call it epsilon). Say that we are given a set of N points equipped with a distance function (i.e. given any two points, we can compute a distance between them (http://en.wikipedia.org/wiki/Distance)). Now, construct a structure comprised of sets of varying lenghts (a set with a single point in it is called a vertex or a 0-simplex, a set with two points in it is called an edge or a 1-simplex, a set with three points in it is called a triangle or 2-simplex and so on.). Given a fixed epsilon, we will: 1. draw an edge (1-simplex) between all pairs of points which are within epsilon of each other. 2. draw a triangle (2-simplex) comprising of all triples of points which are within epsilon of each other (note that three points can have edges between all pairs without 'filling out' the triangle) 3. draw a tetrahedron (3-simplex) comprising of all sets of four points which are within epsilon of each other (remember the note from the previous point) 4. and so on..

Now given this set of simplices for a fixed epsilon, we can compute the number of holes of various dimensions, this gives us a fixed set of betti numbers.

Persistent Homology allows one to study the evolution of this complex as epsilon increases.

The trick about noise : if the features are 'short lived' (i.e. they existed for a short range of epsilon), they are likely noisy. The reason why persistent homology is great is because it identifies the topological features and produces a measure for how long they survive.

I made an example video showing persistence homology in action for a simple 3D dataset (sampled from a torus). Check it out here: https://www.youtube.com/watch?v=CKfUzmznd9g Notice that in this video there are three long lines in the left frame. The first corresponds to betti-0 (there is a single connected component). The second two correspond to betti-1 (there are two loops on a torus). The third corresponds to betti-2 (there is a singe empty space within the torus).


I find that fascinating. Since you seem to know what you are talking about, can you name a good introductory book on the topic for someone who doesn't have an education in higher-level mathematics besides what's in a standard CS curriculum (I am a PhD student in machine learning/information retrieval)?

EDIT: I just realized that it was covered by the blog post. I am a complete moron. Sorry.


I recommend Afra's book:

http://www.amazon.com/Computing-Cambridge-Monographs-Computa...

Happy to help if you need it!


I like "Computational topology" by Edelsbrunner and Harer. They start with some basic graph theory and build on that to give a solid overview of algebraic topology, persistent homology, and even some Morse theory.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: