What's your view on cases like Renaissance Technologies? There's an interview online on James Simons (https://www.youtube.com/watch?v=QNznD9hMEh0) - and he explicitly talked about using math models to detect anomalies, e.g. trends. It's also known that they've used Hidden Markov Models, at least in the early days.
Rentec uses machine learning, but more importantly the firm curates massive amounts of high-signal data. The most significant part of their work lies in process automation and the rapid testing of hypotheses, which empowers the optimal use of mind-numbing amounts of data that most other firms simply can't take advantage of. Very early on their success was due (in part) to the willingness of Simons et al to use correlations in disparate datasets which could be proven but which didn't really make sense, and which wouldn't be explained by anything intuitive.
In other words, Rentec is not just pointing machine learning models at data, they're investing in a very robust data processing pipeline. Everything before the analysis is just as important as the analysis at funds like theirs.
Just want to second this comment, their data processes are the key strength of Medallion. Grandparent comment by murbard2 also talks about the importance of this component to quant work (in the last paragraph: "finding new data feeds that provide valuable information")
While Jim Simons is a mathematician and Rentec clearly has hired many brilliant people with PhDs, it's maybe worth mentioning the actual mathematics being used in their work isn't super high level difficult, impossible or secretive. Many of the PhD's working there do not have a PhD in math, but rather something like Physics, so I would say if you are familiar with graduate level math courses you can understand the math needed for this type of work. Math isn't where their edge comes from. Also Medallion is 30 years old, their early work in the mid 80s was done on computers with less processing power than your phone, "Machine Learning" as the term is being used lately or access to supercomputing hardware no one else knows about is also not where their edge came from.
Well said. Where most funds have the same problem 'chollida1 describes here[1], Rentec (and other similar firms) moved past that by establishing the right culture and investing in the right technology from the outset.
They need smart people, but hiring the smartest people and having the most sophisticated models won't do you any good if you can't acquire high signal data, can't clean that data properly and can't rapidly backtest. And if you can't do any of that, adding more data is just going to add more noise.
Given your background, I'd be interested in picking your brain a bit for a few projects I'm working on. If you're looking to remain anonymous would you mind sending me an email (in my profile), or throwing an email up in yours?
Is there really that much high-signal data that is not already being used? Is their advantage in finding signals in data that other people overlook, or finding new data sources?
> Is there really that much high-signal data that is not already being used?
Yes.
> Is their advantage in finding signals in data that other people overlook, or finding new data sources?
Yes.
I won't go into any particular detail, but there is a lot of signal in the market for those who are imaginative. The obvious and low hanging fruit is long gone, but there are still many places that offer an edge.
I wish there was some sort of full intro tutorial on finding strategies; ie: an example of a former signal (now traded away), the thought process, the data sourcing, statistical analysis, trading/signal strategy, etc..
The thing is that no one is really motivated to make a complete tutorial on finding strategies because it's economically irrational. You're either giving away specific sources of alpha or you're empowering potential competitors. This is why it's virtually guaranteed that anyone selling courses that teach trading or related skills is a fraud - they have essentially no incentive to just ramp up their own trading capital instead.
The industry is also extremely secretive (necessarily so). You'll hardly ever find a good treatise on finding novel signals, but there are tutorials on algo trading in general with examples of production strategies that used to work which have been, as you say, traded away. For that purpose I'd recommend you start here: http://www.decal.org/file/2945
It's public (technically it has to be in order to be strictly legal for use). But for the most part it is unintuitive, unclean (needs to be heavily normalized) and not easily accessible. There are a variety of vendors that source it, clean it and analyze it to make it salable to firms. Quantitative firms also have teams devoted to doing all of that internally.
I like the responses to this already. But I'll add that there's a difference between what I loving call throwing poop at the wall, and using machine learning to estimate non-linear functions of structural models or combining signals that already have alpha.
ML can be very useful if you have some signal or if you have a model.