Architecturally this is great. Introducing more infra in an ML stack is a huge pain in the short and long run and I really love that this doesn't do that (or at least, the new infra are things I understand).
In the past I've always opted for a feature store as a library that operates over an existing database/data warehouse/data lake in the offline case, and computing features on the fly in the online case. The internal feature cache for scaling an online service is nicely implemented here using Redis. Bravo, that's probably how I would do it too.
My one bit of feedback is the API. The code just doesn't look nice, out the gate there's a bunch of objects and methods I don't immediately understand the need for. I'm sure they're useful, but for starting out I'd expect a lot more from that interface. I'd suggest something higher level that looks pretty and is easy to understand. That would be my one hesitation.
this catches my eye because it highlights an industry mindset shift to come:
"[...] the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery"
data scientists study the science of what the business does (laundry delivery, manufacturing TVs, tracking patient health), and the point of science is insight and understanding from data to build a theory of how it all works
what this article highlights is that ML can be an exceptional tool for discovery. this is in stark contrast to how ML is usually deployed, which is some big analytics or product effort. the obvious big reason for that is the infra is expensive, the know-how is lacking, and the data sucks. well, that all is quickly changing and we're gonna see folks weaving in ML to bolster their workflows in a much bigger way
great to see academic scientists leading the charge here too. they stand to gain a lot from that perspective
Beautiful. So many annotation tools focus on "text classification" which assumes you've already got segmented samples. In the real world of documents that's a whole challenge in itself.
Another challenge is that sometimes you're working with PDFs and that means not only ingesting but also displaying. The difficulty is in keeping track of annotations and predictions across the PDF<->text string boundary, both ways.
There are understandably even fewer solutions to that problem because it's a harder UI to build.
Much appreciated! That's true, and lots of the tools that do feature text annotation can be quite restrictive in that they don't allow you to add attributes / repeatedly annotate the same span of text.
Support for PDFs and other doc types is definitely on the backlog, but I keep holding off due to the challenges you mentioned.
LOL. let the car drive itself, but give up music control? over my dead body!
it's funny to me because you'd expect music to be lower stakes and such, but it just highlights that actually driving a car is a much more well defined problem than picking the music i like
i think i believe that. what is the easiest thought abstraction that can be captured by our sensors? well the abstraction is largely defined by the UI. i like to think of it like language. UI components (words) come together to enable complex actions (sentences or thoughts). it evokes questions, like what language does the brain speak in certain contexts for certain outputs? that's gonna be interesting to follow. what if we all think super differently and that makes it hard? i can't imagine why, but i don't have a background in real brains
it may be that our current "AI" tools might be helpful--they're really good at composing "languages" for tying together different types of data. seems that tying noisy brain sensors data to our English alphabet might be an example of that.