> static analysis tools that produce flowcharts and diagrams like this have existed since antiquity, and I'm not seeing any new real innovation other than "letting the LLM produce it".
Inherent limitation of static analysis-only visualization tools is lack of flexibility/judgement on what should and should not be surfaced in the final visualization.
The produced visualizations look like machine code themselves. Advantage of having LLMs produce code visualizations is the judgement/common sense on the resolution things should be presented at, so they are intuitive and useful.
Although I haven't personally experienced the feeling of "produced visualizations looking like machine code", I can appreciate the argument you're making wrt judgment-based resolution scaling.
Vector embedding is not an invention of the last decade. Featurization in ML goes back to the 60s - even deep learning-based featurization is decades old at a minimum. Like everything else in ML this became much more useful with data and compute scale
Gary Marcus has been taking victory laps on this since mid-2023, nothing to see here. Patently obvious to all that there will be additional innovations on top of LLMs such as test-time compute, which nonetheless are structured around LLMs and complementary
Very cool and interesting project. Ideas like this are a threat to traditionally-conceived project management platforms like Linear; that being said, Linear and others (Monday, ClickUp, etc.) are pushing aggressively into UX built for human/AI collaboration. I guess the question is how quickly they can execute and how many novel features are required to properly bring AI into the human project workspace
Cheers! Smaller teams, more infrastructure, more testing, tasks requiring review in minutes not days - the features are just totally different for the new world than what legacy PM tools are optimised for, and who they have to continue to serve.
This does not take into account the fact that experienced developers working with AI have shifted into roles of management and triage, working on several tasks simultaneously.
Would be interesting (and in fact necessary to derive conclusions from this study) to see aggregate number of tasks completed per developer with AI augmentation. That is, if time per task has gone up by 20% but we clear 2x as many tasks, that is a pretty important caveat to the results published here
Used in multiple similar publications, including "Guiding Language Models of Code with Global Context using Monitors" (https://arxiv.org/abs/2306.10763), which uses static analysis beyond the type system to filter out e.g. invalid variable names, invalid control flow etc.
Yes this work is super cool too! Note that LSPs can not guarantee resolving the necessary types that we use to ensure the prefix property, which we leverage to avoid backtracking and generation loops.
If you are looking for an alternative that can also chat with you in Slack, create PRs, edit/create/search tickets and Linear, search the web and more, check out codegen.com
A friend asked me to do diligence on this company circa 2021 given my personal background in ML. The founder was adamant they had a "100% checkout success rate" based on AI, which was clearly false. He also had 2 other startups he was running concurrently (?)