This has been exactly my mindset as well (another Seattle SWE/DS). The baseline capability has been improving and compounding, not getting worse. It'd actually be quite convenient if AI's capabilities stayed exactly where they are now; the real problems come if AI does work.
I'm extremely skeptical of the argument that this will end up creating jobs just like other technological advances did. I'm sure that will happen around the edges, but this is the first time thinking itself is being commodified, even if it's rudimentary in its current state. It feels very different from automating physical labor: most folks don't dream of working on an assembly line. But I'm not sure what's left if white collar work and creative work are automated en masse for "efficiency's" sake. Most folks like feeling like they're contributing towards something, despite some people who would rather do nothing.
To me it is clear that this is going to have negative effects on SWE and DS labor, and I'm unsure if I'll have a career in 5 years despite being a senior with a great track record. So, agreed. Save what you can.
Exactly. For example, what happens to open source projects where developers don't have access to the latest proprietary dev tools? Or, what happens to projects like Spring if AI tools can generate framework code from scratch? I've seen maven builds on Java projects that pull in hundreds or even thousands of libraries. 99% of that code is never even used.
The real changes to jobs will be driven by considerations like these. Not saying this will happen but you can't rule it out either.
> I'm extremely skeptical of the argument that this will end up creating jobs just like other technological advances did. I'm sure that will happen around the edges, but this is the first time thinking itself is being commodified, even if it's rudimentary in its current state. It feels very different from automating physical labor: most folks don't dream of working on an assembly line.
Most people do not dream of working most white collar jobs. Many people dream of meaningful physical labor. And many people who worked in mines did not dream of being told to learn to code.
The important piece here is that many people want to contribute to something intellectually, and a huge pathway for that is at risk of being significantly eroded. Permanently.
Your point stands that many people like physical labor. Whether they want to artisanally craft something, or desire being outside/doing physical or even menial labor more than sitting in an office. True, but that doesn't solve the above issue, just like it didn't in reverse. Telling miners to learn to code was... not great. And from my perspective neither is outsourcing our thinking en masse to AI.
My idea is to use make a table row into a textual description and feed it into a transformer and get a get effectively a sentence embedding. This is effectively a query embedding. Then make a couple of value embeddings for the target you are trying to predict and use cosine similarity to predict the right value embedding and feed that to the ml model as part of the feature set. It works if the categorical values in your table are entities that the model might have learned.
I tried this approach and it did improve the overall performance. The next step would be fine tuning the transformer model. I want to see if I could do it without disturbing the existing weights too much. Here's the library I used to get get the embeddings
I've been watching Steve Brunton's lab closely on discovering dynamical/control systems via NN's/Auto-Encoders. His videos really helped me figure out what was happening in the background to figure out sparse solutions to chaotic systems:
https://www.youtube.com/watch?v=KmQkDgu-Qp0
This is really great! It speaks very much to my use-case (building user embeddings and serving them both to analysts + other ML models).
I was wondering if there was a reasonable way to store raw data next to the embeddings such that:
1. Analysts can run queries to filter down to a space they understand (the raw data).
2. Nearest neighbors can be run on top of their selection on the embedding space.
Our main use case is segmentation, so giving analysts access to the raw feature space is very important.
For purposes of nearest neighbors this seems like an incredibly interesting shape to inscribe into:
The sphere, despite having spherical properties also maintains linear properties due to the corrugation. To me that means we can try to inscribe orthogonal properties into both of the spaces.
My understanding of these geometries isn't complex enough to make the connections, so my question is this:
Do you think its feasible to use shapes with this 'corrugated' property to make better nearest neighbor compression?
My intuition tells me that you can use the shape's linear nature to push apart independent components and inscribe the rest of the details into the spherical components. Or perhaps the opposite way.
I don't have any intelligent comments on your question, but I wanted to say that I am a fan of Quanta magazine, but somehow had missed this really cool article. So thanks for pointing me to this fascinating field. ;)
They should be quite similar.
In the end you coax your embedding space to amount to some consistent measurement of what causes samples to diverge from one-another.
You can do similarity search and all the sorts of things you do for word embeddings on embeddings generated for other scenarios.
This is spot on with my own observations, especially as we get into modelling more 'abstract' ideas.
As more NN methods become viable, some more savvy data scientists complain to me "this NN is just approximating SVD/PCA/POD/etc!"
Wonderful, that's explicitly the point! The network we're applying to this problem compares/combines multiple approaches to dimension reduction. The network created a latent space that makes way more semantic sense than just PCA or SVD for this problem (No Free Lunch). It still takes effort and understanding, but the value I've personally gotten over just applying PCA for my problem-sets has been incredible. In fact I'm certain it has made my career. Turns out diagonalizing covariance matrices aren't the only dimension reduction game in town!
As someone who's spent 20 years tuning my own genetic algorithms, being swamped by newbs who spout fancy language and don't even want to know how to write the code themselves just feels like what it is - a new generation of recent business grads who swapped "blockchain" for "ML". Soon to be separated into "founders" and real estate agents, while the rest of us toil in the vineyard. So goes it.
Neural networks let anyone bullshit a good game until things get tricky. Back in the day frontpage was going to kill web development because anyone could make a website. Now we can slap newer tech we don’t understand together and profit will ensue.
I've been doing it for two years and am barely past the "understand none of the words" phase.
It helps to think of each term as an interesting puzzle. For example, SVD. It's fascinating if you dig into it. Most people don't want to, because it feels like work. For me, it's neat understanding ... whatever it is, ha.
I think it's finding the basis eigenvectors in a higher dimensional space, which basically just means that e.g. the eigenvectors of a cube are the X, Y, Z axes you're used to. If you skew it along the X axis, the Y axis bends a bit, along with the cube.
The eigenvectors form a shape that, when you find the volume of it, is the area of the resulting form. So the determinant of a cube's SVD is the volume of the cube.
In higher dimensional spaces, it's the same thing, except it's called "eigenvectors" (named after Sir Eigen of Eigenmadethisup) because mathematicians have reasons for using complicated language, some of which is valid. But as you see from me muddling through this, the underlying concepts are all small simple pieces that fit together.
Or I was nowhere close to the explanation of SVD. But it was close to something interesting, since it leads to the question of "What's the SVD of a sphere? How about a point cloud?" It was easy to figure out for a cube. Not so easy when it's an arbitrary shape. "And why is it useful?" Because it gives a lot of hints about what that object is. In the optimal case, in StyleGAN for example, the SVD can even be the basis vectors like "smile", "age", and so on. (You know in Faceapp how you can drag the "Age" slider and make yourself look older or younger? That's a basis vector in higher dimensional space. It's orthogonal -- more or less -- to "smile", because if you drag the "smile" basis vector around, it doesn't cause you to age older or younger. Except it's not quite orthogonal, because it's a higher dimensional weird-ass shape and therefore can't be orthogonal, so sometimes when you make someone older their hair turns grey even though "blonde hair" is orthogonal to "age" in theory.)
Yada, yada. Rinse and repeat and dive in for a couple years. You'll find it's fun once you jump in.
P.S. All the people reading this that feel offended like "No, you really must start with theory; you can't possibly learn anything if you don't know what you're doing," you better read this: http://thecodist.com/article/the_programming_steamroller_wai...
That steamroller is coming for you. Once the legions of javascript programmers realize that hey, I can do DL just like an ML researcher, you're gonna be doomed. Because a 17yo JS programmer has roughly 10x as much determination as even I can muster these days, let alone someone who clings to the idea that theory is the only path forward.
> savvy data scientists complain to me "this NN is just approximating SVD/PCA ..."
It wouldn't be "approximating" anything. An optimal one layer linear NN "autoencoder" is PCA. There are other learning algorithms for PCA than gradient descent, but the infrastructure for learning NN:s with big data sets makes it painless.
As soon as you add activations and layers, you're improving on SVD/PCA. For dimensionality reduction, it means the "manifold" is more complicated than just a linear projection.
> As soon as you add activations and layers, you're improving on SVD/PCA
You're expanding the space of realizable functions, which is an improvement in a specific sense, but not in all senses! The SVD, since it is better understood theorist theoretically, is a more straightforward problem to solve robustly. There are fewer hyperparameters (like learning rate) to choose, and you aren't left wondering whether your solution is at a bad local minimum.
I think it's wrong to think that it's an obvious improvement.
Compelling. I wonder what else this could be applied to in addition to psychedelics? Anti-anxiety and other sensory affecting drugs?
If you wanna get Black Mirror-esque, perhaps a Soma-like medication from Brave New World (essentially pacifies/zombifies you by creating endless bliss) could be made. Or the "bliss" drug episode of Doctor Who.
I'm extremely skeptical of the argument that this will end up creating jobs just like other technological advances did. I'm sure that will happen around the edges, but this is the first time thinking itself is being commodified, even if it's rudimentary in its current state. It feels very different from automating physical labor: most folks don't dream of working on an assembly line. But I'm not sure what's left if white collar work and creative work are automated en masse for "efficiency's" sake. Most folks like feeling like they're contributing towards something, despite some people who would rather do nothing.
To me it is clear that this is going to have negative effects on SWE and DS labor, and I'm unsure if I'll have a career in 5 years despite being a senior with a great track record. So, agreed. Save what you can.