Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Vision transformer trained on 300M human images with state of the art results on a bunch of human tasks (keypoints, segmentation, depth, normals).

Disclaimer: Co-author here.



You might want to update the README where it says run "./conda.sh" - it should say there are hard-coded paths in this script that need to be changed (the first line is CONDA_BASE="/home/rawalk/anaconda3").

I wonder if there is something here that requires conda and not a simple requirements.txt or something like that. Every time I try conda is seems to mess up my entire environment (usually I just use pyenv w/ virtualenv). But trying with conda now, keeping my fingers crossed...

EDIT: yep, as usual, conda failed me. (fresh install of miniconda). "./conda.sh" finished with 0 exit code and said "Installation done!". Yet, now I have no new conda environment (I think I saw some warnings and errors deep in the logging output).

I see now how this has various requirements.txt for the different sub-projects - looks like I'll try to create a pyenv-virtualenv and do things manually to try to get an example working...


> usually I just use pyenv w/ virtualenv

In case you're not aware, because I was only recently: https://github.com/pyenv/pyenv-virtualenv


always curious what the license allows with these Meta research drops, seems all over the place… can this be used commercially? (specifically inference) it’s creative commons and some parts apache?


The Creative Commons seems to be Non-Commercial [0], meaning it’s very interesting and quite inspiring, but ultimately useless outside of research and side projects.

The Apache parts seem to be dependencies.

[0]: https://github.com/facebookresearch/sapiens/blob/main/LICENS...


> but ultimately useless outside of research and side projects.

“Everything is useless unless it personally, financially benefits me.”


Yes. It is. Just a giant flexing its muscles. Look but don't touch.

Why would you spend any of your finite attention here? It's a signal to researchers and would-be upstarts that Meta already lurks here.


> Why would you spend any of your finite attention here?

Because it is research. And it's an open research, unlike OpenAI or (for the most part) modern Google DeepMind or xAI.

It's completely fair game for non-researchers to ignore that, but even non-researchers benefit from a higher pace of developing understanding how to do this kind of magic.

Thank you people at Meta Research for this release!


Research is fantastic. Why release the models publicly if they can't be openly used?


It also means no self-sustaining project can be built with it.

Which to me is something important to know when thinking about what can be built and the amount of effort it represents.

For better or worse, that ultimately shapes what can reasonably be done with it.


I’ve seen papers that combined pre-trained vision and language models, trained them together on image/text pairs, and then used the new model for things like text extraction. Could your model be plugged into such a design?

I’ve always wanted to scan whole books by just feeding Pictures of their pages into an AI. Prefer preferably with minimal labeling requirements. I also see this as a way to generate more training data for language models from old cheap books. Do you think your model could help with that?


reposting my comment in hopes you'll see it in your profile:

Um, this looks really, really good.

Yo @yoknapthawa, can this be finetuned on an M3 chip? How much RAM is needed? What are the current low hanging fruit-type tasks you think the community could go at? What's latency like? I didn't see anything on the page / in the paper / github about speeds.

I'm also curious about the classes you use for the segmentation task -- do you have a list of them somewhere?

Finally, your generalization results are all on photorealistic images, did you do any looking at paintings / animation / other? I'm curious how broadly the generalization goes.

As always, thank you for opening the weights.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: