More

sadiq · 2025-09-27T10:18:44 1758968324

It's worth looking at Jan's trophy cabinet at the bottom of https://github.com/ocaml-multicore/multicoretests/

His work has uncovered a number of really tricky bugs in the multicore runtime but what's brilliant is the reports normally come with a minimal reproduction. This makes working out the cause so much easier.

Great work Jan.

sadiq · 2025-09-26T08:37:50 1758875870

I would try https://github.com/ucam-eo/tessera-interactive-map , this is relatively easy to get started with and has a nice interface for labeling.

https://github.com/ucam-eo/geotessera has an image showing our embedding coverage at the moment. Blue areas we have complete coverage for 2024, green areas we cover 2017-2024. We're slowly trying to populate everything 2017-2024 but the constraint is GPU and storage at the moment - each year takes ~20k GPU/200k CPU hours and requires storing and serving 200 terabytes of data. The world is big!

If there is an area you would like prioritised, there's an issue template on the geotessera github repo which we can use to move regions around in the processing queue.

ensocode · 2025-09-26T09:52:09 1758880329

Thanks for your explanation. For my region, 2024 coverage is already available, which should be sufficient to get started. After looking into the library, I just want to make sure I understand the workflow correctly: I would use the Tessera interactive map to mark known locations of Giant Hogweed, label them, and export as GeoJSON; then train a k-NN model, make predictions, and finally export the results as a GeoJSON back to the map. Does that sound right?

sadiq · 2025-09-26T10:08:48 1758881328

So the interactive map should do this workflow for you. You place points and it will run the knn classifier over the landscape for you.

If you want to go further you can export the GeoJSON and then run it through any machine learning pipeline you like.

avsm · 2025-09-26T10:51:45 1758883905

...and if you do build this @ensocode, feel free to open a PR to https://github.com/ucam-eo/geotessera and I'll incorporate it as an example in the repo.

sadiq · 2025-09-26T08:26:38 1758875198

I was a lot more optimistic about Gabriel's model than he was. It is essentially a presence-only species distribution model where accuracy depends largely on assumptions around prevalence and which really needs some presence-absence data to calibrate.

As I mentioned in one of the other comments, the model is also only pixel-wise. That is, it is not using spatial information for predictions.

sadiq · 2025-09-26T08:15:23 1758874523

We did note several places during the trip that didn't contain bramble. The hotspot in the middle of the residential area was also entirely isolated.

For a proper evaluation you would need to be more methodological but as a sanity-check we were very happy with it.

One other thing to point out about the bramble model is that it is pixel-wise. That is each prediction is exclusively only what is within the 10 metre pixel (give or take the georeferencing error).

sadiq · 2025-09-25T21:29:59 1758835799

It might work. TESSERA's embeddings are at a 10 metre resolution, so it might depend on the size of the features you are looking for. If those features have distinct changes in colour or texture over time or they scatter radar in different ways compared with their surroundings then you should be able to discriminate them.

The easiest way to test is to try out the interactive notebook and drop some labels in known areas.

throwup238 · 2025-09-25T23:53:03 1758844383

Is there a way to cluster the embeddings spatially or look for patterns isolated so some dimensions? (Again, way out of my wheel house)

What I mean is a vein is usually a few meters wide but can be hundreds of meters long so ten meter resolution is probably not very helpful unless the embeddings can encode some sort of pattern that stretches across many cells.

sadiq · 2025-09-26T08:30:19 1758875419

It's possible to use embeddings as input to a convolutional network and then train that using labels. We've done that for at least one of the downstream tasks in the TESSERA paper: https://arxiv.org/abs/2506.20380 to estimate canopy height.

The downside of that approach is that you need to spend valuable labels on learning the spatial feature extraction during training. To fix that we're working on building some pre-trained spatial feature extractors that you should only need to minimally fine-tune.

sadiq · 2025-09-25T21:09:05 1758834545

If you have some GPS locations of truffles, you could use the notebook Anil mentioned here https://news.ycombinator.com/item?id=45378855 and give it a go.

There is the issue of just how visible truffles are from space though, if they grow under cover. That said, it may still work because you can find habitats that are very likely to have truffles. We've had some promising results looking at fungal biomass.

sadiq · 2025-09-25T21:05:31 1758834331

That's actually a great idea! I wonder what kind of feature size would be needed though - TESSERA's embeddings are at a 10 metre resolution so for larger structures you might need some kind of spatial aggregation.

sadiq · 2025-09-25T20:46:31 1758833191

Hyperspectral data is really neat though it's worth pointing out that TESSERA is only trained on multispectral (optical + SAR) data.

You are very right on the temporal aspect though, that's what makes the representation so powerful. Crops grow and change colour or scatter patterns in distinct ways.

It's worth pointing out the model and training code is under an Apache2 license and the global embeddings are under a CC-BY-A. We have a python library that makes working with them pretty easy: https://github.com/ucam-eo/geotessera

sadiq · 2025-09-25T20:40:16 1758832816

Yes! TESSERA is very new so we're still exploring how well it works for various things.

We're hoping to try it with a few different things for our next field trip, maybe some that are much harder to find than brambles.

sadiq · 2025-09-25T20:38:04 1758832684

Hi! You can find a bit more about Gabriel's model through some of his posts over the last few weeks: https://gabrielmahler.org/posts/

When it comes to the satellite images, the model actually used TESSERA (https://arxiv.org/abs/2506.20380) which is a model we trained to produce embeddings for every point on earth that encodes the temporal-spectral properties over a year.

Think of it like a compression of potentially fifty or a hundred observations of a particular point in earth down to a single 128 dimension vector.

Happy to answer any other questions.