We run on the same instance types the larger PlanetScale Metal sizes offer as whole instances. For Intel that's r6id, i4i, i7i, i3en, and i7ie. For ARM that's r8gd, i8g, and i8ge. (Right now, at least. AWS is always cookin' up new instance types.) Same story will soon be true for GCP.
If you just make a change of code, you don't need to handle translations at that time. That will get done by the various translation teams closer to the release. However you do need to make sure that the code is translatable (e.g. injecting pre-formulated english messages into a larger message is problematic).
Basically everyone with even a CDN endpoint on the US is under Cloud Act. Hetzner, OVH, etc. Maybe only Scaleway that I couldn’t find any mentions of an US PoP.
I am curious how you prevent private data from getting leaked to the auto-generated public docs. I imagine this problem does not exist in open source projects, but would become an issue if not everything discussed in company's private messenger should be used as context for generating docs.
Absolutely. There are steps in Promptless's agent flow that are designed to prevent this, but this is why users still review Promptless's suggestions to guides before committing/publishing them. I think people will still want to review Promptless's suggestions for a while, but the granularity of oversight will probably decrease as trust increases.
You can generate CLIP embeddings locally on the DB server via:
SELECT abstract,
introduction,
figure1,
clip_text(abstract) AS abstract_ai,
clip_text(introduction) AS introduction_ai,
clip_image(figure1) AS figure1_ai
INTO papers_augmented
FROM papers;
Then you can search for embeddings via:
SELECT abstract, introduction FROM papers_augmented ORDER BY clip_text(query) <=> abstract_ai LIMIT 10;
The approach significantly decreases search latency and results in cleaner code.
As an added bonus, EXPLAIN ANALYZE can now tell percentage of time spent in embedding generation vs search.
The linked library enables embedding generation for a dozen open source models and proprietary APIs (list here: <https://lantern.dev/docs/develop/generate>, and adding new ones is really easy.
I have tried CLIP on my personal photo album collection and it worked really well there - I could write detailed scene descriptions of past road trips, and the photos I had in mind would pop up. Probably the model is better for everyday photos than for icons
Not sure what the approach of this library is, but can't you generate a nonce from a larger alphabet, hash the column values with the nonce `hash(nonce || column)`, and crypto-shred the nonce in the end.
Then, during hashing you just need a constant immutable state, which effectively expands the hash space, without incurring the mutable state overhead of replacement strings strategy.
Would be curious to know what the underlying aws ec2 instance is.
Is each DB on a dedicated instance?
If not, are there per-customer iops bounds?