More

Xyra · 2026-01-03T03:25:07 1767410707

Hetzner, Postgres, Rust, SvelteKit

Xyra · 2026-01-03T03:19:19 1767410359

What did you think?

Xyra · 2026-01-03T03:17:58 1767410278

emailed you, and it's https://venmo.com/u/XyraSinclair.

Xyra · 2026-01-01T20:11:19 1767298279

Thank you! I got the idea December 3, and initially released it December 19.

Xyra · 2026-01-01T03:11:29 1767237089

The scale is there. I'm scraping, cleaning, token efficientizing dozens of sources every single hour. The lack of monies for embedding everything was a temporary problem.

Xyra · 2026-01-01T03:01:31 1767236491

in the direction of "empowering the public with new capabilities they didn't have before", Scry offers, with the copy and paste of a prompt and talking with an agent:

1) Full readonly-SQL + vector manipulation in a live public database. Most vector DB products expose a much narrower search API. Basically only a few enterprise level services let you run arbitrary SQL on remote machines. Google BigQuery gives users SQL power, but it mostly doesn't have embeddings, connect public corpora, have as good of indexes, and doesn't have support an agentic research experience. Beyond object-level research, Scry a good tool for exploring and acquiring intuitions about embedding-space.

2) An agent-native text-to-SQL + lexical + semantic deep research workflow. We have a prompt that's been heavily optimized for taking full advantage of our machine and Claude Code for exploration and answering nuanced questions. Claude fires off many exploratory queries and builds towards really big queries that lean on the SQL query planner. You can interrupt at any time. You have the compute limits to do lots of exhaustive exploration--often more epistemically powerful than finding a document often, is being confident than one doesn't exist.

3) dozens of public commons in one database, with embeddings.

Xyra · 2026-01-01T01:27:27 1767230847

Thank you! I'll be getting millions more quality, embedded documents, it'll be here just getting more useful.

Xyra · 2026-01-01T01:25:43 1767230743

Thank you!

Xyra · 2026-01-01T01:10:36 1767229836

You submit a SQL query to periodically run, we run it and store the results. As we ingest more documents (dozens of sources are being ingested every day), we run it again. If there's different outputs, you get an email.

Xyra · 2026-01-01T00:14:15 1767226455

Exactly, people want precision and control sometimes. Also it's very hard to beat SQL query planners when you have lots of material views and indexes. Like this is a lot more powerful for most use cases for exploring these documents than if you just had all these documents as json on your local machine and could write whatever python you wanted.

Yeah I've out a lot of care into rate-limiting and security. We do AST parsing and block certain joins, and Hacker News has not bricked or overloaded my machine yet--there's actually a lot more bandwidth for people to run expensive queries.

As for getting good semantic queries for different domains, one thing Claude can do besides use our embed endpoint to embed arbitrary text as a search vector, is use compositions of centroids (averages) of vectors in our database, as search vectors. Like it can effortlessly average every lesswrong chunk embedding over text mentioning "optimization" and search with that. You can actually ask Claude to run an experiment averaging the "optimization" vectors from different sources, and see what kind of different queries you get when using them on different sources. Then the fun challenge would be figuring out legible vectors that bridge the gap between these different platform's vectors. Maybe there's half the cosine distance when you average the lesswrong "optimization" vector with embed("convex/nonconvex optimization, SGD, loss landscapes, constrained optimization.")

kiney · 2026-01-01T13:32:46 1767274366

if performance becomes a problem statically hosting sqlite DBs with client side queries and http range requests is an interesting approach:

https://github.com/phiresky/sql.js-httpvfs

Xyra · 2026-01-01T19:30:43 1767295843

Thanks, that's very interesting.

plagiarist · 2026-01-01T15:44:53 1767282293

That's a neat thought. What's the granularity of the text getting embedded? I assume that makes a large difference in what the average vector ends up representing?

Xyra · 2026-01-01T19:25:06 1767295506

~300 token chunks right now. Have other exciting embedding strategies in the works.