I was thinking that writing a DB would be a good project to start getting comfortable with a language I've been learning. However I have no idea about design, strategies or best practices for creating a DB.
What kind of database? An embedded key/value database like TokyoCabinet? An embedded SQL database like SQLite? An in memory data-structure database like Redis? A server SQL database like MySQL? A distributed key/value database like Cassandra? A json document database like CouchDB? I could go on...
In general, if you want to learn a new language, pick a project of limited scope (weeks) that you have done in the past in some other language.
A database is a bad choice. Even the simplest database worth being called a database requires months of effort.
To be precise, I want to code a key/value store exclusively oriented at geolocated queries by key. It doesn't look like a very complex project or am I underestimating it?
I noticed you want to do geolocation. Let's put that aside for now, just looking at key/value.
1. You need to store all your data in "pages". These are fixed-length blocks (that can be chained together).
2. Use a classical "dictionary" data structure to store your index (most use B-Tree), but you place that structure on-disk instead of in-memory. Your index basically points a key to the index of the first page that contains the value.
3. You may need to store other types of data in the file, such as the free page list.
As for geo-location, B-Tree fits nicely here.
1. Intelligently lay out your B-Tree so that nearby nodes are typically within the same node-family-system. Include information about the distance spanned by descendants in the tree (you probably want to use Manhatten distance).
2. When trying to do "near" queries, first locate the node you are interested in and then start traversing up the parents until you find the ancestor that encompasses the distance you are looking for.
3. Grab all the descendants of that ancestor and re-check them against the distance.
This is one of those questions where if you ask it the answer is 'you shouldn't'.
If you want to get comfortable with a language that you are learning my advice would be to pick a simple real-world problem and to try to solve that rather than to tackle a major project like this. That will only cause frustration in the long run.
I think if you really wanted to get comfortable with something that could have you make a performant, database quickly, you might want to start with something like C...
Now, C is a shotgun that you will invariably use to shoot yourself in the foot with, but I think it's absolutely straightforward and powerful manual memory management, as well as raw speed, would make it worth learning.
Like, you could for example just focus on building a database of a given size, in memory -- you could even do it all in RAM, if you wanted.
You could have the database specialize in holding one kind of data structure (for example, how about tries? they're pretty rare, despite how useful they are)
Thanks for the constructive feedback. In fact, I wanted to focus just in latitude-longitude pairs to perform geolocated queries, this is my single goal. I'm trying to get rid of MongoDB for my backend and this is one of the last barriers...
So you were trying to get rid of mongodb as a backend, and you ended up at writing your own? Well I'm not sure how set you are on that, but maybe you should look into some other databases, what did you dislike about mongo?
And yeah, with such a specific goal, you could very well make a very nice niche DB that optimizes for the kind of data you're trying to search! Especially with map data becoming more and more important, I think it's actually a pretty awesome idea.
However, I would say that you should go in with the intent to make a kickass latitude-longitude database, rather than just having it be a piece of something you're trying to push to production (you have better things to worry about with something that's going to production -- don't make the db one of them)
You never know, you might find that the database is the more important project!
If you meant database engine, then you should contribute to open source database projects like PostgreSQL or MariaDB. This assumes that you're comfortable with populating and maintaining databases.
Start with downloading the source code for say MariaDB, reading internals documentation on project Wiki, compiling it on your machine.
I, for one, think you should do it. But you've got to realize, up front, this will be thrown away, will only work for 1 use case, will be slow, etc. Don't build anything general, or anything reusable. Write in Go. Put everything in one big textfile. Start with O(slow) code. Use a simple 'schema', like tsv/csv/json. It will work.
Production-ready, simple, written by you: pick 2. The 'correct' thing to do is choose the first two. Have at it.
One feature at a time, the same as any other program.
Start with something very simple - maybe a single table with a hardcoded schema, or a simple string key/value store (a la memcached). Find a use case for which this is an improvement over what you had before. Add features as and when you need them.
if you have a different angle on querying your data that the current solutions don't solve then you should definitely give it a shot.
You should try to limit what you build though. Can you utilize leveldb, Riak, Cassandra, Redis or even Postgres Foreign Data Wrapper.
Your data will probably need to be queried with other user data so consider that as well.
I would start with figuring out how to query the data you need that different from the current measures and prove to yourself that it beats out the existing strategies. If it does then expand by storing and loading the data to disk.
then work on connecting to it through multiple clients, look into using msgpack, thrift, message pack.
I really don't see how you could suggest to learn assembly in this context. I'm not aware of any production database for commodity hardware written in assembler. (in the embedded world there are a few but that's specialty stuff).
In general, if you want to learn a new language, pick a project of limited scope (weeks) that you have done in the past in some other language.
A database is a bad choice. Even the simplest database worth being called a database requires months of effort.