Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you can limit your research to GBs of logs, I kind of agree with you. It's ok if a log search request takes 100ms instead of 2s, and the "grep" approach is more flexible.

Usually our users search into > 1TB.

Let's imagine you have to search into 10TB (even after time/tag pruning). Distributing over 10k cores over 2 second is not practical and does not always economically make sense.



The question is why would someone need search through TBs of data.

If you are not google cloud and just have your workers ready to stream all data in parallel on x amount of workers in parallel, i would force usefull limitations and for broad searches, i would add a background system.

Start your query, come back later or get streaming results.

On the other hand, if not toooo many people search in parallel constantly and you go with data pods like backblaze, just add a little bit more cpu and memory and use the cpu of the datapods for parallisation. Should still be much cheaper than putting it on s3 / cloud.


I guess I was a little too prescriptive with "a couple seconds". What I really meant was a timescale of seconds to minutes is fine, probably five minutes is too long.

> Let's imagine you have to search into 10TB (even after time/tag pruning).

I'd love to know more about this. How frequently do users need to scan 10TB of data? Assuming it's all on one machine on a disk that supports a conservative 250MB/s sequential throughout (and your grep can also run at 250MB/s) that's about 11hr, so you could get it down to 4min on a cluster with 150 disks.

But I still have trouble believing they actually need to scan 10TB each time. I guess a real world example would help.

EDIT: To be clear, I really like quickwit, and what they've done here is really technically impressive! I don't mean to disparage this effort on its technical merits, I just have trouble understanding where the impulse to index everything comes from when applied specifically to the problem of logging and logs analysis. It seems like a poor fit.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: