I guess I was a little too prescriptive with "a couple seconds". What I really meant was a timescale of seconds to minutes is fine, probably five minutes is too long.
> Let's imagine you have to search into 10TB (even after time/tag pruning).
I'd love to know more about this. How frequently do users need to scan 10TB of data? Assuming it's all on one machine on a disk that supports a conservative 250MB/s sequential throughout (and your grep can also run at 250MB/s) that's about 11hr, so you could get it down to 4min on a cluster with 150 disks.
But I still have trouble believing they actually need to scan 10TB each time. I guess a real world example would help.
EDIT: To be clear, I really like quickwit, and what they've done here is really technically impressive! I don't mean to disparage this effort on its technical merits, I just have trouble understanding where the impulse to index everything comes from when applied specifically to the problem of logging and logs analysis. It seems like a poor fit.
> Let's imagine you have to search into 10TB (even after time/tag pruning).
I'd love to know more about this. How frequently do users need to scan 10TB of data? Assuming it's all on one machine on a disk that supports a conservative 250MB/s sequential throughout (and your grep can also run at 250MB/s) that's about 11hr, so you could get it down to 4min on a cluster with 150 disks.
But I still have trouble believing they actually need to scan 10TB each time. I guess a real world example would help.
EDIT: To be clear, I really like quickwit, and what they've done here is really technically impressive! I don't mean to disparage this effort on its technical merits, I just have trouble understanding where the impulse to index everything comes from when applied specifically to the problem of logging and logs analysis. It seems like a poor fit.