You can get a spinning disk of 18TB (not need for SSD if you can parallel write) for 224€. Let's round that to $300 for easy calculations.
To store 100 petabytes of data by purchasing disks yourself, you would need approximately 5556 18TB hard drives totaling $1,666,800.
Of course, you'll pay more than the disks.
Let's add the cost of 93 enclosures at $3,000 each ($279,000), and accounting for controllers, network equipment ($100,000), and power and cooling infrastructure ($50,000, although it's probably already cool where they will host the thing), that would be a about $2.1 M.
That's total, and that's for the uncompressed data.
You would need 3 times that for redundancy, but it would still be 40% cheaper over 5 years, not to mention I used retail price. With their purchasing power they can get a big discount.
Now, you do have the cost of having a team to maintain the whole thing but they likely have their own data center anyway if they go that route.
> disk of 18TB (not need for SSD if you can parallel write)
Do note that you can put, like, at most?, 1TB of hot/warm data on this 18TB drive.
Imagine you do a query, and 100GB of the data to be searched are on 1 HDD. You will wait 500s-1000s just for this hard drive. Imagine a bit higher concurrency with searching on this HDD, like 3 or 5 queries.
You can't fill these drives full with hot or warm data.
> To store 100 petabytes of data by purchasing disks yourself, you would need approximately 5556 18TB hard drives totaling $1,666,800.
You want to have 1000x more drives and only fill 1/1000 of them. Now you can do a parallel read!
For this purpose you would likely not buy ordinary consumer disks but rather bullet proof enterprise HDDs. Otherwise a signifcant amount of the 5556 disks would not survive the first year, assuming the are under constant load.
quickwit's big advantage is that you can target it at something that speaks S3 and it will be happy. so ideally you delegate the whole storage story by hiring someone who knows their way around Ceph (erasure coding, load distribution) and call a few DC/colo/hosting providers (initial setup and the regular HW replacements).
The cost per year is much higher - that's using a 5-year amortization.