Sure. First and foremost, do you have permission from your customers who you're researching and reporting on here? If you do, great, ignore me. If not you'd be breaching (my) trust if I was one of them. The data is not yours and it may be possible to infer who these datapoints belong to if so desired. If one could do that, they may be able to gain competitive advantage or otherwise exploit knowledge of infrastructure (social engineering for example).
There is a big difference, IMO, in someone like backblaze releasing statistics. They own all of the hardware and they choose to release the data themselves. You (on the surface) appear to be harvesting data from your customers, digging through it, and presenting it. You also point out very specific cases, rather than aggregate pseudonymous data.
You are collecting sensitive data from your customers environments. This doesn't inspire confidence that you treat it as such.
Sure... so, showing "here is a weird disk pattern -- they were running X on top of it -- consider not running X on SSD" with a sampleset of 1 is a logical fallacy and kinda a bizarre post.
For small samplesets, going deep to understand unnecessary writes, tuning the clients and showing less SSD wear after tuning would be interesting. Or, assuming you have more than 1 client of each of these situations aggregating the data to show patterns would be far more useful. As has been mentioned elsewhere, for inspiration, Backblaze has really nice posts analyzing their device wear.
But while we do have a lot of clients, I really think that all of their setups are unique in some way. So starting from that we didn't find more Redises on ceph dumping a lot.
> tuning the clients and showing less SSD wear after tuning would be interesting.
Is just not an option for us, as we only do monitoring and don't have any ways of tuning.
What potential problems do you see?
Thanks