tru_pablo's comments

tru_pablo · on Aug 27, 2018

Could you please elaborate on the point? How we can improve?

What potential problems do you see?

Thanks

pheleven · on Aug 28, 2018

Sure. First and foremost, do you have permission from your customers who you're researching and reporting on here? If you do, great, ignore me. If not you'd be breaching (my) trust if I was one of them. The data is not yours and it may be possible to infer who these datapoints belong to if so desired. If one could do that, they may be able to gain competitive advantage or otherwise exploit knowledge of infrastructure (social engineering for example).

There is a big difference, IMO, in someone like backblaze releasing statistics. They own all of the hardware and they choose to release the data themselves. You (on the surface) appear to be harvesting data from your customers, digging through it, and presenting it. You also point out very specific cases, rather than aggregate pseudonymous data.

You are collecting sensitive data from your customers environments. This doesn't inspire confidence that you treat it as such.

tru_pablo · on Aug 27, 2018

As an author I would greatly appreciate any suggestions in what kind of stats you would want us to gather and share.

labarna · on Aug 27, 2018

Just FYI, "to wear (out)" has an irregular past tense, and past participle forms: "wore" and "worn". So:

1) These disk, because of constant throughput, wore out.

2) These disk, because of constant throughput, are worn out.

tru_pablo · on Aug 27, 2018

ploxiln · on Aug 27, 2018

don't just trust the drive's SMART attributes - try to independently confirm when and how the drive is failing

techreport did a long-term experiment on this (but with limited samples): https://techreport.com/review/27062/the-ssd-endurance-experi...

tru_pablo · on Aug 27, 2018

I'm the author. We wanted to showcase some scenarios and that's it.

I would love to hear any suggestions of what else we can gather and report.

brokentone · on Aug 28, 2018

Sure... so, showing "here is a weird disk pattern -- they were running X on top of it -- consider not running X on SSD" with a sampleset of 1 is a logical fallacy and kinda a bizarre post.

For small samplesets, going deep to understand unnecessary writes, tuning the clients and showing less SSD wear after tuning would be interesting. Or, assuming you have more than 1 client of each of these situations aggregating the data to show patterns would be far more useful. As has been mentioned elsewhere, for inspiration, Backblaze has really nice posts analyzing their device wear.

tru_pablo · on Aug 28, 2018

Thanks. That makes sense.

But while we do have a lot of clients, I really think that all of their setups are unique in some way. So starting from that we didn't find more Redises on ceph dumping a lot.

> tuning the clients and showing less SSD wear after tuning would be interesting. Is just not an option for us, as we only do monitoring and don't have any ways of tuning.