Hello! I'd be interested to hear what you think the correct interpretation of these CIs are in this case. Failing that can you explain what is wrong with saying something like "with xx% confidence we can conclude that the rate is within these bounds" is?
The assumption of using Poisson seems pretty solid to me, given we are talking about x events in some continuum (miles traveled in this case), but always happy to hear any cogent objections.
The Poisson distribution assumes equal probability of events occurring. That seems to me to be an oversimplification, given that AV performance varies over time as changes are made, and also given that terrain / environment plays a huge factor here, whether looking at one particular vehicle or comparing to vehicles across companies (and drivers in general). Since AV performance will hopefully be improved when an accident occurs, we also cannot meet the assumption of independence between events. Although if AVs are simply temporarily stopped after an accident, that also breaks the independence assumption as we'd have a time period of zero accidents.
The bigger problem though is what you are doing with your confidence interval. A CI is a statement about replication. A 95% confidence level means that in 100 replications of the experiment using similar data, 5 of the generated CIs -- which will all have different endpoints -- will _not_ contain the population parameter, although IIRC this math is more complicated in practice, meaning that the error rate is actually higher. As such, if you generate a CI and multiply the endpoints by some constant, that's a complete violation of what is being expressed: there is vastly more data with 100m driving miles than 3m miles, which will cause the CI to shrink and the estimate of the parameter to become more accurate. There is absolutely no basis for multiplying the endpoints of a CI!
Ultimately, given that the size of the sample has an effect on CI width, you need to conduct an appropriate statistical test to compare the estimated parameters between the 1 in 3m deaths for Uber and whatever data generated the 1.18 in 100m deaths for sober drivers. There's a lot more that needs to be taken into account here than what a simple Poisson test can do.
Edit: Note the default values of the T and r parameters when you run poisson.test(1, conf.level = 0.95), and also that the p-value of the one-sample exact test you performed is 1. Also, since this is an exact test, the rate of rejecting true null hypotheses at 0.95 is 0.05, but given my reservations about the use of a Poisson distribution here, I don't think that using an exact Poisson test is appropriate.
To be more clear, when you run poisson.test(1, conf.level = 0.95) with the default values of T and r (which are both 1) you are performing the following two-sided hypothesis test:
Null hypothesis: The true rate of events is 1 (r) with a time base of 1 (T).
Alternative hypothesis: The true rate is not equal to 1.
The reason that you end up with a p-value of 1 is because you've said that you've observed 1 event in a time base of 1 with a hypothesized rate of 1. So given this data, of course the probability of observing a rate equal to or more extreme than 1 is 1! As such, you're not actually testing anything about the data that you claim you are testing.
I'm not trying to be harsh here, but please be careful when using statistics!
Facinating, thank you. Particularly the part about multiplying the CI. I wonder if the analysis could be resuced to some extent? I feel there must be a way to use the information we have do draw some conclusions, at least relative to some explicit assumptions.
The assumption of using Poisson seems pretty solid to me, given we are talking about x events in some continuum (miles traveled in this case), but always happy to hear any cogent objections.