And the throttling seems even simple: give each IP address an initial allowance of A requests, then increase the allowance every T time up to a maximum of B. Perhaps A=B=10, T=150ms.
It is a little more complicated because a request is few layers deep. In HTTP2 you open a connection, start a stream, then send a request over that stream.
Are you tracking per connection? Per stream? Isn't it normal for multiple requests to happen quite quickly? I load a single page with 50 external assets, those get multiplexed over the current stream - is that okay? Is that abusive? The other stream is handling a video player and its requesting (http2) frames of video data - too much? Too fast?
And the throttling seems even simple: give each IP address an initial allowance of A requests, then increase the allowance every T time up to a maximum of B. Perhaps A=B=10, T=150ms.