Motivation:
ff0045e3e1 changed HpackHuffmanDecoder to use a lookup-table which greatly improved performance. We can squeeze out another 3% win by using an ByteProcessor which will reduce the number of bound-checks / reference-count-checks needed by processing byte-by-byte.
Modifications:
Implement logic with ByteProcessor
Result:
Another ~3% perf improvement which shows up when using h2load to simulate load.
`h2load -c 100 -m 100 --duration 60 --warm-up-time 10 http://127.0.0.1:8080`
Before:
```
finished in 70.02s, 620051.67 req/s, 20.70MB/s
requests: 37203100 total, 37203100 started, 37203100 done, 37203100 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 37203100 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 1.21GB (1302108500) total, 41.84MB (43872600) headers (space savings 90.00%), 460.24MB (482598600) data
min max mean sd +/- sd
time for request: 404us 24.52ms 15.93ms 1.45ms 87.90%
time for connect: 0us 0us 0us 0us 0.00%
time to 1st byte: 0us 0us 0us 0us 0.00%
req/s : 6186.64 6211.60 6199.00 5.18 65.00%
```
With this change:
```
finished in 70.02s, 642103.33 req/s, 21.43MB/s
requests: 38526200 total, 38526200 started, 38526200 done, 38526200 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 38526200 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 1.26GB (1348417000) total, 42.39MB (44444900) headers (space savings 90.00%), 466.25MB (488893900) data
min max mean sd +/- sd
time for request: 370us 24.89ms 15.52ms 1.35ms 88.02%
time for connect: 0us 0us 0us 0us 0.00%
time to 1st byte: 0us 0us 0us 0us 0.00%
req/s : 6407.06 6435.19 6419.74 5.62 67.00%
```