I’m glad you added the logging, that should help narrow it down - I’ve looked through that code pretty thoroughly, and had hoped we could add logging to get a better idea of where it is getting hung.
We did run into the max concurrent requests block (did a head request that we didn’t close). We fixed that bug, and increased the ServicePointManager.DefaultConnectionLimit - which I’d actually recommend adding that as a note on the Server-Events wiki, since the .net default is only 2 and SSE client uses 2 connections by itself (1 connection is always open, and an additional connection is made when doing the heartbeats). We discovered that our status updates could delay heartbeats, or vice versa.
Considering the low repro rate, and how we only see this happen in prod, I’m not sure if we’ll be able to debug it, but we’ll see what we can do on that front.