We have a fairly frequent reoccurring issue with the SSE client. Occasionally it gets stuck after “Prep for Heartbeat”, and never recovers. We’ve been trying to track down the root cause, but have been unsuccessful. We have a private network, with ~ 20 clients connected for the majority of each day: 6 of which are browsers (using ss-utils), the rest are windows apps using the built in SS framework.
Exhibit A
Monday, Dec 19th. We did see a short network blip (most clients disconnected, then reconnected) ~ 1 min before loosing this client. At the time this client lost it’s connection, 1 other client also lost connection - the other client’s SSE did a Stop() & Start() and continued working just fine. This client’s SSE never tried to restart itself.
Here’s a relative part from our client-side log:
14:41:24.4974 DEBUG [ServerEventsClient] [SSE-CLIENT] Prep for Heartbeat…
14:41:24.5124 DEBUG [ServerEventsClient] [SSE-CLIENT] LastPulseAt: 14:41:24.5124411
14:41:24.5124 DEBUG [ServerEventsClient] [SSE-CLIENT] OnHeartbeatReceived: (ServerEventHeartbeat) #5 on #user35 (vecclient:v003s00-r027, allvecclients)
14:41:24.5124 DEBUG [ServerEventsClient] [SSE-CLIENT] Heartbeat sent to: http://vecadminserver/api/event-heartbeat?id=aBg9k004325kdzXCQxw3
14:41:44.5246 DEBUG [ServerEventsClient] [SSE-CLIENT] Prep for Heartbeat…
14:41:44.5246 DEBUG [ServerEventsClient] [SSE-CLIENT] Heartbeat sent to: http://vecadminserver/api/event-heartbeat?id=aBg9k004325kdzXCQxw3
14:41:44.5246 DEBUG [ServerEventsClient] [SSE-CLIENT] LastPulseAt: 14:41:44.5336282
14:41:44.5446 DEBUG [ServerEventsClient] [SSE-CLIENT] OnHeartbeatReceived: (ServerEventHeartbeat) #6 on #user35 (vecclient:v003s00-r027, allvecclients)
14:42:04.5379 DEBUG [ServerEventsClient] [SSE-CLIENT] Prep for Heartbeat…
14:42:04.5539 DEBUG [ServerEventsClient] [SSE-CLIENT] Heartbeat sent to: http://vecadminserver/api/event-heartbeat?id=aBg9k004325kdzXCQxw3
14:42:04.5539 DEBUG [ServerEventsClient] [SSE-CLIENT] LastPulseAt: 14:42:04.5539860
14:42:04.5670 DEBUG [ServerEventsClient] [SSE-CLIENT] OnHeartbeatReceived: (ServerEventHeartbeat) #7 on #user35 (vecclient:v003s00-r027, allvecclients)
14:42:24.5702 DEBUG [ServerEventsClient] [SSE-CLIENT] Prep for Heartbeat…
14:42:24.5702 DEBUG [ServerEventsClient] [SSE-CLIENT] LastPulseAt: 14:42:24.5792432
14:42:24.5832 DEBUG [ServerEventsClient] [SSE-CLIENT] OnHeartbeatReceived: (ServerEventHeartbeat) #8 on #user35 (vecclient:v003s00-r027, allvecclients)
14:42:24.5702 DEBUG [ServerEventsClient] [SSE-CLIENT] Heartbeat sent to: http://vecadminserver/api/event-heartbeat?id=aBg9k004325kdzXCQxw3
14:42:44.6011 DEBUG [ServerEventsClient] [SSE-CLIENT] Prep for Heartbeat…
14:42:46.1943 DEBUG [ServerEventsClient] [SSE-CLIENT] OnMessageReceived: 9 on #user35 (vecclient:v003s00-r027, allvecclients)
14:44:49.4449 DEBUG [ServerEventsClient] Stop()
Heartbeats are regular, until 14:42:44
where it got stuck somehow. The following OnMessageReceived was a command sent from the server, which it received and processed just fine. However, without the active heartbeat, the client was considered disconnected, and we had to manually restart the application.
Interestingly, shortly after the last “Prep for Heartbeat”, the client app did a separate http post (a status update that it does every 3-15 seconds) that took 12 seconds to respond - something that normally only takes .5-1 second. The server logs however show that the request processed sub-second.
On the server side of things, here’s the request that triggered the message that was received on the client at 14:42:46
:
14:42:41.1919 DEBUG [AppHost] [GlobalFilters] Request: SessionID=r4QoSAew0PMc8rJ1Rpj1 Id=10351230 Url=/api/vecclient/dimensions/stop
14:42:41.1919 DEBUG [MemoryServerEvents] [SSE-SERVER] Sending vecclient:v003s00-r027@cmd.DimensionStopDto msg to aBg9k004325kdzXCQxw3 on (vecclient:v003s00-r027, allvecclients)
14:42:41.1919 DEBUG [AppHost] [GlobalFilters] Response: SessionID=r4QoSAew0PMc8rJ1Rpj1 Id=10351230 Url=/api/vecclient/dimensions/stop
Our server-side logs show plenty of normal activity going on during that surrounding time period. Requests were still coming in & being processed in a timely fashion.
I’m not sure what other data I can provide that would be useful, but let me know if there is something. It does seem like the network is a trigger for this locked-up state (the network sometimes gets flooded with traffic, or wifi temporarily drops - these are separate issues we’re investigating, but SSE should recover on its own anyway).