Heartbeat Request Never Finishes

Hello mythz,

I noticed situation when request to /event-heartbeat never finishes. Below is the trace:

2017-07-06 00:15:05,277 DEBUG eStack.ServerEventsClient - [SSE-CLIENT] Prep for Heartbeat…
2017-07-06 00:15:05,286 DEBUG eStack.ServerEventsClient - [SSE-CLIENT] Sending Heartbeat…
2017-07-06 00:15:05,358 DEBUG eStack.ServerEventsClient - [SSE-CLIENT] Heartbeat sent to: https://myserverurl/event-heartbeat?id=1xq9QNIVyPGpjRbmHpVo
2017-07-06 00:15:15,380 DEBUG eStack.ServerEventsClient - [SSE-CLIENT] Prep for Heartbeat…
2017-07-06 00:15:15,389 DEBUG eStack.ServerEventsClient - [SSE-CLIENT] Sending Heartbeat…

and that’s it… Since the call to ConnectionInfo.HeartbeatUrl.GetStringFromUrlAsync never completes, the heartbeat process stops. I don’t see any other HTTP activity on the client at this time. Nothing sets the IsCancellationRequested either.

Below is the corresponding IIS Log trace for last heartbeat:

2017-07-06 05:15:12 66.240.245.170 GET /event-heartbeat id=1xq9QNIVyPGpjRbmHpVo 443 - XX.XX.XX.XX - - 200 0 1236 7016

This trace shows sc-win32-status code of 1236 and time taken of 7016ms (usually 100-200ms).
This status code means ERROR_CONNECTION_ABORTED.

We have approximately 130 clients connected at this time with 10 or so experiencing this issue very often. This issue may be related to network quality as those clients frequently experience heartbeat timeouts and other (SSL/DNS) connection issues.

Would it be more reliable to change the heartbeat timer to fire several times instead of once? Perhaps it can adjust the “next” start time after successful heartbeat but keep the time frequency set at heartbeat timeout value.

What do you think?

We definitely don’t want to send more heartbeats, they’re sent too often as it is, but you can adjust how often it’s sent and when to consider the request as broken with the HeartbeatInterval and IdleTimeout properties on ServerEventsFeature, they default to 10s / 30s.

If the connection is aborted it could either be your network is unhealthy and is dropping packets and not acknowledging http/network requests. Another issue that can cause HttpWebRequests to hang is if you use the Web Request elsewhere and aren’t disposing of it properly since .NET uses the same pool under the hood and only allows a few concurrent connections per host which you can adjust with:

ServicePointManager.DefaultConnectionLimit = n;

If that doesn’t help can you make sure all your Request DTOS hav IReturn or IReturnVoid interface marker as it will prevent clients from returning a HttpWebRequest instance that would need to be explicitly disposed as explained in this Thread.

IReturn is used for all the HTTP posts, so we are OK there. The network issues are out of our control as clients are connecting to the public web server over the internet. The SSL requests get terminated / dropped all the time. However, the software must be able to recover from those nicely, right?

I’m not proposing to send more heartbeats. However, it would help if hearbeat timer fires and detects a timeout even if GetStringFromUrlAsync method / task never finishes (hangs or gets killed).

The ServicePointManager.DefaultConnectionLimit is already set to 10.

Thank You!

I’m not proposing to send more heartbeats. However, it would help if hearbeat timer fires and detects a timeout even if GetStringFromUrlAsync method / task never finishes (hangs or gets killed).

It’s concerning if the async HttpWebRequest error callback is never being fired - they shouldn’t be lost in the ether like that so I’d really like to understand what is really happening, if we can repro it and if we’re able to handle all failed requests.

It is very strange indeed.

There is a check for IsCancellationRequested just before the logging in both Success and Error callback methods. I don’t see the flag being set anywhere in ServiceStack or my code. Can you log a message when this flag is set. At least that would either confirm or eliminate the possibility of the cancelation case.

I have no way to replicate this scenario locally. The best I can do is to enable debug tracing on the customer machine and let it run for a day or two.

@smatsevilo Are you using desktop client on .NET or Xamarin? And which library of ServiceStack.Client is used: PCL or full .NET framework?

Also can you share your settings for ServerEventsClient and ServerEventFeature which you are using? It might be some combination of HeartbeatInterval and network delays which produces such results.

@xplicit I’m using .NET client running as Windows service. It is Service Stack version 4.5.8 on full .NET framework.

The ServerEventsClient is using default settings.

The ServerEventsFeature uses the following settings:

LimitToAuthenticatedUsers = true,
NotifyChannelOfSubscriptions = false

There is also OnHeartbeatInit hander that implements sliding session expiration:

private void HandleSseHeartbeatInit(IRequest request)
{
  // Implement sliding session (auto-renew if it is about to expire)
  var session = request.GetSession();
  if (session == null)
  {
    Cookie sessionCookie;
    if (request.Cookies.TryGetValue(Keywords.SessionId, out sessionCookie))
    {
      _logger.DebugFormat("Cannot locate session for ID {0}.", sessionCookie.Value);
    }
    else
    {
      _logger.DebugFormat("Cannot locate session ID cookie in request {0}.", request.RawUrl);
    }
  }
  else
  {
    if (session.IsAuthenticated)
    {
      var cache = request.GetCacheClient();
      var ttl = cache.GetSessionTimeToLive(session.Id);
      if (ttl.HasValue && ttl.Value.TotalSeconds.CompareTo(300) <= 0) // less than 5 minutes to live
      {
        _logger.DebugFormat("Session {0} extended for {1}.", session.Id, _sessionExpiry);
        request.SaveSession(session, _sessionExpiry);
      }
    }
    else
    {
      _logger.DebugFormat("Session {0} is not authenticated. Remote IP: {1}, User Auth ID: {2}.",
        session.Id,
        request.RemoteIp,
        session.UserAuthId);
    }
  }
}

Can you try v4.5.13 from MyGet? I added additional logging to see if operation is cancelled during heartbeat. You can check if the cancellation message appears or not in the logs, this can help to isolate the issue.

I tried various ways to break the execution by adding delays to network interfaces or dropping TCP packets but every time heartbeats were restored after disconnecting from the server. Do you use some other network operation in your code e.g. HttpWebRequest or HttpClient in your code? Or maybe spawning the tasks? I think about the case when thread pool is exhausted and for new async http request we can’t get free thread to execute IO operation.

After looking in many connection failures in the logs I may have a better explanation of what is happening. Below is the sequence of events that occurs very often in the logs:

  1. Connection drops during the post to /event-heartbeat.
  2. Event stream gets dropped as well and causes SSE client Restart().
  3. Restart() calls cancel?.Cancel() which sets cancel.IsCancellationRequested to true.
  4. When Heartbeat() method gets the control back it simply exits without starting new heartbeat.

All this seems to work correctly as designed.

However, there is inconsistency in framework code when calling /event-heartbeat and /event-unregister that causes loss of authentication cookies in subsequent requests.

The heartbeat in Heartbeat() method is called like this:

ConnectionInfo.HeartbeatUrl.GetStringFromUrlAsync(requestFilter: req => {
  var hold = httpReq;
  if (hold != null)
    req.CookieContainer = hold.CookieContainer;
    . . .
})...

which sends the auth cookies to the server and allows us to implement sliding session expiration.

The unregister in InternalStop() method is called without sending the same auth cookies:

try {
  ConnectionInfo.UnRegisterUrl.GetStringFromUrl();
} catch (Exception) {}

This sends some other values for ss-id and ss-pid to the server. ss-opt is not sent at all.

This causes the server to return “401” - unauthorized back to the client and cause another restart.

This creates a problem with subsequent event-stream and event-heartbeat calls that lose authentication and that have to re-authenticate and restart again.

Is there a way to send auth cookies with event-unregister so that it is consistent with other calls?

Thank you!

Hello,

I have not received any response to my previous post.

Is it possible to change the call to event-unregister URL from InternalStop() method so that it includes current authentication cookies?.

Thank you!

Somehow missed your previous post, it’s possible to add this feature, we’re looking into this.

Can you try the latest v4.5.13 from MyGet? Does it help to resolve the issue?