SSE with Redis cache is not recovering from connection error

Hi guys, I wonder if anyone can be so kind and help me out a bit with a SSE problem I’ve been tossing my hair about for the past few days.

We got a working SSE solution setup with ServiceStack 5.7.0 for .Net 4.7.2 and Redis cache, the setup looks like this:

Backend:

private void AddServerEventsFeature()
{
    var se = new ServerEventsFeature
        {
            OnCreated = this.OnCreated,
            OnSubscribe = this.OnSubscribe, //Just a debug log in this method
            OnPublish = (subscription, response, whatever) =>
            {
                Debug.WriteLine(whatever);
            },
            HeartbeatInterval = TimeSpan.FromSeconds(10),
            IdleTimeout = TimeSpan.FromMinutes(20),
            LimitToAuthenticatedUsers = true,
            NotifyChannelOfSubscriptions = false
        };

    AppHost.Plugins.Add(se);
}

private void OnCreated(IEventSubscription eventSub, IRequest request)
{
    var session = (MyUserSession)request.GetSession();

    if (!session.IsAuthenticated)
    {
        throw new HttpError(HttpStatusCode.Unauthorized);
    }

    session.EventsSubscriptionId = eventSub.SubscriptionId;
    request.SaveSession(session);
    // In case a new subscription is established
    // use any existing channels in the session.
    eventSub.UpdateChannels(session.Channels ?? new string[0]);
}

Web js SSE initialization:

	this.eventSource = new EventSource('/api/event-stream');
	this.eventSource.addEventListener('message', this.handleServerSentEvent);

We can see the traffic passing through as it should in OnPublish and OnSubscribe, all is fine and the UI is responsive.

Problem is, when we are simulating an interrupt in the Redis cache communication SSE does not recover from the interrupt. When testing the solution we are using a Docker enabled Redis instance running locally which is working fine, and when we turn it off and on again to simulate the interrupt OnPublish is not triggered anymore and a complete restart of the application is required to get it up and running properly again.

When I turn Redis off it blows up at task.Wait(), in my case at next heartbeat or submitted event when a RedisResponseException “Zero length response” is thrown, followed by with a “SocketException: No connection could be made because the target machine actively refused it 127.0.0.1:6379”, which is expected when Redis is down:

void IHttpAsyncHandler.EndProcessRequest(IAsyncResult result)
{
    var task = (Task)result;

    task.Wait(); // avoid an exception being thrown on the finalizer thread.

}

However, after the Redis instance is up and running again, I can see that a new subscription is made through OnSubscribe and I guess that is internal ServiceStack behavior, but no more events or heartbeats are being passed anymore, calls are made but none of them seems to be processed.

Even if I reload the page again (it’s a SPA) and retrigger the SSE initialization and it’s preceding event subscriptions, no heartbeats or events are handled by the SSE anymore and it seems like the only way of getting it to work again is to restart the application.

The RedisResponseException is also delivered by the routing after we navigate to another page and that is the most strange part of it all. It does not matter which route we use we still get the RedisResponseException attached to that route, when I feel it should either just be thrown by the SSE API instead of hanging on to the next request?

    AppHost.UncaughtExceptionHandlers.Add(HandleUnhandledException);

    …

    private void HandleUnhandledException(IRequest request, IResponse response, string operationName, Exception exception)
    {
    	/*
    The Redis Exception is caught here but only after navigating to other route, which route does not matter, next page navigation will throw the Redis exception which I expected should rather be returned by any of the SSE endpoints? 

    */
    }

So, any help pointing me in the way of solving this issue would be great since I’ve been searching for answers a lot lately and feel I’m stuck here. It definitely seems like I’m missing an important step here or maybe something is not set up correctly, any ideas how to proceed with this matter?

I’d first look at enabling debug logging and look to see if it shows any issues with reconnecting.

The next step is to create a minimal stand-alone repro (and publish on GitHub) that we can run locally to observe the behavior.

As for the SSE going silent after a connection failure I’ve managed to simulate it in a simple project that can be found here:

The setup is quite similar to how our application is set up, however I might be missing something on how to deal with connection failures

I send a few messages, all is ok
I kill Redis and send a few other messages which will cause an exception
I start Redis again
I send a few messages, no response
I reload the page and send a few messages, still no response

Firstly you shouldn’t be changing the default to ReuseScope.Request which is the most unpredictable scope requiring different backing implementations in different hosts, inc. test projects. It’s also less efficient for pooled resources, it obviously should never be used for singleton instances and as it’s also more efficient to store any per-request data in IRequest.Items, there’s basically no reason to use Request scope, it definitely shouldn’t be the default which is going to conflict with all the docs relying on the default singleton scope.

//Container.DefaultReuse = ReuseScope.Request; // never do this

E.g. the RedisManagerPool and RedisServerEvents are singleton dependencies that are required to be registered as singletons.

Next you need to use one of our Server Event Clients which issues periodic heartbeats and include auto connection retry when it detects a failed SSE connection. For web pages you can either use the ss-utils.js JavaScript client or TypeScript Client.

As you’re already using jQuery it’s easiest to use ss-utils.js.

Please see Run side-by-side with another Framework for how to properly host ServiceStack on a custom path, copy the Web.config section from the above docs and make sure to specify the custom path in HandlerFactoryPath:

public override void Configure(Container container)
{
    SetConfig(new HostConfig { HandlerFactoryPath = "api" });
}

After you’ve updated your config you can reference the built-in ss-utils.js like:

<script src="/api/js/ss-utils.js"></script>
<script src="/js/default.js"></script>

Your subscribe() function is unnecessary as the channels are subscribed to when establishing the SSE connection so change it to just initializing the SSE connection instead:

<body onload="initSSE();">

Then rewrite it to use ss-utils.js to manage the connection, e.g:

function initSSE() {
    if (!eventSource) {
        try {
            eventSource = new EventSource('/api/event-stream?channels=mychannel&t=' + new Date().getTime());
            $(eventSource).handleServerEvents({
                handlers: {
                    onConnect: function(e) {
                        console.log('onConnect',e);
                    },
                    onMessage: function (e) {
                        console.log('onMessage',e);
                        addMessage(e);
                    }
                }
            });
        }
        catch(ex)
        {
            addMessage("ERROR: initSSE");
            addMessage("ERROR: " + ex);
        }
    }
    return true;
}

I’ve also changed your addMessage() to output serialize object as JSON when it’s not logging a string:

function addMessage(message) {
    if (typeof message != 'string')
        message = JSON.stringify(message);
//...
}

I did detect an unhandled failure connection in the underlying RedisPubSubServer Redis SSE uses which should now be resolved in the latest v5.8.1 that’s now available on MyGet.

Please test with the latest v5.8.1 and let me know if you’re still able to repro the issue.

A thousand thanks for taking your time and giving me a very detailed and informative answer

After a few tests I’ve verified SSE is working as expected now and the solution was upgrading to 5.8.1

As a final question, when will the 5.8.1 fix be generally available?

The reason I ask is that we have quite an extensive use of ServiceStack in our system and it is really large and business critical so we are a bit weary about upgrades that hasnt been extensilvely tested

The next major release on NuGet will be v5.9 which won’t be for a while (there’s never planned release dates for any version) so you’ll need to use the prerelease version on MyGet to get the changes before then. The packages are built using the Same CI and test suite as the official packages only that they’re published to MyGet instead of NuGet. So the NuGet versions aren’t better tested, they go through the same tests.

The only issue is if you download v5.8.1 packages at different times where you could end up with dirty versions built at different times which will throw a Method or Type does not exist Exception, the solution is just to clear your NuGet packages cache which will force redownloading the latest version.