SSE problems in Azure

I need some help to get to the bottom of these errors we see constantly with our SSE implementation.
They have been going on for months now, and now need to be erradicated.

We have two regular ServiceStack web services that communicate using SSE. Service A raises certain events to Services B using SSE:

  1. Service A : Publishes SSE events (on several channels)
  2. Service B: subscribes to several of the channels.

ServiceStack version 56.

  • We run both services as Cloud Services in Azure in production.
  • We run both services as Cloud Services in the Azure emulator in development (in IISExpress).
  • We are using the default MemoryServerEvents at present because the RedisServerEvents version causes other problems when run.
  • We currently do not have authentication on, but will need to go there once this problem is understood.

In development, when running on the Azure Emulator environment (in IISExpress), once Service A is started we see this exception every 10-12 seconds.

[iisexpress.exe] 09:58:32 [Error] [] [RecorderLogger.Error] "Error publishing notification to: Crib.Services.Profile.Changed@cmd.onJoin": HttpException { WebEventCode: 0, ErrorCode: -2147023667, Message: "The remote host closed the connection. The error code is 0x800704CD.", Data: [], InnerException: null, TargetSite: Void RaiseCommunicationError(Int32, Boolean), StackTrace: "   at System.Web.Hosting.IIS7WorkerRequest.RaiseCommunicationError(Int32 result, Boolean throwOnDisconnect)
   at System.Web.Hosting.IIS7WorkerRequest.ExplicitFlush()
   at System.Web.HttpResponse.Flush(Boolean finalFlush, Boolean async)
   at System.Web.HttpWriter.WriteFromStream(Byte[] data, Int32 offset, Int32 size)
   at ServiceStack.ServerEventsFeature.<.ctor>b__0(IResponse res, String frame)
   at ServiceStack.EventSubscription.Publish(String selector, String message)", HelpLink: null, Source: "System.Web", HResult: -2147023667 }

Where Crib.Services.Profile.Changed is one of the names of one of the channels we publish to.

We see this exception raised in Service B: (again every 10-12 seconds) Presumably they are related?

[iisexpress.exe] 09:58:32 [Error] [] [RecorderLogger.Error] "[SSE-CLIENT] OnExceptionReceived: Unable to connect to the remote server on #user1": WebException { Status: ConnectFailure, Response: null, Message: "Unable to connect to the remote server", Data: [], InnerException: SocketException { ErrorCode: 10061, Message: "No connection could be made because the target machine actively refused it 127.0.0.1:4433", SocketErrorCode: ConnectionRefused, NativeErrorCode: 10061, Data: [], InnerException: null, TargetSite: Void EndConnect(System.IAsyncResult), StackTrace: "   at System.Net.Sockets.Socket.EndConnect(IAsyncResult asyncResult)
   at System.Net.ServicePoint.ConnectSocketInternal(Boolean connectFailure, Socket s4, Socket s6, Socket& socket, IPAddress& address, ConnectSocketState state, IAsyncResult asyncResult, Exception& exception)", HelpLink: null, Source: "System", HResult: -2147467259 }, TargetSite: System.Net.WebResponse EndGetResponse(System.IAsyncResult), StackTrace: "   at System.Net.HttpWebRequest.EndGetResponse(IAsyncResult asyncResult)
   at ServiceStack.HttpUtils.<>c__DisplayClass16.<GetResponseAsync>b__14(IAsyncResult iar)", HelpLink: null, Source: "System", HResult: -2146233079 }

In Service A, we are raising the SSE event like this during a service PUT call of one of its services:

ServerEvents.NotifyChannel("Crib.Services.Verification.Changed", new CreateUpdateVerificationEvent());

In Service B, we are listening for this event with a ServerEventsClient like this, using an instance of a GlobalReceiver. That is configured once in the AppHost.Configure().

            var baseUrl = "https://localhost:4433/api"; //URL of Service A
            this.serverEventsClient = new ServerEventsClient(baseUrl, Channels.All);
            this.serverEventsClient.RegisterReceiver<GlobalReceiver>();

            container.RegisterAutoWiredTypes(this.serverEventsClient.ReceiverTypes);
            this.serverEventsClient.Resolver = container;

            this.serverEventsClient.Start();

The GlobaReciever looks like this:

    internal class GlobalReceiver : ServerEventReceiver
    {
        public void UpdateVerification(UpdateVerificationEvent verification)
        {
            try
            {
                 // Doing something useful with the event here             
            }
            catch (Exception ex)
            {
                // Log the exception
            }
        }

        public override void NoSuchMethod(string selector, object message)
        {
            base.NoSuchMethod(selector, message);

                // Log the exception
        }
    }

The assumptions with this setup are:

  • The SSE Server (IServerEvents) can publish during any web service request Service A handles.
  • The SSEClient (ServerEventsClient) will always be alive and listening to its subscribed channels, since its instance is created in the AppHost.Configure() method of Service B.

What could be going wrong here?

“Unable to connect to the remote server” error message suggests that the remote connection has dropped. So I’d be looking into why the long-lived connections would be dropping. Does the connection routinely drops when running on IIS outside of the Azure emulator? Maybe Azure imposes some additional request timeout limits if it happens consistently, here’s some info on how you can increase Request Timeouts in ASP.NET, although not familiar if Azure has some configuration elsewhere, e.g. if you’re going through an Azure load balancer they have some additional configuration to increase Request limits - some of SignalR’s non-default configuration may also help.

Does the Server Events Client auto-reconnect when this happens?

Thanks Mythz,

Well, we have full end-to-end environment integration tests (Azure Emulator + IISExpress) that prove that SSE events are getting through to the SSE client (despite these exceptions every 10secs).
So, from that evidence I presume that the auto-reconnect is happening to make that work? right?

I will investigate the possible IISExpress timeouts.

One question, these exceptions are raised both client and server, as we can see from the traces above. Would be good if I can catch these exceptions in our code explicitly, to ensure we are recording them in production.
We are currently not seeing these traces at the moment in production.
Where could I catch these exceptions in our code?
As you can see we are using a class called RecorderLogger which is an ILog (configured by LogManager.LogFactory = new RecorderLogFactory() before AppHost.Init()) that redirects to our logging framework. I am guessing we are seeing traces from ILog?

I would need to capture and record these traces as exceptions to confirm this is not happening in production.

Ah, not to worry, I can simply record an exception in ILog.Fatal() and ILog.Error() methods as well as tracing it.

ok I’ve also just added a new OnError callback in this commit which will let you handle Server SSE Exceptions, e.g:

Plugins.Add(new ServerEventsFeature { 
    OnError = (sub,ex) => { ... }
}

Let me know if you want to make use of it now and I’ll publish it to MyGet.

Thanks, It might be useful, but right now I have changed our implementation of ILog to raise the exception (rather than just tracing it, which it was doing). I’ll go with that for now. No need for you to commit that change just for me.

OK, well I tried all the solutions you recommended Increase Request Timeouts in ASP.NET nothing mattered there.

Still getting these exceptions on both ends.
Verifying we are seeing the same in production

Don’t know, I’m only shooting in the dark, are you running through public ips / Azure load balancer? as there’s some additional configuration to change in the comment above. Otherwise can you capture the HTTP traffic and see if there is anything weird going on before the connection drops like if the heartbeats are failing?

OK, we are not seeing this problem in Azure in Production.
We were not using any special load balancer anyway, or any special networking IP’s etc. Just a bog-ordinary Cloud Service.

However, it seems like it’s only a local Azure Emulator problem using IISExpress.

1 Like