Callback missing randomly

Riccardo.Franchi · December 9, 2022, 3:21pm

Good morning,
We have found a big issue using your libraries.

We are using your libraries both by direct call and by callback. We are using the callbacks via NotifySubscription method. It works properly except sometimes we stop receiving them. To receive the callback again we need to reconnect.
We tried to send a callback every ten seconds after a subscription, and it runs for a while then it stops with no apparent reason. It happens also when client and server are running on the same machine so we can be sure that is not a network issue.
We don’t get any errors when it happens even if we activate your debug log. The only communications we get are that the OnUnsubscribe event of ServerEventsFeature plugin is triggered but we never called it.

It is not a network problem.
We need to solve this problem asap because we are getting big issues with our customers.
Please give us support asap.

Thanks in advance,
Riccardo

mythz · December 9, 2022, 4:28pm

We would need a repro we can run locally to be able to investigate any integration issues like this. If you can put a standalone project on GitHub with the issue I can investigate.

Riccardo.Franchi · December 12, 2022, 5:05pm

Is it very difficult for us to reproduce all the conditions that could influence the behavior. We prefer keep the standalone solution as last chance. Meanwhile could you please explain us why we see the following log events:

Normal behavior:

SERVER-SIDE: DEBUG: [SSE-SERVER] Sending cmd.CallbackMessage msg to hSO23MuMxOiA4z2gT6CX on ()*
CLIENT-SIDE: Received msg on channel ‘’ on cmd.CallbackMessage {“Model”:[…]

Wrong behavior:

SERVER_SIDE: DEBUG: [SSE-SERVER] Sending cmd.CallbackMessage msg to hSO23MuMxOiA4z2gT6CX on ()*
SERVER-SIDE: DEBUG: [SSE-SERVER] Expired cmd.CallbackMessage Sub hSO23MuMxOiA4z2gT6CX on ()*
SERVER-SIDE: ERROR: Subscription hSO23MuMxOiA4z2gT6CX does not exist

Message: Subscription hSO23MuMxOiA4z2gT6CX does not exist

Source: ServiceStack

Target site: Void MoveNext()

Stack trace: in ServiceStack.ServerEventsUnRegisterService.d__7.MoveNext() in /home/runner/work/ServiceStack/ServiceStack/ServiceStack/src/ServiceStack/ServerEventsFeature.cs:riga 502
— Fine traccia dello stack da posizione precedente dove è stata generata l’eccezione —

in System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()*
in System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)*
in ServiceStack.Host.ServiceRunner`1.d__15.MoveNext() in /home/runner/work/ServiceStack/ServiceStack/ServiceStack/src/ServiceStack/Host/ServiceRunner.cs:riga 150*

The wrong behavior happens rarely when the client is running on the same machine of the server (loopback) but it happens quite often when the client is running on another machine in the same local server network.

We can exclude a physical network issues because it happens also on the same machine.

What could be the cause of the problem? Is there any way to solve it?

Thank you very much.

mythz · December 12, 2022, 5:43pm

“Subscription does not exist” can happen after the server or client terminates the connection, the client should attempt to reconnect after that happens.

Line 502 indicates the connection was explicitly unregistered by the client which it does after it closes/disposes the connection or the page unloads.

Long running SSE connections can fail which the server uses a heartbeat to detect, when it times out the sever removes the subscription and any attempts to access by the client results in the Subscription does not exist error. You can adjust the HeartbeatInterval and IdleTimeout used to implement heartbeats when registering the ServerEventsFeature plugin.

Riccardo.Franchi · December 13, 2022, 8:48am

“Long running SSE connections can fail which the server uses a heartbeat to detect”, in what cases can they fail?
Because it is exactly what happens to us and we don’t expect it on a local network (or worst, on the same machine).

Thanks for your quick reply.

mythz · December 13, 2022, 9:01am

I’ve already covered most of this in my previous answer: when clients close/disposes the connection, the web page unloads, the server heartbeat fails within the specified interval, the client or server terminates the TCP connection, client/server request timeouts are exceeded, etc. HTTP network connections aren’t persistent, they’ll get dropped eventually.

You’ve linked to a StackTrace showing the client explicitly closed the connection. I’d recommend focusing on creating a repro so you can investigate when and why it’s happening.

Riccardo.Franchi · December 29, 2022, 9:47am

Hi Mythz,
thanks for your suggestions.

As far as we understand, if the heartbeat is lost the client connection will be reset and one of the reasons for the loss of the heartbeat could be a wrong heartbeat timeout and idle timeout set.
We are experimenting this kind of issue with 10 seconds of heartbeat timeout and 30 seconds of idle timeout. It works fine for a while then it randomly drop.

Thinking about what you wrote last time, just as test we set 1 day of hearbeat and 3 days of idle timeout and we check if we receive a callback in a given period of time. It works better but it sounds to us as a workaround and we are looking for a correct implementation.

Our clients have the following features:

we have always only one callback active at time
inside that callback we send several kind of different serialized model
our packages have dimensions from a few KB to a few MB maximum
we can have from one to a few dozen packets per second

Can you give us suggestions of timeout values (or general advice) to avoid making a workaround?

Thank you very much,
Riccardo

mythz · December 29, 2022, 10:34am

My recommendation is always going to be to create a repro so you can identify the problem and what’s dropping the connection which is going to be a prerequisite for finding an effective solution. Otherwise you’re going to have to resort to tweaking the timeout parameters to whatever suits your environment, it sounds like disabling Heartbeats helps, so I would be investigating if all the heartbeats are being sent within the expected interval and that they’re being received successfully.

I wouldn’t change the HeartbeatInterval as the issue sounds like they’re not being routinely received before IdleTimeout elapses, changing IdleTimeout is what controls how long the Server should wait to receive a heartbeat before the connection is considered to be dropped. Essentially I would double the timeout until it’s satisfactory, i.e. from 30s to 60s, 120s, 240s, etc.

Also I would highly recommend only using the SSE connection to send “events” (i.e. small payloads), and use a separate HTTP Client to send larger payloads.

Riccardo.Franchi · December 29, 2022, 11:56am

Hi,
Thanks for your quick and useful reply.

We investigated through break points and other tools and we saw that sometimes the heartbeat packet is not lost but it is not send at all because something is locked internally inside the ServiceStack library. We are still looking for a cause but we cannot find anything in our code or in some external conditions that can explain that odd behaviour.
Do you have any idea?

Meanwhile we noticed that sometimes the issue is caused by both big dimensions and high frequency of the packets sent via callback so we will try to follow your recommendation to use SSE only to send small payloads and a separate HTTP for lager payloads.
About this argument, could you please tell me if there are some documentation about the limit of the payloads we can use in callback? Or, generally, something about the guide lines we should use to implement a large payload communication?

Thanks again,
Riccardo

mythz · December 29, 2022, 12:35pm

My recommendation is always going to be to first try to create a repro you can run to reliably repro the issue so you can better identify the problem and what’s dropping the connection which is going to be a prerequisite for finding an effective solution. Otherwise you’re going to have to resort to blindly tweaking timeout parameters to whatever suits your environment, it sounds like disabling Heartbeat detection helps alleviate the dropped connections, so I would be investigating whether all the heartbeats are being sent within the expected interval and if they’re being received successfully.

I wouldn’t change the HeartbeatInterval as the issue sounds like they’re not being routinely received before IdleTimeout elapses, changing IdleTimeout is what controls how long the Server should wait to receive a heartbeat before the connection is considered to be dropped. Essentially I would double the timeout until it’s satisfactory, i.e. from 30s to 60s, 120s, 240s, etc.

Also I would recommend only using the SSE connection to send “events” (i.e. small payloads), and use a separate HTTP Client to send larger payloads.