I run a replica set with three nodes and a quorum of two. I am testing failovers and they do not work with ServiceStack.
The machines I run are:
- dbinfra1 (master) IP 172.16.63.51
- dbinfra2 (slave) IP 172.16.63.52
- dbinfra3 (slave) IP 172.16.63.53
Then I shutdown serverdbinfra1
. The sentinel logs show, that the master was switched correctly:
18940:X 12 Jun 2019 09:56:32.693 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
18940:X 12 Jun 2019 09:56:32.693 # Redis version=5.0.5, bits=64, commit=00000000, modified=0, pid=18940, just started
18940:X 12 Jun 2019 09:56:32.693 # Configuration loaded
18940:X 12 Jun 2019 09:56:32.693 * supervised by systemd, will signal readiness
18940:X 12 Jun 2019 09:56:32.699 * Running mode=sentinel, port=26387.
18940:X 12 Jun 2019 09:56:32.705 # Sentinel ID is bf4bde3c8a398a84e61b6a2414a98b46b9dd94b1
18940:X 12 Jun 2019 09:56:32.705 # +monitor master redis_6387_config 172.16.63.51 6387 quorum 2
18940:X 12 Jun 2019 09:56:32.707 * +slave slave 172.16.63.53:6387 172.16.63.53 6387 @ redis_6387_config 172.16.63.51 6387
18940:X 12 Jun 2019 09:56:32.715 * +slave slave 172.16.63.52:6387 172.16.63.52 6387 @ redis_6387_config 172.16.63.51 6387
18940:X 12 Jun 2019 09:56:34.571 * +sentinel sentinel d69802fb258357b3c56b29d0f79f35c6f78eb85c 172.16.63.51 26387 @ redis_6387_config 172.16.63.51 6387
18940:X 12 Jun 2019 09:56:34.790 * +sentinel sentinel 1aa056b49df89874c3b01319e48fa0c5726eee8d 172.16.63.53 26387 @ redis_6387_config 172.16.63.51 6387
18940:X 12 Jun 2019 10:14:30.468 # +sdown sentinel d69802fb258357b3c56b29d0f79f35c6f78eb85c 172.16.63.51 26387 @ redis_6387_config 172.16.63.51 6387
18940:X 12 Jun 2019 10:14:30.731 # +sdown master redis_6387_config 172.16.63.51 6387
18940:X 12 Jun 2019 10:14:30.815 # +odown master redis_6387_config 172.16.63.51 6387 #quorum 2/2
18940:X 12 Jun 2019 10:14:30.816 # +new-epoch 1
18940:X 12 Jun 2019 10:14:30.816 # +try-failover master redis_6387_config 172.16.63.51 6387
18940:X 12 Jun 2019 10:14:30.825 # +vote-for-leader bf4bde3c8a398a84e61b6a2414a98b46b9dd94b1 1
18940:X 12 Jun 2019 10:14:30.840 # 1aa056b49df89874c3b01319e48fa0c5726eee8d voted for bf4bde3c8a398a84e61b6a2414a98b46b9dd94b1 1
18940:X 12 Jun 2019 10:14:30.909 # +elected-leader master redis_6387_config 172.16.63.51 6387
18940:X 12 Jun 2019 10:14:30.909 # +failover-state-select-slave master redis_6387_config 172.16.63.51 6387
18940:X 12 Jun 2019 10:14:30.992 # +selected-slave slave 172.16.63.52:6387 172.16.63.52 6387 @ redis_6387_config 172.16.63.51 6387
18940:X 12 Jun 2019 10:14:30.992 * +failover-state-send-slaveof-noone slave 172.16.63.52:6387 172.16.63.52 6387 @ redis_6387_config 172.16.63.51 6387
18940:X 12 Jun 2019 10:14:31.064 * +failover-state-wait-promotion slave 172.16.63.52:6387 172.16.63.52 6387 @ redis_6387_config 172.16.63.51 6387
18940:X 12 Jun 2019 10:14:31.845 # +promoted-slave slave 172.16.63.52:6387 172.16.63.52 6387 @ redis_6387_config 172.16.63.51 6387
18940:X 12 Jun 2019 10:14:31.845 # +failover-state-reconf-slaves master redis_6387_config 172.16.63.51 6387
18940:X 12 Jun 2019 10:14:31.911 * +slave-reconf-sent slave 172.16.63.53:6387 172.16.63.53 6387 @ redis_6387_config 172.16.63.51 6387
18940:X 12 Jun 2019 10:14:32.944 * +slave-reconf-inprog slave 172.16.63.53:6387 172.16.63.53 6387 @ redis_6387_config 172.16.63.51 6387
18940:X 12 Jun 2019 10:14:32.944 * +slave-reconf-done slave 172.16.63.53:6387 172.16.63.53 6387 @ redis_6387_config 172.16.63.51 6387
18940:X 12 Jun 2019 10:14:33.034 # -odown master redis_6387_config 172.16.63.51 6387
18940:X 12 Jun 2019 10:14:33.034 # +failover-end master redis_6387_config 172.16.63.51 6387
18940:X 12 Jun 2019 10:14:33.034 # +switch-master redis_6387_config 172.16.63.51 6387 172.16.63.52 6387
As you can see at 10:14:30.468 it realizes that the master is down and started election and promotion of a new master. This process ended 2.5 seconds later by finished promoting 172.16.63.52 to the new Master.
Unfortunately servicestack.redis does not recognise this for whatever reasons. It tries the old address 172.16.63.51 FOREVER and never tries another server of the array. Here are a few log extracts which illustrate the problem:
2019-06-12 10:14:30.606 +02:00 [ERR] [ServiceStack.Redis.RedisNativeClient] [ThreadId: 10] n/a n/a (n/a) #18 Could not connect to redis Instance at dbinfra1.tbhome.int:26385
2019-06-12 10:14:30.607 +02:00 [ERR] [ServiceStack.Redis.RedisNativeClient] [ThreadId: 6] n/a n/a (n/a) #4 Could not connect to redis Instance at dbinfra1.tbhome.int:26387
2019-06-12 10:14:30.611 +02:00 [ERR] [ServiceStack.Redis.RedisNativeClient] [ThreadId: 6] n/a n/a (n/a) #4 SocketException in SendReceive, retrying...
System.Net.Internals.SocketExceptionFactory+ExtendedSocketException (111): Connection refused 172.16.63.51:26387
at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress)
at System.Net.Sockets.Socket.Connect(EndPoint remoteEP)
at System.Net.Sockets.Socket.Connect(IPAddress address, Int32 port)
at ServiceStack.Redis.RedisNativeClient.Connect() in C:\BuildAgent\work\b2a0bfe2b1c9a118\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 100
at ServiceStack.Redis.RedisNativeClient.Reconnect() in C:\BuildAgent\work\b2a0bfe2b1c9a118\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 321
at ServiceStack.Redis.RedisNativeClient.TryConnectIfNeeded() in C:\BuildAgent\work\b2a0bfe2b1c9a118\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 302
at ServiceStack.Redis.RedisNativeClient.SendReceive[T](Byte[][] cmdWithBinaryArgs, Func`1 fn, Action`1 completePipelineFn, Boolean sendWithoutRead) in C:\BuildAgent\work\b2a0bfe2b1c9a118\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 574
2019-06-12 10:14:30.613 +02:00 [ERR] [ServiceStack.Redis.RedisNativeClient] [ThreadId: 10] n/a n/a (n/a) #18 SocketException in SendReceive, retrying...
System.Net.Internals.SocketExceptionFactory+ExtendedSocketException (111): Connection refused 172.16.63.51:26385
at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress)
at System.Net.Sockets.Socket.Connect(EndPoint remoteEP)
at System.Net.Sockets.Socket.Connect(IPAddress address, Int32 port)
at ServiceStack.Redis.RedisNativeClient.Connect() in C:\BuildAgent\work\b2a0bfe2b1c9a118\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 100
at ServiceStack.Redis.RedisNativeClient.Reconnect() in C:\BuildAgent\work\b2a0bfe2b1c9a118\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 321
at ServiceStack.Redis.RedisNativeClient.TryConnectIfNeeded() in C:\BuildAgent\work\b2a0bfe2b1c9a118\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 302
at ServiceStack.Redis.RedisNativeClient.SendReceive[T](Byte[][] cmdWithBinaryArgs, Func`1 fn, Action`1 completePipelineFn, Boolean sendWithoutRead) in C:\BuildAgent\work\b2a0bfe2b1c9a118\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 574
2019-06-12 10:14:30.613 +02:00 [ERR] [ServiceStack.Redis.RedisNativeClient] [ThreadId: 7] n/a n/a (n/a) #12 Could not connect to redis Instance at dbinfra1.tbhome.int:26386
2019-06-12 10:14:30.655 +02:00 [ERR] [ServiceStack.Redis.RedisNativeClient] [ThreadId: 7] n/a n/a (n/a) #12 SocketException in SendReceive, retrying...
System.Net.Internals.SocketExceptionFactory+ExtendedSocketException (111): Connection refused 172.16.63.51:26386
at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress)
at System.Net.Sockets.Socket.Connect(EndPoint remoteEP)
at System.Net.Sockets.Socket.Connect(IPAddress address, Int32 port)
at ServiceStack.Redis.RedisNativeClient.Connect() in C:\BuildAgent\work\b2a0bfe2b1c9a118\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 100
at ServiceStack.Redis.RedisNativeClient.Reconnect() in C:\BuildAgent\work\b2a0bfe2b1c9a118\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 321
at ServiceStack.Redis.RedisNativeClient.TryConnectIfNeeded() in C:\BuildAgent\work\b2a0bfe2b1c9a118\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 302
at ServiceStack.Redis.RedisNativeClient.SendReceive[T](Byte[][] cmdWithBinaryArgs, Func`1 fn, Action`1 completePipelineFn, Boolean sendWithoutRead) in C:\BuildAgent\work\b2a0bfe2b1c9a118\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 574
Do not be confused about the different ports, I run six Redis databases on the same machine, each with a different IP Port and each with its own senitel wich runs at port + 20000.
My server (which is a redis client) runs as a docker container. Even if I restart that container, it does NOT switch and simply seems to try the first server supplied as an array of replicaset members. Why does it not try the other replicaset members?
Here is some of my code, that configures the replicaset in my C# code:
// In my docker file I pass some of the parameters for the server program. A sample for one Redis database looks like this
//"--rediscacheservers", "dbinfra1.tbhome.int:dbinfra2.tbhome.int:dbinfra3.tbhome.int", \
//"--rediscachedbreplicasetname", "redis_6385_cache", \
//"--rediscacheport", "6385", \
// When bootstrapping my server I do
if (redisCfgServer.Contains(':')) // we have a replica set
{
var replSetName = AppSettings.Get<string>("RedisCfgDbReplicaSetName");
var redisConfigServerArray = redisCfgServer.Split(':', StringSplitOptions.RemoveEmptyEntries);
for (var i = 0; i < redisConfigServerArray.Length; i++)
{
var server = redisConfigServerArray[i];
server = $"{server}:2{redisCfgDbPort}"; //sentinel ports are by default Redis port + 20000
redisConfigServerArray[i] = server;
}
var sentinelHosts = redisConfigServerArray;
try
{
var sentinel = new RedisSentinel(sentinelHosts, masterName: replSetName);
sentinel.HostFilter = host => $$"{host}?Db={BbConsumerConstants.RedisProdDb}&RetryTimeout=5000" +
$$"&password={HttpUtility.UrlEncode(pwdRedisConfig)}";
IRedisClientsManager redisManager = sentinel.Start();
container.Register(c => new SysConfigRepository(redisManager));
}
catch (Exception e)
{
// some logging
throw;
}
}
else // standalone setup
...
Any idea what is going wrong here??