Hi Demis
Sorry I am still struggling with this. I understand how these things work on LINUX. I just learned that the settings in the sentinel.conf
files seem to be critical and it depends on these values, if I receive the events in my service or not.
What I do NOT understand is the following:
(please remember I have dbinfra1 (172.16.63.51), dbinfra2 (172.16.63.52) and dbinfra3 (172.16.63.53) CENTOS VMs which run the Redis ReplicaSet. dbinfra1 is the MASTER. **I stopped the master and redis decided to promote dbinfra3 as new master) This is the starting point where all the mess begins!
So lets show the details step-by-step:
I SSH to the dbinfra3 server and start the redis-cli
and issue the command info replication
. It returns:
127.0.0.1:6387> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=172.16.63.52,port=6387,state=online,offset=721238,lag=0
master_replid:87d100c193086907184dc0d35740919e12144d81
master_replid2:2528c29c5a58530615842f706b21158efef67ebe
master_repl_offset:721238
second_repl_offset:81834
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:721238
127.0.0.1:6387>
This shows that its ROLE is MASTER and it has ONE SLAVE: 172.16.63.52 (dbinfra2)
Now in my C# code I construct the RedisSentinel
as follows:
try
{
var sentinel = new RedisSentinel(sentinelHosts, masterName: replSetName)
{
OnFailover = manager =>
{
Logger.Debug($$"ONFAILOVER event received from {replSetName} ...");
},
OnWorkerError = ex =>
{
Logger.Debug($$"ONWORKERERROR event received from {replSetName}. Error: {ex.GetAllExceptions()}");
},
OnSentinelMessageReceived = (channel, msg) =>
{
Logger.Debug($$"ONSENTINELMESSAGERECEIVED event received from {replSetName}. Message: '{msg}' " +
$$"Channel '{channel}'...");
},
};
sentinel.HostFilter = host => $$"{host}?Db={BbConsumerConstants.RedisProdDb}&RetryTimeout=2000" +
$$"&password={HttpUtility.UrlEncode(pwdRedisConfig)}";
sentinel.RefreshSentinelHostsAfter = TimeSpan.FromSeconds(60);
IRedisClientsManager redisManager = sentinel.Start();
container.Register(c => new SysConfigRepository(redisManager));
}
catch (Exception e)
{
Logger.Error($$"Failed to create, start and register the Redis config database Sentinel server. Error: {e.GetAllExceptions()}");
throw;
}
The data of sentinelHosts
is "dbinfra3.tbhome.int:26387","dbinfra2.tbhome.int:26387","dbinfra1.tbhome.int:26387"
in exactly this order. The value of replSetName
is redis_6387_config
.
The BAD THING starts with sentinel.Start()
:
It seems to get an exception but I cannot catch it! I only see the following in the log:
2019-06-13 10:38:54.257 +02:00 [ERR] [ServiceStack.Redis.RedisNativeClient] [ThreadId: 1] #3 Could not connect to redis Instance at dbinfra1.tbhome.int:26387
2019-06-13 10:38:56.607 +02:00 [ERR] [ServiceStack.Redis.RedisNativeClient] [ThreadId: 1] #3 SocketException in SendReceive, retrying...
System.Net.Internals.SocketExceptionFactory+ExtendedSocketException (113): No route to host 172.16.63.51:26387
at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress)
WHY DOES IT STILL TRY DBIFRA1 (172.16.63.51)??? That is the server I have STOPPED and as you can see above from the redis shell, the server DBINFRA3 with IP 172.16.63.53 is the new master.
So where does your library get the information from to connect to 172.16.63.51???
I tried to track this down a bit in your source:
I found GetActiveSentinelHosts in the Start() method.
There you loop through the machine names I passed to the constructor. For each machine you create a SentinelWorker. What happens if you use the server which is down?
Then you call GetSentinelHosts. It would be interesting to see what it returns, only the two currently active hosts or all members defined for the ReplicaSet?
Finally you seem to start this worker thread (GetValidSentinelWorker) which loops through the machines. Don’t know if this is ever reached in my case…
I can only state (see my log output above) that it only tries the server which is stopped and it seems to NEVER try the other two machines.
Any help would be greatly appreciated!