Master promotion not recognized by ServiceStack.Redis

tbednarz · June 24, 2019, 7:09am

Hi Demis, sorry for the long delay, I was busy releasing a new Version of our software…

Does that mean you fully rely on PUB/SUB? If it is correct, what Bill Anderson in early 2016 writes (see this blog post I referenced earlier)

then Redis seams to have lost the information WHO has subscribed to any channel when the primary is shut down. Don’t know if that is still true in 2019…

But the behaviour when the primary is stopped somehow confirms what Bill writes. I can reproduce this same behaviour all the time:

Nodes dbinfra1 … dbinfra5 are up and running.
Node dbinfra1 is the current master
I shutdown the LINUX server dbinfra1
The sentinels immediately recognize this and elect a new master, e.g. it is dbinfra2.
ServiceStack.RedisSentinel reports connection failures on dbinfra1.
I shutdown my container (docker container stop <mycontainername>). So the process running instances of ServiceStack.RedisSentinel is down/stopped/dead.
I restart the container (docker container start <mycontainername>). ServiceStack.RedisSentinel reports again connection errors with dbinfra1!!

WHY IS IS THIS? IT HAS REBOOTED!
WHY DOES IT SELECT DBINFRA1 AGAIN EVEN THOUGH THIS INSTANCE IS STILL SHUT DOWN? WHERE DOES THE MACHINE NAME DBINFRA1 COME FROM? IS IT CACHED SOMEWHERE??
WHY DOES IT NOT TRY THE NEXT IP OF THE ARRAY OF SUBMITTED NODES??

I start the Linux machine dbinfra1 again and all of a sudden, connection errors disappear and all of a sudden ServiceStack.RedisSentine seems to switch to dbinfra2 as new MASTER!!

Other Questions:

If you write misconfiguration then you mean ServiceStack.RedisSentinel or do you mean sentinel.conf on the Linux server??
Is there a possibility to give ServiceStack.RedisSentinel a hint, to switch the server? (e.g. in my case to tell it use ‘dbinfra2’ and NOT always 'dbinfra1`??

Sorry, but I don’t care what is typical or standard! Redis is a fast key/value store but only if you configure and run it correctly on the server side. “Hello World” samples and configs are usually a bad choice in production deployments… According to many messages on the net, Redis is single threaded. So if you run multiple databases in one redis instance on ONE IP port your performance degrades massively, especially if all your databases are busy. Read for example this post and also this discussion just as two samles… You can find many more…

Hmm, here I showed the process list of ONE of my Linux servers. It clearly shows 6 redis processes and also 6 sentinel processes. So each Redis database instance is monitored by one Sentinel instance. And every redis database instance has also its own master group. I also posted C# containing the App.Configure() implementation of one of my servers which uses three of the six redis databases. The name of every master group is taken from an input parameter and is passed to your constructor of RedisSentinel for database 1, database2 and database 3

Thats not an option since our customers NEVER accept a public cloud datastore for privacy reasons!

Forget the geo-redundancy, it has nothing to do with this problem! In the remote backup location I use slave-priority = 0 which means, that these slaves will never become primaries. Read this documentation for details if you are interested in.

mythz · June 24, 2019, 5:44pm

Your use of BOLD SCREAMING CAPS is not in the least bit helpful, if you want to me to look into this any further provide a repro environment, e.g. with Docker compose that can be spun up and tested against, I’m not going to make any further assumptions about your environment.

I’m referring to your custom Redis/Sentinel configuration. The clients cannot perform a failover for an event it never received, which is what your TCP dump is reporting that the failover events are not propagated to subscribed clients. This is what you should’ve been looking into.

Given you’re limited experience with configuring Redis/Sentinel clusters you really should be going for a well-known and tested configuration, otherwise you’re going to run into issues that others aren’t experiencing.

No, it’s just a fast memory data store by default, it’s fast because it operates in Memory and has a very efficient implementation.

Is this blanket strawman somehow a justification of adopting your own bespoke configuration? Have you tested the actual performance of the standard Redis Sentinel configuration, because Redis more than likely provides better performance than you need.

It uses a single thread to process commands which is the property which ensures every Redis operation is atomic and is also why Redis is so efficient as it doesn’t spend any time context switching between threads and most of the time a database is not CPU bound. Although from v4 Redis has started to use multiple threads for different tasks.

Citation required. No this is not the logical conclusion of being single threaded or that multiple DB’s magically massively degrade performance. Have you actually tested your hypothesis in anyway and benchmarked the standard master/slave multiple db version in your environment? Running multiple db’s in a single instance just means it doesn’t scale out to multiple cores, it does not mean performance degrades. You should not be using blanket statements from other disk persisted DB’s and apply them carte blanche to Redis. Redis runs in memory so its operations are CPU bound so does not need to wait for I/O in order to executes its commands which it processes very efficiently. Scaling DB’s to multiple cores is generally only going to help if your Redis instances have exhausted their CPU capacity.

StackOverflow is one of the top 50 websites in the world which uses Redis for all its distributed caching and it only uses the 2x standard master/slave configuration where it processes 160 Billion operations a month whilst running on avg of <2% CPU - leaving it with a lot more spare capacity. Only if you’ve identified the standard configuration has insufficient capacity for your workloads should you consider adopting your own non-standard bespoke configuration. The software engineering maxim applies, make it work, make it right, make it fast.