Can we run Memcached on dotnet core? Having Redis issues

tracstarr · January 24, 2019, 2:27pm

I’m not seeing memcached client libs for netcore. Is that correct?

mythz · January 24, 2019, 3:49pm

There’s not support for .NET Core, use Redis if you need a distributed memory cache in .NET Core.

tracstarr · January 24, 2019, 3:54pm

That’s what we are using, but it’s giving us one hell of a headache. As our user base increases we get more and more timeouts. So much so that it’s now completely disruptive to the users. I have to get an alternative in place ASAP.

I’m trying dynamodb right now, but now there are issues with custom cache keys and something about needing * prefix.

mythz · January 24, 2019, 4:21pm

Redis is pretty efficient, is there an option of increasing the capacity of the servers hosting the redis instances?

You can alleviate some load on the distributed cache by using LocalCache for caches you prefer to store in local memory.

Some configurations that might help with timeouts is to specify a DefaultConnectTimeout so clients that are hanging when trying to establish a socket connection will throw. Setting DeactivatedClientsExpiry to zero will disconnect clients that were connected to a failed redis instance immediately.

RedisConfig.DefaultConnectTimeout = 20 * 1000; //milliseconds
RedisConfig.DeactivatedClientsExpiry = TimeSpan.Zero;

tracstarr · January 24, 2019, 4:36pm

Ok, lets talk redis then…since it would make things easier if i can just get it working.

Most errors are The timeout period elapsed prior to obtaining a connection from the pool.
We have been using PooledRedisClientManager but re-reading the docs it appears that RedisManagerPool might be the better option.

We do not manually make any connections, we only use Cache methods of SS. There are many within parallel loops.

I will make these changes along with the suggestion global config options and see how that pans out. I can’t see things getting any worse at this point.

mythz · January 24, 2019, 4:39pm

Yes use RedisManagerPool instead, if the pool size is exceeded it will create a new connection outside of the pool instead of waiting until one is freed which is likely the cause of your timeouts if the Redis server instance isn’t the cause from being unresponsive from too much load.

tracstarr · January 24, 2019, 4:48pm

thanks, have my PR up and hope to get testing in prod asap.

tracstarr · January 24, 2019, 6:54pm

So far not so good. It worked initially, but then it took down the entire api service. No logs no errors just caused timeouts hitting every possible endpoint and eventually the server died. Not much to go on yet. Reverting it and all was back to normal.

I only added the 2 redis global settings and changed the pool manager class. I will keep posted here any additional information I find.

kev_rm · January 25, 2019, 12:55am

I would be very surprised if your issue is Redis itself.

mythz · January 25, 2019, 1:02am

Redis can definitely become responsive if it has too much load, but it requires a lot of load. I would look into your server health to see its resource usage.

tracstarr · January 25, 2019, 2:19am

I don’t disagree that it might (probably) is something we are not doing, or doing wrong… I just don’t know what yet. We have redis hosted with aws, and it’s a r3.large server. It’s hardly feeling a thing.

I’m still working on trying to figure it out - but I’m now suspicious of many of our key sizes. Not sure why that would be an issue with max connection pool issues, or why the change above was so catastrophic. But i’m tired now and didn’t get to test as much with it today as I wanted. Will continue tomorrow.

tracstarr · January 31, 2019, 6:58pm

So to continue this saga, the seeming simple change from PooledRedisClientManager to RedisManagerPool is disastrous. With that being the only change, things initially work fine, but within hours it literally takes the entire API to it’s knees without warning and without any error logging. The only log message I was able to find was:

ServiceStack.Redis.RedisNativeClient No more data, sPort: 46060, LastCommand:

The symptom was that we were getting constant 504 errors when trying to hit the api. Some endpoints would eventually work, but be very very slow. Anything which accessed cache via redis was 504.

Rebooting redis and changing back the pool class and everything is working … except that we eventually get the no more pool connections/timeout error. Any thoughts? Looking at the redis status it’s got no issues itself with resources.

mythz · February 1, 2019, 1:19am

The error message means that the client expected more data from the server which typically happens if the same redis client instance was used simultaneously in multiple threads. You should check to ensure that all access to the Redis client happens within the using (){} scope where it was retrieved from the Redis manager, that you don’t use any typed clients or server collections outside of the using scope, e.g:

using (var client = redisManager.GetClient())
{
    // only use within original using scope 
    var redisList = redis.Lists["mylist"]; 
    var redisPocos = redis.As<Poco>(); 
}

and that you don’t maintain any static or singleton instances of redis clients.

You can use the RedisConfig.AssertAccessOnlyOnSameThread = true; debug config option to ensure that the Redis client is only called from the same thread that retrieved it from the pool, i.e. using (var client = redisManager.GetClient()){} if it detects it’s being accessed from a different thread it will throw an InvalidAccessException with the message containing the different Thread Ids and the original StackTrace where the client was resolved from the pool. You can compare this with the StackTrace of the Exception to hopefully identify where the client is being improperly used.

tracstarr · February 1, 2019, 1:32am

Interesting. Although I think we did this correctly, there’s always a chance we missed something, and knowing there is this way to test that is great. Thanks. I will work on that tomorrow.

tracstarr · February 1, 2019, 3:17pm

So what about direct access to Service.Cache from threads? How do we correctly handle that?

mythz · February 1, 2019, 6:18pm

Each thread needs to use their own Redis client instance that they resolve from the redisManager, I.e. don’t share the same Redis Client instance across multiple threads, have each thread fetch their own instance from the pool themselves.

tracstarr · February 14, 2019, 7:28pm

So I wanted to follow up here as we are continuing to have serious issues. So much so we are loosing our users.

Here’s what we’ve concluded and found so far. Using the new pooling class with the modified global redis settings instantly takes our API servers down without any errors logged.

After setting timeout=120 in redis config within aws there was an instant difference in the active connection count. It was very obvious in the cloudwatch graphs. Instead of a flatline, then step, flatline then step in connection counts up to the max, we could see connection counts going up and down, and staying relatively low. (less than 50). There was defiantly an issue with connections not being correctly released despite the settings in code, it took setting them in redis.

After some local load testing and basically by accident, the API was restarted while the load tests were running. We instantly got the same redis connection pool timeouts, and a spike in connections up to the max.

Upon further investigation of logs, we could see that minutes before the redis timeouts, we were seeing API restart messages.

Now, the problem/question is why are we seeing these service restarts? Memory usage remains constant and at about 50%. We see CPU spikes to 100% at the same time, but it’s difficult to say if it’s due to the restart or if it’s a symptom of the problem.

We see upwards of 10-20 API restarts a day followed by the connection pool errors.

Additionally, just before these connection timeouts, we see _ Zero length response_ errors from redis when trying to fetch the user session information.

Really struggling to figure out the next steps or what could be going on here.

mythz · February 14, 2019, 8:01pm

I couldn’t tell you why from here, but it’s definitely something I’d be looking to investigate what’s causing the 100% CPU spikes and subsequent restarts, the CPU spikes are likely the cause of the restarts.

Did you try setting RedisConfig.AssertAccessOnlyOnSameThread = true I mentioned earlier to find out if you’re using the redis client on different threads? which can cause tcp corruption issues like Zero response errors.

What are the RedisStats reporting?

RedisStats.ToDictionary().PrintDump();

If theyre showing a high count of TotalFailovers, TotalDeactivatedClients, TotalForcedMasterFailovers or TotalInvalidMasters you can try upgrading to use the latest v5.4.1 on MyGet which has some improvements on recovering from failed masters.

tracstarr · February 14, 2019, 8:03pm

Yes we’ve tried the assert and have cleaned up the single instance where there was an issue.
I was unaware of the redis stats reporting and will investigate what comes from that.

We are already on 5.4.1 from myget.

mythz · February 14, 2019, 8:05pm

Make sure you’re using the latest pre-release v5.4.1 version by clearing your local NuGet cace:

$ nuget locals all -clear