We are encountering problems with the implementation of Rabbit Mq in ServiceStack. Looking at how it’s implemented we believe messages are being Nak’d and then shoveled into the dead letter queue. The problem is we can’t figure out why.
The consistent markers are when we have long lived messages and a large number of messages backed up in the queue. Because, of the implementation these messages are published to the dlq instead of being moved there because of an X-death header. When the issue happens all messages are rapidly failed and moved to the dlq.
Our consumers are hosted on IIS and have no idle timeouts. We did implement this to fix a race condition which we thought was potentially the issue. Should AppHost.OnAfterInit start the mq server?
At this point we have no idea why this fails. We can run 100k long lived messages and it works fine and then the next time 70k of the messages will succeed and 30k will fail into the dlq. It’s as though the consumer pulls the message and then immediately naks it. I know under the covers ServiceStack uses a basic.get instead of a basic.consume. We have another non ServiceStack implementation used in a completely different application that never encounters problems.
Do you have any suggestions or any thoughts on what is happening here? How should we proceed to fix the issue?
For reference we have an updated RHL cluster for our RMQ server with and updated Erlang version and RabbitMq Server version.