[DATAREDIS-834] Redis connection using LettuceConnectionFactory locks for a long time when the redis server is down. Created: 15/May/18 Updated: 07/Jun/18 Resolved: 07/Jun/18
|Project:||Spring Data Redis|
|Affects Version/s:||1.8.13 (Ingalls SR13)|
|Reporter:||Ugur Alpay Cenar||Assignee:||Mark Paluch|
|Resolution:||Works as Designed||Votes:||0|
|Remaining Estimate:||Not Specified|
|Time Spent:||Not Specified|
|Original Estimate:||Not Specified|
I am trying to setup spring CacheManager using Redistemplate and LettuceConnectionFactory connected to a Redis sentinel
Below is the code I used (slightly reduced for simplicity)
It is not expected that the redis server is always up and working 100% of time in a production environment. I did therefore try to simulate redis server going down by restarting the redis sentinel. The problem is that when the redis sentinel was down, the application stopped responding to the requests or took almost 30-50 seconds to respond. It seems like the LettuceConnectionFactory is locking when trying to reconnect, and builds up a large queue with pending requests which takes very long time when it tries to retry every request in the queue (not sure if this is the scenario, but it is was I understood from reading the documentation).
After lot of debugging I was able to fix this problem by copying and modifying the afterPropertiesSet method in DefaultLettucePool:
I was therefore wondering if you could add an option to set the ClientOptions of RedisClient when configuring the DefaultLettucePool or find another solution to the locking problem by adding more configuration options. It is very important for the application to keep running as usual without cache when the Redis Server is down.
I tried to set the reconnectDelay and connectionTimeout but nothing helped. This was the only solution that actually worked. I also tried to setup without using the LettucePool.
I tried to use JedisConnectionFactory, but the problem with JedisConnectionFactory is that it won't reconnect to the redis sentinel after redis server restart.
|Comment by Mark Paluch [ 15/May/18 ]|
This sounds like a Lettuce issue and not necessary a Spring Data Redis issue. In general, if your server is not available, you can either fail fast with an exception or (what Lettuce is doing) wait a certain time until Redis comes back online. Pooling won't help here: If Redis is down, all pooled connections are disconnected.
What would you expect should happen if your server is not available?
|Comment by Ugur Alpay Cenar [ 15/May/18 ]|
Forgot to explain, but I also overrided the CacheErrorHandler and I am just logging the error without doing anything. The application then run as usual without using the cache. I am expecting for the application to not fail when Redis is not available and reconnect when it is available. After my modification on the LettucePool, the Redis CacheManager works as expected.
|Comment by Mark Paluch [ 15/May/18 ]|
These are two aspects of which to not fail when Redis is not available is a quite broad statement: RedisCache operations propagate exceptions if there's an issue with Redis I/O. Consuming exceptions with an own CacheErrorHandler does not propagate exceptions any further. This brings you to the point where you can suppress exceptions.
You get the reconnect feature from Lettuce and by tweaking timeout options/disconnected behavior you can tailor the actual behavior in unavailability scenarios.
We revised LettuceConnectionFactory in version 2.0 with specific client configurations where you can set ClientOptions directly. We don't have plans to update the API for 1.8.x versions, please upgrade to a newer version.
|Comment by Ugur Alpay Cenar [ 19/May/18 ]|
Sorry for late answer, I was busy the last couple of days. I tried to tweak the timeout/reconnect options but nothing helped. Only solution was to modify the RedisClient configurations which is not possbile through the current LettuceConnectionFactory on 1.8.13. I just think that it was strange that the application stopped responding when Redis was unavailable. I was expecting the ConnectionFactory to stop trying to reconnect after throwing exception which was not the case. I will try to upgrade to version 2.0 when I get the chance.
|Comment by Mark Paluch [ 07/Jun/18 ]|
Okay, I'm closing this ticket as we currently cannot do anything further here.