Uploaded image for project: 'Spring AMQP'
  1. Spring AMQP
  2. AMQP-762

Semaphore permit leak leads to “No available channels” exception

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Complete
    • Affects Version/s: 1.6.10, 2.0 M5, 1.7.3
    • Fix Version/s: 1.6.11, 1.7.4, 2.0 RC1
    • Component/s: None
    • Labels:

      Description

      Let's say we have a consumer using CachingConnectionFactory with channelCheckoutTimeout being set to a value greater than zero.
      When we experience connection drop, a semaphore permit is being obtained but is never released.
      This over time leads to exhaust (leak) of permits, and in the end to the AmqpTimeoutException("No available channels") being thrown because there are no more permits to acquire.

      More detailed explanation of the issue and the example on how to reproduce this behavior can be find in linked Stack Overflow question story.

      Code causing this behavior can be find here.

        Activity

        Hide
        grussell Gary Russell added a comment -

        Matej Risek I am having trouble reproducing this issue. When I kill the connection, I see:

        2017-08-21 15:27:51.471  INFO 6775 --- [cTaskExecutor-1] o.s.a.r.l.SimpleMessageListenerContainer : 
        Restarting [email protected]: tags=[{amq.ctag-g3uG6NXrgObajkw6ZtB7Pg=foo}], channel=Cached Rabbit Channel: AMQChannel(amqp://[email protected]:5672/,1), conn: [email protected] Shared Rabbit Connection: [email protected] [delegate=amqp://[email protected]:5672/, localPort= 62008], acknowledgeMode=AUTO local queue size=0
        2017-08-21 15:27:51.471  INFO 6775 --- [cTaskExecutor-1] o.s.a.r.l.SimpleMessageListenerContainer : 
        Restarting [email protected]: tags=[{amq.ctag-FpKbLkgPh7HvedcDUProuA=foo}], channel=Cached Rabbit Channel: AMQChannel(amqp://[email protected]:5672/,2), conn: [email protected] Shared Rabbit Connection: [email protected] [delegate=amqp://[email protected]:5672/, localPort= 62008], acknowledgeMode=AUTO local queue size=0
        

        This is the result of the listener getting a ShutdownSignalException and the restart process closes the old channel (releasing the permit).

        I did this several times (with a cache size of 2) and never experienced any channel starvation.

        What is the nature of this "proxy" you are talking about to fake the network glitch?

        If the glitch is such that the application doesn't get informed of the network failure, I would still expect the heartbeats to deal with that and, eventually, the client will get the ShutdownSignalException. Unless you've disabled heartbeats, that is.

        Show
        grussell Gary Russell added a comment - Matej Risek I am having trouble reproducing this issue. When I kill the connection, I see: 2017-08-21 15:27:51.471 INFO 6775 --- [cTaskExecutor-1] o.s.a.r.l.SimpleMessageListenerContainer : Restarting [email protected]: tags=[{amq.ctag-g3uG6NXrgObajkw6ZtB7Pg=foo}], channel=Cached Rabbit Channel: AMQChannel(amqp://[email protected]:5672/,1), conn: [email protected] Shared Rabbit Connection: [email protected] [delegate=amqp://[email protected]:5672/, localPort= 62008], acknowledgeMode=AUTO local queue size=0 2017-08-21 15:27:51.471 INFO 6775 --- [cTaskExecutor-1] o.s.a.r.l.SimpleMessageListenerContainer : Restarting [email protected]: tags=[{amq.ctag-FpKbLkgPh7HvedcDUProuA=foo}], channel=Cached Rabbit Channel: AMQChannel(amqp://[email protected]:5672/,2), conn: [email protected] Shared Rabbit Connection: [email protected] [delegate=amqp://[email protected]:5672/, localPort= 62008], acknowledgeMode=AUTO local queue size=0 This is the result of the listener getting a ShutdownSignalException and the restart process closes the old channel (releasing the permit). I did this several times (with a cache size of 2) and never experienced any channel starvation. What is the nature of this "proxy" you are talking about to fake the network glitch? If the glitch is such that the application doesn't get informed of the network failure, I would still expect the heartbeats to deal with that and, eventually, the client will get the ShutdownSignalException . Unless you've disabled heartbeats, that is.
        Hide
        grussell Gary Russell added a comment -

        I just tested with a simple proxy and suspended the proxy to stop traffic and got the following...

        2017-08-21 15:57:02.926  INFO 8603 --- [           main] com.example.Amqp762Application           : Started Amqp762Application in 1.349 seconds (JVM running for 1.851)
        2017-08-21 16:01:17.921 ERROR 8603 --- [127.0.0.1:62501] c.r.c.impl.ForgivingExceptionHandler     : An unexpected connection driver error occured
         
        com.rabbitmq.client.MissedHeartbeatException: Heartbeat missing with heartbeat = 60 seconds
        	at com.rabbitmq.client.impl.AMQConnection.handleSocketTimeout(AMQConnection.java:723) ~[amqp-client-4.0.2.jar:4.0.2]
        	at com.rabbitmq.client.impl.AMQConnection.readFrame(AMQConnection.java:642) ~[amqp-client-4.0.2.jar:4.0.2]
        	at com.rabbitmq.client.impl.AMQConnection.access$300(AMQConnection.java:47) ~[amqp-client-4.0.2.jar:4.0.2]
        	at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:572) ~[amqp-client-4.0.2.jar:4.0.2]
        	at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
         
        2017-08-21 16:01:17.925 ERROR 8603 --- [127.0.0.1:62501] o.s.a.r.c.CachingConnectionFactory       : Channel shutdown: connection error
        2017-08-21 16:01:17.925 ERROR 8603 --- [127.0.0.1:62501] o.s.a.r.c.CachingConnectionFactory       : Channel shutdown: connection error
        2017-08-21 16:01:18.779  INFO 8603 --- [cTaskExecutor-1] o.s.a.r.l.SimpleMessageListenerContainer : 
        Restarting [email protected]: tags=[{amq.ctag-nelcV3laikpL1jO2rs8tNQ=foo}], channel=Cached Rabbit Channel: AMQChannel(amqp://[email protected]:62501/,1), conn: [email protected] Shared Rabbit Connection: [email protected] [delegate=amqp://[email protected]:62501/, localPort= 62502], acknowledgeMode=AUTO local queue size=0
        2017-08-21 16:01:18.779  INFO 8603 --- [cTaskExecutor-1] o.s.a.r.l.SimpleMessageListenerContainer : 
        Restarting [email protected]: tags=[{amq.ctag-DA_07tgKCtFFhhxXv6FAcQ=foo}], channel=Cached Rabbit Channel: AMQChannel(amqp://[email protected]:62501/,2), conn: [email protected] Shared Rabbit Connection: [email protected] [delegate=amqp://[email protected]:62501/, localPort= 62502], acknowledgeMode=AUTO local queue size=0
        

        Show
        grussell Gary Russell added a comment - I just tested with a simple proxy and suspended the proxy to stop traffic and got the following... 2017-08-21 15:57:02.926 INFO 8603 --- [ main] com.example.Amqp762Application : Started Amqp762Application in 1.349 seconds (JVM running for 1.851) 2017-08-21 16:01:17.921 ERROR 8603 --- [127.0.0.1:62501] c.r.c.impl.ForgivingExceptionHandler : An unexpected connection driver error occured   com.rabbitmq.client.MissedHeartbeatException: Heartbeat missing with heartbeat = 60 seconds at com.rabbitmq.client.impl.AMQConnection.handleSocketTimeout(AMQConnection.java:723) ~[amqp-client-4.0.2.jar:4.0.2] at com.rabbitmq.client.impl.AMQConnection.readFrame(AMQConnection.java:642) ~[amqp-client-4.0.2.jar:4.0.2] at com.rabbitmq.client.impl.AMQConnection.access$300(AMQConnection.java:47) ~[amqp-client-4.0.2.jar:4.0.2] at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:572) ~[amqp-client-4.0.2.jar:4.0.2] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]   2017-08-21 16:01:17.925 ERROR 8603 --- [127.0.0.1:62501] o.s.a.r.c.CachingConnectionFactory : Channel shutdown: connection error 2017-08-21 16:01:17.925 ERROR 8603 --- [127.0.0.1:62501] o.s.a.r.c.CachingConnectionFactory : Channel shutdown: connection error 2017-08-21 16:01:18.779 INFO 8603 --- [cTaskExecutor-1] o.s.a.r.l.SimpleMessageListenerContainer : Restarting [email protected]: tags=[{amq.ctag-nelcV3laikpL1jO2rs8tNQ=foo}], channel=Cached Rabbit Channel: AMQChannel(amqp://[email protected]:62501/,1), conn: [email protected] Shared Rabbit Connection: [email protected] [delegate=amqp://[email protected]:62501/, localPort= 62502], acknowledgeMode=AUTO local queue size=0 2017-08-21 16:01:18.779 INFO 8603 --- [cTaskExecutor-1] o.s.a.r.l.SimpleMessageListenerContainer : Restarting [email protected]: tags=[{amq.ctag-DA_07tgKCtFFhhxXv6FAcQ=foo}], channel=Cached Rabbit Channel: AMQChannel(amqp://[email protected]:62501/,2), conn: [email protected] Shared Rabbit Connection: [email protected] [delegate=amqp://[email protected]:62501/, localPort= 62502], acknowledgeMode=AUTO local queue size=0
        Hide
        matejrisek Matej Risek added a comment -

        Sample eclipse project

        Show
        matejrisek Matej Risek added a comment - Sample eclipse project
        Hide
        matejrisek Matej Risek added a comment - - edited

        I've attached the eclipse project I'm testing with. For the proxy I use toxiproxy.

        I've tried explicitly setting requestedHeartBeat to 60 and putting channelCacheSize to 2, as from your example, and I still got the same behavior.
        As for the heartbeat to actually kick-in, I haven't seen that happening. I don't know whether I have the proper logging level for that, will check that later when I have time.

        The steps I do to reproduce this error is:
        1. Enable proxy
        2. Run the app, and wait for init
        3. Disable proxy
        4. Wait for "Restarting Consumer" log and enable proxy
        5. Disable proxy
        6. Enable proxy - at this point all the permits should already be taken. (I had to verify that by putting breakpoint at getChannel(ChannelCachingConnectionProxy connection, boolean transactional) in CachingConnectionFactory as in my case log for "throw new AmqpTimeoutException("No available channels");" is not shown for some reason, will check on that later on)

        Show
        matejrisek Matej Risek added a comment - - edited I've attached the eclipse project I'm testing with. For the proxy I use toxiproxy . I've tried explicitly setting requestedHeartBeat to 60 and putting channelCacheSize to 2, as from your example, and I still got the same behavior. As for the heartbeat to actually kick-in, I haven't seen that happening. I don't know whether I have the proper logging level for that, will check that later when I have time. The steps I do to reproduce this error is: 1. Enable proxy 2. Run the app, and wait for init 3. Disable proxy 4. Wait for "Restarting Consumer" log and enable proxy 5. Disable proxy 6. Enable proxy - at this point all the permits should already be taken. (I had to verify that by putting breakpoint at getChannel(ChannelCachingConnectionProxy connection, boolean transactional) in CachingConnectionFactory as in my case log for " throw new AmqpTimeoutException("No available channels"); " is not shown for some reason, will check on that later on)
        Hide
        grussell Gary Russell added a comment -

        Got it; reproduced; thanks!

        Show
        grussell Gary Russell added a comment - Got it; reproduced; thanks!

          People

          • Assignee:
            grussell Gary Russell
            Reporter:
            matejrisek Matej Risek
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: