Spring AMQP
  1. Spring AMQP
  2. AMQP-287

Cannot Connect (no server response) prevents container start

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Complete
    • Affects Version/s: 1.1.3
    • Fix Version/s: 1.2.0.M1
    • Component/s: RabbitMQ
    • Labels:
    • Environment:
      Linux

      Description

      The issue is caused by missing machines (no machine at at the target IP/crashed server) or network/firewall issues causing connectivity issues, specifically dropped packets.

      It should be noted that windows vs linux behave differently on the missing server case - windows apparently gives some response that the container is happier with and it will go into the retry cycle. On linux missing server or any other cause of dropped packets will cause the below error. This can be simulated easily on a linux machine with

      sudo iptables -A OUTPUT -p tcp --destination-port 5672 -j DROP

      (adjusting port as necessary)

      See referenced forum thread for additional details

        Activity

        Hide
        J. Russell Smyth added a comment -

        Stack Trace:
        org.springframework.context.ApplicationContextException: Failed to start bean 'org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer#0'; nested exception is org.springframework.amqp.UncategorizedAmqpException: java.util.concurrent.TimeoutException: Timed out waiting for startup
        at org.springframework.context.support.DefaultLifecycleProcessor.doStart(DefaultLifecycleProcessor.java:170)
        at org.springframework.context.support.DefaultLifecycleProcessor.access$1(DefaultLifecycleProcessor.java:154)
        at org.springframework.context.support.DefaultLifecycleProcessor$LifecycleGroup.start(DefaultLifecycleProcessor.java:339)
        at org.springframework.context.support.DefaultLifecycleProcessor.startBeans(DefaultLifecycleProcessor.java:143)
        at org.springframework.context.support.DefaultLifecycleProcessor.onRefresh(DefaultLifecycleProcessor.java:108)
        at org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:926)
        at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:467)
        at org.springframework.web.context.ContextLoader.configureAndRefreshWebApplicationContext(ContextLoader.java:384)
        at org.springframework.web.context.ContextLoader.initWebApplicationContext(ContextLoader.java:283)
        at org.springframework.web.context.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:111)
        at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4205)
        at org.apache.catalina.core.StandardContext.start(StandardContext.java:4704)
        at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799)
        at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
        at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601)
        at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:675)
        at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:601)
        at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502)
        at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1315)
        at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324)
        at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
        at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1061)
        at org.apache.catalina.core.StandardHost.start(StandardHost.java:840)
        at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
        at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463)
        at org.apache.catalina.core.StandardService.start(StandardService.java:525)
        at org.apache.catalina.core.StandardServer.start(StandardServer.java:754)
        at org.apache.catalina.startup.Catalina.start(Catalina.java:595)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
        at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
        Caused by: org.springframework.amqp.UncategorizedAmqpException: java.util.concurrent.TimeoutException: Timed out waiting for startup
        at org.springframework.amqp.rabbit.connection.RabbitUtils.convertRabbitAccessException(RabbitUtils.java:115)
        at org.springframework.amqp.rabbit.connection.RabbitAccessor.convertRabbitAccessException(RabbitAccessor.java:106)
        at org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.start(AbstractMessageListenerContainer.java:362)
        at org.springframework.context.support.DefaultLifecycleProcessor.doStart(DefaultLifecycleProcessor.java:167)
        ... 33 more
        Caused by: java.util.concurrent.TimeoutException: Timed out waiting for startup
        at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.getStartupException(SimpleMessageListenerContainer.java:504)
        at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.doStart(SimpleMessageListenerContainer.java:331)
        at org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.start(AbstractMessageListenerContainer.java:360)
        ... 34 more

        Show
        J. Russell Smyth added a comment - Stack Trace: org.springframework.context.ApplicationContextException: Failed to start bean 'org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer#0'; nested exception is org.springframework.amqp.UncategorizedAmqpException: java.util.concurrent.TimeoutException: Timed out waiting for startup at org.springframework.context.support.DefaultLifecycleProcessor.doStart(DefaultLifecycleProcessor.java:170) at org.springframework.context.support.DefaultLifecycleProcessor.access$1(DefaultLifecycleProcessor.java:154) at org.springframework.context.support.DefaultLifecycleProcessor$LifecycleGroup.start(DefaultLifecycleProcessor.java:339) at org.springframework.context.support.DefaultLifecycleProcessor.startBeans(DefaultLifecycleProcessor.java:143) at org.springframework.context.support.DefaultLifecycleProcessor.onRefresh(DefaultLifecycleProcessor.java:108) at org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:926) at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:467) at org.springframework.web.context.ContextLoader.configureAndRefreshWebApplicationContext(ContextLoader.java:384) at org.springframework.web.context.ContextLoader.initWebApplicationContext(ContextLoader.java:283) at org.springframework.web.context.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:111) at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4205) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4704) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:675) at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:601) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1315) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1061) at org.apache.catalina.core.StandardHost.start(StandardHost.java:840) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463) at org.apache.catalina.core.StandardService.start(StandardService.java:525) at org.apache.catalina.core.StandardServer.start(StandardServer.java:754) at org.apache.catalina.startup.Catalina.start(Catalina.java:595) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) Caused by: org.springframework.amqp.UncategorizedAmqpException: java.util.concurrent.TimeoutException: Timed out waiting for startup at org.springframework.amqp.rabbit.connection.RabbitUtils.convertRabbitAccessException(RabbitUtils.java:115) at org.springframework.amqp.rabbit.connection.RabbitAccessor.convertRabbitAccessException(RabbitAccessor.java:106) at org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.start(AbstractMessageListenerContainer.java:362) at org.springframework.context.support.DefaultLifecycleProcessor.doStart(DefaultLifecycleProcessor.java:167) ... 33 more Caused by: java.util.concurrent.TimeoutException: Timed out waiting for startup at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.getStartupException(SimpleMessageListenerContainer.java:504) at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.doStart(SimpleMessageListenerContainer.java:331) at org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.start(AbstractMessageListenerContainer.java:360) ... 34 more
        Hide
        Gary Russell added a comment - - edited

        Can you reduce your syn retry count?

        I just ran a test (using iptables to drop packets to port 5672) and it works fine for me; although, my network timeouts occur within the (currently hard-coded) 60 second limit...

        13:23:56.309 INFO  [main][org.springframework.integration.samples.amqp.Main] 
        =========================================================
                                                                 
                  Welcome to Spring Integration!                 
                                                                 
            For more information please visit:                   
            http://www.springsource.org/spring-integration       
                                                                 
        =========================================================
        13:24:18.306 INFO  [main][org.springframework.integration.samples.amqp.Main] 
        =========================================================
                                                                  
            This is the AMQP Sample -                             
                                                                  
            Please enter some text and press return. The entered  
            Message will be sent to the configured RabbitMQ Queue,
            then again immediately retrieved from the Message     
            Broker and ultimately printed to the command line.    
                                                                  
        =========================================================
        13:24:23.302 WARN  [SimpleAsyncTaskExecutor-1][org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer] Consumer raised exception, processing can restart if the connection factory supports it. Exception summary: org.springframework.amqp.AmqpConnectException: java.net.ConnectException: Connection timed out
        13:24:49.301 WARN  [SimpleAsyncTaskExecutor-2][org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer] Consumer raised exception, processing can restart if the connection factory supports it. Exception summary: org.springframework.amqp.AmqpConnectException: java.net.ConnectException: Connection timed out
        13:25:15.308 WARN  [SimpleAsyncTaskExecutor-3][org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer] Consumer raised exception, processing can restart if the connection factory supports it. Exception summary: org.springframework.amqp.AmqpConnectException: java.net.ConnectException: Connection timed out
        13:25:41.311 WARN  [SimpleAsyncTaskExecutor-4][org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer] Consumer raised exception, processing can restart if the connection factory supports it. Exception summary: org.springframework.amqp.AmqpConnectException: java.net.ConnectException: Connection timed out
        13:26:07.310 WARN  [SimpleAsyncTaskExecutor-5][org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer] Consumer raised exception, processing can restart if the connection factory supports it. Exception summary: org.springframework.amqp.AmqpConnectException: java.net.ConnectException: Connection timed out
        (deleted the rule and connection was opened)
        xxx
        xxx
        

        Looks like my timeouts were about 26 seconds (Ubuntu 10.4).

        The only way I could reproduce your problem was to increase the SYN retries to 20 (default was 5). This forced the timeout to be greater than the 60 seconds allowed.

        sudo sysctl net.ipv4.tcp_syn_retries=20

        I believe that we should make that 60 second timeout configurable, but if you can force the connection timeout to occur within 60 seconds, all should be well.

        Show
        Gary Russell added a comment - - edited Can you reduce your syn retry count? I just ran a test (using iptables to drop packets to port 5672) and it works fine for me; although, my network timeouts occur within the (currently hard-coded) 60 second limit... 13:23:56.309 INFO [main][org.springframework.integration.samples.amqp.Main] ========================================================= Welcome to Spring Integration! For more information please visit: http: //www.springsource.org/spring-integration ========================================================= 13:24:18.306 INFO [main][org.springframework.integration.samples.amqp.Main] ========================================================= This is the AMQP Sample - Please enter some text and press return . The entered Message will be sent to the configured RabbitMQ Queue, then again immediately retrieved from the Message Broker and ultimately printed to the command line. ========================================================= 13:24:23.302 WARN [SimpleAsyncTaskExecutor-1][org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer] Consumer raised exception, processing can restart if the connection factory supports it. Exception summary: org.springframework.amqp.AmqpConnectException: java.net.ConnectException: Connection timed out 13:24:49.301 WARN [SimpleAsyncTaskExecutor-2][org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer] Consumer raised exception, processing can restart if the connection factory supports it. Exception summary: org.springframework.amqp.AmqpConnectException: java.net.ConnectException: Connection timed out 13:25:15.308 WARN [SimpleAsyncTaskExecutor-3][org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer] Consumer raised exception, processing can restart if the connection factory supports it. Exception summary: org.springframework.amqp.AmqpConnectException: java.net.ConnectException: Connection timed out 13:25:41.311 WARN [SimpleAsyncTaskExecutor-4][org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer] Consumer raised exception, processing can restart if the connection factory supports it. Exception summary: org.springframework.amqp.AmqpConnectException: java.net.ConnectException: Connection timed out 13:26:07.310 WARN [SimpleAsyncTaskExecutor-5][org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer] Consumer raised exception, processing can restart if the connection factory supports it. Exception summary: org.springframework.amqp.AmqpConnectException: java.net.ConnectException: Connection timed out (deleted the rule and connection was opened) xxx xxx Looks like my timeouts were about 26 seconds (Ubuntu 10.4). The only way I could reproduce your problem was to increase the SYN retries to 20 (default was 5). This forced the timeout to be greater than the 60 seconds allowed. sudo sysctl net.ipv4.tcp_syn_retries=20 I believe that we should make that 60 second timeout configurable, but if you can force the connection timeout to occur within 60 seconds, all should be well.
        Hide
        Gary Russell added a comment -

        Actually, there's a simpler work-around - just ensure the connection timeout on the underlying RabbitConnectionFactory is less than 60 seconds...

        
        	<rabbit:connection-factory id="connectionFactory" host="frodo" connection-factory="nativeCF"/>
        
        	<bean id="nativeCF" class="com.rabbitmq.client.ConnectionFactory">
        		<property name="connectionTimeout" value="10000"/>
        	</bean>
        

        That way, the connection will fail-fast and go into retry mode after just 10 seconds (in this example).

        This is actually cleaner because it allows the context to initialize faster; extending the consumer start timeout leaves the context in an initializing state for all that time.

        Unless you object, I will close this with a documentation fix reflecting the above.

        Show
        Gary Russell added a comment - Actually, there's a simpler work-around - just ensure the connection timeout on the underlying RabbitConnectionFactory is less than 60 seconds... <rabbit:connection-factory id= "connectionFactory" host= "frodo" connection-factory= "nativeCF" /> <bean id= "nativeCF" class= "com.rabbitmq.client.ConnectionFactory" > <property name= "connectionTimeout" value= "10000" /> </bean> That way, the connection will fail-fast and go into retry mode after just 10 seconds (in this example). This is actually cleaner because it allows the context to initialize faster; extending the consumer start timeout leaves the context in an initializing state for all that time. Unless you object, I will close this with a documentation fix reflecting the above.
        Hide
        Gary Russell added a comment -

        I decided to document as well as allowing the configuration of the start timer...

        https://github.com/SpringSource/spring-amqp/pull/78

        Show
        Gary Russell added a comment - I decided to document as well as allowing the configuration of the start timer... https://github.com/SpringSource/spring-amqp/pull/78
        Hide
        J. Russell Smyth added a comment -

        I verified that setting the connection timeout on the native connection factory to < 60 seconds resolved our concerns. I would like to see the overall timeout configurable and more importantly the defaults for the overall vs native timeouts should be set so that default behaviour is retry, not container fail (ie default native CF timeout < container timeout)

        Regardless, the documentation fix and configurable timeout on the container satisfies the need.

        Show
        J. Russell Smyth added a comment - I verified that setting the connection timeout on the native connection factory to < 60 seconds resolved our concerns. I would like to see the overall timeout configurable and more importantly the defaults for the overall vs native timeouts should be set so that default behaviour is retry, not container fail (ie default native CF timeout < container timeout) Regardless, the documentation fix and configurable timeout on the container satisfies the need.

          People

          • Assignee:
            Gary Russell
            Reporter:
            J. Russell Smyth
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: