Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Complete
    • Affects Version/s: 2.2.0.RELEASE
    • Fix Version/s: 2.3.1
    • Component/s: Core
    • Labels:
      None

      Description

      We have developed a couple of Spring Webservices deployed on Tomcat in a (stateless) Docker container that from time to time somehow fail to start correctly and return a stacktrace instead of the WSDL. Restarting the container will almost always fix it at the first attempt.

      The same issue was reported here by someone else: http://forum.spring.io/forum/spring-projects/web-services/122858-thread-safety-issue-with-requests-to-wsdl

      It looks like it is similar to bugs like these:

      Below the complete stacktrace:

      08:14:35.378 [http-nio-8080-exec-1] DEBUG o.s.w.t.h.MessageDispatcherServlet - Could not complete request
      javax.xml.transform.TransformerException: java.lang.NullPointerException
      at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:752) ~[na:1.8.0_40-internal]
      at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:357) ~[na:1.8.0_40-internal]
      at org.springframework.ws.transport.http.WsdlDefinitionHandlerAdapter.handle(WsdlDefinitionHandlerAdapter.java:144) ~[spring-ws-core-2.2.0.RELEASE.jar:2.2.0.RELEASE]
      at org.springframework.ws.transport.http.MessageDispatcherServlet.doService(MessageDispatcherServlet.java:285) ~[spring-ws-core-2.2.0.RELEASE.jar:2.2.0.RELEASE]
      at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:961) [spring-webmvc-4.0.5.RELEASE.jar:4.0.5.RELEASE]
      at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:852) [spring-webmvc-4.0.5.RELEASE.jar:4.0.5.RELEASE]
      at javax.servlet.http.HttpServlet.service(HttpServlet.java:618) [servlet-api.jar:na]
      at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:837) [spring-webmvc-4.0.5.RELEASE.jar:4.0.5.RELEASE]
      at javax.servlet.http.HttpServlet.service(HttpServlet.java:725) [servlet-api.jar:na]
      at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:291) [catalina.jar:8.0.20]
      at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) [catalina.jar:8.0.20]
      at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) [tomcat-websocket.jar:8.0.20]
      at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239) [catalina.jar:8.0.20]
      at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) [catalina.jar:8.0.20]
      at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219) [catalina.jar:8.0.20]
      at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106) [catalina.jar:8.0.20]
      at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:501) [catalina.jar:8.0.20]
      at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:142) [catalina.jar:8.0.20]
      at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79) [catalina.jar:8.0.20]
      at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:610) [catalina.jar:8.0.20]
      at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88) [catalina.jar:8.0.20]
      at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:516) [catalina.jar:8.0.20]
      at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1086) [tomcat-coyote.jar:8.0.20]
      at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:659) [tomcat-coyote.jar:8.0.20]
      at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:223) [tomcat-coyote.jar:8.0.20]
      at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1558) [tomcat-coyote.jar:8.0.20]
      at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1515) [tomcat-coyote.jar:8.0.20]
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_40-internal]
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_40-internal]
      at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) [tomcat-util.jar:8.0.20]
      at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40-internal]
      Caused by: java.lang.NullPointerException: null
      at com.sun.org.apache.xerces.internal.dom.DeferredElementNSImpl.synchronizeData(DeferredElementNSImpl.java:108) ~[na:1.8.0_40-internal]
      at com.sun.org.apache.xerces.internal.dom.ElementImpl.getNodeName(ElementImpl.java:122) ~[na:1.8.0_40-internal]
      at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:152) ~[na:1.8.0_40-internal]
      at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:230) ~[na:1.8.0_40-internal]
      at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:230) ~[na:1.8.0_40-internal]
      at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:230) ~[na:1.8.0_40-internal]
      at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:230) ~[na:1.8.0_40-internal]
      at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:230) ~[na:1.8.0_40-internal]
      at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:136) ~[na:1.8.0_40-internal]
      at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:98) ~[na:1.8.0_40-internal]
      at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:699) ~[na:1.8.0_40-internal]
      at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:743) ~[na:1.8.0_40-internal]
      ... 30 common frames omitted

        Activity

        Hide
        johannesb JohannesB added a comment -

        CAUSE:
        WsdlDefinitionHandlerAdapter uses an instance of the Source interface named definitionSource.
        According to my debugger this is a javax.xml.transform.dom.DOMSource with its node attribute set to an instance of:
        com.sun.org.apache.xerces.internal.dom.DeferredDocumentImpl (from Oracle JDK8).

        This DeferredDocumentImpl is the root of a tree of DeferredNode's (DeferredElementNSImpl, DeferredTextImpl) which are not thread safe, not even for read access, as the whole idea is that these are lazyily initialised. (See for example the FAQ from Xerces: https://xerces.apache.org/xerces2-j/faq-dom.html)

        This object is created by Wsdl4jDefinition.getSource() which is (partially) synchronized but caches the result of wsdlWriter.getDocument(definition) that causes the problem.

        SOLUTION:
        My (succesfully tested) proposal would be to remove the caching of the DOM version of the definition in Wsdl4jDefinition.java

        ALTERNATIVES:

        • In theory there are other DOM implementations like the original Xerces that can be configured to disable this deferred feature. See: https://xerces.apache.org/xerces-j/features.html but we cannot be sure which dom implementation is used and we cannot configure DOM parser features through the JSR-110 WSDLFactory and WSDLWriter.
        • The original race condition only affects parallel initialization of deferred elements. It might be possible to somehow solve the original code by iterating once over all the elements in the DOM tree within the synchronized block in order to initialise all the deferred elements.

        Reproduction and testing the solution was done with a JMeter loadtest with 8 threads that was started before starting the application.

        Show
        johannesb JohannesB added a comment - CAUSE: WsdlDefinitionHandlerAdapter uses an instance of the Source interface named definitionSource. According to my debugger this is a javax.xml.transform.dom.DOMSource with its node attribute set to an instance of: com.sun.org.apache.xerces.internal.dom.DeferredDocumentImpl (from Oracle JDK8). This DeferredDocumentImpl is the root of a tree of DeferredNode's (DeferredElementNSImpl, DeferredTextImpl) which are not thread safe, not even for read access, as the whole idea is that these are lazyily initialised. (See for example the FAQ from Xerces: https://xerces.apache.org/xerces2-j/faq-dom.html ) This object is created by Wsdl4jDefinition.getSource() which is (partially) synchronized but caches the result of wsdlWriter.getDocument(definition) that causes the problem. SOLUTION: My (succesfully tested) proposal would be to remove the caching of the DOM version of the definition in Wsdl4jDefinition.java ALTERNATIVES: In theory there are other DOM implementations like the original Xerces that can be configured to disable this deferred feature. See: https://xerces.apache.org/xerces-j/features.html but we cannot be sure which dom implementation is used and we cannot configure DOM parser features through the JSR-110 WSDLFactory and WSDLWriter. The original race condition only affects parallel initialization of deferred elements. It might be possible to somehow solve the original code by iterating once over all the elements in the DOM tree within the synchronized block in order to initialise all the deferred elements. Reproduction and testing the solution was done with a JMeter loadtest with 8 threads that was started before starting the application.
        Hide
        johannesb JohannesB added a comment -

        Today discovered that this bug not only can cause NullPointerException during WSDL retrieval but sometimes corrupts the WSDL:

        We have the same identical Docker container on both servers but this returns different results on different servers:

        curl http://localhost:xxx/app/v1/App.wsdl|egrep -o 'xs:complexType name="MediumIdCollection.

        {50}

        '

        Server 1 returns:
        xs:complexType name="MediumIdCollection"><xs:sequence><xs:element maxOccurs="" minOccurs=

        Server 2 returns:
        xs:complexType name="MediumIdCollection"><xs:sequence><xs:element maxOccurs="unbounded" m

        In this case the race condition made the maxOccurs attribute value disappear, which makes this even more unpredictable.

        Show
        johannesb JohannesB added a comment - Today discovered that this bug not only can cause NullPointerException during WSDL retrieval but sometimes corrupts the WSDL: We have the same identical Docker container on both servers but this returns different results on different servers: curl http://localhost:xxx/app/v1/App.wsdl |egrep -o 'xs:complexType name="MediumIdCollection. {50} ' Server 1 returns: xs:complexType name="MediumIdCollection"><xs:sequence><xs:element maxOccurs="" minOccurs= Server 2 returns: xs:complexType name="MediumIdCollection"><xs:sequence><xs:element maxOccurs="unbounded" m In this case the race condition made the maxOccurs attribute value disappear, which makes this even more unpredictable.
        Hide
        gregturn Greg Turnquist added a comment -

        The cached version of Document has been removed from Wsdl4jDefinition so it should return a clean copy. Code is available on 2.3.1.BUILD-SNAPSHOT if you want to try it out.

        Show
        gregturn Greg Turnquist added a comment - The cached version of Document has been removed from Wsdl4jDefinition so it should return a clean copy. Code is available on 2.3.1.BUILD-SNAPSHOT if you want to try it out.

          People

          • Assignee:
            gregturn Greg Turnquist
            Reporter:
            johannesb JohannesB
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: