[SWS-947] race-condition breaks dynamic WSDL Created: 11/Feb/16  Updated: 23/Aug/16  Resolved: 23/Aug/16

Status: Resolved
Project: Spring Web Services
Component/s: Core
Affects Version/s: 2.2.0.RELEASE
Fix Version/s: 2.3.1

Type: Bug Priority: Major
Reporter: JohannesB Assignee: Greg Turnquist
Resolution: Complete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

We have developed a couple of Spring Webservices deployed on Tomcat in a (stateless) Docker container that from time to time somehow fail to start correctly and return a stacktrace instead of the WSDL. Restarting the container will almost always fix it at the first attempt.

The same issue was reported here by someone else: http://forum.spring.io/forum/spring-projects/web-services/122858-thread-safety-issue-with-requests-to-wsdl

It looks like it is similar to bugs like these:

Below the complete stacktrace:

08:14:35.378 [http-nio-8080-exec-1] DEBUG o.s.w.t.h.MessageDispatcherServlet - Could not complete request
javax.xml.transform.TransformerException: java.lang.NullPointerException
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:752) ~[na:1.8.0_40-internal]
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:357) ~[na:1.8.0_40-internal]
at org.springframework.ws.transport.http.WsdlDefinitionHandlerAdapter.handle(WsdlDefinitionHandlerAdapter.java:144) ~[spring-ws-core-2.2.0.RELEASE.jar:2.2.0.RELEASE]
at org.springframework.ws.transport.http.MessageDispatcherServlet.doService(MessageDispatcherServlet.java:285) ~[spring-ws-core-2.2.0.RELEASE.jar:2.2.0.RELEASE]
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:961) [spring-webmvc-4.0.5.RELEASE.jar:4.0.5.RELEASE]
at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:852) [spring-webmvc-4.0.5.RELEASE.jar:4.0.5.RELEASE]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:618) [servlet-api.jar:na]
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:837) [spring-webmvc-4.0.5.RELEASE.jar:4.0.5.RELEASE]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:725) [servlet-api.jar:na]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:291) [catalina.jar:8.0.20]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) [catalina.jar:8.0.20]
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) [tomcat-websocket.jar:8.0.20]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239) [catalina.jar:8.0.20]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) [catalina.jar:8.0.20]
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219) [catalina.jar:8.0.20]
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106) [catalina.jar:8.0.20]
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:501) [catalina.jar:8.0.20]
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:142) [catalina.jar:8.0.20]
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79) [catalina.jar:8.0.20]
at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:610) [catalina.jar:8.0.20]
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88) [catalina.jar:8.0.20]
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:516) [catalina.jar:8.0.20]
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1086) [tomcat-coyote.jar:8.0.20]
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:659) [tomcat-coyote.jar:8.0.20]
at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:223) [tomcat-coyote.jar:8.0.20]
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1558) [tomcat-coyote.jar:8.0.20]
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1515) [tomcat-coyote.jar:8.0.20]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_40-internal]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_40-internal]
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) [tomcat-util.jar:8.0.20]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40-internal]
Caused by: java.lang.NullPointerException: null
at com.sun.org.apache.xerces.internal.dom.DeferredElementNSImpl.synchronizeData(DeferredElementNSImpl.java:108) ~[na:1.8.0_40-internal]
at com.sun.org.apache.xerces.internal.dom.ElementImpl.getNodeName(ElementImpl.java:122) ~[na:1.8.0_40-internal]
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:152) ~[na:1.8.0_40-internal]
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:230) ~[na:1.8.0_40-internal]
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:230) ~[na:1.8.0_40-internal]
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:230) ~[na:1.8.0_40-internal]
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:230) ~[na:1.8.0_40-internal]
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:230) ~[na:1.8.0_40-internal]
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:136) ~[na:1.8.0_40-internal]
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:98) ~[na:1.8.0_40-internal]
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:699) ~[na:1.8.0_40-internal]
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:743) ~[na:1.8.0_40-internal]
... 30 common frames omitted



 Comments   
Comment by JohannesB [ 08/Aug/16 ]

CAUSE:
WsdlDefinitionHandlerAdapter uses an instance of the Source interface named definitionSource.
According to my debugger this is a javax.xml.transform.dom.DOMSource with its node attribute set to an instance of:
com.sun.org.apache.xerces.internal.dom.DeferredDocumentImpl (from Oracle JDK8).

This DeferredDocumentImpl is the root of a tree of DeferredNode's (DeferredElementNSImpl, DeferredTextImpl) which are not thread safe, not even for read access, as the whole idea is that these are lazyily initialised. (See for example the FAQ from Xerces: https://xerces.apache.org/xerces2-j/faq-dom.html)

This object is created by Wsdl4jDefinition.getSource() which is (partially) synchronized but caches the result of wsdlWriter.getDocument(definition) that causes the problem.

SOLUTION:
My (succesfully tested) proposal would be to remove the caching of the DOM version of the definition in Wsdl4jDefinition.java

ALTERNATIVES:

  • In theory there are other DOM implementations like the original Xerces that can be configured to disable this deferred feature. See: https://xerces.apache.org/xerces-j/features.html but we cannot be sure which dom implementation is used and we cannot configure DOM parser features through the JSR-110 WSDLFactory and WSDLWriter.
  • The original race condition only affects parallel initialization of deferred elements. It might be possible to somehow solve the original code by iterating once over all the elements in the DOM tree within the synchronized block in order to initialise all the deferred elements.

Reproduction and testing the solution was done with a JMeter loadtest with 8 threads that was started before starting the application.

Comment by JohannesB [ 09/Aug/16 ]

Today discovered that this bug not only can cause NullPointerException during WSDL retrieval but sometimes corrupts the WSDL:

We have the same identical Docker container on both servers but this returns different results on different servers:

curl http://localhost:xxx/app/v1/App.wsdl|egrep -o 'xs:complexType name="MediumIdCollection.

{50}

'

Server 1 returns:
xs:complexType name="MediumIdCollection"><xs:sequence><xs:element maxOccurs="" minOccurs=

Server 2 returns:
xs:complexType name="MediumIdCollection"><xs:sequence><xs:element maxOccurs="unbounded" m

In this case the race condition made the maxOccurs attribute value disappear, which makes this even more unpredictable.

Comment by Greg Turnquist [ 23/Aug/16 ]

The cached version of Document has been removed from Wsdl4jDefinition so it should return a clean copy. Code is available on 2.3.1.BUILD-SNAPSHOT if you want to try it out.

Generated at Wed Dec 13 03:31:51 UTC 2017 using JIRA 6.4.14#64029-sha1:ae256fe0fbb912241490ff1cecfb323ea0905ca5.