[SWS-302] Need to prevent parsing server-side responses and client-side requests into axiom trees if using axiom and payload caching is off Created: 04/Mar/08  Updated: 21/Jul/08  Resolved: 14/May/08

Status: Closed
Project: Spring Web Services
Component/s: Core
Affects Version/s: 1.5 RC1
Fix Version/s: 1.5.2

Type: Improvement Priority: Major
Reporter: Jim Cummings Assignee: Arjen Poutsma
Resolution: Fixed Votes: 3
Labels: None
Remaining Estimate: 5d
Time Spent: Not Specified
Original Estimate: 5d

Attachments: Text File patch.txt    
Issue Links:
Related
is related to SWS-352 Full streaming WebServiceMessage/Soap... Closed

 Description   

If using the axiom message factory with payload caching turned off, spring-ws currently supports streaming a request on the server-side and a response on the client-side without parsing the stream into an axiom tree. But responses on the server-side and requests on the client-side do get parsed into axiom trees. So turning off payload caching is non-symmetrical.

If this can be prevented, then spring-ws should be able to support large client requests and large server responses with much better performance. This also would even help with medium-sized messages in high volume scenarios.

The AxiomSoapBody's getPayloadResult() method currently creates a custom SAX Content Handler that effectively builds an axiom tree for the payload during transforms involving the payload result. The proposed enhancement is to basically have getPayloadResult() return a subclass of StreamResult feeding a ByteArrayOutputStream, and then in the AxiomSoapMessage's writeTo() method insert an Axiom OMSourcedEle as the payload, containing a custom OMDataSource built around the above ByteArrayOutputStream. This allows a 'placeholder' for the payload in the axiom tree containing the raw payload XML, and when writeTo() serializes the axiom message to the transport's output stream, the raw payload XML is written to the output stream without ever getting parsed. It is a little more involved than the above, but that describes the highlights.

If something like an interceptor, client-side callback, or logging code needs to access the contents of the payload, axiom will automatically parse the contents of the OMDataSource into a fully expanded Axiom tree. So this gives the flexibility of accessing the payload data if need be, but does not parse it into axiom nodes unless absolutely needed.

I am preparing a patch of contribution files that does the above. I mainly just need to wrap up the unit tests and do some performance testing, and then I will attach the patch.



 Comments   
Comment by Arjen Poutsma [ 05/Mar/08 ]

Very interesting.

So if I understand correctly, the response is still stored in memory, just not parsed?

Comment by Jim Cummings [ 06/Mar/08 ]

Arjen - Right, it would still store it in memory, unparsed as a byte array, but allows you to use the normal message factory, endpoints, and interceptors (as long as they don't need access to the payload contents - if they do then it would get parsed). It seems this would be appropriate behavior (ie non-parsed, in memory) for most client -side requests and server-side responses, similar to how client-side responses and server-side requests work with axiom for the general efficiency improvements.

But yeah, I think for really massive payloads (say hundreds of MBs or more), a pure streaming solution would be useful for those specialized scenarios. The above might be able to be extended at some point to also support pure streaming - it would probably involve having the getPayloadResult() return a custom StreamResult that allowed a 'streaming-focused' end point adapter or end point to put an InputStream (or something similar) into the result to allow the payload to actually be somewhere out-of-memory (say maybe on disk). And then when the AxiomSoapMessage's writeTo() method serializes the axiom message, the custom DataSource would just read from that streaming inputStream and write that to the transport output stream. Or alternatively, your thoughts of using a specialized message factory that precluded use of interceptors would probably be fine as well.

I figured the above proposal would be a first step that would improve perf for the general non-massive scenarios (which is what I ran into), and that a specialized solution for pure streaming could be introduced at some point as well.

Does that seem like it might be a reasonable approach?

Comment by Jim Cummings [ 06/Mar/08 ]

I should probably mention I wasn't proposing trying to address the pure streaming issues for really large payloads with this issue/patch. I was hoping to instead reduce the processing (and memory churn to some degree) generally required for medium and largish payloads by avoiding the full-blown axiom trees.

Comment by Sebastien Chatel [ 12/Mar/08 ]

Hello jim,
I've found this issue too.
http://forum.springframework.org/showthread.php?t=50967

IMO the custom OMDataSource could be provided by the MessageEndPoint itself. With a JiBX marshaller, a custom OMDataSource can be used to marshall Objects only when the writeTo() method is called. This trick can avoid an in memory ByteArrayOutputStream.

This process is used by Axis2 for Jibx. An OMElement containing a JibxDataSource is added to the OMContainer.

Could we have something like "OMDataSourceResult" containing a custom OMDataSource ? I'm not expert with Axiom and Spring-WS

Here is an example :

IMarshallable mrshable = (IMarshallable) responseObject;
OMDataSource dataSource = new JiBXDataSource(mrshable, bindingFactory);
int index = mrshable.JiBX_getIndex();
OMNamespace appns = oMFactory.createOMNamespace(bindingFactory.getElementNamespaces()[index], "");
OMElement el = oMFactory.createOMElement(dataSource, bindingFactory.getElementNames()[index], appns);

oMContainer.addChild(el);

Comment by Jim Cummings [ 12/Mar/08 ]

Hi Sebastien,

Delaying the marshalling until the message's writeTo() method (e.g. like in a manner you suggested) could be made to avoid an intermediate byte array stream. I think the trick would be how to get this to work for the other binding technologies' endpoints (assuming it was ok to have Axiom specific code in each), and finding something similar that would work on the client-side too when sending large requests (which also ends up using the body.getPayloadResult()).

Although I have not had the bandwidth to do performance/throughput tests on this yet, I'll go ahead and attach my patch to this Jira issue based on my earlier ByteArrayOutputStream-based suggestion with some modified and new unit tests. This appears to provide a baseline general enhancement that should work with all the existing endpoints and on the client-side as well, but as discussed does marshall the response temporarily to a ByteArrayOutputStream, but does avoid building a full Axiom tree.

It probably wouldn't be too hard to further extend the AxiomSoapBody at some point to allow alternate custom payload results to be returned that could hold other things besides a ByteArrayOutputStream, so that some (e.g. possibly JiBX and pure streaming) endpoints (not sure about the client-side though) could allow even more optimized writing of the payloads to the transport. I'm not clear off-hand what the least invasive way to do this would be though.

Jim.

Comment by Jim Cummings [ 12/Mar/08 ]

Arjen - Here is a patch that seems to prevent the payload from getting built into a full axiom tree before getting written to the transport output stream. It temporarily puts the payload into a ByteArrayOutputStream instead inside the OMDataSource as described. I've modified/added some unit tests, but have not had a chance to properly performance test this.

This should work for all endpoints' server-side responses and on the client-side for client-side requests as well.

The only side-effect I can think this might have is that the payload Axiom element is not put into the body element until the message's writeTo() is called. So if an interceptor needed to access the payload it wouldn't be there - but the javadocs already warn this won't work if payload caching is turned off anyways. If that was concerning, this could easily be keyed off a new flag in the axiom message factory besides payloadCaching.

Comment by Arjen Poutsma [ 03/May/08 ]

Great stuff, Jim. I will add this to 1.5.2. Thanks!

Note that I have created SWS-352 for the full-streaming approach. I'd be interested in your comments on that. Implementing SWS-352 is a lot more work than this issue, so I've set that for the next major version (currently 1.6).

Comment by Arjen Poutsma [ 14/May/08 ]

I've resolved this issue, with a few changes/improvements to the original patch:

  • I've replaced the commons io ByteArrayOutputStream with the standard java.io.ByteArrayOutputStream. I don't want to require the extra dependency to commons io. What we could do instead is detect it: if available, use commons io, but default to java io. This is not done yet, let me know if you want it.
  • The XML declarations are now filtered when written, not when read. This gets rid of the error-prone inputstream filtering in AxiomXMLDeclarationFilter, which only works with ASCII-based encodings (UTF-8), but not with UTF-16, for instance.
  • I use the Axiom ByteArrayDataSource, rather than the AxiomByteArrayDataSource.
  • The byte array element is now attached to the SOAP body when writing is completed. This gets rid of the addNonCachedPayloadElement() in AxiomSoapMessage.

I've also separated the non-caching/caching behavior in AxiomSoapBody into two separate classes. There is an abstract base class Payload in org.springframework.ws.soap.axiom, with two subclasses CachingPayload and NonCachingPayload. AxiomSoapBody creates an instance of these in the constructor, based on the payloadCaching boolean parameter. This gets rid of the if (payloadCaching) {} else {} logic scattered throughout the body.

Jim, could you update SVN or download a snapshot and tell me what you think? I've tried to stay close to the original idea of your patch, but I might have missed something. I'd like to make this part of the release this Friday, and I want to make sure I didn't screw up.

Thanks!

Comment by Jim Cummings [ 14/May/08 ]

Arjen - Your changes/improvements look nice and elegant. I see no problem with them at all.

I think for now using the ByteArrayOutputStream from the JDK is ok, and if anyone has perf issues, it could always be a future improvement to detect the commons.io version of ByteArrayOutputStream.

Two comments on the axiom version. I think there is a mistake in the maven repository for axis, as it shows entries for both axiom 1.2.6 and 1.2.7, and the spring-ws pom references 1.2.7. But I'm pretty sure axiom doesn't have a release 1.2.7 yet, because they just came out with 1.2.6.

Also, I wanted to make sure you knew that ByteArrayDataSource didn't come out until 1.2.6, so spring-ws will no longer work with axiom 1.2.5. I don't have any issues with that myself. I really wanted to use 1.2.6 as I had to do a few kind of ugly things due to some 1.2.5 limitations, so I'm happier seeing you use 1.2.6.

BTW - I've been meaning to comment on SWS-352 for the full streaming approach but haven't had the change to, but I'll try to soon.

Thanks for getting this fix in!!

Comment by Arjen Poutsma [ 14/May/08 ]

I downgraded to 1.2.6, because I couldn't find an announcement for 1.2.7 either.

I am not so worried about requiring 1.2.6, because it is only applicable if you disable payload caching. For the default case, you can still use 1.2.5.

Thanks!

Comment by Erik-Berndt Scheper [ 15/May/08 ]

Axiom 1.2.7 contains a fix for Axis2 1.4 and is included in the Axis2 1.4 distribution. See http://markmail.org/message/brgnp5tb4vmq6ysu?q=axiom+dev+mail for more details.

Therefore it should be fine to use Axiom 1.2.7. (though it's a bit sloppy not to officially announce it on the ws-commons site).

Comment by Erik-Berndt Scheper [ 15/May/08 ]

See also http://forum.springframework.org/showthread.php?t=54331

Comment by Arjen Poutsma [ 15/May/08 ]

OK, i'll upgrade again .

Comment by Arjen Poutsma [ 21/Jul/08 ]

Closing issues in 1.5.2

Generated at Mon Dec 18 20:32:59 UTC 2017 using JIRA 6.4.14#64029-sha1:ae256fe0fbb912241490ff1cecfb323ea0905ca5.