Spring Integration
  1. Spring Integration
  2. INT-1141

Avoid Handler List cloning when delivering messages.

    Details

      Description

      While delivering messages, the AbstractDispatcher generates 2 copies of the registered handlers collection in order to iterate them. We have found a 30% increment throughput after replacing the structure used for storing handlers. This new structure allows iteration, and concurrent modification (if any) of the collection, and avoids the need of copying it everytime a message needs to be dispatched by a MessageChannel.

      I am attaching the OrderedAwareCopyOnWriterArrayList and the AbstractDisptacher which uses it.

      This implementation improves the most common usage of the registered handlers collection (iteration), and as drawback, handler registration takes O(n). We tried to keep compatibility with the previous implementation, therefore we need to add a handler to the list iif it is not already in it. If this is not a requirement and it was actually a side effect of using a set as base structure, we can just add the new handler and reduce the Order from n to ~1.

        Issue Links

          Activity

          Hide
          Diego Belfer added a comment -

          Hi Oleg,

          I have looked at your test case and there are some issues to consider in there. I will enumerated them in order of importance:

          • The test has actually a bug. Wwhile testing the Set implementation for iteration, it is actually iteraring the "list" instead of the set, so the results are for the same implementation.
          • The major issue with the current Set implementation is that it has to be transformed to an array, then converted to a List and finally wrapped in an unmodifiableList in order to iterate it, and that is completely missing in the test. The new implementation provides a fast way to iterate the list in a concurrent environment and this is by far the most common use case for the this list, since it is iterated everytime a channel dispatches a message.
            Therefore it is important to include in the test the code used to create an iterable list from the current Set.
          AbstractDispatcher.java
          protected List<MessageHandler> getHandlers() {
              return Collections.<MessageHandler>unmodifiableList(Arrays.<MessageHandler>asList(
          				this.handlers.toArray(new MessageHandler[this.handlers.size()])));
          }
          
          • Finally there is not warm up to trigger JIT compilation, the test as it is may be optimized due to the particular way in which the classes are used, scope, etc. I have to say that I don't know much about how it works but, EscapeAnalysis and others features may produce undesired effects in this test.

          In summary, there is a performance gain while iterating the list; on the other side adding an element has a big penalty.You probably know more about scenarios where spring-integration is used; but in our scenario, we have hundred of channels with less than 5 message handlers each, we send millions of messages per second and we configure the handlers during context creation, so it is actually a big improvement for us.

          I am attaching a Test case with concurrent usage where you can see the improvement I mention; check for instance the difference in the number of GC passes. These are the results I get in my box:

          Starting test iteration count for: OrderedAwareCopyOnWriteArrayList
          (iterator) With OrderedAwareCopyOnWriteArrayList: 2822
          [GC 115011K->10002K(212928K), 0.0005700 secs]
          [Full GC 10002K->9893K(212928K), 0.0406240 secs]

          Starting test iteration count for: OrderedAwareLinkedHashSet
          [GC 134565K->10021K(212928K), 0.0009850 secs]
          [GC 134693K->9989K(342400K), 0.0014990 secs]
          [GC 259333K->10053K(342464K), 0.0009480 secs]
          [GC 259397K->10021K(591232K), 0.0007960 secs]
          [GC 508709K->10045K(591424K), 0.0006830 secs]
          [GC 508733K->10109K(737728K), 0.0005370 secs]
          [GC 655933K->10013K(737984K), 0.0006980 secs]
          [GC 655837K->10045K(738176K), 0.0006380 secs]
          [GC 657085K->10077K(699456K), 0.0005890 secs]
          [GC 626269K->10045K(670080K), 0.0007140 secs]
          [GC 596925K->10045K(649728K), 0.0007980 secs]
          [GC 569085K->10109K(615872K), 0.0009860 secs]
          [GC 542717K->10045K(597376K), 0.0006340 secs]
          [GC 517501K->10013K(566784K), 0.0009250 secs]
          [GC 493597K->10045K(550144K), 0.0006160 secs]
          [GC 470973K->10013K(522560K), 0.0010030 secs]
          [GC 449373K->10045K(507392K), 0.0005570 secs]
          [GC 428925K->9981K(482560K), 0.0040090 secs]
          (iterator) With OrderedAwareLinkedHashSet: 20163

          Show
          Diego Belfer added a comment - Hi Oleg, I have looked at your test case and there are some issues to consider in there. I will enumerated them in order of importance: The test has actually a bug. Wwhile testing the Set implementation for iteration, it is actually iteraring the "list" instead of the set, so the results are for the same implementation. The major issue with the current Set implementation is that it has to be transformed to an array, then converted to a List and finally wrapped in an unmodifiableList in order to iterate it, and that is completely missing in the test. The new implementation provides a fast way to iterate the list in a concurrent environment and this is by far the most common use case for the this list, since it is iterated everytime a channel dispatches a message. Therefore it is important to include in the test the code used to create an iterable list from the current Set. AbstractDispatcher.java protected List<MessageHandler> getHandlers() { return Collections.<MessageHandler>unmodifiableList(Arrays.<MessageHandler>asList( this .handlers.toArray( new MessageHandler[ this .handlers.size()]))); } Finally there is not warm up to trigger JIT compilation, the test as it is may be optimized due to the particular way in which the classes are used, scope, etc. I have to say that I don't know much about how it works but, EscapeAnalysis and others features may produce undesired effects in this test. In summary, there is a performance gain while iterating the list; on the other side adding an element has a big penalty.You probably know more about scenarios where spring-integration is used; but in our scenario, we have hundred of channels with less than 5 message handlers each, we send millions of messages per second and we configure the handlers during context creation, so it is actually a big improvement for us. I am attaching a Test case with concurrent usage where you can see the improvement I mention; check for instance the difference in the number of GC passes. These are the results I get in my box: Starting test iteration count for: OrderedAwareCopyOnWriteArrayList (iterator) With OrderedAwareCopyOnWriteArrayList: 2822 [GC 115011K->10002K(212928K), 0.0005700 secs] [Full GC 10002K->9893K(212928K), 0.0406240 secs] Starting test iteration count for: OrderedAwareLinkedHashSet [GC 134565K->10021K(212928K), 0.0009850 secs] [GC 134693K->9989K(342400K), 0.0014990 secs] [GC 259333K->10053K(342464K), 0.0009480 secs] [GC 259397K->10021K(591232K), 0.0007960 secs] [GC 508709K->10045K(591424K), 0.0006830 secs] [GC 508733K->10109K(737728K), 0.0005370 secs] [GC 655933K->10013K(737984K), 0.0006980 secs] [GC 655837K->10045K(738176K), 0.0006380 secs] [GC 657085K->10077K(699456K), 0.0005890 secs] [GC 626269K->10045K(670080K), 0.0007140 secs] [GC 596925K->10045K(649728K), 0.0007980 secs] [GC 569085K->10109K(615872K), 0.0009860 secs] [GC 542717K->10045K(597376K), 0.0006340 secs] [GC 517501K->10013K(566784K), 0.0009250 secs] [GC 493597K->10045K(550144K), 0.0006160 secs] [GC 470973K->10013K(522560K), 0.0010030 secs] [GC 449373K->10045K(507392K), 0.0005570 secs] [GC 428925K->9981K(482560K), 0.0040090 secs] (iterator) With OrderedAwareLinkedHashSet: 20163
          Hide
          Diego Belfer added a comment -

          The test case I mentioned.

          Show
          Diego Belfer added a comment - The test case I mentioned.
          Hide
          Diego Belfer added a comment -

          I have just seen your comment about keeping the AbstractDispatcher configurable, which seems very logic. We are just happy with having the flexibility of changing the implementation which don't fit our needs.

          In that case, it is possible to define a common interface between both implementations, and add to it a new method (something like getIterableSnapshot()) which returns an iterable version of the List. OrderedAwareCopyOnWriteArrayList will return the unmodifiableList()( O(1) ) and OrderedAwareLinkedHashSet will return a new List as it is currently implemented in the AbstractDispatcher ( O(N) ).

          Show
          Diego Belfer added a comment - I have just seen your comment about keeping the AbstractDispatcher configurable, which seems very logic. We are just happy with having the flexibility of changing the implementation which don't fit our needs. In that case, it is possible to define a common interface between both implementations, and add to it a new method (something like getIterableSnapshot()) which returns an iterable version of the List. OrderedAwareCopyOnWriteArrayList will return the unmodifiableList()( O(1) ) and OrderedAwareLinkedHashSet will return a new List as it is currently implemented in the AbstractDispatcher ( O(N) ).
          Hide
          Oleg Zhurakousky added a comment -

          Diego, thanks for the clarification and now I see what I missed he first time. Basically the improvement mainly affects the READ (in the positive way) but also affects the WRITE (in the negative way). Right?
          Also investigating it further allowed me to find a compromise where we can retain the speed of WRITE while greatly improving on READ (doing similar things as you did).

          I'll issue PR later on so you can see and leave your review comments.
          Thanks again

          Show
          Oleg Zhurakousky added a comment - Diego, thanks for the clarification and now I see what I missed he first time. Basically the improvement mainly affects the READ (in the positive way) but also affects the WRITE (in the negative way). Right? Also investigating it further allowed me to find a compromise where we can retain the speed of WRITE while greatly improving on READ (doing similar things as you did). I'll issue PR later on so you can see and leave your review comments. Thanks again
          Hide
          Diego Belfer added a comment -

          Oleg, do you know the usage pattern for the add method?

          If most of the handlers are added during the initialization process, it is possible to implement en addAll method which would solve problem. If the add is done one by one, then another structure may be used.

          Show
          Diego Belfer added a comment - Oleg, do you know the usage pattern for the add method? If most of the handlers are added during the initialization process, it is possible to implement en addAll method which would solve problem. If the add is done one by one, then another structure may be used.

            People

            • Assignee:
              Oleg Zhurakousky
              Reporter:
              Diego Belfer
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 0.5d
                0.5d
                Remaining:
                Remaining Estimate - 0.5d
                0.5d
                Logged:
                Time Spent - Not Specified
                Not Specified