Spring Batch
  1. Spring Batch
  2. BATCH-1908

Inefficient storage of StepExecutionContexts when using partitioning

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Complete
    • Affects Version/s: 2.1.9
    • Fix Version/s: 2.2.0, 2.2.0 - Sprint 17
    • Component/s: Core
    • Labels:
      None

      Description

      When using a PartitionStep, each StepExecutionContext created for the corresponding partitions are saved and committed individually. When a job has a large number of partitions, this leads to large delays. Look into batching the inserts of these records.

        Activity

        Hide
        Larry Bullock added a comment -

        I don't know if this is the correct workaround, but here is what I did to correct this:

        I modified SimpleStepExecutionSplitter as follows:

        In the split() method, I commented out the call to jobRepository.add(currentStepExecution) and added a call to a new method (addtoRepository) just before returning the set

        Here is the method I created:

        private void addToRepository(Set<StepExecution> stepExecutions) {
        Iterator i = stepExecutions.iterator();
        StepExecution se = null;
        while (i.hasNext())

        { se = (StepExecution) i.next(); jobRepository.add(se); }

        }

        Using the old SimpleStepExecutionSplitter, it would take 40-90 minutes to load the BATCH_STEP_EXECUTION table using a job with 1,000 files. Now it is done in 10 minutes.

        I don't know if this is the best way to do this as I do not know the source code for SpringBatch much, but it does represent an improvement.

        Show
        Larry Bullock added a comment - I don't know if this is the correct workaround, but here is what I did to correct this: I modified SimpleStepExecutionSplitter as follows: In the split() method, I commented out the call to jobRepository.add(currentStepExecution) and added a call to a new method (addtoRepository) just before returning the set Here is the method I created: private void addToRepository(Set<StepExecution> stepExecutions) { Iterator i = stepExecutions.iterator(); StepExecution se = null; while (i.hasNext()) { se = (StepExecution) i.next(); jobRepository.add(se); } } Using the old SimpleStepExecutionSplitter, it would take 40-90 minutes to load the BATCH_STEP_EXECUTION table using a job with 1,000 files. Now it is done in 10 minutes. I don't know if this is the best way to do this as I do not know the source code for SpringBatch much, but it does represent an improvement.
        Show
        David Turanski added a comment - https://github.com/SpringSource/spring-batch/pull/142

          People

          • Assignee:
            David Turanski
            Reporter:
            Michael Minella
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 0.5d
              0.5d
              Remaining:
              Remaining Estimate - 0.5d
              0.5d
              Logged:
              Time Spent - Not Specified
              Not Specified