Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0.m4
    • Fix Version/s: 2.0.0.M2
    • Component/s: Core
    • Labels:
      None

      Description

      Since 1.0.0.m4, a new test have been added when re-executing a failed batch.

      The new test lies in org.springframework.batch.core.domain.JobExecution class.
      When you re-start a failed batch, it checks if the jobexecution is running or not.
      I've checked that 1.0.0.m3 didn't do this test.

      Here is the method:

      /**

      • Test if this {@link JobExecution}

        indicates that it is running. It should

      • be noted that this does not necessarily mean that it has been persisted
      • as such yet.
      • @return true if the end time is null
        */
        public boolean isRunning() {
        return endTime == null;
        }

      What's the point ?

      Considere the following scenario:

      • You start you're batch
      • It creates a new job instance and job execution
      • You simulate a hardware failure by killing the batch while it runs.
        => job execution state is still STARTED with NO END DATE, because the batch haven't had any chance to update DB before it gets killed.
      • You restart the same batch.
        => It finds again the same JobInstance and JobExecution, but sees that the JobExecution.isRunning () is true, so it returns an error:
        org.springframework.batch.core.repository.JobExecu tionAlreadyRunningException: A job execution for this job is already running

      Expected behavior

      My point of view is that a killed batch should be restartable without errors.

      Should I create a JIRA bug ?

        Issue Links

          Activity

          Hide
          Lucas Ward added a comment -

          After going back and forth on a lot of different possibilities on this issue, (and discussing with Thomas and Dave), I kept it simple and added a check to the SimpleJobRepository to determine if the status of the JobExecution of the currently executing StepExecution has been set to STOPPING or not. If so, setTerminateOnly is set to true (throwing an exception will cause the job to fail, not stop) and the step will stop. It's by no means complete, but it took a lot of back and forth to even decide that this was the right direction.

          One other issue to note is that we really need to pick some common names and stick to them. In the step we called it 'interruption' (and even have a policy named that), the status is called 'STOPPED' and 'STOPPING', and you set the stepExecution to 'terminateOnly'. I don't care which one of the three we go with (stop, interrupt, terminate), but we should be consistent across the board. (although I heavily favor stop, since it will go well with pause)

          Show
          Lucas Ward added a comment - After going back and forth on a lot of different possibilities on this issue, (and discussing with Thomas and Dave), I kept it simple and added a check to the SimpleJobRepository to determine if the status of the JobExecution of the currently executing StepExecution has been set to STOPPING or not. If so, setTerminateOnly is set to true (throwing an exception will cause the job to fail, not stop) and the step will stop. It's by no means complete, but it took a lot of back and forth to even decide that this was the right direction. One other issue to note is that we really need to pick some common names and stick to them. In the step we called it 'interruption' (and even have a policy named that), the status is called 'STOPPED' and 'STOPPING', and you set the stepExecution to 'terminateOnly'. I don't care which one of the three we go with (stop, interrupt, terminate), but we should be consistent across the board. (although I heavily favor stop, since it will go well with pause)
          Hide
          Lucas Ward added a comment -

          I added the lastUpdated property to JobExecution and a column in it's respective database. There is also a new functional test for ensuring that modifying the status of a JobExecution will cause it to stop. I just need to add some functionality to the daos and the JobOperator to support calling stop that way. The column changes I've made were also only done in the immeadiate hsql schema in src/main/resources. The base ones used for the template need to be updated as well.

          Show
          Lucas Ward added a comment - I added the lastUpdated property to JobExecution and a column in it's respective database. There is also a new functional test for ensuring that modifying the status of a JobExecution will cause it to stop. I just need to add some functionality to the daos and the JobOperator to support calling stop that way. The column changes I've made were also only done in the immeadiate hsql schema in src/main/resources. The base ones used for the template need to be updated as well.
          Hide
          Lucas Ward added a comment -

          I added the support for stop into the SimpleJobOperator, which was committed. I'll need a little more time to get the sample working using it. There's a bit of a mess to untangle with some of the dao's, etc.

          I'm not sure if the stop interface in JobOperator really needs to return a boolean. We don't have anyway of knowing whether or not the stop is successful, short of sleeping and checking to see if it's stopped, which is probably not the operator implementation's responsibility.

          Show
          Lucas Ward added a comment - I added the support for stop into the SimpleJobOperator, which was committed. I'll need a little more time to get the sample working using it. There's a bit of a mess to untangle with some of the dao's, etc. I'm not sure if the stop interface in JobOperator really needs to return a boolean. We don't have anyway of knowing whether or not the stop is successful, short of sleeping and checking to see if it's stopped, which is probably not the operator implementation's responsibility.
          Hide
          Lucas Ward added a comment -

          Jobs can now successfully be stopped via the JobOperator interface. There's still some polishing that needs to be done between all the collaborators, but I think that should be a separate issue, so I'm resolving this one.

          Show
          Lucas Ward added a comment - Jobs can now successfully be stopped via the JobOperator interface. There's still some polishing that needs to be done between all the collaborators, but I think that should be a separate issue, so I'm resolving this one.
          Hide
          Lucas Ward added a comment -

          Reassigned for review.

          Show
          Lucas Ward added a comment - Reassigned for review.

            People

            • Assignee:
              Robert Kasanicky
              Reporter:
              Gerard COLLIN
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 3d Original Estimate - 3d
                3d
                Remaining:
                Time Spent - 3d Remaining Estimate - 0.5d
                0.5d
                Logged:
                Time Spent - 3d Remaining Estimate - 0.5d
                3d