Spring Batch
  1. Spring Batch
  2. BATCH-413

Unhandled IndexOutOfBounds in DrivingQueryItemReader

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.0.0.m4
    • Fix Version/s: 1.0.0.rc1
    • Component/s: Infrastructure
    • Labels:
      None

      Description

      In the getCurrentKey() method of org.springframework.batch.io.driving.DrivingQueryItemReader, if currentIndex is 0, you will get an ArrayIndexOutOfBoundsException. This occurred when my KeyGenerator query returned 0 results for a particular batch job.

      Is this supposed to be handled gracefully or is it an error? Is it a condition that my query must return at least 1 result?

        Activity

        Hide
        Dave Syer added a comment -

        I think it should be handled gracefully.

        Show
        Dave Syer added a comment - I think it should be handled gracefully.
        Hide
        Cameron Leach added a comment -

        Looking at the getKeyAsExecutionAttributes(Object key) method in org.springframework.batch.io.driving.support.SingleColumnJdbcKeyGenerator, it does seem like there was an assumption that there would always be at least one result returned (and possibly in more places):

        Assert.notNull(key, "The key must not be null.");

        I guess I'm wondering if there was another intended way to handle queries that could possibly return 0 results (the batch just ends)?

        Show
        Cameron Leach added a comment - Looking at the getKeyAsExecutionAttributes(Object key) method in org.springframework.batch.io.driving.support.SingleColumnJdbcKeyGenerator, it does seem like there was an assumption that there would always be at least one result returned (and possibly in more places): Assert.notNull(key, "The key must not be null."); I guess I'm wondering if there was another intended way to handle queries that could possibly return 0 results (the batch just ends)?
        Hide
        Lucas Ward added a comment -

        I updated the DrivingQueryItemReader to throw an exception if the KeyGenerator returns null or an empty list in the open method. Having nothing to process where you expect to process is a failure scenario. There are a few scenarios that we've called 'mark and sweep' jobs where there might not be anything to process for that particular run. However, this should be known. If we simply return null on the first call to read, we might be able to pull it out of statistics, but someone likely needs to make a domain decision on whether or not having no work to process is a failure or not, which they could easily do with a StepListener, but have few ways of knowing besides becoming stateless unless a specific exception is thrown.

        Show
        Lucas Ward added a comment - I updated the DrivingQueryItemReader to throw an exception if the KeyGenerator returns null or an empty list in the open method. Having nothing to process where you expect to process is a failure scenario. There are a few scenarios that we've called 'mark and sweep' jobs where there might not be anything to process for that particular run. However, this should be known. If we simply return null on the first call to read, we might be able to pull it out of statistics, but someone likely needs to make a domain decision on whether or not having no work to process is a failure or not, which they could easily do with a StepListener, but have few ways of knowing besides becoming stateless unless a specific exception is thrown.
        Hide
        Lucas Ward added a comment -

        As a side note, I created a new issue against 1.0.1 to ensure item readers are consistent in this regard:

        BATCH-414

        Show
        Lucas Ward added a comment - As a side note, I created a new issue against 1.0.1 to ensure item readers are consistent in this regard: BATCH-414
        Hide
        Mattias Hellborg Arthursson added a comment -

        To be honest I really don't understand the reasoning here. I don't agree that this should be considered an error condition. What I want to do in a step is process every item returned by the ItemReader. If there are no items to process in one particular run of the batch this seems to me as being a perfectly valid condition (i.e. no items arrived for me to process today - excellent, I'll just check tomorrow if there's anything new).

        Since the ItemReader is the one throwing the Exception, it would make sense to be able to configure the domain decision about the validity of this condition on the ItemReader itself. Adding a method to configure this on DrivingQueryItemReader would greatly simplify the configuration, and wouldn't change the current behavior - the default interpretation could still be that this is an error condition (even though I still disagree with that interpretation - but changing the default behavior should probably be avoided).

        whyt?

        Show
        Mattias Hellborg Arthursson added a comment - To be honest I really don't understand the reasoning here. I don't agree that this should be considered an error condition. What I want to do in a step is process every item returned by the ItemReader. If there are no items to process in one particular run of the batch this seems to me as being a perfectly valid condition (i.e. no items arrived for me to process today - excellent, I'll just check tomorrow if there's anything new). Since the ItemReader is the one throwing the Exception, it would make sense to be able to configure the domain decision about the validity of this condition on the ItemReader itself. Adding a method to configure this on DrivingQueryItemReader would greatly simplify the configuration, and wouldn't change the current behavior - the default interpretation could still be that this is an error condition (even though I still disagree with that interpretation - but changing the default behavior should probably be avoided). whyt?
        Hide
        Lucas Ward added a comment -

        I could come up with quite a few scenarios where you would want the job to be considered failed if there was no work to process. I'm sure you could come up with just as many scenarios where you would want it to be considered successfully completed. The real question here is what the default should be. In general I dislike no-oping and completing successfully rather than throwing an exception. We had many cases in the past where we no-oped and it caused confusion. For example, if someone expects to have data processed in a run but the job completes successfully with nothing having been written, you'll usually get something to the effect of: "Spring Batch won't process any of my records for some reason, this has to be a bug!" Rather than someone realizing that the file was empty, or the query was wrong. However, if you throw an explicit exception it's pretty obvious what happened. I suppose that's the brunt of my issue, if you no op and complete successfully there's no way to know that there was no work processed (other than the item count). However, if you throw an exception, someone can choose to ignore it (it's fairly easy to swallow it) if it doesn't make sense for their scenario. Furthermore, I would want to be able to log out the fact that nothing was done as well, without an exception I have no way of doing that without making my item reader stateful.

        I'm still open to discussion in general, but having been burnt in the past with no-oping, I'm hesitant to do it. It still feels to me that without an exception you're losing some vital information about the job run, with no good way of accessing it.

        Show
        Lucas Ward added a comment - I could come up with quite a few scenarios where you would want the job to be considered failed if there was no work to process. I'm sure you could come up with just as many scenarios where you would want it to be considered successfully completed. The real question here is what the default should be. In general I dislike no-oping and completing successfully rather than throwing an exception. We had many cases in the past where we no-oped and it caused confusion. For example, if someone expects to have data processed in a run but the job completes successfully with nothing having been written, you'll usually get something to the effect of: "Spring Batch won't process any of my records for some reason, this has to be a bug!" Rather than someone realizing that the file was empty, or the query was wrong. However, if you throw an explicit exception it's pretty obvious what happened. I suppose that's the brunt of my issue, if you no op and complete successfully there's no way to know that there was no work processed (other than the item count). However, if you throw an exception, someone can choose to ignore it (it's fairly easy to swallow it) if it doesn't make sense for their scenario. Furthermore, I would want to be able to log out the fact that nothing was done as well, without an exception I have no way of doing that without making my item reader stateful. I'm still open to discussion in general, but having been burnt in the past with no-oping, I'm hesitant to do it. It still feels to me that without an exception you're losing some vital information about the job run, with no good way of accessing it.
        Hide
        Lucas Ward added a comment -

        I also tossed this issue around with a few of the leading Batch architects here at Accenture via email. Here is one reply from Randy O'Neil:

        "I like the exception. No records has been a very uncommon condition in my experience – and generally something I'd like to be alerted to. I suppose if I had a job where I expected it to have nothing to do frequently I'd want to be able to catch this exception and return success.

        That said, I'd rather be alerted to look into the fact that the job failed because no records were found. If it turns out that was the case, I'd mark the job as successful in the scheduler and let the schedule continue."

        Show
        Lucas Ward added a comment - I also tossed this issue around with a few of the leading Batch architects here at Accenture via email. Here is one reply from Randy O'Neil: "I like the exception. No records has been a very uncommon condition in my experience – and generally something I'd like to be alerted to. I suppose if I had a job where I expected it to have nothing to do frequently I'd want to be able to catch this exception and return success. That said, I'd rather be alerted to look into the fact that the job failed because no records were found. If it turns out that was the case, I'd mark the job as successful in the scheduler and let the schedule continue."
        Hide
        Mattias Hellborg Arthursson added a comment -

        While it might be a less common case I'd still like to argue that it's a very valid one, and one that should be possible to easily configure the behavior for. The key thing here is that we're talking about a single step in a batch here - a batch which potentially contains several steps. For instance, in our situation we're each day receiving new/updated and removed products (products that for some reason are no longer valid). New/updated products are specified in one way and removed ones are specified separately. It seems to make sense to handle the different cases in separate steps. Now, it's a perfectly valid condition that we one day don't get information about any removed products. Similar cases shouldn't be hard to think of.

        Wrapping and swallowing the exception is one possibility, but it is kind of a hack. Being a situation that shouldn't be all that uncommon I still argue that it should be a simple configuration option on the ItemReader specifying whether this is an error condition or not. Hacking around it for the time being though.

        Show
        Mattias Hellborg Arthursson added a comment - While it might be a less common case I'd still like to argue that it's a very valid one, and one that should be possible to easily configure the behavior for. The key thing here is that we're talking about a single step in a batch here - a batch which potentially contains several steps. For instance, in our situation we're each day receiving new/updated and removed products (products that for some reason are no longer valid). New/updated products are specified in one way and removed ones are specified separately. It seems to make sense to handle the different cases in separate steps. Now, it's a perfectly valid condition that we one day don't get information about any removed products. Similar cases shouldn't be hard to think of. Wrapping and swallowing the exception is one possibility, but it is kind of a hack. Being a situation that shouldn't be all that uncommon I still argue that it should be a simple configuration option on the ItemReader specifying whether this is an error condition or not. Hacking around it for the time being though.
        Hide
        Dave Syer added a comment -

        Assume closed as resolved and released

        Show
        Dave Syer added a comment - Assume closed as resolved and released

          People

          • Assignee:
            Lucas Ward
            Reporter:
            Cameron Leach
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 0.5h
              0.5h
              Remaining:
              Remaining Estimate - 0.5h
              0.5h
              Logged:
              Time Spent - Not Specified
              Not Specified