Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.0.4
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      I have no need of a skip limit and yet Spring Batch forces me to impose an artificial one. I can set that limit suitably sky high, but this is not an accurate representation of my client's business requirements.

      My ideal solution would be for the skip limit to not be required in xml and for Spring Batch to assume that unless ALL records are skipped that the batch job succeeded.

        Activity

        Hide
        mouyang Matthew Ouyang added a comment -

        I disagree with the implication that "there's obviously something wrong" if I "can't even parse a certain number of records." If I have a large number of records that can't be parsed, but I have a small number of records that can; then I must continue processing that small number of records.

        Setting a hard limit has little value because I am processing batches with anywhere from 100 to 100k records at a high frequency. Also, I don't think setting a percentage is reasonable because the total number of records needs to be known in advance.

        Based on these points, I think an infinite skip limit is the only reasonable solution for this situation. I suppose I can set the skip limit to MAX_VALUE but I think it's clearer to not specify a skip limit if it really doesn't exist.

        Show
        mouyang Matthew Ouyang added a comment - I disagree with the implication that "there's obviously something wrong" if I "can't even parse a certain number of records." If I have a large number of records that can't be parsed, but I have a small number of records that can; then I must continue processing that small number of records. Setting a hard limit has little value because I am processing batches with anywhere from 100 to 100k records at a high frequency. Also, I don't think setting a percentage is reasonable because the total number of records needs to be known in advance. Based on these points, I think an infinite skip limit is the only reasonable solution for this situation. I suppose I can set the skip limit to MAX_VALUE but I think it's clearer to not specify a skip limit if it really doesn't exist.
        Hide
        abhaic Abhai added a comment -

        Consider this
        1 ) I have a dynamic product feed coming every day, I don't know about the number of records I am receiving
        2) There is only exception possible when I process this product, "Product price is approved or not"
        3) On exception I want to send that record to product price approval queue using a skip Listener
        4) All other products needs to be published to my online catalog.

        There is nothing wrong if all my records are not approved and go to price approval queue. Since I don't know the feed size, there is no way I can set the hard limit. and entire processing will fail if I fail to estimate that limit. Setting a crazy high number is just a hack which I want to avoid.

        Show
        abhaic Abhai added a comment - Consider this 1 ) I have a dynamic product feed coming every day, I don't know about the number of records I am receiving 2) There is only exception possible when I process this product, "Product price is approved or not" 3) On exception I want to send that record to product price approval queue using a skip Listener 4) All other products needs to be published to my online catalog. There is nothing wrong if all my records are not approved and go to price approval queue. Since I don't know the feed size, there is no way I can set the hard limit. and entire processing will fail if I fail to estimate that limit. Setting a crazy high number is just a hack which I want to avoid.
        Hide
        bjorn_skogseth Bjorn Skogseth added a comment -

        My use case is:

        • A customer (external party) provides the product owner of my batch job with a flat input file where all lines are supposed to follow a specified format.
        • The product owner runs the batch job using the provided file.
        • She expects that in ALL cases where a particular line in the input file does not follow the format, a line is printed to an error file with the line number and reason for the rejection.
        • In some cases this may be close to 100% or potentially even 100% of the number of lines in the input file (between 1k and 100k), that doesn't matter, everything needs to show up in the error file.

        Setting a specific number where the job would just stop would be wrong. Using filtering would be wrong because these are actual errors, not to mention the fact that it would be harder to implement since the line number would have to be manually passed over to the processor. When handling the FlatFileParseException, the line number is easily accessible from the SkipListener with no extra effort.

        I currently use a skip limit of 1000000000 for this job. It's clumsy.

        Show
        bjorn_skogseth Bjorn Skogseth added a comment - My use case is: A customer (external party) provides the product owner of my batch job with a flat input file where all lines are supposed to follow a specified format. The product owner runs the batch job using the provided file. She expects that in ALL cases where a particular line in the input file does not follow the format, a line is printed to an error file with the line number and reason for the rejection. In some cases this may be close to 100% or potentially even 100% of the number of lines in the input file (between 1k and 100k), that doesn't matter, everything needs to show up in the error file. Setting a specific number where the job would just stop would be wrong. Using filtering would be wrong because these are actual errors, not to mention the fact that it would be harder to implement since the line number would have to be manually passed over to the processor. When handling the FlatFileParseException, the line number is easily accessible from the SkipListener with no extra effort. I currently use a skip limit of 1000000000 for this job. It's clumsy.
        Hide
        jpraet Jimmy Praet added a comment -

        Passing the line number to your processor is supported by the framework: if your item implements the org.springframework.batch.item.ItemCountAware interface, the FlatFileItemReader will automatically inject the line number into your item.

        For use case above you could use this in combination with a ClassifierCompositeItemWriter that writes the rejected records to one file, and the processed records to another file.

        Show
        jpraet Jimmy Praet added a comment - Passing the line number to your processor is supported by the framework: if your item implements the org.springframework.batch.item.ItemCountAware interface, the FlatFileItemReader will automatically inject the line number into your item. For use case above you could use this in combination with a ClassifierCompositeItemWriter that writes the rejected records to one file, and the processed records to another file.
        Hide
        bjorn_skogseth Bjorn Skogseth added a comment -

        Jimmy Praet: Great feedback. I'd still prefer to see this issue resolved, since it would feel wrong to not handle a parsing error as an error, which it is.
        But given that this may not be fixed, something along the lines you are suggesting might still be an improvement, although I would feel that I am replacing one workaround with another one.
        I'll certainly look into it the next time a change request comes up for this batch job.

        Show
        bjorn_skogseth Bjorn Skogseth added a comment - Jimmy Praet: Great feedback. I'd still prefer to see this issue resolved, since it would feel wrong to not handle a parsing error as an error, which it is. But given that this may not be fixed, something along the lines you are suggesting might still be an improvement, although I would feel that I am replacing one workaround with another one. I'll certainly look into it the next time a change request comes up for this batch job.

          People

          • Assignee:
            david_syer Dave Syer
            Reporter:
            caoilte Caoilte O'Connor
          • Votes:
            10 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

            • Created:
              Updated: