Uploaded image for project: 'Spring XD'
  1. Spring XD
  2. XD-2078

"http | hdfs" stream starts throwing exceptions after a few minutes

    Details

    • Type: Bug
    • Status: Done
    • Priority: Major
    • Resolution: Complete
    • Affects Version/s: 1.0 RC1
    • Fix Version/s: 1.0.2, 1.1 M1
    • Component/s: None
    • Labels:
      None
    • Story Points:
      0
    • Rank (Obsolete):
      137
    • Sprint:
      Sprint 37

      Description

      Environment:

      • Hadoop Installation: PHD Service for PCF (PHD1.1 based on Apache Hadoop 2.0.5: 2.0.5-alpha-gphd-2.1.0.0 ) running on vCHS
      • Spring XD running in singlenode mode (version 1.0.0.RC1) on a vCHS VM

      Steps to reproduce:
      1- Setup a stream in Spring XD shell: "http --port=9000 | hdfs --rollover=10M --idleTimeout=60000" --deploy
      2- Hit port 9000 every second with 1-10KB of JSON data
      3- Observe the temp file being created in HDFS under /xd/<stream name>
      4- Run `hadoop fs tail <file> --follow` to see that data is being written to HDFS

      Expected result:

      • HDFS sink continues to operate and eventually roll-over at 10MB

      Actual:

      • After about 2 minutes of successful HDFS writes, the HDFS sink crashes and starts throwing exceptions (see full log attached):
        "'java.io.IOException: All datanodes 192.168.109.61:50010 are bad. Aborting..."
      • The temp file is never closed even after the stream is undeployed or destroyed.

      Here are some details of our investigation that may be useful:

      • I start both the shell and the singlenode runner with --hadoopDistro phd1; I also configured the hadoop fs namenode correctly in the XD shell.
      • "http <options> | file <options>" work as expected; so does "http <options> | log"
      • "time | hdfs" does not show the same crash problem. Up until now only the http source combined with hdfs sink presents this problem
      • Putting a 4-10MB file in HDFS via the `Hadoop fs put` commands in Spring XD worked fine; so it's not a disk limitation.
      • This could be related to PHD service running on vCHS since supporting this configuration is fairly new. But it's only reproducable (consistently) with Spring XD's "http | hdfs" stream.

        Attachments

          Activity

            People

            • Assignee:
              thomas.risberg Thomas Risberg
              Reporter:
              ssojoodi Sina Sojoodi
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: