- Hadoop Installation: PHD Service for PCF (PHD1.1 based on Apache Hadoop 2.0.5: 2.0.5-alpha-gphd-126.96.36.199 ) running on vCHS
- Spring XD running in singlenode mode (version 1.0.0.RC1) on a vCHS VM
Steps to reproduce:
1- Setup a stream in Spring XD shell: "http --port=9000 | hdfs --rollover=10M --idleTimeout=60000" --deploy
2- Hit port 9000 every second with 1-10KB of JSON data
3- Observe the temp file being created in HDFS under /xd/<stream name>
4- Run `hadoop fs tail <file> --follow` to see that data is being written to HDFS
- HDFS sink continues to operate and eventually roll-over at 10MB
- After about 2 minutes of successful HDFS writes, the HDFS sink crashes and starts throwing exceptions (see full log attached):
"'java.io.IOException: All datanodes 192.168.109.61:50010 are bad. Aborting..."
- The temp file is never closed even after the stream is undeployed or destroyed.
Here are some details of our investigation that may be useful:
- I start both the shell and the singlenode runner with --hadoopDistro phd1; I also configured the hadoop fs namenode correctly in the XD shell.
- "http <options> | file <options>" work as expected; so does "http <options> | log"
- "time | hdfs" does not show the same crash problem. Up until now only the http source combined with hdfs sink presents this problem
- Putting a 4-10MB file in HDFS via the `Hadoop fs put` commands in Spring XD worked fine; so it's not a disk limitation.
- This could be related to PHD service running on vCHS since supporting this configuration is fairly new. But it's only reproducable (consistently) with Spring XD's "http | hdfs" stream.