在这上面花了两天,我现在很沮丧。一切似乎都好。当我在它上面运行较小的数据集时,它工作得很好,但是当我在它上面运行40gb的主数据集时,它总是会失败并出现这个错误。数据集正常,没有非法字符。
15/05/02 21:12:42 INFO mapred.JobClient: map 100% reduce 96%
15/05/02 21:12:43 INFO mapred.JobClient: map 100% reduce 97%
15/05/02 21:12:45 INFO mapred.JobClient: map 100% reduce 98%
15/05/02 21:12:47 INFO mapred.JobClient: map 100% reduce 99%
15/05/02 21:12:52 INFO mapred.JobClient: map 100% reduce 100%
15/05/02 21:12:52 INFO mapred.JobClient: Job complete: job_201505011756_0013
15/05/02 21:12:52 INFO mapred.JobClient: Counters: 30
15/05/02 21:12:52 INFO mapred.JobClient: Map-Reduce Framework
15/05/02 21:12:52 INFO mapred.JobClient: Spilled Records=295830048
15/05/02 21:12:52 INFO mapred.JobClient: Map output materialized bytes=4511435075
15/05/02 21:12:52 INFO mapred.JobClient: Reduce input records=147915024
15/05/02 21:12:52 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1973084037120
15/05/02 21:12:52 INFO mapred.JobClient: Map input records=1479169548
15/05/02 21:12:52 INFO mapred.JobClient: SPLIT_RAW_BYTES=109140
15/05/02 21:12:52 INFO mapred.JobClient: Map output bytes=4215470387
15/05/02 21:12:52 INFO mapred.JobClient: Reduce shuffle bytes=4511435075
15/05/02 21:12:52 INFO mapred.JobClient: Physical memory (bytes) snapshot=268727762944
15/05/02 21:12:52 INFO mapred.JobClient: Map input bytes=68433542634
15/05/02 21:12:52 INFO mapred.JobClient: Reduce input groups=1020
15/05/02 21:12:52 INFO mapred.JobClient: Combine output records=0
15/05/02 21:12:52 INFO mapred.JobClient: Reduce output records=147915024
15/05/02 21:12:52 INFO mapred.JobClient: Map output records=147915024
15/05/02 21:12:52 INFO mapred.JobClient: Combine input records=0
15/05/02 21:12:52 INFO mapred.JobClient: CPU time spent (ms)=1611510
15/05/02 21:12:52 INFO mapred.JobClient: Total committed heap usage (bytes)=209235476480
15/05/02 21:12:52 INFO mapred.JobClient: File Input Format Counters
15/05/02 21:12:52 INFO mapred.JobClient: Bytes Read=68500323818
15/05/02 21:12:52 INFO mapred.JobClient: FileSystemCounters
15/05/02 21:12:52 INFO mapred.JobClient: HDFS_BYTES_READ=68500432958
15/05/02 21:12:52 INFO mapred.JobClient: FILE_BYTES_WRITTEN=9105249650
15/05/02 21:12:52 INFO mapred.JobClient: FILE_BYTES_READ=4511300789
15/05/02 21:12:52 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=3623810291
15/05/02 21:12:52 INFO mapred.JobClient: File Output Format Counters
15/05/02 21:12:52 INFO mapred.JobClient: Bytes Written=3623810291
15/05/02 21:12:52 INFO mapred.JobClient: Job Counters
15/05/02 21:12:52 INFO mapred.JobClient: Launched map tasks=1033
15/05/02 21:12:52 INFO mapred.JobClient: Launched reduce tasks=24
15/05/02 21:12:52 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=2505921
15/05/02 21:12:52 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
15/05/02 21:12:52 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=2009059
15/05/02 21:12:52 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
15/05/02 21:12:52 INFO mapred.JobClient: Data-local map tasks=1033
15/05/02 21:12:52 INFO operations.Sampler: resultSize: 4215470387
15/05/02 21:12:52 INFO operations.Sampler: resultCount: 147915024
15/05/02 21:12:52 INFO operations.Sampler: MapReduce return 0.02487447197431825 of 147915024 records
15/05/02 21:12:52 INFO mapred.FileInputFormat: No block filter specified
15/05/02 21:12:52 INFO mapred.FileInputFormat: Total input paths to process : 22
15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.10:50010
15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.5:50010
15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.7:50010
15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.4:50010
15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.13:50010
15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.14:50010
15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.11:50010
15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.9:50010
15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.8:50010
15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.6:50010
15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.2:50010
15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.3:50010
15/05/02 21:12:53 INFO mapred.JobClient: Running job: job_201505011756_0014
15/05/02 21:12:54 INFO mapred.JobClient: map 0% reduce 0%
15/05/02 21:13:02 INFO mapred.JobClient: Task Id : attempt_201505011756_0014_m_000000_0, Status : FAILED
java.lang.NumberFormatException: empty String
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1842)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at edu.umn.cs.spatialHadoop.io.TextSerializerHelper.consumeDouble(TextSerializerHelper.java:182)
at edu.umn.cs.spatialHadoop.core.Rectangle.fromText(Rectangle.java:276)
at edu.umn.cs.spatialHadoop.core.STPRect.fromText(STPRect.java:41)
at edu.umn.cs.spatialHadoop.operations.Sampler$Map.map(Sampler.java:122)
at edu.umn.cs.spatialHadoop.operations.Sampler$Map.map(Sampler.java:69)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
15/05/02 21:13:02 INFO mapred.JobClient: Task Id : attempt_201505011756_0014_m_000002_0, Status : FAILED
java.lang.NumberFormatException: empty String
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1842)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at edu.umn.cs.spatialHadoop.io.TextSerializerHelper.consumeDouble(TextSerializerHelpe r.java:182)
at edu.umn.cs.spatialHadoop.core.Rectangle.fromText(Rectangle.java:276)
at edu.umn.cs.spatialHadoop.core.STPRect.from
数据集是这样的:
32714,13271400,132704,13271400,132704
132715,13271500,132716,13271500,132716
132716,13271600,132717,13271600,132717
132717,13271700,132718,13271700,132718
132718,13271800,132719,13271800,132719
132719,13271900,132709,13271900,132709
132720,13272000,132721,13272000,132721
132721,13272100,132722,13272100,132722
132722,13272200,132723,13272200,132723
132723,13272300,132724,13272300,132724
132724,13272400,132725,13272400,132725
132725,13272500,132726,13272500,132726
132726,13272600,132727,13272600,132727
132727,13272700,132728,13272700,132728
132728,13272800,132729,13272800,132729
132729,13272900,132730,13272900,132730
有什么想法吗?请帮忙。谢谢
1条答案
按热度按时间snz8szmq1#
我的解决方案是检查当前输入是否为双精度。如果它没有记录并移动到下一个输入。
或者,你可以抓住
NumberFormatException
.