hadoop error failed java.lang.numberformatexception:几乎完成后为空字符串

chhqkbe1  于 2021-06-02  发布在  Hadoop
关注(0)|答案(1)|浏览(275)

在这上面花了两天,我现在很沮丧。一切似乎都好。当我在它上面运行较小的数据集时,它工作得很好,但是当我在它上面运行40gb的主数据集时,它总是会失败并出现这个错误。数据集正常,没有非法字符。

15/05/02 21:12:42 INFO mapred.JobClient:  map 100% reduce 96%
15/05/02 21:12:43 INFO mapred.JobClient:  map 100% reduce 97%
15/05/02 21:12:45 INFO mapred.JobClient:  map 100% reduce 98%
15/05/02 21:12:47 INFO mapred.JobClient:  map 100% reduce 99%
15/05/02 21:12:52 INFO mapred.JobClient:  map 100% reduce 100%
15/05/02 21:12:52 INFO mapred.JobClient: Job complete: job_201505011756_0013
15/05/02 21:12:52 INFO mapred.JobClient: Counters: 30
15/05/02 21:12:52 INFO mapred.JobClient:   Map-Reduce Framework
15/05/02 21:12:52 INFO mapred.JobClient:     Spilled Records=295830048
15/05/02 21:12:52 INFO mapred.JobClient:     Map output materialized    bytes=4511435075
15/05/02 21:12:52 INFO mapred.JobClient:     Reduce input records=147915024
15/05/02 21:12:52 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1973084037120
15/05/02 21:12:52 INFO mapred.JobClient:     Map input records=1479169548
15/05/02 21:12:52 INFO mapred.JobClient:     SPLIT_RAW_BYTES=109140
15/05/02 21:12:52 INFO mapred.JobClient:     Map output bytes=4215470387
15/05/02 21:12:52 INFO mapred.JobClient:     Reduce shuffle bytes=4511435075
15/05/02 21:12:52 INFO mapred.JobClient:     Physical memory (bytes) snapshot=268727762944
15/05/02 21:12:52 INFO mapred.JobClient:     Map input bytes=68433542634
 15/05/02 21:12:52 INFO mapred.JobClient:     Reduce input groups=1020
15/05/02 21:12:52 INFO mapred.JobClient:     Combine output records=0
15/05/02 21:12:52 INFO mapred.JobClient:     Reduce output records=147915024
15/05/02 21:12:52 INFO mapred.JobClient:     Map output records=147915024
15/05/02 21:12:52 INFO mapred.JobClient:     Combine input records=0
15/05/02 21:12:52 INFO mapred.JobClient:     CPU time spent (ms)=1611510
15/05/02 21:12:52 INFO mapred.JobClient:     Total committed heap usage  (bytes)=209235476480
15/05/02 21:12:52 INFO mapred.JobClient:   File Input Format Counters 
15/05/02 21:12:52 INFO mapred.JobClient:     Bytes Read=68500323818
15/05/02 21:12:52 INFO mapred.JobClient:   FileSystemCounters
15/05/02 21:12:52 INFO mapred.JobClient:     HDFS_BYTES_READ=68500432958
15/05/02 21:12:52 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=9105249650
15/05/02 21:12:52 INFO mapred.JobClient:     FILE_BYTES_READ=4511300789
15/05/02 21:12:52 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3623810291
15/05/02 21:12:52 INFO mapred.JobClient:   File Output Format Counters 
15/05/02 21:12:52 INFO mapred.JobClient:     Bytes Written=3623810291
15/05/02 21:12:52 INFO mapred.JobClient:   Job Counters 
15/05/02 21:12:52 INFO mapred.JobClient:     Launched map tasks=1033
15/05/02 21:12:52 INFO mapred.JobClient:     Launched reduce tasks=24
15/05/02 21:12:52 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=2505921
15/05/02 21:12:52 INFO mapred.JobClient:     Total time spent by all reduces  waiting after reserving slots (ms)=0
15/05/02 21:12:52 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=2009059
 15/05/02 21:12:52 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
 15/05/02 21:12:52 INFO mapred.JobClient:     Data-local map tasks=1033
 15/05/02 21:12:52 INFO operations.Sampler: resultSize: 4215470387
 15/05/02 21:12:52 INFO operations.Sampler: resultCount: 147915024
 15/05/02 21:12:52 INFO operations.Sampler: MapReduce return 0.02487447197431825 of 147915024 records
 15/05/02 21:12:52 INFO mapred.FileInputFormat: No block filter specified
 15/05/02 21:12:52 INFO mapred.FileInputFormat: Total input paths to process : 22
 15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.10:50010
  15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.5:50010
  15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.7:50010
  15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.4:50010
  15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.13:50010
  15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.14:50010
  15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.11:50010
  15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.9:50010
  15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.8:50010
  15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.6:50010
  15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.2:50010
  15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.3:50010
  15/05/02 21:12:53 INFO mapred.JobClient: Running job: job_201505011756_0014
  15/05/02 21:12:54 INFO mapred.JobClient:  map 0% reduce 0%
  15/05/02 21:13:02 INFO mapred.JobClient: Task Id :  attempt_201505011756_0014_m_000000_0, Status : FAILED
   java.lang.NumberFormatException: empty String
    at  sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1842)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at edu.umn.cs.spatialHadoop.io.TextSerializerHelper.consumeDouble(TextSerializerHelper.java:182)
at edu.umn.cs.spatialHadoop.core.Rectangle.fromText(Rectangle.java:276)
at edu.umn.cs.spatialHadoop.core.STPRect.fromText(STPRect.java:41)
at edu.umn.cs.spatialHadoop.operations.Sampler$Map.map(Sampler.java:122)
at edu.umn.cs.spatialHadoop.operations.Sampler$Map.map(Sampler.java:69)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

15/05/02 21:13:02 INFO mapred.JobClient: Task Id :   attempt_201505011756_0014_m_000002_0, Status : FAILED
  java.lang.NumberFormatException: empty String
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1842)
    at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
     at java.lang.Double.parseDouble(Double.java:538)
at  edu.umn.cs.spatialHadoop.io.TextSerializerHelper.consumeDouble(TextSerializerHelpe r.java:182)
at edu.umn.cs.spatialHadoop.core.Rectangle.fromText(Rectangle.java:276)
at edu.umn.cs.spatialHadoop.core.STPRect.from

数据集是这样的:

32714,13271400,132704,13271400,132704
  132715,13271500,132716,13271500,132716
  132716,13271600,132717,13271600,132717
  132717,13271700,132718,13271700,132718
  132718,13271800,132719,13271800,132719
  132719,13271900,132709,13271900,132709
  132720,13272000,132721,13272000,132721 
  132721,13272100,132722,13272100,132722
  132722,13272200,132723,13272200,132723
  132723,13272300,132724,13272300,132724
  132724,13272400,132725,13272400,132725
  132725,13272500,132726,13272500,132726
  132726,13272600,132727,13272600,132727
  132727,13272700,132728,13272700,132728
  132728,13272800,132729,13272800,132729
  132729,13272900,132730,13272900,132730

有什么想法吗?请帮忙。谢谢

snz8szmq

snz8szmq1#

我的解决方案是检查当前输入是否为双精度。如果它没有记录并移动到下一个输入。

//assuming current input is called input and your doubles are >1
if (input.matches("[\\d]+\\.*[\\d]*")
{
    //process normally
}
else
{
   //log and continuecontinue
}

或者,你可以抓住 NumberFormatException .

try
{
    double d = Double.parseDouble(input)
    //process normally
}
catch(NumberFormatException e)
{
   //log and continue to next input
}

相关问题