问题描述
当我为100万个URL运行nutch作业时,nutch作业失败 与
20/10/14 12:40:34 ERROR fetcher.Fetcher: Fetcher: java.lang.RuntimeException: Fetcher job did not succeed,job status:Failed,reason: Task Failed task_1601725692999_0307_m_000004
Job Failed as tasks Failed. FailedMaps:1 FailedReduces:0
at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:500)
at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:541)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:514)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:244)
at org.apache.hadoop.util.RunJar.main(RunJar.java:158)
Error running:
/home/hadoop/apache-nutch-1.17/runtime/deploy/bin/nutch fetch -Dmapreduce.map.memory.mb=2048 -Dmapreduce.map.java.opts=-Xmx2048m -Dmapreduce.reduce.memory.mb=2048 -Dmapreduce.reduce.java.opts=-Xmx2048m -Dmapreduce.job.reduces=12 -Dmapreduce.reduce.speculative=false -Dmapreduce.map.speculative=false -Dmapreduce.map.output.compress=true -D fetcher.timelimit.mins=300 s3a://pt-test-1/nutch/1million-crawls//segments/20201014115727 -threads 400
Failed with exit value 255.
解决方法
失败的原因显示在task_1601725692999_0307_m_000004的日志中。它也显示在Hadoop UI的任务表中。
最可能的原因:
-Dmapreduce.map.memory.mb=2048 -Dmapreduce.map.java.opts=-Xmx2048m
mapreduce.map.memory.mb必须大于Java堆内存。我建议将512 MB添加到mapreduce.map.memory.mb。