在运行 SLS 时可能会碰到如下问题:
命令:
sh $HADOOP_HOME/share/hadoop/tools/sls/bin/slsrun.sh --input-sls=/home/c/sls/output2/sls-jobs.json --nodes=/home/c/sls/output2/sls-nodes.json --output-dir=/home/c/sls/output1 --print-simulation
其中 input-sls 和--nodes 的文件最好加上绝对路径,如果只写一个文件名,则默认从当前文件夹下取文件。
1. 报错:
Exception in thread "main" java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.yarn.sls.SLSRunner.startAMFromSLSTraces(SLSRunner.java:313) at org.apache.hadoop.yarn.sls.SLSRunner.startAM(SLSRunner.java:248) at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:145) at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:528) Caused by: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:123) ... 4 more
**原因:**找不到 sls-runner.xml,只有在/hadoop/etc/hadoop 文件夹下的 xml 配置文件才会被发现,而在当前 hadoop 版本中,sls-runner.xml 在/hadoop/share/hadoop/tools/sls/sample-conf 中。因此将 sls-runner.xml 拷贝至/hadoop/etc/hadoop 下即可。
2. 报错:
java.lang.NullPointerException at org.apache.hadoop.yarn.sls.web.SLSWebApp.(SLSWebApp.java:86)
**原因:**找不到 html 文件夹,而 html 文件夹在/hadoop/share/hadoop/tools/sls 目录下,因此到该目录下,执行 slsrun.sh 脚本即可。
3. 报错:
18/07/11 16:58:48 WARN capacity.CapacityScheduler: Couldn't find application application_1531299523163_0001 18/07/11 16:58:48 WARN resourcemanager.RMAuditLogger: USER=jenkins OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1531299523163_0001 submitted by user jenkins to unknown queue: sls_queue_1 APPID=application_1531299523163_0001 18/07/11 16:58:48 INFO resourcemanager.RMAppManager$ApplicationSummary: appId=application_1531299523163_0001,name=N/A,user=jenkins,queue=sls_queue_1,state=FAILED,trackingUrl=N/A,appMasterHost=N/A,startTime=1531299528010,finishTime=1531299528035,finalStatus=FAILED
容器启动失败
**原因:**yarn-site.xml 配置文件没有配置好,在/hadoop/etc/hadoop 下有个空的 yarn-site.xml,系统默认执行该文件,因此报错。其实在 sls/sample-conf 文件夹下除了上面的 sls-runner.xml 文件,还有一个专门为 sls 例子准备的 yarn-site.xml。将此文件替换至/hadoop/etc/hadoop 的 yarn-site.xml 即可。

