Hive
Q: Hive执行报错:
'org.apache.hadoop.yarn.exceptions.YarnRuntimeException(java.lang.InterruptedException: sleep interrupted)'
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.InterruptedException: sleep interrupted
at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:339)
at org.apache.hadoop.mapred.ClientServiceDelegate.getTaskReports(ClientServiceDelegate.java:444)
at org.apache.hadoop.mapred.YARNRunner.getTaskReports(YARNRunner.java:572)
at org.apache.hadoop.mapreduce.Job$3.run(Job.java:543)
at org.apache.hadoop.mapreduce.Job$3.run(Job.java:541)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapreduce.Job.getTaskReports(Job.java:541)
at org.apache.hadoop.mapred.JobClient.getTaskReports(JobClient.java:639)
at org.apache.hadoop.mapred.JobClient.getMapTaskReports(JobClient.java:629)
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:259)
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:547)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:426)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:72)
// 错误堆栈2
java.io.IOException: 断开的管道
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.hadoop.hdfs.DFSOutputStream$Packet.writeTo(DFSOutputStream.java:285)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:591)
2015-08-10 02:46:18,127 WARN [Thread-13]: hdfs.DFSClient (DFSOutputStream.java:waitForAckedSeqno(2074)) - Slow waitForAckedSeqno took 73275ms (threshold=30000ms)
2015-08-10 02:46:18,140 WARN [Thread-10]: hdfs.DFSClient (DFSOutputStream.java:waitForAckedSeqno(2074)) - Slow waitForAckedSeqno took 73884ms (threshold=30000ms)
2015-08-10 02:46:18,190 WARN [DataStreamer for file /tmp/hadoop-yarn/staging/hhive/.staging/job_1439027917379_19257/job.jar block BP-1797264656-192.168.4.128-1431244532842:blk_1094532259_20796961]: hdfs.DFSClient (DFSOutputStream.java:run(639)) - DataStreamer Exception
A:此类问题一般是由于DN或NN压力过大无法及时响应,可通过调整以下参数改善(hdfs-site.xml
)
dfs.datanode.handler.count: # 增大,提升DN服务线程数,增加DN接收请求、处理指令能力
dfs.namenode.handler.count: # 增大, 提升NN服务线程数,提升处理RPC请求能力
Q:操作文件失败
java.io.EOFException: Premature EOF: no length prefix available
at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2109)
at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)
A:文件操作超出租期,即dataStreamer操作文件过程中,文件已经被删除导致,可通过修改hdfs-site.xml
参数优化
dfs.datanode.max.transfer.threads # 增大该参数,可提升DN节点并发能力
Q:链接被重置问题
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstreamUnprotected(BlockReceiver.java:1396)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstream(BlockReceiver.java:1335)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1256)
at java.lang.Thread.run(Thread.java:745)
A:解决方案为优化NN性能,通过修改hdfs-site.xml
参数实现
dfs.namenode.handler.count # (加大) NN的服务线程数。用于处理RPC请求
dfs.namenode.replication.interval #(减小) NN周期性计算DN的副本情况的频率,秒
dfs.client.failover.connection.retries #(建议加大) 专家设置。IPC客户端失败重试次数。在网络不稳定时建议加大此值
Q:Socket链接超时问题
java.io.IOException: Bad response ERROR for block BP-1797264656-192.168.4.128-1431244532842:blk_1094409843_20674430 from datanode 192.168.4.118:50010
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:840)
java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.16.70:50010 remote=/192.168.4.143:52416]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:468)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:772)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:724)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
at java.lang.Thread.run(Thread.java:745)
A: 通过调整DN参数优化
dfs.datanode.socket.write.timeout #(加大)向datanode写入数据超时设置
dfs.client.socket-timeout #(加大) dfsclients与集群间进行网络通信的超时设置