我在Flink做了一些数据传输工作。这些作业每天多次以会话分离模式提交给flink cluster。每天都会出现一个奇怪的现象:随机的少数作业(大约5%不是内存密集型的)会失败,因为容器由于超出物理内存限制而被移除。
移除容器时,jobmanager已打印进程树。它显示有两个重复的taskexecutorrunner jvm进程共享相同的参数,一个是另一个的父进程。
flink版本:1.6.2
我的taskmanager的容器出了什么问题?你见过这个问题吗?
jobmanager.log:
2020-07-13 08:20:38,353 INFO org.apache.flink.yarn.YarnResourceManager - Closing TaskExecutor connection container_e17_1594464964771_78288_01_000002 because: Container [pid=126548,containerID=container_e17_1594464964771_78288_01_000002] is running beyond physical memory limits. Current usage: 2.5 GB of 2 GB physical memory used; 7.0 GB of 4.2 GB virtual memory used. Killing container.
Dump of the process-tree for container_e17_1594464964771_78288_01_000002 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 133231 126644 126548 126548 java 0 360 3703341056 324345 /usr/local/jdk1.8.0_112/bin/java -Xms1304m -Xmx1304m -XX:MaxDirectMemorySize=744m -Djob_name=squirrel_mt_netsec.db_conn_ip -Dengine_type=FLINK -Dlog.file=/data1/hadoop/yarn/userlogs/application_1594464964771_78288/container_e17_1594464964771_78288_01_000002/taskmanager.log -Dlogback.configurationFile=file:./logback.xml -Dlog4j.configuration=file:./log4j.properties org.apache.flink.yarn.YarnTaskExecutorRunner --configDir .
|- 126548 126400 126548 126548 bash 0 3 120193024 352 /bin/bash -c /usr/local/jdk1.8.0_112/bin/java -Xms1304m -Xmx1304m -XX:MaxDirectMemorySize=744m -Djob_name=squirrel_mt_netsec.db_conn_ip -Dengine_type=FLINK -Dlog.file=/data1/hadoop/yarn/userlogs/application_1594464964771_78288/container_e17_1594464964771_78288_01_000002/taskmanager.log -Dlogback.configurationFile=file:./logback.xml -Dlog4j.configuration=file:./log4j.properties org.apache.flink.yarn.YarnTaskExecutorRunner --configDir . 1> /data1/hadoop/yarn/userlogs/application_1594464964771_78288/container_e17_1594464964771_78288_01_000002/taskmanager.out 2> /data1/hadoop/yarn/userlogs/application_1594464964771_78288/container_e17_1594464964771_78288_01_000002/taskmanager.err
|- 126644 126548 126548 126548 java 3872 777 3703341056 324345 /usr/local/jdk1.8.0_112/bin/java -Xms1304m -Xmx1304m -XX:MaxDirectMemorySize=744m -Djob_name=squirrel_mt_netsec.db_conn_ip -Dengine_type=FLINK -Dlog.file=/data1/hadoop/yarn/userlogs/application_1594464964771_78288/container_e17_1594464964771_78288_01_000002/taskmanager.log -Dlogback.configurationFile=file:./logback.xml -Dlog4j.configuration=file:./log4j.properties org.apache.flink.yarn.YarnTaskExecutorRunner --configDir .
Container killed on request. Exit code is 143
暂无答案!
目前还没有任何答案,快来回答吧!