Describe the bug
i upgrade my doris cluster to master version.
and found error in fe'restart.
check the log and found content as below:
2020-09-26 11:37:18,961 WARN (UNKNOWN 172.28.18.140_9010_1591588831143(-1)|1) [Catalog.notifyNewFETypeTransfer():2356] notify new FE type transfer: UNKNOWN
2020-09-26 11:37:20,967 WARN (RepNode 172.28.18.140_9010_1591588831143(-1)|56) [Catalog.notifyNewFETypeTransfer():2356] notify new FE type transfer: MASTER
2020-09-26 11:37:21,162 ERROR (stateListener|67) [EditLog.loadJournal():804] Operation Type 29
java.lang.NullPointerException: null
at org.apache.doris.consistency.ConsistencyChecker.replayFinishConsistencyCheck(ConsistencyChecker.java:373) ~[palo-fe.jar:3.4.0]
at org.apache.doris.persist.EditLog.loadJournal(EditLog.java:332) [palo-fe.jar:3.4.0]
at org.apache.doris.catalog.Catalog.replayJournal(Catalog.java:2497) [palo-fe.jar:3.4.0]
at org.apache.doris.catalog.Catalog.transferToMaster(Catalog.java:1167) [palo-fe.jar:3.4.0]
at org.apache.doris.catalog.Catalog.access$1100(Catalog.java:261) [palo-fe.jar:3.4.0]
at org.apache.doris.catalog.Catalog$4.runOneCycle(Catalog.java:2414) [palo-fe.jar:3.4.0]
at org.apache.doris.common.util.Daemon.run(Daemon.java:116) [palo-fe.jar:3.4.0]
To Reproduce
Steps to reproduce the behavior:
- run command
add follower ...
on the old version of doris. (the follower fe to be added is NOT started now) - run
bin/stop_fe.sh
to stop the old version of fe - upgrade files to new version of fe. e.g. lib/* webroot/*
- run
bin/start_fe.sh
to start the new version of fe - check the log then found the error as above
how to prevent in trick method.
- rollback the version
- run command
drop follower ...
on the old version of doris - upgrade files and restart fe
- fe start ok
- run
add follower ...
on the new version
Expected behavior
- add follower on the old version whether the service survives or not
- upgrade the version of fe
- restart ok
1条答案
按热度按时间ffdz8vbo1#
This is strange. the code shows that this is because the tablet does not exist.
This is hard to debug without fe.log.
But this is not a very serious problem, we can modify the code to just skip this tablet.