Spark SHC核心-日志区域/区域服务器

xpcnnkqh 于 2022-11-16 发布在 Apache

关注(0)|答案(1)|浏览(134)

Im using the SHC spark connector by hortonworks to read an HBase table
https://github.com/hortonworks-spark/shc
I have some tasks that take a very long time to complete and I suspect its because of region size skew but would like to confirm it by logging which region/region server each task is reading.
I tried turning on debug logs by doing the following in the driver

Logger.getLogger("org").setLevel(Level.DEBUG);
Logger.getLogger("akka").setLevel(Level.DEBUG);

But it didnt seem to have any effect.
Is it possible to log the above somehow?

apache-spark

来源：https://stackoverflow.com/questions/74131888/spark-shc-core-log-region-regionserver

1条答案

按热度按时间

oalqel3c1#

it didn't seem to have any effect.
Yes, unfortunately, SHC itself does not log the region/region server name information anywhere during the execution. That's why enabling DEBUG log would not help at all.
Is it possible to log the above somehow?
Yes, and only if you know where and how to customize shc's source code. You might need to insert your own log command, rebuild, test, package, and ship it with your application.
It depends on your goal. i.e. you might want to call logDebug() or logInfo() of the region name info during a task of table scanning. here is source code HBaseTableScan
The build, test, ship, .etc details are here in SHC's repo doc .

赞(0）回复(0）举报 2022-11-16

我来回答

Spark SHC核心-日志区域/区域服务器

1条答案

相关问题

热门标签

最新问答