mariadb 节点重新同步后加莱拉群集性能下降

ef1yzkbh  于 12个月前  发布在  其他
关注(0)|答案(1)|浏览(127)

我在MariaDB 10.4和加莱拉-4上获得了加莱拉主-主集群作为复制库。它完美地运行,直到一些网络问题发生在一些节点和其他节点之间。在节点重新连接和重新同步它的状态后,整个集群的性能急剧下降。恢复性能的唯一方法是从一个施主节点重建整个群集。
这是我在日志中得到的:

2023-03-19  1:23:20 0 [Note] WSREP: ####### Adjusting cert position: 61149358 -> 61149359 
    2023-03-19  1:23:20 0 [Note] WSREP: Service thread queue flushed. 
    2023-03-19  1:23:20 0 [Note] WSREP: Lowest cert index boundary for CC from ist: 61149189 
    2023-03-19  1:23:20 0 [Note] WSREP: Min available from gcache for CC from ist: 60988106 
    2023-03-19  1:23:20 0 [Note] WSREP: Receiving IST...100.0% (545/545 events) complete. 
    2023-03-19  1:23:21 2 [Note] WSREP:
    ================================================ View:   id: 6caf7137-be5b-11ed-952d-8e2998c314a7:61149359   status: primary   protocol_version: 4   capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO   final: no   own_index: 2   members(4):
            0: 38ac2156-c3e1-11ed-b954-ba79e6d7687d, DB1
            1: 637c05dc-c17f-11ed-b3f2-0e364b289bef, DB3
            2: 667386bc-c5db-11ed-ad68-ee03169ac5b9, DB4
            3: c9c34048-c3e0-11ed-8cbc-13957f2241e4, DB2
    ================================================= 
    2023-03-19  1:23:21 2 [Note] WSREP: Server status change initialized -> joined                 
    2023-03-19  1:23:21 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 
    2023-03-19  1:23:21 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 
    2023-03-19  1:23:21 2 [Note] WSREP: Draining apply monitors after IST up to 61149359 
    2023-03-19  1:23:21 2 [Note] WSREP: IST received: 6caf7137-be5b-11ed-952d-8e2998c314a7:61149359 
    2023-03-19  1:23:21 2 [Note] WSREP: Lowest cert index boundary for CC from sst: 61149189
    2023-03-19  1:23:21 2 [Note] WSREP: Min available from gcache for CC from sst: 60988107 
    2023-03-19  1:23:21 0 [Note] WSREP: 2.0 (DB4): State transfer from 3.0 (DB2) complete. 
    2023-03-19  1:23:21 0 [Note] WSREP: Shifting JOINER -> JOINED (TO: 61149618) 
    2023-03-19  1:23:21 0 [Note] WSREP: Processing event queue:...  0.0% (  0/260 events) complete. 
    2023-03-19  1:23:32 2 [Note] WSREP: Processing event queue:... 56.8% (208/366 events) complete. 
    2023-03-19  1:23:40 0 [Note] WSREP: Member 2.0 (DB4) synced with group.
    2023-03-19  1:23:40 0 [Note] WSREP: Processing event queue:...100.0% (402/402 events) complete. 
    2023-03-19  1:23:40 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 61149756) 
    2023-03-19  1:23:43 2 [Note] WSREP: Server DB4 synced with group.
    2023-03-19  1:23:43 2 [Note] WSREP: Server status change joined -> synced 
    2023-03-19  1:23:43 2 [Note] WSREP: Synchronized with group, ready for connections 
    2023-03-19  1:23:43 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 
    2023-03-19  1:23:59 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61150337, -110 (Connection timed out) 
    2023-03-19  1:24:15 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61151132, -110 (Connection timed out) 
    2023-03-19  1:24:17 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61151205, -110 (Connection timed out) 
    2023-03-19  1:24:23 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61151478, -110 (Connection timed out) 
    2023-03-19  1:24:33 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61151724, -110 (Connection timed out) 
    2023-03-19  1:24:38 0 [Note] InnoDB: Buffer pool(s) load completed at 230319  1:24:38 
    2023-03-19  1:24:41 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61152114, -110 (Connection timed out) 
    2023-03-19  1:24:43 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61152261, -110 (Connection timed out) 
    2023-03-19  1:24:46 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61152364, -110 (Connection timed out) 
    2023-03-19  1:27:32 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61158126, -110 (Connection timed out) 
    2023-03-19  1:27:36 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61158269, -110 (Connection timed out) 
    2023-03-19  1:27:40 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61158385, -110 (Connection timed out) 
    2023-03-19  1:27:47 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61158486, -110 (Connection timed out) 
    2023-03-19  1:28:12 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61159789, -110 (Connection timed out) 
    2023-03-19  1:28:15 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61159889, -110 (Connection timed out) 
    2023-03-19  1:28:18 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61160056, -110 (Connection timed out) 
    2023-03-19  1:28:21 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61160171, -110 (Connection timed out)

我的wsrep配置:

[galera]
    wsrep_on=ON
    wsrep_provider=/usr/lib/libgalera_smm.so                      
    wsrep_cluster_address="gcomm://xx.xx.xx.xxx,xxx.xx.xx.xxx,xx.xxx.xx.xxx,xx.xxx.xxx.xx"
    wsrep_sst_method=mariabackup
    wsrep_sst_auth=xxxxxxxxx:xxxxxxxxx
    binlog_format=row
    default_storage_engine=InnoDB
    innodb_autoinc_lock_mode=2
    wsrep_cluster_name="xxxxxxx"
    wsrep_node_address="xx.xxx.xxx.xx"
    wsrep_node_name="DBX"

是什么导致了这个性能问题以及如何解决它?请帮帮我。

2q5ifsrm

2q5ifsrm1#

一般来说,这不应该发生。重新同步节点不会影响所述节点的性能-它将按照预期工作。重新同步过程本身具有显著的开销,因为必须传输数据,但只要它完成,预计就不会有进一步的影响。
您共享的日志是一个节点的典型图片,该节点加入集群并通过IST(增量状态传输)执行状态传输。在日志的末尾有一些警告,这可能表明一些网络打嗝,但很难仅根据日志中包含的这一短时间范围发表评论。
话虽如此,节点在重建后可能会变慢,这是有潜在原因的。它应该只影响特定的查询,并且不应该在短,快速,正确索引的查询中可见。
MySQL,确切地说是它的存储引擎InnoDB,跟踪存储在表中的内容。它计算行的外观,表中是否有许多不同的值,或者大多数行是否包含相同的值。它不分析所有行,而是分析一个小样本。我们不想在这里讨论细节,但是这些统计信息会影响优化器如何规划查询执行计划。因此,如果您通过SST重建节点,特别是如果它是完全重建的,可能会发生统计信息更改(已采取不同的样本),这可能会影响查询执行计划。
这不太可能,但你可以很容易地检查。如果您知道某个特定查询的查询执行计划变慢了(EXPLAIN SELECT...),则可以在重建后检查该查询执行计划,看看它是否不同。
最后,节点已经重新启动的事实也可能对性能产生影响。缓冲池必须通过查询或重新启动前保存的缓冲池再次填充。不过,这将是暂时的放缓。
我们希望这将为您指明一个良好的方向,事实是,如果没有亲自访问数据库和数据,很难分析此类案例。

相关问题