ApacheZookeeper:跨数据中心分布节点

00jrzges  于 2022-12-16  发布在  Apache
关注(0)|答案(2)|浏览(175)

I am working on a brand new SolrCloud - ZooKeeper infrastructure.
Some background information:

  • all other services (mostly web site infrastructure) are distributed across two data centers, with active-active configurations.
  • at the network level, the servers are setup on extended LANs, with dark fibre across the data centers. So latency is at a minimum.
  • the SolrCloud - ZooKeeper infrastructure will be used by most of these applications.

I got a SolrCloud, and a ZooKeeper ensemble running. Implementation at this level is fine.
But I wonder how to distribute my ZooKeeper servers. I must have an odd number of servers, but I only have two data centers. If one fails, I have a 50-50 chance that I will lose majority.
What should I do? So far I have thought of:

  • requesting a third data center (not likely to happen, $$$!)
  • host two per data center and two on an external cloud provider ( Amazon or ...?). Again $$$
  • set up an odd number at data center 1 and use an observer on site 2. What then happens if site 1 fails? Can SolrCloud work with only one observer?
64jmpszr

64jmpszr1#

如果您的需求是服务来自本地数据中心(请求的来源)的所有搜索请求,那么您不需要跨数据中心部署ZooKeeper。
因为跨数据中心的ZooKeeper部署只需要在DC崩溃时生存(这很可能不会发生,这就是为什么您要支付$$$$),所以在这种情况下,不需要在多个数据中心中生成ZooKeeper集群。

eqoofvh9

eqoofvh92#

I got a third site to host the other ZooKeeper instance. This site is another office of my company, not a "full data center". So each site has one ZooKeeper instance.
What allowed me to have one cluster spread over three data centers was that they are close enough together to get a dark fiber between them. The latency is very low and does not impact ZooKeeper performance.
Then for Solr, I got full replicas on the two main data centers. The third office only hosts a ZooKeeper for quorum. Using full replicas, I have all the data in each data center. If my Solr needs to increase later, I will shard, but for now our index is small.
It has proven solid for four years now, with one failure. And it was at the third office, not in a data center.

相关问题