解释Apache ZooKeeper

vecaoik1  于 2022-12-09  发布在  Apache
关注(0)|答案(8)|浏览(136)

我正在努力了解ZooKeeper,它是如何工作的,它做什么。有没有任何应用程序可以与ZooKeeper相媲美?
如果你知道,那么你会如何向一个外行描述ZooKeeper?
我试过apache wiki,zookeeper sourceforge...但是我仍然无法与它联系起来。
我刚刚读了一下http://zookeeper.sourceforge.net/index.sf.shtml,所以不是有更多这样的服务吗?它就像复制一个服务器服务一样简单吗?

roejwanj

roejwanj1#

In a nutshell, ZooKeeper helps you build distributed applications.

How it works

You may describe ZooKeeper as a replicated synchronization service with eventual consistency. It is robust, since the persisted data is distributed between multiple nodes (this set of nodes is called an "ensemble") and one client connects to any of them (i.e., a specific "server"), migrating if one node fails; as long as a strict majority of nodes are working, the ensemble of ZooKeeper nodes is alive. In particular, a master node is dynamically chosen by consensus within the ensemble; if the master node fails, the role of master migrates to another node.

How writes are handled

The master is the authority for writes: in this way writes can be guaranteed to be persisted in-order, i.e., writes are linear. Each time a client writes to the ensemble, a majority of nodes persist the information: these nodes include the server for the client, and obviously the master. This means that each write makes the server up-to-date with the master. It also means, however, that you cannot have concurrent writes.
The guarantee of linear writes is the reason for the fact that ZooKeeper does not perform well for write-dominant workloads. In particular, it should not be used for interchange of large data, such as media. As long as your communication involves shared data, ZooKeeper helps you. When data could be written concurrently, ZooKeeper actually gets in the way, because it imposes a strict ordering of operations even if not strictly necessary from the perspective of the writers. Its ideal use is for coordination, where messages are exchanged between the clients.

How reads are handled

This is where ZooKeeper excels: reads are concurrent since they are served by the specific server that the client connects to. However, this is also the reason for the eventual consistency: the "view" of a client may be outdated, since the master updates the corresponding server with a bounded but undefined delay.

In detail

The replicated database of ZooKeeper comprises a tree of znodes, which are entities roughly representing file system nodes (think of them as directories). Each znode may be enriched by a byte array, which stores data. Also, each znode may have other znodes under it, practically forming an internal directory system.

Sequential znodes

Interestingly, the name of a znode can be sequential, meaning that the name the client provides when creating the znode is only a prefix: the full name is also given by a sequential number chosen by the ensemble. This is useful, for example, for synchronization purposes: if multiple clients want to get a lock on a resource, they can each concurrently create a sequential znode on a location: whoever gets the lowest number is entitled to the lock.

Ephemeral znodes

Also, a znode may be ephemeral: this means that it is destroyed as soon as the client that created it disconnects. This is mainly useful in order to know when a client fails, which may be relevant when the client itself has responsibilities that should be taken by a new client. Taking the example of the lock, as soon as the client having the lock disconnects, the other clients can check whether they are entitled to the lock.

Watches

The example related to client disconnection may be problematic if we needed to periodically poll the state of znodes. Fortunately, ZooKeeper offers an event system where a watch can be set on a znode. These watches may be set to trigger an event if the znode is specifically changed or removed or new children are created under it. This is clearly useful in combination with the sequential and ephemeral options for znodes.

Where and how to use it

A canonical example of Zookeeper usage is distributed-memory computation, where some data is shared between client nodes and must be accessed/updated in a very careful way to account for synchronization.
ZooKeeper offers the library to construct your synchronization primitives, while the ability to run a distributed server avoids the single-point-of-failure issue you have when using a centralized (broker-like) message repository.
ZooKeeper is feature-light, meaning that mechanisms such as leader election, locks, barriers, etc. are not already present, but can be written above the ZooKeeper primitives. If the C/Java API is too unwieldy for your purposes, you should rely on libraries built on ZooKeeper such as cages and especially curator .

Where to read more

Official documentation apart, which is pretty good, I suggest to read Chapter 14 of Hadoop: The Definitive Guide which has ~35 pages explaining essentially what ZooKeeper does, followed by an example of a configuration service.

jrcvhitl

jrcvhitl2#

Zookeeper是最好的开源服务器和服务之一,有助于可靠地协调分布式进程。Zookeeper是一个CP系统(参考CAP定理),提供一致性和分区容限。在所有节点上复制Zookeeper状态使其成为最终一致的分布式服务。
此外,任何新当选的领导人都会向其追随者更新缺失的提案,或者如果追随者有许多提案缺失,则会提供国家的快照。
Zookeeper还提供了一个非常容易使用的API,这篇博客文章Zookeeper Java API examples有一些例子,如果你正在寻找例子的话。
如果您的分布式服务需要一个集中、可靠和一致的配置管理、锁、队列等,您会发现Zookeeper是一个可靠的选择。

s5a0g9ez

s5a0g9ez3#

我了解Zookeeper的一般情况,但有问题的术语“定额”和“裂脑”,所以也许我可以分享我的发现与你(我认为自己也是一个门外汉)。
假设我们有一个由5台服务器组成的ZooKeeper集群,其中一台服务器将成为领导者,其他服务器将成为追随者。

  • 这5台服务器组成了一个法定人数。法定人数只是意味着“这些服务器可以投票决定谁应该是领导者”。
  • 所以投票是基于多数的。多数只是意味着“超过一半”,所以必须有超过一半的服务器同意某个特定的服务器成为领导者。
  • 所以有一种可能发生的坏事叫做“裂脑”。据我所知,裂脑就是这样简单:由5台服务器组成的集群分为两部分,或者我们称之为“服务器组,”可能一部分是2台服务器,另一部分是3台服务器。这是一个非常糟糕的情况,好像两个“服务器团队”都必须执行一个特定的命令,您将如何决定哪个团队应该是首选?他们可能从客户端收到了不同的信息。因此,了解哪个“服务器团队”非常重要仍然是相关的,并且可以/应该忽略哪一个。
  • 多数也是您应该使用奇数台服务器的原因。如果您有4台服务器,而其中2台服务器是分开的,那么两个“服务器团队”可能会说“嘿,我们想决定谁是领导者!”但您应该如何决定应该选择哪2台服务器呢?如果有5台服务器,那么很简单:拥有3台服务器的服务器团队占多数,可以选择新的领导者。
  • 即使您只有3台服务器,其中一台出现故障,其他2台仍占多数,并且可以同意其中一台将成为新的领导者。

我意识到,一旦你想一想,并理解了这些术语,它就不再那么复杂了。我希望这也能帮助任何人理解这些术语。

yws3nbqq

yws3nbqq4#

What problem does it solve?

Let's imagine we have a million files in a file store and the file count keeps increasing every minute of the day. Our task is to first process and then delete these files. One of the approach we can think of is to write a script that does this task and run multiple instances parallelly on multiple servers. We can even increase or decrease the server count based on the demand. This is basically a distributed compute/data processing application.
Here, how can we ensure that the same file is not picked and processed by multiple servers at the same time? To solve this problem, all the servers should share the information b/w them regarding which file is currently being processed.
This is where we can use something like ZooKeeper. When the first server wants to read a file, it can write to the zookeeper the file name its going to process. Now the rest of the servers can look up ZooKeeper and know that this file is already picked up by the first server.
Above is a crude example and needs few other guard rails in place but I hope it gives an idea on what zookeeper is. ZK is basically a data store which can be accessed using the ZK API's. But it should NOT be used as a database. Only a small amount of data should be stored(usually in KB's). The upper limit is 1MB per znode. ZK is specifically built so that the distributed applications can communicate among each other.

Applications of ZK

Out of the box can be used for

  • storing configuration: to store configuration that is accessed across your distributed application.
  • naming service: store information such as service name and IP address mapping in a central place, which enables users and applications to communicate across the network.
  • group membership: all the applications running on distributed servers can connect to ZK and send heartbeats. If any one server/application goes down then ZK can alert other servers/applications regarding this event.

Other features have to be built on top of the ZooKeeper API.

  • locks and queues - useful for distributed synchronization.
  • two phase commits - useful when we have to commit/rollback across servers.
  • leader election - your distributed applications can use ZK to hold leader elections for automatic failovers.
  • shared counter

Below is the page that explains how these features can be implemented https://zookeeper.apache.org/doc/current/recipes.html
ZooKeeper can have many more applications. The features have to be built on top of ZK API's based on the requirements of your distributed system.
NOTE: ZK should not be used to store large amounts of data. Its not a cache/database. Use it to exchange small piece of information that your distributed applications need to start, operate and failover.

How data is stored?

Data is stored in a hierarchical tree data structure. Each node in the tree is called znode. Max size of a znode is 1MB. znodes can have data and other children znodes. Think of a znode like a folder on your computer where the folder can have files with data but also the folder itself can have data just like a file.

Why use ZK instead of our own custom service?

  • Atomicity and Durability
  • Zookeeper itself is distributed and Fault tolerant. The architecture involves one leader node and multiple follower nodes. In case a ZK follower node goes down, it will automatically failover. The client sessions are replicated hence ZK can automatically move clients to a different node. If the Leader node goes down then a new leader is elected using the ZK consensus algorithm.
  • Reads are very fast since its served from in-memory store.
  • Writes are written in the sequence in which it arrived. Hence maintains ordering.
  • Watches will send out notification to the client who set the watch on some data. This reduces the need to poll ZK. Note that watches are one time triggers and if you get a watch event and you want to get notified of future changes, you must set another watch.
  • Persistent and ephemeral znodes are available. Both are stored on ZK disks. Persistent here means that the data will be persisted once the client who created it disconnects. Ephemeral means the data will be removed automatically when the client disconnects. Ephemeral znodes are not allowed to have children.
  • There is also persistent sequential and ephemeral sequential znodes. Here the names of the znodes can have a suffix sequence number. similar to DB auto increment ID's, these sequence number keeps increasing and managed by ZK. This can be useful to implement queues, locks etc.
    Is there any application which is comparable to Zookeeper?

etcd - https://etcd.io/docs/v3.3/learning/why/#zookeeper

5jdjgkvh

5jdjgkvh5#

Zookeeper是一个集中式的开源服务器,用于维护和管理分布式集群环境中的配置信息、命名约定和同步。Zookeeper通过提供低延迟和高可用性来帮助分布式系统降低管理复杂性。Zookeeper最初是Hadoop的一个子项目,现在是Apache软件基金会的一个顶级独立项目。
More Information

q9rjltbz

q9rjltbz6#

I would suggest the following resources:

  1. The paper: https://pdos.csail.mit.edu/6.824/papers/zookeeper.pdf
  2. The lecture offered by MIT 6.824 from 36:00: https://youtu.be/pbmyrNjzdDk?t=2198
    I would suggest watching the video, read the paper, and then watch the video again. It would be easier to understand if you know Raft beforehand.
yzxexxkh

yzxexxkh7#

My approach to understand zookeeper was, to play around with the CLI client. as described in Getting Started Guide and Command line interface
From this I learned that zookeeper's surface looks very similar to a filesystem and clients can create and delete objects and read or write data.

Example CLI commands

create /myfirstnode mydata
ls /
get /myfirstnode
delete /myfirstnode

Try yourself

How to spin up a zookeper environment within minutes on docker for windows, linux or mac:
One time set up:

docker network create dn

Run server in a terminal window:

docker run --network dn --name zook -d zookeeper
docker logs -f zookeeper

Run client in a second terminal window:

docker run -it --rm --network dn zookeeper zkCli.sh -server zook

See also documentation of image on dockerhub

qvk1mo1f

qvk1mo1f8#

Apache ZooKeeper是一种用于协调和管理分布式应用程序中的配置的开源技术。它简化了维护配置详细信息、启用分布式同步和管理命名注册表等任务。
它的名字很恰当--想想Zookeeper是如何照顾所有的动物,维护它们的围栏,喂养它们等等。
Apache ZooKeeper可以与Apache Pinot或Apache Flink等Apache项目一起使用。Apache Kafka也使用ZooKeeper来管理代理、主题和分区信息。由于Apache ZooKeeper是开源的,您还可以将其与您选择的任何技术/项目配对,而不仅仅是Apache Foundation项目。

相关问题