Pulsar Terminology
Here is a glossary of terms related to Apache Pulsar:
#
Concepts#
PulsarPulsar is a distributed messaging system originally created by Yahoo but now under the stewardship of the Apache Software Foundation.
#
MessageMessages are the basic unit of Pulsar. They're what producers publish to topics and what consumers then consume from topics.
#
TopicA named channel used to pass messages published by producers to consumers who process those messages.
#
Partitioned TopicA topic that is served by multiple Pulsar brokers, which enables higher throughput.
#
NamespaceA grouping mechanism for related topics.
#
Namespace BundleA virtual group of topics that belong to the same namespace. A namespace bundle is defined as a range between two 32-bit hashes, such as 0x00000000 and 0xffffffff.
#
TenantAn administrative unit for allocating capacity and enforcing an authentication/authorization scheme.
#
SubscriptionA lease on a topic established by a group of consumers. Pulsar has four subscription modes (exclusive, shared, failover and key_shared).
#
Pub-SubA messaging pattern in which producer processes publish messages on topics that are then consumed (processed) by consumer processes.
#
ProducerA process that publishes messages to a Pulsar topic.
#
ConsumerA process that establishes a subscription to a Pulsar topic and processes messages published to that topic by producers.
#
ReaderPulsar readers are message processors much like Pulsar consumers but with two crucial differences:
- you can specify where on a topic readers begin processing messages (consumers always begin with the latest available unacked message);
- readers don't retain data or acknowledge messages.
#
CursorThe subscription position for a consumer.
#
Acknowledgment (ack)A message sent to a Pulsar broker by a consumer that a message has been successfully processed. An acknowledgement (ack) is Pulsar's way of knowing that the message can be deleted from the system; if no acknowledgement, then the message will be retained until it's processed.
#
Negative Acknowledgment (nack)When an application fails to process a particular message, it can send a "negative ack" to Pulsar to signal that the message should be replayed at a later timer. (By default, failed messages are replayed after a 1 minute delay). Be aware that negative acknowledgment on ordered subscription types, such as Exclusive, Failover and Key_Shared, can cause failed messages to arrive consumers out of the original order.
#
UnacknowledgedA message that has been delivered to a consumer for processing but not yet confirmed as processed by the consumer.
#
Retention PolicySize and time limits that you can set on a namespace to configure retention of messages that have already been acknowledged.
#
Multi-TenancyThe ability to isolate namespaces, specify quotas, and configure authentication and authorization on a per-tenant basis.
#
Architecture#
StandaloneA lightweight Pulsar broker in which all components run in a single Java Virtual Machine (JVM) process. Standalone clusters can be run on a single machine and are useful for development purposes.
#
ClusterA set of Pulsar brokers and BookKeeper servers (aka bookies). Clusters can reside in different geographical regions and replicate messages to one another in a process called geo-replication.
#
InstanceA group of Pulsar clusters that act together as a single unit.
#
Geo-ReplicationReplication of messages across Pulsar clusters, potentially in different datacenters or geographical regions.
#
Configuration StorePulsar's configuration store (previously known as configuration store) is a ZooKeeper quorum that is used for configuration-specific tasks. A multi-cluster Pulsar installation requires just one configuration store across all clusters.
#
Topic LookupA service provided by Pulsar brokers that enables connecting clients to automatically determine which Pulsar cluster is responsible for a topic (and thus where message traffic for the topic needs to be routed).
#
Service DiscoveryA mechanism provided by Pulsar that enables connecting clients to use just a single URL to interact with all the brokers in a cluster.
#
BrokerA stateless component of Pulsar clusters that runs two other components: an HTTP server exposing a REST interface for administration and topic lookup and a dispatcher that handles all message transfers. Pulsar clusters typically consist of multiple brokers.
#
DispatcherAn asynchronous TCP server used for all data transfers in-and-out a Pulsar broker. The Pulsar dispatcher uses a custom binary protocol for all communications.
#
Storage#
BookKeeperApache BookKeeper is a scalable, low-latency persistent log storage service that Pulsar uses to store data.
#
BookieBookie is the name of an individual BookKeeper server. It is effectively the storage server of Pulsar.
#
LedgerAn append-only data structure in BookKeeper that is used to persistently store messages in Pulsar topics.
#
FunctionsPulsar Functions are lightweight functions that can consume messages from Pulsar topics, apply custom processing logic, and, if desired, publish results to topics.