- Pulsar Guidelines
- Intro to Apache Pulsar
Using Pulsar for Messaging and Streaming
Communication and data is critical to most modern systems. The world is built around the movement of data and being able to do things with that data. More and more components are being added. More and more internal and external systems are being integrated. And more and more data is moving between those components and systems.
As you keep adding systems, processes, and technologies, the task of trying to get them all to talk to each other becomes quite difficult and the teams responsible for making this all work are getting overwhelmed.
Many organizations have two common challenges that they struggle with: communicating between services and moving data within an organization.
As an organization grows in requirements, use cases, and the amount of data it's handling, communication can break down. For example, as more data flows through the system, a component may start to struggle with a spike in traffic or unplanned down time. Likewise, an increased number of components trying to communicate and interoperate with each other can also create challenges. It can become difficult to manage the data and difficult to evolve the system.
Message queuing helps you deal with these issues because it facilitates interactions between systems by exchanging messages asynchronously.
In addition to facilitating the day-to-day communication between services, moving data from one place in your organization to another is critical. On a small scale, moving data around via files might work well, but as the amount of data increases, the systems break down. Companies start to worry about security such as encryption, compliance, and so on.
Data streaming is a solution to this problem because it can efficiently move large amounts of data in and throughout a system.
Using Pulsar for Messaging
Messaging provides a way to decouple services by communicating messages asynchronously.
Messages are the most basic unit of Pulsar. You can think of them as analogous to letters in a postal service system.
When you mail a letter, it is carrying information from you (the sender) to the receiver. You enclose your letter into its envelope and fill out the address. Once the letter is deposited into a collection box, it is picked up by the postal carrier and taken to the local post office. Next, the letter is taken to a processing center and it is sorted according to shape, size, and zip code. Once the letter is processed, a postmark is applied and then the letter is sent to the appropriate post office station. Finally, the postal carrier delivers the letter to the recipient.
Like the "post office" model, an application has a packet of data (not unlike a letter) it wants to send, and that data is sent to the Pulsar "broker" (or the central post office) that handles those messages. From there, the data is delivered to a Pulsar consumer (or in our analogy, the recipient's house). The main differences are:
- messages are sent very fast, in milliseconds.
- the recipient can handle large amounts of messages.
With application messaging, you are handling individual messages (one message at a time), and there are two typical ways to deliver data:
- distributing work using a queue
- fanning out messages to multiple interested parties (message bus)
Pulsar supports these classic use cases as well as advanced messaging patterns (such as scheduled delivery and failure handling). It was designed with messaging in mind, so Pulsar makes it easier to map your current design into the Pulsar model without extensive rework or the need to rearchitect your system around a streaming model.
Using Pulsar for Streaming
Streaming handles many messages (or joins different streams of messages) at a time. These systems deal with events that are interlinked together.
You can think of streaming as a "conveyer belt" where you are sending a sequence of data. Rather than sending individual items, a stream focuses on a range of items. When you look at the messages on the conveyer belt, you are looking at a range of messages in the order they arrived to the application.
Like with messaging, the Pulsar broker manages the messages for you and sends the stream of data to the applications. Unlike messaging, streaming applications control when the data is delivered. In a messaging system, applications do not have control over when a message arrives.
Streaming use cases include:
- Moving large amounts of data to another service (logs to real-time ETL)
- Running periodic jobs to move large amounts of data and aggregating the data to more traditional stores (logs to S3)
- Computing near real-time aggregate of a message stream (real-time analytics over page views)
Pulsar's Unified solution
Pulsar is multi-model and can provide messaging and streaming together in one system.
Historically, messaging and streaming required two different systems. This is not the case here. Pulsar allows the application to select how it wants to deliver messages. The way Pulsar does this is based on subscription types.
The key advantages of Pulsar's multi-model solution are operational simplicity and the improved experience for developers (because they can do more with one system). Most significantly, Pulsar becomes the hub for all of your organization's real-time data.