Hazelcast Jet is a framework for continuous computations over streams of data. What exactly does this mean?
A data stream is continuous flow of objects or records without a notion of explicit beginning or end. An abstract concept of the stream can be thought of as a:
- Queue in Java
- Topic in a message broker such as Apache Kafka or JMS
- Unix pipe
Data streams don’t originate in these messaging systems. Rather, there are data producers creating the objects and publishing them to the stream. These producers could be:
- Applications producing log events, business events, transaction records, etc.
- Sensors or devices producing measurement events.
- Databases producing change events. For example, every INSERT, UPDATE, or REMOVE could produce a new record in the stream.
Data streams are generally infinite, growing as new objects are produced, but the memory and disk space of computers have capacity constraints. Physical stream storage systems can hold only a portion of the stream. Therefore, consumers must work in parallel with producers to continuously process data as objects are created or within a reasonable delay, based on the capacity of the stream storage system.
Continuous processing, unlike staged processing, makes the data pipeline real-time, reducing the end-to-end processing latency to milliseconds in optimized in-memory systems such as Hazelcast.
The benefits of continuous processing are valuable for data-driven and latency-sensitive use cases such as:
- Analytics and decision-making. Analytical computations driven by incoming data can produce insights in real time.
- Data integration. Systems and applications can stay in sync by sharing the stream of change events.
- Event-driven applications. An application can change its state based on a stream of events.
Hazelcast Jet is a framework for building continuous data processing applications that scale.