Get Started

Get Started

These guides demonstrate how to get started quickly with Hazelcast IMDG and Hazelcast Jet.

Hazelcast IMDG

Learn how to store and retrieve data from a distributed key-value store using Hazelcast IMDG. In this guide you’ll learn how to:

  • Create a cluster of 3 members.
  • Start Hazelcast Management Center
  • Add data to the cluster using a sample client in the language of your choice
  • Add and remove some cluster members to demonstrate data balancing capabilities of Hazelcast

Hazelcast Jet

Learn how to build a distributed data processing pipeline in Java using Hazelcast Jet. In this guide you’ll learn how to:

  • Install Hazelcast Jet and form a cluster on your computer
  • Build a simple pipeline that receives a stream of data, does some calculations and outputs some results
  • Submit the pipeline as a job to the cluster and observe the results
  • Scale the cluster up and down while the job is still running

Use Case

Real-Time Stream Processing

Overview

The Hazelcast Approach to Stream Processing

Hazelcast Stream Processing Architecture Diagram

Hazelcast provides all the necessary tools to build a real-time stream processing application with its Jet API. It is a powerful processing framework for querying data streams on top of an elastic in-memory storage system, where the process may ultimately store its results.

Solutions

How It Works

Hazelcast processing tasks, called jobs, are distributed across the cluster to parallelize the computations. You can elastically and horizontally scale the cluster based on your performance and volume requirements.

For real-time data enrichment, Hazelcast provides a tight integration with in-memory computing to deliver very high-speed data access. You can store large amounts of data in-memory, which are then joined to the data stream with microsecond latency. Moreover, you can reduce the end-to-end latency by using Hazelcast to store temporary data for stateful stream processing tasks.

Stream Processing Challenges

Stream processing presents unique challenges that are not relevant to batch processing frameworks. Below are several key challenges and details on how Hazelcast addresses each of these.

Windowing

Windowing

Streaming data is fundamentally different from batch or micro-batch processing because both inputs and outputs are continuous. In many cases, streaming computations look at how values change over time. Typically, we look at streaming data in terms of “windows,” a specific slice of the data stream constrained to a time period. Hazelcast streaming processing supports tumbling, sliding, and session windows. For more information on windows, please read the Jet API documentation page on kinds of windows.

Event Time and Late Events

Event Time and Late Events

Hazelcast supports the notion of “event time” in which events can have their own timestamp and may arrive out of order. To handle out-of-order events and especially late-arriving events, stream processors must keep calculations (i.e., aggregations) open until all events have arrived. However, stream processors cannot know if all events have arrived, so they need to discard any extremely late events. To define what constitutes an “extremely late” event, Hazelcast sets a “watermark” that marks a time window in which late-arriving events can still be processed in the appropriate aggregation window. Events arriving from beyond the watermarked time window are discarded.

Streaming Fault Tolerance

Streaming Fault Tolerance

Fault tolerance in stream processing systems must deal with preserving data that is not necessarily stored in any permanent medium. This means that stream processors need to know how to handle failures with data-in-motion, or else data can be lost. Hazelcast is fault-tolerant for issues such as network failures, splits, and node failures. When there is a fault, Hazelcast uses the latest state snapshot and automatically restarts all jobs that contain the failed member as a job participant from this snapshot. With in-memory snapshots saved to distributed in-memory storage, Hazelcast resumes processing where it left off. Distributed in-memory storage is an integral component of the cluster. Multiple replicas of data are stored in a distributed manner across the cluster to increase the cluster’s resiliency.

Processing Guarantees

Processing Guarantees

Event processing systems have to balance tradeoffs in performance and correctness, and some systems may not allow firm processing guarantees, which can make it difficult to program these systems.

Hazelcast allows you to choose a processing guarantee when you start a job. While there is some performance tradeoff with the higher guarantees, Hazelcast still provides superior processing speed while honoring the chosen guarantee. Hazelcast provides exactly-once processing (the slowest but most correct), at-least-once processing, or no guarantee of correctness (the fastest option).

Free Hazelcast Online Training Center

Whether you're interested in learning the basics of in-memory systems, or you're looking for advanced, real-world production examples and best practices, we've got you covered.

Join Us On Slack