Get Started

Get Started

These guides demonstrate the operational flexibility and speed of the Hazelcast In-Memory Computing Platform. Set-up in seconds, data in microseconds. Operations and developer friendly.

Hazelcast IMDG

Find out for yourself how to get a Hazelcast IMDG cluster up and running. In this Getting Started guide you’ll learn how to:

  • Create a Cluster of 3 Members.
  • Start the Hazelcast Management Center.
  • Add data to the cluster using a sample client in the language of your choice.
  • Add and remove some cluster members to demonstrate the automatic rebalancing of data and back-ups.

Hazelcast Jet

Learn how to run a distributed data stream processing pipeline in Java. In this Getting Started guide you’ll learn how to:

  • Start a Hazelcast Jet cluster in your JVM.
  • Build the Word Count application.
  • Execute the application wit Jet.
  • Push testing data to the cluster and explore results

As a next step, you will be able explore the code samples and extend your setup more features and connectors.

Jet Performance

Overview

Hazelcast Jet uses a combination of a directed acyclic graph (DAG) computation model, in-memory processing and storage, data locality, partition mapping affinity, single-producer/single-consumer queues, and green threads to achieve very high performance. These key design decisions are explained below.

Word Count is the classic Big Data application used to compare performance between systems. Jet 0.4 is faster than all other frameworks. See complete benchmark.

Streaming Word Count involves windowing and out-of-order data processing. The latency of Jet remains flat even under higher load. Flink and Spark were unable to keep up for the 10-second window sliding by 100 milliseconds. See complete benchmark.

Complete Benchmarks

Jet 3.0 Streaming Benchmark
Jet vs java.util.stream

DAG for modeling computations

Similar to recent big data frameworks, Hazelcast Jet uses a directed acyclic graph (DAG) abstraction to model computations. However, Jet uses some novel approaches to achieve much higher speeds than other engines.

In-memory data locality

Hazelcast Jet achieves high speed and low latency by keeping both the computation and data storage in memory by combining Hazelcast Jet with Hazelcast IMDG on the same servers. Depending on the use case, some or all of the data that Jet processes will already be in RAM and on the same machine as the computation – data locality.

Partition mapping affinity

Jet allows you to define an arbitrary object-to-partition mapping scheme on each edge. This allows reading in parallel with many threads from each partition and member and, thus, server. We can use this to harmonize and optimize throughput from other distributed data systems such as HDFS, Spark, or Hazelcast. Therefore, when performing DAG processing, local edges can be read and written locally without incurring a network call and without waiting.

SP/SC queues

Local edges are implemented with the most efficient kind of concurrent queue: the single-producer/single-consumer (SP/SC) bounded queue. It employs wait-free algorithms on both sides and avoids volatile writes by using lazySet. All of these aspects are unique to Hazelcast Jet.

Green threads

With Hazelcast Jet, the number of parallel instances of each vertex (called processors) can be defined so that we can use all the cores even in the largest machines. With many cores and execution threads, the key to optimal performance is to smoothly coordinate these with cooperative multithreading. Hazelcast Jet uses green threads, where cooperative processors run in a loop serviced by the same native thread. This leads to:

  • Practically zero cost of context switching. There is hardly any logic in the worker thread needed to hand over from one processor to the next.
  • (Next to) guaranteed core affinity. Processors don’t jump between threads, and each thread is highly likely to remain pinned to a core. This yields a high CPU cache hit rate.
  • Insight into which processor is ready to run. We can inspect the processor’s input/output queues at very low cost to see whether it can make progress.

Windows and Frames

Hazelcast Jet uses frames as the building blocks of the sliding windows. (Tumbling windows are just a special case of sliding windows.) A frame covers a part of a stream the size of a sliding step. When a record arrives, it is added to the respective frame. For each frame, only the rolling accumulator is stored rather than buffering all the items. When the window is closed, respective frames are combined and the computation is executed. This provides a tradeoff between the smoothness of sliding and the cost of storage and computation.

There is a deduct function to optimize sliding window computations. When windows slide, deduct simply removes the trailing frame from the sliding window and adds the new one. This costs two operations rather than recomputing the entire sliding window from underlying frames with every sliding step. This leads to flat capacity, even with large windows.

Free Hazelcast Online Training Center

Whether you're interested in learning the basics of in-memory systems, or you're looking for advanced, real-world production examples and best practices, we've got you covered.

Open Gitter Chat