Get Started

Get Started

These guides demonstrate the operational flexibility and speed of the Hazelcast In-Memory Computing Platform. Set-up in seconds, data in microseconds. Operations and developer friendly.

Hazelcast IMDG

Find out for yourself how to get a Hazelcast IMDG cluster up and running. In this Getting Started guide you’ll learn how to:

  • Create a Cluster of 3 Members.
  • Start the Hazelcast Management Center.
  • Add data to the cluster using a sample client in the language of your choice.
  • Add and remove some cluster members to demonstrate the automatic rebalancing of data and back-ups.

Hazelcast Jet

Learn how to run a distributed data stream processing pipeline in Java. In this Getting Started guide you’ll learn how to:

  • Start a Hazelcast Jet cluster in your JVM.
  • Build the Word Count application.
  • Execute the application wit Jet.
  • Push testing data to the cluster and explore results

As a next step, you will be able explore the code samples and extend your setup more features and connectors.

Why Jet

Overview

Cloud-ready, simple to scale, unparalleled speed

Architecture

Jet is a single, embeddable Java library for building fault-tolerant and elastic data processing pipelines. The nodes automatically discover each other and form a cluster. You can add more nodes that immediately share the computation load. Jet continues processing data without loss even if a node fails. Jet runs in any cloud and functions seamlessly in Kubernetes.
Learn More

Performance

Jet is designed for predictable low latency. It uses a combination of a directed acyclic graph (DAG) computation model, parallel execution, in-memory processing and storage, data locality, partition mapping affinity, single-producer/single-consumer queues, and green threads to achieve very high and predictable performance.
Learn More

Connectors

Use a rich set of out-of-the-box connectors to integrate with databases, messaging systems, object stores, files and REST services.
Learn More

Comparisons

Jet comes with integrated APIs for data processing, storage, messaging, and distributed coordination. Learn how it compares to other popular data processing technologies.
Learn More

Insights

For Java Developers

When working with a single Java Virtual Machine, a developer would use java.util.Collections to store operational data and java.util.stream as a higher-level API to process the data. Hazelcast Jet shifts this approach to a multi-JVM setup for scalability and high availability.

Hazelcast Jet is a distributed and robust implementation of Java Collections and Concurrency APIs extended with a functional-style, declarative data processing Java API inspired by Java Streams.

  • Implement computations by linking the respective transformations, then connect those to the upstream data sources and downstream sinks such as Hazelcast, Apache Kafka, S3, Hadoop, JMS, databases, and files (see the connectors).
  • Add more JVMs to scale your application. Jet will immediately start sharing the computation load.
  • Run Java Clients to your Hazelcast Jet cluster or embed Hazelcast Jet nodes in your own Java applications (no runtime dependencies!).
  • Coordinate your application using a linearizable and distributed implementation of the Java concurrency primitives backed by the Raft consensus algorithm such as locks, atomics, semaphores, and latches. [code sample]

For Hazelcast Users

Hazelcast Jet builds on its tight integration with Hazelcast IMDG – the robust, distributed in-memory storage with querying and event-driven programming support. The services of Hazelcast IMDG are available in the Jet cluster to be used in conjunction with the data processing to:

  • Load the data to the cluster cache using Jet connectors.
  • Query cached data with Jet. [code sample]
  • Cache your reference data and enrich the event stream with it. [code sample]
  • Store the results of a computation. [code sample]

Hazelcast IMDG deployments can be upgraded to Jet. Jet can also use a remote Hazelcast IMDG cluster as a data layer.

For Architects

Hazelcast Jet combines services for stateful streaming, batch processing, caching, operational storage, messaging, and coordination into one integrated package

  • Compose the high-level Hazelcast Jet APIs to create powerful yet simple data-intensive applications rather than stitching together several heterogeneous components.
  • Scale the cluster up and down by adding and removing cluster members without downtime.
  • Integrate with the ecosystem using the Connectors and Cloud plugins.
  • The memory-centric design guarantees very high performance with constantly low latency.

Use Cases

Use Cases

Hazelcast Jet is a framework for building continuous data processing applications that scale.

Stream Processing
Stream Processing

Jet makes the stream actionable by building and maintaining a queryable view of it. Jet continuously reads the streaming source, processes (deduplicates, aggregates, correlates, joins) new data as it arrives, and updates the cache with fresh results. Consumers fetch the pre-processed data from the cache rather than running queries on raw sequences of records. This reduces latency for consumers. Consumers can query the cache or subscribe for cache updates to get fresh data in milliseconds.

Distributed Compute (Spark replacement)
Distributed Compute (Spark replacement)

Use Hazelcast Jet pipelines to replace and speed up your MapReduce, Spark, or custom Java data processing jobs. Load data sets to a cluster cache and perform fast compute jobs on top of the cached data. The distributed storage of Jet can reasonably accommodate terabytes of data, and cached data can be reused by multiple jobs. Pipelines can combine in-memory collections with external data sets to cache only the hot, frequently accessed data.

Significant performance gains can be achieved by combining an in-memory approach with job and data co-location plus parallel execution.

Continuous Data Integration
Continuous Data Integration

Integrate data in real time with continuous pipelines. Hazelcast Jet talks to various systems (messaging, databases, caches, file systems, RPC services) using connectors to continuously move data from place to place.

The distributed execution engine of Jet makes the ETL process scalable and resilient. Highlights include:

  • Exactly-once processing guarantees even during failures.
  • Distributed transactions with two-phase commits to extend exactly-once guarantees to the entire data pipeline. Example: record read isn’t acknowledged to a JMS broker until results are successfully written to a JDBC database.
  • The Jet cluster can be upscaled to accelerate the ETL process.
The in-memory storage layer can be used as an input buffer or to store operational datasets.

Data-driven Microservices
Data-driven Microservices

The Hazelcast Jet server can run either embedded or as a standalone data processing cluster. In embedded mode, package the Jet JAR with the application and start the Jet cluster member from application code. The Jet member runs in the same JVM as the application.

Embedding Jet provides services for data processing, storage, messaging, and distributed coordination at an application level. It simplifies the packaging and deployment because everything is distributed in one self-contained package (such as a JAR or Docker container) with no runtime dependencies.

Embedding Jet is possible because:

  • Jet has no runtime dependencies.
  • All cluster nodes are symmetrical. There is no dedicated master node.
  • Automatic discovery and elasticity. New members automatically discover and join the running cluster and the processing job, starting it if absent.

Free Hazelcast Online Training Center

Whether you're interested in learning the basics of in-memory systems, or you're looking for advanced, real-world production examples and best practices, we've got you covered.

Open Gitter Chat