Hazelcast IMDG 3.8 Deployment and Operations Guide

Return to hazelcast.org

Hazelcast IMDG 3.8 Deployment and Operations Guide

Table of Contents

Introduction

Purpose of this document

Hazelcast Versions

Network Architecture and Configuration

Topologies

Advantages of Embedded Architecture

Advantages of Client-Server Architecture

Open Binary Client Protocol

Cluster Discovery Protocols

Firewalls, NAT, Network Interfaces and Ports

WAN Replication (Enterprise Feature)

Lifecycle, Maintenance and Updates

Configuration Management

Cluster Startup

Hot Restart Store (Enterprise HD Feature)

Topology Changes: Nodes Joining and Leaving

Maintenance and Software Updates

Hazelcast IMDG Software Updates

Performance Tuning and Optimization

Dedicated, Homogeneous Hardware Resources

Partition Count

Dedicated Network Interface Controller for Hazelcast Members

Network Settings

Garbage Collection

High-Density Memory Store (Enterprise HD Feature)

Azul Zing® and Zulu® Support (Enterprise Feature)

Optimizing Queries

Optimizing Serialization

Serialization Optimization Recommendations

Executor Service Optimizations

Executor Service Tips and Best Practices

Near Cache

Client Executor Pool Size

Clusters with Many (Hundreds) of Nodes or Clients

Basic Optimization Recommendations

Cluster Sizing

Sizing Considerations

Example: Sizing a Cache Use Case

Security and Hardening

Features (Enterprise and Enterprise HD)

Security Defaults

Hardening Recommendations

Secure Context

Deployment and Scaling Runbook

Failure Detection and Recovery

Common Causes of Node Failure

Health Monitoring & Alerts

Recovery from a Partial or Total Failure

Hazelcast Diagnostics Log

Management Center (Subscription and Enterprise Feature)

Cluster-Wide Statistics and Monitoring

Web Interface Home Page

Data Structure and Member Management

Monitoring Cluster Health

Monitoring WAN Replication

Enterprise Cluster Monitoring with JMX and REST (Subscription and Enterprise)

Recommendations for Setting Alerts

Actions and Remedies for Alerts

Guidance for Specific Operating Environments

Solaris Sparc

VMWare ESX

Amazon Web Services

Handling Network Partitions

Split-Brain on Network Partition

Split Brain Protection

Split Brain Resolution

License Management

How to Report Issues to Hazelcast

Hazelcast Support Subscribers

Hazelcast Open Source Users

Introduction

Hazelcast IMDG provides a convenient and familiar interface for developers to work with distributed data structures and other aspects of in-memory computing. For example, in its simplest configuration Hazelcast IMDG can be treated as an implementation of the familiar ConcurrentHashMap that can be accessed from multiple JVMs (Java Virtual Machine), including JVMs that are spread out across the network. However, the Hazelcast IMDG architecture has sufficient flexibility and advanced features that it can be used in a large number of different architectural patterns and styles. The following schematic represents the basic architecture of Hazelcast IMDG.


Hazelcast 3.8 Architecture_v0.2 SPOT.ai

Though Hazelcast IMDG’s architecture is sophisticated many users are happy integrating purely at the level of the java.util.concurrent or javax.cache APIs.

The core Hazelcast IMDG technology:

The primary capabilities that Hazelcast IMDG provides include:

Elasticity means that Hazelcast clusters can grow capacity simply by adding new nodes. Redundancy means that you have great flexibility when you configure Hazelcast clusters for data replication policy (which defaults to one synchronous backup copy). To support these capabilities, Hazelcast IMDG has a concept of members: Members are JVMs that join a Hazelcast cluster; a cluster provides a single extended environment where data can be synchronized between (and processed by) its members.

Purpose of this document

If you are a Hazelcast IMDG user planning to go into production with a Hazelcast-backed application, or you are curious about the practical aspects of deploying and running such an application, this guide will provide an introduction to the most important aspects of deploying and operating a successful Hazelcast installation.

In addition to this guide, there is a host of useful resources available online including Hazelcast product documentation, Hazelcast forums, books, webinars, and blog posts. Where applicable, each section of this document provides links to further reading if you would like to delve more deeply into a particular topic.

Hazelcast also offers support, training, and consulting to help you get the most out of the product and to ensure successful deployment and operation. Visit hazelcast.com/pricing for more information.

Hazelcast Versions

This document is current to Hazelcast IMDG version 3.8. It is not explicitly backward-compatible to earlier versions, but may still substantially apply.

Network Architecture and Configuration

Topologies

Hazelcast IMDG supports two modes of operation: “embedded member”, where the JVM containing application code joins the Hazelcast cluster directly, and “client plus member”, whereby a secondary JVM (which may be on the same host, or on a different one) is responsible for joining the Hazelcast cluster. These two approaches to topology are shown in the following diagrams.

Here is the embedded approach:

IMDGEmbeddedMode_v0.2 SPOT.ai

Figure 1: Hazelcast IMDG embedded topology

Here is the client plus member topology:

ClientAndMemberNode_v0.3 SPOT.ai

Figure 2: Hazelcast IMDG client-server topology

Under most circumstances we recommend the client plus member topology, as it provides greater flexibility in terms of cluster mechanics—member JVMs can be taken down and restarted without any impact on the overall application, as the Hazelcast client will simply reconnect to another member of the cluster. Another way of saying this is that client plus member topologies isolate application code from purely cluster-level events.

Hazelcast IMDG allows clients to be configured within the client code (programmatically), or by XML, or by properties files. Configuration uses properties files (handled by the class com.hazelcast.client.config.ClientConfigBuilder) and XML (via com.hazelcast.client.config.XmlClientConfigBuilder). Clients have quite a few configurable parameters, including known members of the cluster. Hazelcast IMDG will discover the other members as soon as they are online, but they need to connect first. In turn, this requires the user to configure enough addresses to ensure that the client can connect into the cluster somewhere.

In production applications, the Hazelcast client should be reused between threads and operations. It is designed for multithreaded operation, and creation of a new Hazelcast client is relatively expensive, as it handles cluster events, heartbeating, etc., so as to be transparent to the user.

Advantages of Embedded Architecture

The main advantage of using the embedded architecture is its simplicity. Because the Hazelcast IMDG services run in the same JVMs as the application, there are no extra servers to deploy, manage, or maintain. This simplicity applies especially when the Hazelcast cluster is tied directly to the embedded application.

Advantages of Client-Server Architecture

For most use cases, however, there are significant advantages to using the client-server architecture. Broadly, they are as follows:

  1. Cluster member lifecycle is independent of application lifecycle
  2. Resource isolation
  3. Problem isolation
  4. Shared Infrastructure
  5. Better scalability

Cluster Member Node Lifecycle Independent of Application Lifecycle

The practical lifecycle of Hazelcast member nodes is usually different from any particular application instance. When Hazelcast IMDG is embedded in an application instance, the embedded Hazelcast node must necessarily be started and shut down in concert with its co-resident application instance, and vice-versa. This is often not

ideal and may lead to increased operational complexity. When Hazelcast nodes are deployed as separate server instances, they and their client application instances may be started and shut down independently of one another.

Resource Isolation

When Hazelcast IMDG is deployed as a member on its own dedicated host, it does not compete with the application for CPU, memory, and I/O resources. This makes Hazelcast performance more predictable
and reliable.

Easier Problem Isolation

Because Hazelcast member activity is isolated on its own server, it’s easier to identify the cause of any pathological behavior. For example, if there is a memory leak in the application causing heap usage to grow without bounds, the memory activity of the application is not obscured by the co-resident memory activity of Hazelcast services. The same holds true for CPU and I/O issues. When application activity is isolated from Hazelcast services, symptoms are automatically isolated and easier to recognize.

Shared Infrastructure

The client server architecture is appropriate when using Hazelcast IMDG as a shared infrastructure such
that multiple applications, especially those under the control of different work groups, use the same cluster
or clusters.

Better Scalability

The client-server architecture has a better, more flexible scalability profile. When you need to scale, simply add more Hazelcast IMDG servers. With the client-server deployment model, the separate client and server scalability concerns may be addressed separately.

Achieve Very Low Latency with Client-Server

If you need very low latency data access, but you also want the scalability advantages of the client-server deployment model, consider configuring the clients to use Near Cache. This will ensure that frequently used data is kept in local memory on the application JVM.

Further Reading:

Open Binary Client Protocol

Hazelcast IMDG includes an Open Binary Protocol to facilitate the development of Hazelcast client APIs on any platform. In addition to the protocol documentation itself, there is an implementation guide and a Python client API reference implementation that describes how to implement a new Hazelcast client.

Further Reading:

Cluster Discovery Protocols

Hazelcast IMDG supports four options for cluster creation and discovery when nodes start up:

Once a node has joined a cluster, all further network communication is performed via TCP.

Multicast

The advantage of multicast discovery is its simplicity and flexibility. As long as Hazelcast IMDG’s local network supports multicast, the cluster members do not need to know each other’s specific IP addresses when they start up; they just multicast to all other members on the subnet. This is especially useful during development and testing. In production environments, if you want to avoid accidentally joining the wrong cluster, then use Group Configuration.

Further Reading:

TCP

When using TCP for cluster discovery, the specific IP address of at least one other cluster member must be specified in configuration. Once a new node discovers another cluster member, the cluster will inform the new node of the full cluster topology, so the complete set of cluster members need not be specified in the configuration. However, we recommend that you specify the addresses of at least two other members in case one of those members is not available at startup time.

Amazon EC2 Auto Discovery

Hazelcast IMDG on Amazon EC2 supports TCP and EC2 auto-discovery, which is similar to multicast. It is useful when you do not want to or cannot provide the list of possible IP addresses. To configure your cluster to use EC2 Auto Discovery, disable cluster joining over multicast and TCP/IP, enable AWS, and provide your credentials (access and secret keys).

Cloud Discovery SPI

As of version 3.6, as the result of the first Hazelcast Enhancement Proposal (HEP), Hazelcast IMDG now provides a Cloud Discovery Service Provider Interface (SPI) to allow for pluggable, third-party discovery implementations.

An example implementation is available in the Hazelcast code samples repository on GitHub:
https://github.com/hazelcast/hazelcast-code-samples/tree/master/spi/discovery

As of January, 2016, the following third-party API implementations are available:

Further Reading:

For detailed information on cluster discovery and network configuration for Multicast, TCP and EC2, see the following documentation:

Firewalls, NAT, Network Interfaces and Ports

Hazelcast IMDG’s default network configuration is designed to make cluster startup and discovery simple and flexible out of the box. It’s also possible to tailor the network configuration to fit the specific requirements of your production network environment.

If your server hosts have multiple network interfaces, you may customize the specific network interfaces Hazelcast IMDG should use. You may also restrict which hosts are allowed to join a Hazelcast cluster by specifying a set of trusted IP addresses or ranges. If your firewall restricts outbound ports, you may configure Hazelcast IMDG to use specific outbound ports allowed by the firewall. Nodes behind network address translation (NAT) in, for example, a private cloud may be configured to use a public address.

Further Reading:

WAN Replication (Enterprise Feature)

If, for example, you have multiple data centers for geographic locality or disaster recovery and you need to synchronize data across the clusters, Hazelcast IMDG Enterprise supports wide-area network (WAN) replication. WAN replication operates in either active-passive mode, where an active cluster backs up to a passive cluster, or active-active mode, where each participating cluster replicates to all others.

You may configure Hazelcast IMDG to replicate all data, or restrict replication to specific shared data structures. In certain cases, you may need to adjust the replication queue size. The default replication queue size is 100,000, but in high volume cases, a larger queue size may be required to accommodate all of the
replication messages.

Further Reading:

Lifecycle, Maintenance and Updates

When operating a Hazelcast IMDG installation over time, planning for certain lifecycle events will ensure high uptime and smooth operation. Before moving your Hazelcast IMDG application into production, you will want to have policies in place for handling various aspects of your installation such as:

Configuration Management

Some Map configuration options may be updated after a cluster has started—for example, TTL and TTI via file or programmatic configuration or via the Management Center. Other configuration options can’t be changed on a running cluster. Hazelcast IMDG will not accept nor communicate any configuration of joining nodes that differs from the cluster configuration. The following are configurations that are to remain the same at all nodes in a cluster and may not be changed after cluster startup:

The use of a file change monitoring tool is recommended to ensure proper configuration across the cluster.

Further Reading:

Cluster Startup

Hazelcast cluster startup is typically as simple as starting all of the nodes. Cluster formation and operation will happen automatically. However, in certain use cases you may need to coordinate the startup of the cluster in a particular way. In a cache use case, for example, where shared data is loaded from an external source such as a database or web service, you may want to ensure the data is substantially loaded into the Hazelcast cluster before initiating normal operation of your application.

Data and Cache Warming

Data from an external source via a custom MapLoader implementation may be loaded either lazily or eagerly, depending on configuration. The Hazelcast IMDG instance will immediately return lazy-loaded maps from calls to getMap(). Alternately, the Hazelcast IMDG instance will block calls to getMap() until all of the data is loaded from the MapLoader.

Further Reading:

Hot Restart Store (Enterprise HD Feature)

As of version 3.6, Hazelcast IMDG Enterprise HD provides an optional disk-based data-persistence mechanism to enable hot restart. This is especially useful when loading cache data from the canonical data sources is slow or resource-intensive.

Note: The persistence capability supporting the hot restart capability is meant to facilitate cluster restart. It is not intended or recommended for canonical data storage.

With hot restart enabled, each member writes its data to local disk using a log-structured persistence algorithm for minimum write latency. A garbage collector thread runs continuously to remove stale data from storage.

Hot Restart from Planned Shutdown

Hot Restart Store may be used after either a full-cluster shutdown or member-by-member in a rolling-restart. In both cases, care must be taken to transition the whole cluster or individual cluster members from an “ACTIVE” state to an appropriate quiescent state to ensure data integrity. (See the documentation on managing cluster and member states for more information on the operating profile of each state.)

Hot Restart from Full-Cluster Shutdown

To stop and start an entire cluster using Hot Restart Store, the entire cluster must first be transitioned from “ACTIVE” to “PASSIVE” or “FROZEN” state prior to shutdown. Full-cluster shutdown may be initiated in any of the following ways:

Hot Restart of Individual Members

Individual members may be stopped and restarted using Hot Restart Store during, for example, a rolling upgrade. Prior to shutdown of any member, the whole cluster must be transitioned from “ACTIVE” to “PASSIVE” or “FROZEN” state. Once the cluster has safely transitioned to the appropriate state, each member may be shut down independently. When a member restarts, it will reload its data from disk and re-join the running cluster.

When all members have been restarted and have joined the cluster, the cluster may be transitioned back to the “ACTIVE” state.

Hot Restart from Unplanned Shutdown

Should an entire cluster crash at once (due, for example, to power or network service interruption), the cluster may be restarted using Hot Restart Store. Each member will attempt to restart using the last saved data. There are some edge cases where the last saved state may be unusable—for example, the cluster crashes during an ongoing partition migration. In such cases, hot restart from local persistence is not possible.

For more information on hot restart, see the documentation on hot restart persistence.

Force Start with Hot Restart Enabled

A member can crash permanently and then be unable to recover from the failure. In that case, restart process cannot be completed since some of the members do not start or fail to load their own data. In that case, you can force the cluster to clean its persisted data and make a fresh start. This process is called force start. (See the documentation on force start with hot restart enabled.)

Partial Start with Hot Restart Enabled

When one or more members fail to start or have incorrect Hot Restart data (stale or corrupted data) or
fail to load their Hot Restart data, cluster will become incomplete and restart mechanism cannot proceed.
One solution is to use Force Start and make a fresh start with existing members. Another solution is to do a partial start.

Partial start means that the cluster will start with an incomplete member set. Data belonging to those missing members will be assumed lost and Hazelcast IMDG will try to recover missing data using the restored backups. For example, if you have minimum two backups configured for all maps and caches, then a partial start up to two missing members will be safe against data loss. If there are more than two missing members or there are maps/caches with less than two backups, then data loss is expected. (See the documentation on partial start with hot restart enabled.)

Moving/Copying Hot Restart Data

After Hazelcast member owning the Hot Restart data is shutdown, Hot Restart base-dir can be copied/moved to a different server (which may have different IP address and/or different number of CPU cores) and Hazelcast member can be restarted using the existing Hot Restart data on that new server. Having a new IP address does not affect Hot Restart, since it does not rely on the IP address of the server but instead uses Member UUID as a unique identifier. (See the documentation on moving or copying hot restart data.)

Hot Backup

During Hot Restart operations you can take a snapshot of the Hot Restart Store at a certain point in time. This is useful when you wish to bring up a new cluster with the same data or parts of the data. The new cluster can then be used to share load with the original cluster, to perform testing, QA or reproduce an issue on production data.

Simple file copying of a currently running cluster does not suffice and can produce inconsistent snapshots with problems such as resurrection of deleted values or missing values. (See the documentation on hot backup.)

Topology Changes: Nodes Joining and Leaving

The oldest node in the cluster is responsible for managing a partition table that maps the ownership of Hazelcast IMDG’s data partitions to the nodes in the cluster. When the topology of the cluster changes—either a node joins or leaves the cluster—the oldest node rebalances the partitions across the extant nodes to ensure equitable distribution of data, then initiates the process of moving partitions according to the new partition table. While a partition is in transit to its new node, only requests for data in that partition will block. When a node leaves the cluster, the nodes that hold the backups of the partitions held by the exiting node promote those backup partitions to be primary partitions and are immediately available for access. To avoid data loss, it is important to ensure that all the data in the cluster has been backed up again before taking down other nodes. To shutdown a node gracefully, call the HazelcastInstance.shutdown() method, which will block until there is no active data migration and at least one backup of that node’s partitions are synced with the new primary ones. To ensure that the entire cluster (rather than just a single node) is in a “safe” state , you may call PartitionService.isClusterSafe(). If PartitionService.isClusterSafe() returns true, it is safe to take down another node. You may also use the Management Center to determine if the cluster or a given node is in a safe state. See the Management Center section below.

Non-map data structures, e.g. Lists, Sets, Queues, etc., are backed up according to their backup count configuration, but their data is not distributed across multiple nodes. If a node with a non-map data structure leaves the cluster, its backup node will become the primary for that data structure, and it will be backed up to another node. Because the partition map changes when nodes join and leave the cluster, be sure not to store object data to a local filesystem if you persist objects via MapStore and MapLoader interfaces. The partitions that a particular node is responsible for will almost certainly change over time, rendering locally persisted data inaccessible when the partition table changes.

Further Reading:

Maintenance and Software Updates

When you need to perform software updates and hardware maintenance, you will most likely be able to do so without incurring downtime. Make sure that you have enough memory and CPU headroom in your cluster to allow for smooth operation when a cluster member node that you are updating is removed from service.

There are four flavors of update that each require different considerations vis-à-vis the state of a running cluster and its data:

  1. Hardware, operating system, or JVM updates. All of these may be updated live on a running cluster without scheduling a maintenance window. Note: Hazelcast IMDG supports Java versions 6-8. While not a best practice, JVMs of any supported Java version may be freely mixed and matched between the cluster and its clients and between individual members of a cluster.
  2. Live updates to user application code that executes only on the client side. These updates may be performed against a live cluster with no downtime. Even if the new client-side user code defines new Hazelcast data structures, these are automatically created in the cluster. As other clients are upgraded they will be able to use these new structures. Changes to classes that define existing objects stored in Hazelcast IMDG are subject to some restrictions. Adding new fields to classes of existing objects is always allowed. However, removing fields or changing the type of a field will require special consideration. See the section on object schema changes below.
  3. Live updates to user application code that executes on cluster members and on cluster clients. Clients may be updated and restarted without any interruption to cluster operation.
  4. Updates to Hazelcast libraries. Prior to Hazelcast 3.5, all members and clients of a running cluster must run the same major and minor version of Hazelcast. Patch-level upgrades are guaranteed to work with each other.

Live Updates to Cluster Member Nodes

In most cases, maintenance and updates may be performed on a live, running cluster without incurring downtime. However, when performing a live update, you must take certain precautions to ensure the continuous availability of the cluster and the safety of its data. When you remove a node from service, its data backups on other nodes become active, and the cluster automatically creates new backups and rebalances data across the new cluster topology. Before stopping another member node, you must ensure that the cluster has been fully backed up and is once again in a safe, high-availability state.

The following steps will ensure cluster data safety and high availability when performing maintenance or software updates:

  1. Remove one member node from service. You may either kill the JVM process, call HazelcastInstance.shutdown(), or use the Management Center. Note: when you stop a member, all locks and semaphore permits held by that member will be released.
  2. Perform the required maintenance or updates on that node’s host.
  3. Restart the node. The cluster will once again automatically rebalance its data based on the new
    cluster topology.
  4. Wait until the cluster is in a safe state before removing any more nodes from service. The cluster is in a safe state when all of its members are in a safe state. A member is in a safe state when all of its data has been backed up to other nodes according to the backup count. You may call HazelcastInstance.getPartitionService().isClusterSafe() to determine whether the entire cluster is in a safe state. You may also call HazelcastInstance.getPartitionService().isMemberSafe(Member member) to determine whether a particular node is in a safe state. Likewise, the Management Center displays the current safety of the cluster on its dashboard.
  5. Continue this process for all remaining member nodes.

Live Updates to Clients

A client is a process that is connected to a Hazelcast cluster with either Hazelcast’s client library (Java, C++, C#, .Net), REST, or Memcached interfaces. Restarting clients has no effect on the state of the cluster or its members, so they may be taken out of service for maintenance or updates at any time and in any order. However, any locks or semaphore permits acquired by a client instance will be automatically released. In order to stop a client JVM, you may kill the JVM process or call HazelcastClient.shutdown().

Live Updates to User Application Code that Executes on Both Clients and Cluster Members

Live updates to user application code on cluster members nodes is supported where:

Examples of what is allowed are new EntryProcessors, ExecutorService, Runnable, Callables, Map/Reduce and Predicates. Because the same code must be present on both clients and members, you should ensure the code is installed on all of the cluster members before invoking that code from a client. As a result, all cluster members must be updated before any client is.

Procedure:

  1. Remove one member node from service
  2. Update the user libraries on the member node
  3. Restart the member node
  4. Wait until the cluster is in a safe state before removing an more nodes from service
  5. Continue this process for all remaining member nodes
  6. Update clients in any order

Object Schema Changes

When you release new versions of user code that uses Hazelcast IMDG data, take care to ensure that the object schema for that data in the new application code is compatible with the existing object data in Hazelcast IMDG,or implement custom deserialization code to convert the old schema into the new schema. Hazelcast IMDG supports a number of different serialization methods, one of which,the Portable interface,directly supports the use of multiple versions of the same class in different class loaders. See below for more information on different serialization options.

If you are using object persistence via MapStore and MapLoader implementations, be sure to handle object schema changes there as well. Depending on the scope of object schema changes in user code updates, it may be advisable to schedule a maintenance window to perform those updates. This will avoid unexpected problems with deserialization errors associated with updating against a live cluster.

Hazelcast IMDG Software Updates

Prior to Hazelcast version 3.5, all members and clients must run the same major and minor version of Hazelcast Different patch-level updates are guaranteed to work with each other. For example, Hazelcast version 3.4.0 will work with 3.4.1 and 3.4.2, allowing for live updates of those versions against a running cluster.

From version 3.6, Hazelcast supports updating clients with different minor versions. For example, Hazelcast 3.6.x clients will work with Hazelcast version 3.7.x.

Between Hazelcast version 3.5 and 3.8, minor version updates of cluster members must be performed concurrently, which will require scheduled a maintenance window to bring the cluster down. Only patch-level updates are supported on members of a running cluster (i.e., rolling upgrade).

Starting with Hazelcast IMDG Enterprise 3.8, each next minor version released will be compatible with the previous one. For example, it will be possible to perform a rolling upgrade on a cluster running Hazelcast IMDG Enterprise 3.8 to Hazelcast IMDG Enterprise 3.9 whenever that is released. Rolling upgrades across minor versions is a Hazelcast IMDG Enterprise feature, starting with version 3.8.

The compatibility guarantees described above are given in the context of rolling member upgrades and only apply to GA (general availability) releases. It is never advisable to run a cluster with members running on different patch or minor versions for prolonged periods of time.

Live Updates of Hazelcast Libraries on Clients

Where compatibility is guaranteed, the procedure for updating Hazelcast libraries on clients is as follows:

  1. Take any number of clients out of service
  2. Update the Hazelcast libraries on each client
  3. Restart each client
  4. Continue this process until all clients are updated

Updates to Hazelcast Libraries on Cluster Members

For patch-level Hazelcast IMDG updates, use the procedure for live updates on member nodes described above.

For major and minor-level Hazelcast version updates before Hazelcast IMDG 3.8, use the following procedure:

  1. Schedule a window for cluster maintenance
  2. Start the maintenance window
  3. Stop all cluster members
  4. Update Hazelcast libraries on all cluster member hosts
  5. Restart all cluster members
  6. Return the cluster to service

Rolling Member Upgrades

As it is stated above, Hazelcast supports rolling upgrades across minor versions starting with the version 3.8. The detailed procedures for rolling member upgrades can be found in the documentation. (See the documentation on Rolling Member Upgrades)

Performance Tuning and Optimization

Aside from standard code optimization in your application, there are a few Hazelcast IMDG-specific optimization considerations to keep in mind when preparing for a new Hazelcast IMDG deployment.

Dedicated, Homogeneous Hardware Resources

The first, easiest, and most effective optimization strategy for Hazelcast IMDG is to ensure that Hazelcast IMDG services are allocated their own dedicated machine resources. Using dedicated, properly sized hardware (or virtual hardware) ensures that Hazelcast IMDG nodes have ample CPU, memory, and network resources without competing with other processes or services.

Hazelcast IMDG distributes load evenly across all of its nodes and assumes that the resources available to each of its nodes is homogeneous. In a cluster with a mix of more powerful and less powerful machines, the weaker nodes will cause bottlenecks, leaving the stronger nodes underutilized. For predictable performance, it’s best to use equivalent hardware for all Hazelcast IMDG nodes.

Partition Count

Hazelcast IMDG’s default partition count is 271, which is a good choice for clusters of up to 50 nodes and ~25–30 GB of data. Up to this threshold, partitions are small enough that any rebalancing of the partition map when nodes join or leave the cluster doesn’t disturb smooth operation of the cluster. However, with larger clusters and bigger data sets, a larger partition count helps to rebalance data more efficiently across nodes.

An optimum partition size is anything between 50MB – 100MB. Therefore, while designing the cluster, size the data that will be distributed across all nodes and determine a number of partitions such that that no partition size exceeds 100MB. If the default count of 271 results in heavily loaded partitions, increase the partition count to the point where current data load plus headroom for a future increase in data load keeps per-partition size under 100MB.

Important: If you change the partition count from the default, be sure to use a prime number of partitions. This will help minimize collision of keys across partitions, ensuring more consistent lookup times. For further reading on the advantages of using a prime number of partitions, see http://www.quora.com/Does-making-array-size-a-prime-number-help-in-hash-table-implementation-Why.

Important: If you are an Enterprise customer using the High-Density Data Store with large data sizes, we recommend a large increase in partition count, starting with 5009 or higher.

The partition count cannot be changed after a cluster is created, so if you have a larger cluster, be sure to test for and set an optimum partition count prior to deployment. If you need to change the partition count after a cluster is running, you will need to schedule a maintenance window to update the partition count and restart the cluster.

Dedicated Network Interface Controller for Hazelcast Members

Provisioning a dedicated physical network interface controller (NIC) for Hazelcast member nodes ensures smooth flow of data, including business data and cluster health checks, across servers. Sharing network interfaces between a Hazelcast IMDG instance and another application could result in choking the port, thus causing unpredictable cluster behavior.

Network Settings

Adjust TCP buffer size

TCP uses a congestion window to determine how many packets it can send at one time; the larger the congestion window, the higher the throughput. The maximum congestion window is related to the amount of buffer space that the kernel allocates for each socket. For each socket, there is a default value for the buffer size, which may be changed by using a system library call just before opening the socket. The buffer size for both the receiving and sending sides of the socket may be adjusted.

To achieve maximum throughput, it is critical to use optimal TCP socket buffer sizes for the links you are using to transmit data. If the buffers are too small, the TCP congestion window will never open up fully, therefore throttling the sender. If the buffers are too large, the sender can overrun the receiver such that the sending host is faster than the receiving host, which will cause the receiver to drop packets and the TCP congestion window to shut down.

Hazelcast IMDG, by default, configures I/O buffers to 32KB, but these are configurable properties and may be changed in Hazelcast IMDG configuration with the following configuration parameters:

Typically, throughput may be determined by the following formulae:

TPS = Buffer Size / Latency

and

Buffer Size = RTT (Round Trip Time) * Network Bandwidth

To increase TCP Max Buffer Size in Linux, see the following settings:

To increase TCP auto-tuning by Linux, see the following settings:

Further Reading:

Garbage Collection

Keeping track of garbage collection statistics is vital to optimum Java performance, especially if you run the JVM with large heap sizes. Tuning the garbage collector for your use case is often a critical performance practice prior to deployment. Likewise, knowing what baseline garbage collection behavior looks like and monitoring for behavior outside of normal tolerances will keep you apprised of potential memory leaks and other pathological memory usage.

Minimize Heap Usage

The best way to minimize the performance impact of garbage collection is to keep heap usage small. Maintaining a small heap can save countless hours of garbage collection tuning and will provide much higher stability and predictability across your entire application. Even if your application uses very large amounts of data, you can still keep your heap small by using Hazelcast High-Density Memory Store.

Some common off-the-shelf GC tuning parameters for Hotspot and OpenJDK:

-XX:+UseParallelOldGC
-XX:+UseParallelGC
-XX:+UseCompressedOops

To enable GC logging, use the following JVM arguments:

-XX:+PrintGCDetails
-verbose:gc
-XX:+PrintGCTimeStamp

High-Density Memory Store (Enterprise HD Feature)

Hazelcast High-Density Memory Store (HDMS) is an in-memory storage option that uses native, off-heap memory to store object data instead of the JVM heap, allowing you to keep terabytes of data in memory without incurring the overhead of garbage collection. HDMS capabilities supports JCache, Map, Hibernate, and Web Sessions data structures.

Available to Hazelcast IMDG Enterprise customers, the HDMS is a perfect solution for those who want the performance gains of large amounts of in-memory data, need the predictability of well-behaved Java memory management, and don’t want to waste time and effort on meticulous and fragile garbage collection tuning.

Important: If you are an Enterprise customer using the HDMS with large data sizes, we recommend a large increase in partition count, starting with 5009 or higher. See the Partition Count section above for more information. Also, if you intend to pre-load very large amounts of data into memory (tens, hundreds, or thousands of gigabytes), be sure to profile the data load time and to take that startup time into account prior to deployment.

Further Reading:

Azul Zing® and Zulu® Support (Enterprise Feature)

Azul Systems, the industry’s only company exclusively focused on Java and the Java Virtual Machine (JVM), builds fully supported, certified standards-compliant Java runtime solutions that help enable the real time business. Zing is a JVM designed for enterprise Java applications and workloads that require any combination of low latency, high transaction rates, large working memory, and/or consistent response times. Zulu and Zulu Enterprise are Azul’s certified, freely available open source builds of OpenJDK with a variety of flexible support options, available in configurations for the enterprise as well as custom and embedded systems.

Starting with version 3.6, Azul Zing is certified and supported in Hazelcast IMDG Enterprise. When deployed with Zing, Hazelcast IMDG deployments gain performance, capacity, and operational efficiency within the same capital infrastructure. Additionally, you can directly use Hazelcast IMDG with Zulu without making any changes to your code.

Further Reading:

Optimizing Queries

Add Indexes for Queried Fields

This will cause Hazelcast IMDG to cache a deserialized form of the object under query in memory. This removes the overhead of object deserialization per query, but will increase heap usage.

Object “in-memory-format”

An alternative to setting optimize query to “true” is to set the queried object’s in-memory format to “object.” This will force that object to be always kept in object format, resulting in faster access for queries, but also higher heap usage. It will also incur an object serialization step on every remote “get” operation.

Further Reading:

Implement the “Portable” Interface on Queried objects

The Portable interface allows for individual fields to be accessed without the overhead of deserialization or reflection and supports query and indexing support without full-object deserialization.

Further Reading:

Optimizing Serialization

Hazelcast IMDG supports a range of object serialization mechanisms, each with their own costs and benefits. Choosing the best serialization scheme for your data and access patterns can greatly increase the performance of your cluster. An in-depth discussion of the various serialization methods is referenced below, but here is an at-a-glance summary:

java.io.Serializable

Benefits:

Costs:

java.io.Externalizable

Benefits over standard Java serialization:

Benefits:

Costs:

com.hazelcast.nio.serialization.DataSerializable

Optimization over standard Java Serialization:

Benefits:

Costs:

com.hazelcast.nio.serialization.IdentifiedDataSerializable

Optimization over standard Java Serialization

Benefits:

Costs:

com.hazelcast.nio.serialization.Portable

Optimization over other serialization schemes:

Benefits:

Costs:

Pluggable serialization libraries, e.g. Kryo

Benefits:

Costs:

Serialization Optimization Recommendations

Further Reading:

Executor Service Optimizations

Hazelcast’s IExecutorService is an extension of Java’s built-in ExecutorService that allows for distributed execution and control of tasks. There are a number of options to Hazelcast’s executor service that will have an impact on performance.

Number of Threads

An executor queue may be configured to have a specific number of threads dedicated to executing enqueued tasks. Set the number of threads appropriate to the number of cores available for execution. Too few threads will reduce parallelism, leaving cores idle while too many threads will cause context switching overhead.

Bounded Execution Queue

An executor queue may be configured to have a maximum number of entries. Setting a bound on the number of enqueued tasks will put explicit back-pressure on enqueuing clients by throwing an exception when the queue is full. This will avoid the overhead of enqueuing a task only to be cancelled because its execution takes too long. It will also allow enqueuing clients to take corrective action rather than blindly filling up work queues with tasks faster than they can be executed.

Avoid Blocking Operations in Tasks

Any time spent blocking or waiting in a running task is thread execution time wasted while other tasks wait in the queue. Tasks should be written such that they perform no potentially blocking operations (e.g., network or disk I/O) in their run() or call() methods.

Locality of Reference

By default, tasks may be executed on any member node. Ideally, however, tasks should be executed on the same machine that contains the data the task requires to avoid the overhead of moving remote data to the local execution context. Hazelcast’s executor service provides a number of mechanisms for optimizing locality of reference.

Scaling Executor Services

If you find that your work queues consistently reach their maximum and you have already optimized the number of threads and locality of reference and removed any unnecessary blocking operations in your tasks, you may first try to scale up the hardware of the overburdened members by adding cores and, if necessary, more memory.

When you have reached diminishing returns on scaling up (such that the cost of upgrading a machine outweighs the benefits of the upgrade), you can scale out by adding more nodes to your cluster. The distributed nature of Hazelcast IMDG is perfectly suited to scaling out and you may find in many cases that it is as easy as just configuring and deploying additional virtual or physical hardware.

Executor Services Guarantees

In addition to the regular distributed executor service, durable and scheduled executor services are added to the feature set of Hazelcast IMDG with the versions 3.7 and 3.8. Note that when a node failure occurs, durable and scheduled executor services come with “at least once execution of a task” guarantee while the regular distributed executor service has none.

Executor Service Tips and Best Practices

Work Queue Is Not Partitioned

Each member-specific executor will have its own private work-queue. Once a job is placed on that queue, it will not be taken by another member. This may lead to a condition where one member has a lot of unprocessed work while another is idle. This could be the result of an application call such as the following:

for(;;){
iexecutorservice.submitToMember(mytask, member)
}

This could also be the result of an imbalance caused by the application, such as in the following scenario: all products by a particular manufacturer are kept in one partition. When a new, very popular product gets released by that manufacturer, the resulting load puts a huge pressure on that single partition while others remain idle.

Work Queue Has Unbounded Capacity by Default

This can lead to OutOfMemoryError because the number of queued tasks can grow without bounds. This can be solved by setting the <queue-capacity> property on the executor service. If a new task is submitted while the queue is full, the call will not block, but will immediately throw a RejectedExecutionException that the application must handle.

No Load Balancing

There is currently no load balancing available for tasks that can run on any member. If load balancing is needed, it may be done by creating an IExecutorService proxy that wraps the one returned by Hazelcast. Using the members from the ClusterService or member information from SPI:MembershipAwareService, it could route “free” tasks to a specific member based on load.

Destroying Executors

An IExecutorService must be shut down with care because it will shut down all corresponding executors in every member and subsequent calls to proxy will result in a RejectedExecutionException. When the executor is destroyed and later a HazelcastInstance.getExecutorService is done with the id of the destroyed executor, a new executor will be created as if the old one never existed.

Exceptions in Executors

When a task fails with an exception (or an error), this exception will not be logged by Hazelcast by default. This comports with the behavior of Java’s ThreadPoolExecutorService, but it can make debugging difficult. There are, however, some easy remedies: either add a try/catch in your runnable and log the exception,r wrap the runnable/callable in a proxy that does the logging; the last option will keep your code a bit cleaner.

Further Reading:

Near Cache

Access to small-to-medium, read-mostly data sets may be sped up by creating a Near Cache. This cache maintains copies of distributed data in local memory for very fast access.

Benefits:

Costs:

Further Reading:

Client Executor Pool Size

The Hazelcast client uses an internal executor service (different from the distributed IExecutorService) to perform some of its internal operations. By default, the thread pool for that executor service is configured to be the number of cores on the client machine times five—e.g., on a 4-core client machine, the internal executor service will have 20 threads. In some cases, increasing that thread pool size may increase performance.

Further Reading:

Clusters with Many (Hundreds) of Nodes or Clients

Very large clusters of hundreds of nodes are possible with Hazelcast IMDG, but stability will depend heavily on your network infrastructure and ability to monitor and manage that many servers. Distributed executions in such an environment will be more sensitive to your application’s handling of execution errors, timeouts, and the optimization of task code.

In general, you will get better results with smaller clusters of Hazelcast members running on more powerful hardware and a higher number of Hazelcast clients. When running large numbers of clients, network stability will still be a significant factor in overall stability. If you are running in Amazon’s EC2, hosting clients and servers in the same zone is beneficial. Using Near Cache on read-mostly data sets will reduce server load and network overhead. You may also try increasing the number of threads in the client executor pool (see above).

Further Reading:

Basic Optimization Recommendations

Cluster Sizing

To determine the size of the cluster you will need for your use case, you must first be able to answer the following questions:

Sizing Considerations

Once you know the size, access patterns, throughput, latency, and fault tolerance requirements of your application, you can use the following rules of thumb to help you determine the size of your cluster.

Memory Headroom

Once you know the size of your working set of data, you can start sizing your memory requirements. Data in Hazelcast IMDG is both active data and backup data for high availability, so the total memory footprint will be the size of your active data plus the size of your backup data. If your fault tolerance allows for just a single backup, then each member of the Hazelcast cluster will contain a 1:1 ratio of active data to backup data for a total memory footprint of two times the active data. If your fault tolerance requires two backups, then that ratio climbs to 1:2 active to backup data for a total memory footprint of three times your active data set. If you use only heap memory, each Hazelcast node with a 4GB heap should accommodate a maximum of 3.5 GB of total data (active and backup). If you use the High-Density Data Store, up to 75% of your physical memory footprint may be used for active and backup data, with headroom of 25% for normal fragmentation. In both cases, however, you should also keep some memory headroom available to handle any node failure or explicit node shutdown. When a node leaves the cluster, the data previously owned by the newly offline node will be redistributed across the remaining members. For this reason, we recommend that you plan to use only 60% of available memory, with 40% headroom to handle node failure or shutdown.

Recommended Configurations

Hazelcast IMDG performs scaling tests for each version of the software. Based on this testing we specify some scaling maximums. These are defined for each version of the software starting with 3.6. We recommend to stay below these numbers. Please contact Hazelcast if you plan to use higher limits.

In the documentation, multisocket clients are called smart clients. Each client maintains a connection to each Member. Unisocket clients have a single connection to the entire cluster.

Very Low-Latency Requirements

If your application requires very low latency, consider using an embedded deployment. This configuration will deliver the best latency characteristics. Another solution for ultra-low-latency infrastructure could be ReplicatedMap.

ReplicatedMap is a distributed data structure that stores an exact replica of data on each node. This way, all of the data is always present on every node in the cluster, thus preventing a network hop across to other nodes in the case of a map.get() request. Otherwise, the isolation and scalability gains of using a client-server deployment are preferable.

CPU Sizing

As a rule of thumb, we recommend a minimum of 8 cores per Hazelcast server instance. You may need more cores if your application is CPU-heavy in, for example, a high throughput distributed executor
service deployment.

Example: Sizing a Cache Use Case

Consider an application that uses Hazelcast IMDG as a data cache. The active memory footprint will be the total number of objects in the cache times the average object size. The backup memory footprint will be the active memory footprint times the backup count. The total memory footprint is the active memory footprint plus the backup memory footprint:

Total memory footprint = (total objects * average object size) + (total objects * average object size *
backup count)

For this example, let’s stipulate the following requirements:

Cluster Size Using the High-Density Memory Store

Since we have 50 GB of active data, our total memory footprint will be:

50 GB + 50 GB * 2 (backup count) = 150 GB.

Add 40% memory headroom and you will need a total of 250 GB of RAM for data.

To satisfy this use case, you will need 3 Hazelcast nodes, each running a 4 GB heap with ~84 GB of data off-heap in the High-Density Data Store.

Note: You cannot have a backup count greater than or equal to the number of nodes available in the cluster—Hazelcast IMDG will ignore higher backup counts and will create the maximum number of backup copies possible. For example, Hazelcast IMDG will only create two backup copies in a cluster of three nodes, even if the backup count is set equal to or higher than three.

Note: No node in a Hazelcast cluster will store primary as well as its own backup.

Cluster Size Using Only Heap Memory

Since it’s not practical to run JVMs with greater than a four GB heap, you will need a minimum of 42 JVMs, each with a four GB heap to store 150 GB of active and backup data as a four GB JVM would give approximately 3.5 GB of storage space. Add the 40% headroom discussed earlier, for a total of 250 GB of usable heap, then you will need ~72 JVMs, each running with four GB heap for active and backup data. Considering that each JVM has some memory overhead and Hazelcast IMDG’s rule of thumb for CPU sizing is eight cores per Hazelcast IMDG server instance, you will need at least 576 cores and upwards of 300 GB
of memory.

Summary

150 GB of data, including backups.

High-Density Memory Store:

Heap-only:

Security and Hardening

Hazelcast IMDG Enterprise offers a rich set of security features you can use:

Features (Enterprise and Enterprise HD)

The major security features are described below. Please see the Security section of the Hazelcast IMDG Reference Manual for details.

Socket Interceptor

The socket interceptor allows you to intercept socket connections before a node joins a cluster or a client connects to a node. This provides the ability to add custom hooks to the cluster join operation and perform connection procedures (like identity checking using Kerberos, etc.).

Security Interceptor

The security interceptor allows you to intercept every remote operation executed by the client. This lets you add very flexible custom security logic.

Encryption

All socket-level communication among all Hazelcast members can be encrypted. Encryption is based on the Java Cryptography Architecture.

SSL

All Hazelcast members can use SSL socket communication among each other.

Credentials and ClusterLoginModule

The Credentials interface and ClusterLoginModule allow you to implement custom credentials checking.

The default implementation that comes with Hazelcast IMDG uses a username/password scheme.

Cluster Member Security

Hazelcast IMDG Enterprise supports standard Java Security (JAAS) based authentication between cluster members.

Native Client Security

Hazelcast’s client security includes both authentication and authorization via configurable permissions policies.

Further Reading:

Security Defaults

Hazelcast port 5701 is used for all communication by default. Please see the port section in the Reference Manual for different configuration methods, and its attributes.

Hardening Recommendations

For enhanced security, we recommend the following hardening recommendations:

Secure Context

Hazelcast IMDG’s security features can be undermined by a weak security context. The following areas
are critical:

Host Security

Hazelcast IMDG does not encrypt data held in memory. Similarly the Hot Restart Store does not encrypt data. Finally encryption passwords or Java keystore passwords are stored in the hazelcast.xml and hazelcast-client.xml which is on the file system. Management Center passwords are also stored on the Management Center host.

An attacker with host access to either a Hazelcast member host or a Hazelcast client host with sufficient permission could therefore read data held in either memory or on disk and be in a position to obtain the key repository, though perhaps not the keys themselves.

Development and Test Security

Because encryption passwords or Java keystore passwords are stored in the hazelcast.xml and hazelcast-client.xml which is on the file system, different passwords should be used for production than for development. Otherwise the development and test teams will know these passwords.

Java Security

Hazelcast IMDG is primarily Java based. Java is less prone to security problems than C with security designed in; however the Java version being used should be immediately patched for any security patches.

Deployment and Scaling Runbook

The following is a sample set of procedures for deploying and scaling a Hazelcast cluster:

  1. Ensure you have the appropriate Hazelcast jars (hazelcast-ee for Enterprise) installed. Normally hazelcast-all-<version>.jar is sufficient for all operations, but you may also install the smaller hazelcast-<version>.jar on member nodes and hazelcast-client-<version>.jar for clients.
  2. If not configured programmatically, Hazelcast IMDG looks for a hazelcast.xml configuration file for server operations and hazelcast-client.xml configuration file for client operations. Place all the configurations at their respective places so that they can be picked by their respective applications (Hazelcast server or an application client).
  3. Make sure that you have provided the IP addresses of a minimum of two Hazelcast server nodes and the IP address of the joining node itself, if there are more than two nodes in the cluster, in both the configurations. This is required to avoid new nodes failing to join the cluster if the IP address that was configured does not have any server instance running on it.
    Note: A Hazelcast member looks for a running cluster at the IP addresses provided in its configuration. For the upcoming member to join a cluster, it should be able to detect the running cluster on any of the IP addresses provided. The same applies to clients as well.
  4. Enable “smart” routing on clients. This is done to avoid a client sending all of its requests to the cluster routed through a Hazelcast member, hence bottlenecking that member. A smart client connects with all Hazelcast server instances and sends all of its requests directly to the respective member node. This also ensures better latency and throughput in accessing data stored in Hazelcast servers.

Further Reading:

  1. Make sure that all nodes are reachable by every other node in the cluster and are also accessible by clients (ports, network, etc).
  2. Start Hazelcast server instances first. This is not mandatory but a recommendation to avoid clients timing out or complaining that no Hazelcast server is found if clients are started before server.
  3. Enable/start a network log collecting utility. nmon is perhaps the most commonly used tool and is very easy to deploy.
  4. To add more server nodes to an already running cluster, just start a server instance with similar configuration to other nodes with a possible addition of the IP address of the new node. There is no need for a maintenance window to add more nodes to an already running Hazelcast cluster.
    Note: When a node is added or removed in a Hazelcast cluster, clients may see a little pause time, but this is normal. This is essentially the time required by Hazelcast servers to rebalance the data on the arrival or departure of a member node.
    Note: There is no need to change anything on the clients when adding more server nodes to the running cluster. Clients will update themselves automatically to connect to new nodes once the new node has successfully joined the cluster.
    Note: Rebalancing of data (primary plus backup) on arrival or departure (forced or unforced) of a node is an automated process and no manual intervention is required.
  5. Ensure you have configured an adequate backup count based on your SLAs.
  6. When using distributed computing features such as IExecutorService, EntryProcessors, Map/Reduce or Aggregators, any change in application logic or in the implementation of above features must also be installed on member nodes. All the member nodes must be restarted after new code is deployed using the typical cluster re-deployment process:
  7. Shutdown servers
  8. Deploy the new application jar the servers’ classpath
  9. Start servers

Failure Detection and Recovery

While smooth and predictable operation is the norm, occasional failure of hardware and software is inevitable. But with the right detection, alerts, and recovery processes in place, your cluster will tolerate failure without incurring unscheduled downtime.

Common Causes of Node Failure

The most common causes of node failure are garbage collection pauses and network connectivity issues, both of which can cause a node to fail to respond to health checks and thus be removed from the cluster.

Health Monitoring and Alerts

Hazelcast IMDG provides multi-level tolerance configurations in a cluster:

  1. Garbage collection tolerance—When a node fails to respond to health check probes on the existing socket connection but is actually responding to health probes sent on a new socket, it can be presumed to be stuck either in a long GC or in another long running task. Adequate tolerance levels configured here may allow the node to come back from its stuck state within permissible SLAs.
  2. Network tolerance—When a node is temporarily unreachable by any means, temporary network communication errors may cause nodes to become unresponsive. In such a scenario, adequate tolerance levels configured here will allow the node to return to healthy operation within permissible SLAs.

See below for more details:
http://docs.hazelcast.org/docs/latest/manual/html-single/index.html#system-properties

You should establish tolerance levels for garbage collection and network connectivity and then set monitors to raise alerts when those tolerance thresholds are crossed. Customers with a Hazelcast subscription can use the extensive monitoring capabilities of the Management Center to set monitors and alerts.

In addition to the Management Center, we recommend that you use jstat and keep verbose GC logging turned on and use a log scraping tool like Splunk or similar to monitor GC behavior. Back-to-back full GCs and anything above 90% heap occupancy after a full GC should be cause for alarm.

Hazelcast IMDG dumps a set of information to the console of each instance that may further be used for to create alerts. The following is a detail of those properties:

Note: Hazelcast IMDG uses internal executors to perform various operations that read tasks from a dedicated queue. Some of the properties below belong to such executors:

Recovery from a Partial or Total Failure

Under normal circumstances, Hazelcast members are self-recoverable as in the following scenarios:

However, in the rare case when a node is declared unreachable by Hazelcast IMDG because it fails to respond, but the rest of the cluster is still running, use the following procedure for recovery:

  1. Collect Hazelcast server logs from all server nodes, active and unresponsive.
  2. Collect Hazelcast client logs or application logs from all clients.
  3. If the cluster is running and one or more member nodes was ejected from the cluster because it was stuck, take a heap dump of any stuck member nodes.
  4. If the cluster is running and one or more member nodes is ejected from the cluster because it was stuck, take thread dumps of server nodes including any stuck member nodes. For taking thread dumps, you may use the Java utilities jstack, jconsole or any other JMX client.
  5. If the cluster is running and one or more member nodes are ejected from the cluster because it was stuck, collect nmon logs from all nodes in the cluster.
  6. After collecting all of the necessary artifacts, shut down the rogue node(s) by calling shutdown hooks (see next section, Cluster Member Shutdown, for more details) or through JMX beans if using a JMX client.
  7. After shutdown, start the server node(s) and wait for them to join the cluster. After successful joining, Hazelcast IMDG will rebalance the data across new nodes.

Important: Hazelcast IMDG allows persistence based on Hazelcast callback APIs, which allow you to store cached data in an underlying data store in a write-through or write-behind pattern and reload into cache for cache warm-up or disaster recovery.

See link for more details:
http://docs.hazelcast.org/docs/latest/manual/html-single/index.html#hot-restart-persistence

Cluster Member Shutdown

Hazelcast Diagnostics Log

Hazelcast has an extended set of diagnostic plugins for both client and server. The diagnostics log is a more powerful mechanism than the health monitor, and a dedicated log file is used to write the content. A rolling file approach is used to prevent taking up too much disk space.

Enabling

On the member side the following parameters need to be added:

-Dhazelcast.diagnostics.enabled=true
-Dhazelcast.diagnostics.metric.level=info
-Dhazelcast.diagnostics.invocation.sample.period.seconds=30
-Dhazelcast.diagnostics.pending.invocations.period.seconds=30
-Dhazelcast.diagnostics.slowoperations.period.seconds=30

On the client side the following parameters need to be added:

-Dhazelcast.diagnostics.enabled=true
-Dhazelcast.diagnostics.metric.level=info

You can use this parameter to specify the location for log file:

-Dhazelcast.diagnostics.directory=/path/to/your/log/directory

This can run in production without significant overhead. Currently there is no information available regarding data-structure (e.g. IMap or IQueue) specifics.

The diagnostics logfiles can be sent, together with the regular log files, to engineering for analysis.

For more information about the configuration options, see class com.hazelcast.internal.diagnostics.Diagnostics and the surrounding plugins.

Plugins

The Diagnostic system works based on plugins.

BuildInfo

The build info plugin shows the details about the build. It shows not only the Hazelcast version and if enterprise is enabled, it also shows the git revision number. This is especially important for if you use SNAPSHOT versions.

Every time when a new file in the rolling file appender sequence is created, the BuildInfo will be printed in the header. The plugin has very low overhead and can’t be disabled.

System Properties

The System properties plugin shows all properties beginning with:

Because filtering is applied, content of the diagnostics log is less at risk to catch all kinds of private information. It will also include the arguments that have been used to startup the JVM, even though it is officially not a system property.

Every time a new file in the rolling file appender sequence is created, the System Properties will be printed in the header. The System properties plugin is very useful for a lot of things, including getting information about the OS and JVM. The plugin has very low overhead and can’t be disabled.

Config Properties

The Config Properties plugin shows all Hazelcast properties that have been explicitly set (either on the command line or in the configuration).

Every time a new file in the rolling file appender sequence is created, the Config Properties will be printed in the header. The plugin has very low overhead and can’t be disabled.

Metrics

This is one of the most rich plugins because it allows to peek into what is happening in the Hazelcast system. The metrics plugin can be configured using:

Property

Purpose

Default

hazelcast.diagnostics.metrics.period.seconds

The frequency of dumping to file

60 seconds

hazelcast.diagnostics.metrics.level

The amount of detail.
Options available:

  • MANDATORY
  • INFO
  • DEBUG

MANDATORY


Slow Operations

The Slow Operation plugin detects two things:

The Slow Operation plugin shows all kinds of information about the type of operation and the invocation. If there is some kind of obstruction, e.g., a db call taking a lot of time and therefore the map get operation is slow, the get operation will be seen in the slow operations section. Any invocation that is obstructed by this slow operation will be listed in the slow invocations second.

Property

Default

hazelcast.diagnostics.slowoperations.period.seconds

60

hazelcast.slow.operation.detector.enabled

true

hazelcast.slow.operation.detector.threshold.millis

1000

hazelcast.slow.invocation.detector.threshold.millis

-1


Invocations

The Invocations plugin shows all kinds of statistics about current and past invocations:

The Invocations plugin will periodically sample all invocations in the invocation registry. It will give an impression of which operations are currently executing.

The plugin has very low overhead and can be used in production.

Property

Definition

Default

hazelcast.diagnostics.invocation.sample.period.seconds

The frequency of scanning all pending invocations

60

hazelcast.diagnostics.invocation.slow.threshold.seconds

The threshold when an invocation is consider to be slow

5


Overloaded Connections

The overloaded connections plug-in is a debug plug-in, and it is dangerous to use it in a production environment. It is used internally to figure out what is inside connections and their write queues when the system is behaving badly. Otherwise the metrics plugin only exposes the number of items pending, but not the type of items pending.

The overloaded connections plugin samples connections that have more than a certain number of pending packets and deserializes the content and creates some statistics per connection.

Property

Definition

Default

hazelcast.diagnostics.overloaded.connections.period.seconds

The frequency of scanning all connections. 0 indicates disabled.

0

hazelcast.diagnostics.overloaded.connections.threshold

The minimum number of pending packets.

10000

hazelcast.diagnostics.overloaded.connections.samples

The maximum number of samples to take.

1000


MemberInfo

The member info plugin periodically displays some basic state of the Hazelcast member. It shows what the current members are, if it is master, etc. It is useful to get a fast impression of the cluster without needing to analyze a ton of data.

The plugin has very low overhead and can be used in production.

Property

Definition

Default

hazelcast.diagnostics.memberinfo.period.seconds

The frequency the member info is being printed

60


System Log

The System Log plug-in listens to what happens in the cluster and will display if a connection is added/removed, a member is added/removed, or if there is a change in the lifecycle of the cluster. It is especially written to create some kind of sanity when a user is running into connection problems. It includes quite a lot of detail of why e.g., a connection was closed. So if there are connection issues, please look at the System Log plug-in before diving into the underworld called logging.

The plugin has very low overhead and can be used in production. Beware that if the partitions are being logged; you get a lot of logging noise.

Property

Definition

Default

hazelcast.diagnostics.systemlog.enabled

If the plugin is enabled

true

hazelcast.diagnostics.systemlog.partitions

If the plugin should display information about partition migration. Beware that if enabled, this can become pretty noisy, especially if there are many partitions.

false


Management Center (Subscription and Enterprise Feature)

The Hazelcast Management Center is a product available to Hazelcast IMDG Enterprise and subscription customers that facilitates monitoring and management of Hazelcast clusters. In addition to monitoring the overall cluster state, Management Center also allows you to analyze and browse your data structures in detail, update map configurations and take thread dump from nodes. With its scripting and console module you can run scripts (JavaScript, Ruby, Groovy, and Python) and commands on your nodes.

Cluster-Wide Statistics and Monitoring

While each member node has a JMX management interface that exposes per-node monitoring capabilities, the Management Center collects the all of the individual member node statistics to provide cluster-wide JMX and REST management APIs, making it a central hub for all of your cluster’s management data. In a production environment, the Management Center is the best way to monitor the behavior of the entire cluster, both through its web-based user interface and through its cluster-wide JMX and REST APIs.

Web Interface Home Page


Fig3_ManagementCenterHome.png

Figure 3: Management Center Home

The home page of the Management Center provides a dashboard-style overview. For each node, it displays at-a-glance statistics that may be used to quickly gauge the status and health of each member and the cluster as a whole.

Home page statistics per node:

Home page cluster-wide statistics:

Fig4_ManagementCenterTools.png

Figure 4: Management Center Tools

Management Center Tools

The toolbar menu provides access to various resources and functions available in the Management Center. These include:

Data Structure and Member Management

The Caches, Maps, Queues, Topics, MultiMaps, and Executors pages each provide a drill-down view into the operational statistics of individual data structures. The Members page provides a drill-down view into the operational statistics of individual cluster members, including CPU and memory utilization, JVM Runtime statistics and properties, and member configuration. It also provides facilities to run GC, take thread dumps, and shut down each member node.

Monitoring Cluster Health

The “Cluster Health” section on the Management Center home page describes current backup and partition migration activity. While a member’s data is being backed up, the Management Center will show an alert indicating that the the cluster is vulnerable to data loss if that node is removed from service before the backup is complete.

When a member node is removed from service, the cluster health section will show an alert while the data is re-partitioned across the cluster indicating that the cluster is vulnerable to data loss if any further nodes are removed from service before the re-partitioning is complete.

You may also set alerts to fire under specific conditions. In the “Alerts” tab, you can set alerts based on the state of cluster members as well as alerts based on the status of particular data types. For one or more members and for one or more data structures of a given type on one or more members, you can set alerts to fire when certain watermarks are crossed.

When an alert fires, it will show up as an orange warning pane overlaid on the Management Center web interface.

Available member alert watermarks:

Available Map and MultiMap alert watermarks (greater than, less than, or equal to a given threshold):

Available Queue alert watermarks (greater than, less than, or equal to a given threshold):

Available executor alert watermarks (greater than, less than, or equal to a given threshold):

Monitoring WAN Replication

You can also monitor WAN Replication process on Management Center. WAN Replication schemes are listed under the WAN menu item on the left. When you click on a scheme, a new tab for monitoring the targets which that scheme has appears on the right. In this tab you see a WAN Replication Operations Table for each target which belongs to this scheme. The following information can be monitored:

Synchronizing Clusters Dynamically with WAN Replication

Starting from Hazelcast IMDG version 3.8, you can use Management Center to synchronize multiple clusters with WAN Replication. You can start sync process inside WAN Sync interface of Management Center without any service interruption. Also in Hazelcast IMDG 3.8, you can add a new WAN Replication endpoint to a running cluster using Management Center. So at any time, you can create a new WAN replication destination and create a snapshot of your current cluster using sync ability.

Please use the “WAN Sync” screen of Management Center to display existing WAN replication configurations. You can use “Add WAN Replication Config” button to add a new configuration and “Configure Wan Sync” button to start a new synchronization with the desired config.

Fig5.png

Figure 5

Further Reading:

Enterprise Cluster Monitoring with JMX and REST (Subscription and Enterprise)

Each Hazelcast node exposes a JMX management interface that includes statistics about distributed data structures and the state of that node’s internals. The Management Center described above provides a centralized JMX and REST management API that collects all of the operational statistics for the entire cluster.

As an example of what you can achieve with JMX beans for an IMap, you may want to raise alerts when the latency of accessing the map increases beyond an expected watermark that you established in your load-testing efforts. This could also be the result of high load, long GC,or other potential problems that you might have already created alerts for, so consider the output the following bean properties:

localTotalPutLatency
localTotalGetLatency
localTotalRemoveLatency
localMaxPutLatency
localMaxGetLatency
localMaxRemoveLatency

Similarly, you may also make use of the HazelcastInstance bean that exposes information about the current node and all other cluster members.

For example, you may use the following properties to raise appropriate alerts or for general monitoring:

Further Reading:

Recommendations for Setting Alerts

We recommend setting alerts for at least the following incidents:

Actions and Remedies for Alerts

When an alert fires on a node, it’s important to gather as much data about the ailing JVM as possible before shutting it down.

Logs: Collect Hazelcast server logs from all server instances. If running in a client-server topology, also collect client application logs before a restart.

Thread dumps: Make sure you take thread dumps of the ailing JVM using either the Management Center or jstack. Take multiple snapshots of thread dumps at 3 – 4 second intervals.

Heap dumps: Make sure you take heap dumps and histograms of the ailing JVM using jmap.

Further Reading:

Guidance for Specific Operating Environments

Hazelcast IMDG works across a huge range of operating environments. Following is guidance for some specific environments which must be given additional consideration.

Solaris Sparc

Hazelcast IMDG Enterprise HD is certified for Solaris Sparc from version 3.6. Versions prior to that have a known issue with HD Memory due to the Sparc architecture not supporting unaligned memory access.

VMWare ESX

Hazelcast IMDG is certified on VMWare VSphere 5.5/ESXi 6.0.

Generally speaking Hazelcast IMDG can use all of the resources of a full machine, so splitting a machine up into VMs and dividing resources is not required.

Best Practices

Avoid Memory overcommitting - Always use dedicated physical memory for guests running Hazelcast IMDG.

Common VMWare Operations with Known Issues

Amazon Web Services

See our dedicated AWS Deployment Guide https://hazelcast.com/resources/amazon-ec2-deployment-guide/

Handling Network Partitions

Ideally the network is always fully up or fully down. Though unlikely, it is possible that the network may be be partitioned. This chapter deals with how to handle those rare cases.

Split-Brain on Network Partition

Under certain cases of network failure, some cluster members may become unreachable to each other, yet still be fully operational and may even be able to see some, but not all, of the extant cluster members. From the perspective of each node, the unreachable members will appear to have gone offline. Under these circumstances, what was once a single cluster will have been divided into two or more clusters. This is known as network partitioning, or the “Split-Brain Syndrome.”

Consider a five-node cluster as depicted in Figure 6:


5NodeCluster_v0.1 SPOT.ai

Figure 6


5NodeCluster_ConnectionFailure_v0.1 SPOT.ai

Figure 7: Network failure isolates nodes one and two from node three

All five nodes have working network connections to each other and respond to health check heartbeat pings. If a network failure causes communication to fail between nodes four and five and the rest of the cluster (Figure 7), from the perspective of nodes one, two, and three, nodes four and five will appear to have gone offline. However, from the perspective of nodes four and five, the opposite is true: nodes one through three appear to have gone offline (Figure 8).

5NodeCluster_SplitBrain_v0.1 SPOT.ai

Figure 8: Split-Brain

How to respond to a split-brain scenario depends on whether consistency of data or availability of your application is of primary concern. In either case, because a Split-Brain scenario is caused by a network failure, you must initiate an effort to identify and correct the network failure. Your cluster cannot be brought back to steady state operation until the underlying network failure is fixed.

If availability is of primary concern, especially if there is little danger of data becoming inconsistent across clusters (e.g., you have a primarily read-only caching use case) then you may keep both clusters running until the network failure has been fixed. Alternately, if data consistency is of primary concern, it may make sense to remove the clusters from service until the split-brain is repaired. If consistency is your primary concern, use Split Brain Protection as discussed following.

Split Brain Protection

Split Brain Protection provides the ability to prevent the smaller cluster in a split brain from being used by your application where consistency is the primary concern.

This is achieved by defining and configuring a split brain protection cluster quorum. A quorum is the minimum cluster size required for operations to occur.

Tip: It is preferable to have an odd-sized initial cluster size to prevent a single network partition from creating two equal sized clusters.

So imagine we have a 9 node cluster. The quorum is configured as 5. If any split brains occur the smaller clusters of sizes 1, 2, 3, 4 will be prevented from being used. Only the larger cluster of size 5 will be allowed to be used. The following declaration would be added to the Hazelcast configuration:

<quorum name=”quorumOf5” enabled=”true”>
<quorum-size>5</quorum-size>
</quorum>

Attempts to perform operations against the smaller cluster will are rejected and the rejected operations return a QuorumException to their callers. Write operations, Read operations or both can be configured with split brain protection.

Your application continues normal processing on the larger remaining cluster. Any application instances connected to the smaller cluster will be receiving exceptions which, depending on the programming and monitoring setup, should throw up alerts. The key point is that rather than applications continuing in error with stale data, they are prevented from doing so.

Time Window

Cluster Membership is established and maintained by heart-beating. A network partition will present as some members being unreachable. While configurable it is normally seconds or tens of seconds before the cluster is adjusted to exclude unreachable members. The cluster size is based on the currently understood number
of members.

For this reason there will be a time window between the network partition and the application of split brain protection.

Protected Data Structures

The following data structures are protected:

Each data structure to be protected should have the quorum configuration added to it.

Further Reading:

Split Brain Resolution

Once the network is repaired, the multiple clusters must be merged back together into a single cluster. This normally happens by default and the multiple sub-clusters created by the split-brain merge again to re-form the original cluster. This is how Hazelcast IMDG resolves the split-brain condition:

  1. Checks whether sub-clusters are suitable to merge.
  2. Sub-clusters should have compatible configurations; same group name and password, same partition count, same joiner types etc.
  3. Sub-clusters’ membership intersection set should be empty, they should not have common members. If they have common members, that means there is a partial split, sub-clusters postpone the merge process until membership conflicts are resolved.
  4. Cluster states of sub-clusters should be ACTIVE.
  5. Performs an election to determine the winning cluster. Losing side merges into the winning cluster.
  6. The bigger sub-cluster, in terms of member count, is chosen as winner and smaller one merges into the bigger.
  7. If sub-clusters have equal number of members then a pure function with two sub-clusters given as input is executed to determine/pick winner on both sides. Since this function is produces the same output with the same inputs, winner can be consistently determined by both sides.
  8. After the election, data from the losing cluster is discarded except for Map and Cache, where a Merge policy can be configured. If none is configured the data from the smaller is discarded.

Further Reading:

License Management

If you have a license for Hazelcast IMDG Enterprise, you will receive a unique license key from Hazelcast Support that will enable the Hazelcast IMDG Enterprise capabilities. Ensure the license key file is available on the filesystem of each member and configure the path to it using either declarative, programmatic, or Spring configuration, or set the following system property:

-Dhazelcast.enterprise.license.key=/path/to/license/key

How to Upgrade or Renew Your License

If you wish to upgrade your license or renew your existing license before it expires, contact Hazelcast Support to receive a new license. To install the new license, replace the license key on each member host and restart each node, one node at a time, similar to the process described in the “Live Updates to Cluster Member Nodes” section above.

Important: If your license expires in a running cluster or Management Center, do not restart any of the cluster members or the Management Center JVM. Hazelcast IMDG will not start with an expired or invalid license. Reach out to Hazelcast Support to resolve any issues with an expired license.

Further Reading:

How to Report Issues to Hazelcast

Hazelcast Support Subscribers

Offered by the creators of Hazelcast, a support subscription is the best ways to leverage the power of Hazelcast IMDG. With timely responses, critical patches, and technical support, subscription customers will get the help they need to achieve a higher level of productivity and quality.

Learn more about Hazelcast support subscriptions:
http://hazelcast.com/services/support/

If you are a current Hazelcast support subscriber with an issue, you can contact Hazelcast either through the ticketing system Zendesk:
https://hazelcast.zendesk.com/

Or by direct email to:
support@hazelcast.com

Support subscriptions specify the number of registered support contacts, and those users may log into Zendesk and submit a ticket.

When submitting a ticket with Hazelcast, provide as much information and data as possible:

  1. Detailed description of incident – what happened and when
  2. Details of use case
  3. Hazelcast logs
  4. Thread dumps from all server nodes
  5. Heap dumps
  6. Networking logs
  7. Time of incident
  8. Reproducible test case (optional: Hazelcast engineering may ask for it if required)

Support SLA

Based on your support contract with Hazelcast, you are entitled to pre-defined SLAs for your support issues. However, each
issue has a level that defines the severity of the issue in order to garner proper attention. Therefore, setting an appropriate severity level is very important. You should have received details on support process and SLAs in a “Welcome to Hazelcast Support” email.

When you register an issue with Hazelcast support, it is also important to provide detailed information as requested by Hazelcast support engineers and to be prompt in your communication with Hazelcast support. This ensures timely resolution of issues.

Hazelcast Open Source Users

Hazelcast has an active open source community of developers and users. If you are a Hazelcast open source user, you will find a wealth of information and a forum for discussing issues with Hazelcast developers and other users at the Hazelcast Google Group and on Stack Exchange:
https://groups.google.com/forum/#!forum/hazelcast
http://stackoverflow.com/questions/tagged/hazelcast

You may also file and review issues on the Hazelcast issue tracker on GitHub:
https://github.com/hazelcast/hazelcast/issues

To see all of the resources available to the Hazelcast community, please visit the community page on Hazelcast.org:
http://hazelcast.org/get-involved/

Hazelcast, Inc
350 Cambridge Ave, Suite 100, Palo Alto, CA 94306 USA

Email: sales@hazelcast.com
Phone: +1 (650) 521-5453

Visit us at www.hazelcast.com