Distributed Java Application Development – Part 6

In this article we will explore some more capabilities which are required to
build Application-Level Server-cluster-aware applications.
While developing standalone java applications we use various java built-in data
structures like Map,Queue,List,Set etc. and built-in concurrency constructs like
Synchronized, Lock, Semaphore, CountDownlatch, ExecutorService etc. These data 
structures/constructs made java development easy for complex applications.
Distributed Data Structures
We should be able to use above data structures/constructs in clustered environment
also. For example, we can take BlockingDeque/HashMap and add something to it on one
server and poll/get it from another server. Or have a distributed ID generator which
would guarantee unique ID across all servers. 
Distributed Locks/Synchronization 
Distributed synchronization allows clustered Java applications to maintain consistency
by serializing access to shared data. Multiple servers that modify shared resources
concurrently may cause interference and data inconsistency. Distributed locks provide
safety of data access, application liveness and simplicity of programming.
Distributed locks ensure safe access to the shared data. At most one thread on
one server may enter the section of code protected by a distributed lock. 
Distributed counter is counter that attempts atomic increments
Distributed ID Generator
In distributed system development, a common requirement is to generate unique ids across
the cluster. Distributed counter/AtomicInteger can be used to generate unique ids.
Distributed ExecutorService
We are familiar with standard Java ExecutorService interface. It is used for asynchronous
execution of tasks.
Distributed ExecutorService is a distributed implementation of ExecutorService,
which will allow us to execute tasks in parallel in a cluster made of many 
servers. By distributing your tasks/jobs within the cluster, you automatically get them
load-balanced across all nodes. Moreover, your computation becomes fault-tolerant and
is guaranteed to execute as long as there is at least one node left. 
Distributed Job Scheduling
On some projects, we may need to execute certain jobs and tasks at an exactly specified
time or at regular time intervals. Developers typically use some Job Scheduler to execute
scheduled tasks. On distributed-systems, we may need distributed task scheduling felicity.
Quartz, Obsidian Java Schedulers have the clustering felicity which brings both high
availability and scalability to your scheduler via fail-over and load balancing
Some of the open-source in-memory data-management tools which can be used to implement above capabilities are
Hazelcast – http://hazelcast.com/
Hazelcast is a clustering and highly scalable data distribution platform for Java. Hazelcast
helps architects and developers to easily design and develop faster, highly scalable and
reliable applications for their businesses.
Distributed implementations of java.util.{Queue, Set, List, Map}
Distributed implementation of java.util.concurrent.ExecutorService
Distributed implementation of java.util.concurrency.locks.Lock
Distributed Topic for publish/subscribe messaging
Transaction support and J2EE container integration via JCA
Distributed listeners and events
Support for cluster info and membership events
Dynamic HTTP session clustering
Dynamic clustering
Dynamic scaling to hundreds of servers
Dynamic partitioning with backups
Dynamic fail-over
Grid Gain – http://www.gridgain.com/
GridGain is Java-based middleware for in-memory processing of big data in a distributed
environment. Developers all over the world are using GridGain to create auto-elastic grids
across any number of machines which then power high performance, data-intensive real time
applications. GridGain typically resides between business, analytics or BI applications
and long term data storage such as RDBMS, ERP or Hadoop HDFS, and provides in-memory data
platform for high performance, low latency data processing and computations.
With GridGain you can process terabytes of data, on 1000s of nodes in under a second – all
the while enjoying in-memory speed and database reliability.
The two main technologies behind GridGain are:
In-Memory Compute Grid
In-Memory Data Grid
The key features of the GridGain In-Memory Compute Grid are:
Direct API for split and aggregation
Pluggable failover, topology and collision resolution
Distributed task session
Distributed continuations & recursive split
Support for Streaming MapReduce
Support for Complex Event Processing (CEP)
Node-local cache
AOP-based, OOP/FP-based, synch/asynch execution modes
Support for direct closure distribution in Java, Scala and Groovy
Cron-based scheduling
Direct redundant mapping support
Zero deployment with P2P class loading
Partial asynchronous reduction
Direct support for weighted and adaptive mapping
State checkpoints for long running tasks
Early and late load balancing
Affinity routing with data grid
Cacheonix is an open source clustered cache and distributed data management framework for
Java that allows developers to scale Java applications in a cluster while preserving the
simplicity of design and coding in a single Java VM. Download Cacheonix binaries and code here.
Key Cacheonix features
Reliable distributed Java cache
Replication for high availability
Cache API with generics
Integration with ORM frameworks
Data partitioning for load balancing
Support for non-multicast networks
High performance computing
Fast local Java cache
Distributed locks
Posted in Application Clustering, Distributed Systems, java, Uncategorized | Tagged , , | Leave a comment

Distributed Java Application Development – Part 5

In this article we will explore Server Group membership/Group communication/Service Registry and Discovery capabilities which are required to build Application-Level Server-cluster-aware applications.

Group membership and Group communication

In distributed systems, application must be able to establish a dynamic process/server
group and track the status of all of the servers/processes in that group. Server group members should be able to communicate each other in the group. They should be able to exchange of data and commands across the server group.   Fig. A shows a example sample server group called “Server Cluster” with three members.

Server Group

Fig A. Server Group

Leader process/server And Task coordination

In distributed computing, leader election is the process of designating a single process as the organizer of some task distributed among several processes. This single process/server can be called as leader process.  After a leader election algorithm has been run, each server throughout the server group recognizes a particular, unique server as the task leader. This election process should be dynamic so that; if a leader server crashes, a new leader can be elected to continue processing application tasks.

This leader process can be used for controlling a task (or) distributed tasks.

Fig B. shows a sample cluster setup, in which component 2 selected as leader process and controls task distribution to other servers.

Leader Server

Fig. B Server Group with elected leader

Distributed Service Registry and Discovery

In SOA/distributed systems, services need to find each other. i.e. a web service might need to find a caching service, etc.. Clients may have to locate service which may be running at multiple servers.  Service Registry and Discovery is the mechanism by which severs register their services and clients find the required services.

Service Registry  provides a mechanism for Services to register their availability
Service Discovery system provides a mechanism for locating a single instance of a particular service. It also notifies when the instances of a service change (service deletion/addition/update).

Fig. C,D shows Service Discovery and Registry implementation using Apache ZooKeeper. In this all services register their availability with ZooKeeper and Clients (Web Server, API Server) can locate services using ZooKeeper.

Fig C. Service Registry and Discovery using ZooKeeper (photo courtesy:  http://engineering.pinterest.com/)

Fig C. Service Registry and Discovery using ZooKeeper (photo courtesy: http://engineering.pinterest.com/)

Fig. D Server Group with  ZooKeeper

Fig. D Server Group with ZooKeeper

Some of the open-source java based tools which can be used to implement above capabilities are

Apache ZooKeeper
Apache Curator
Apache Helix



Posted in Application Clustering, Distributed Systems, java | Tagged | Leave a comment

Scala Collections usage

Scala Collections Usage

Image | Posted on by | Tagged | Leave a comment

Distributed Java Application Development – Part 4

In this article we will explore the distributed caching capability which is required to build Application-Level Server-cluster-aware applications.

Clustered Server

Fig A. Clustered Application

Our sample clustered application  above [Fig. A] likely to encounter major scalability issues as we try to scale and put more load on application. Scalability bottlenecks mostly occurs in database which are used as data store. Normally databases do not scale very well.

Caching is a well-known concept used software worlds to eliminate the database scalability
bottlenecks. Traditionally, caching was a stand-alone mechanism, but that has been evolved
to distributed caching for scalabilty and availability reasons. Distributed caching is a form of caching that allows the cache to span multiple servers so that it can grow in size. Distributed caching comes in various flavors like Replicated Caching , Cache Grid etc.. [Fig. B, C]

Cache Evolution

Fig  B. Cache Evolution (Source: planet.jboss)

Distributed Caching (Source: planet.jboss)

Fig. C Distributed Caching (Source: planet.jboss)


More details about Distributed Caching can be found at

Fig. D shows our sample clustered application with distributed cache. This caching can be used for variety of application specific caches like data, state, configuration caches etc.. These caches are available to all the servers available in the application cluster.  A good  distributed caching solution will help us to build  a Application-Level Server-cluster-aware application.

Application Clustering with  Caching

Fig D. Application Clustering with  Distributed Cache

Some of the available caching tools are


Ehcache is one of the leading java based open-source cache solution.
Ehcache supports basic replication caching.

Terracotta provides leading in-memory data management and Big Data solutions for the enterprise, including BigMemory, Universal Messaging, and more.Terracotta offers commercial distributed cache solutions under the brand name of BigMemoryGO and BigMemoryMax.

Infinipan/JBOSS Data Grid:

Infinispan is an extremely scalable, highly available key/value data store
and data grid platform. It is 100% open source, and written in Java.
It is often used as a distributed cache, but also as a NoSQL key/value store or object database. JBOSS Data Grid is a licenced version with suppoert from Redhat.

Hazelcast is an in-memory Open Source data grid based on Java.

NoSQL Databases:

Many NoSQL database technologies have excellent integrated caching capabilities,
keeping frequently-used data in system memory as much as possible and removing the need for a separate caching layer that must be maintained.

Grid Gain

Next article we will explore  the remaining capabilities/supports required to build Application-Level Server-cluster-aware applications.

Posted in Application Clustering, Distributed Systems, java | Tagged , , | Leave a comment

Distributed Java Application Development – Part 3

This is the third article in the series of articles exploring distributed java
application development. In this article we will explore the capabilities/support
required to build Aplication-Level Server-cluster-aware applications.

We will explore these capabilities with the help of a simple example. In this
example we will take sample usecase of converting standalone application to
clustered application.

Fig A. shows a sample stand-alone application. This sample application process
the real-time data/events/messages supplied by a external system.


Standalone Server

Fig A. Standalone Server


Now for scalability and availability reasons we want to convert this application
into clustered application (Fig B).


Clustered Server

Fig B. Clustered Server


While converting stand-alone application to clustered application, we must
design applications so that they can run as multiple instances on different/same
physical servers. This can be done by splitting the application into smaller
components that can be deployed independently.

Stateless Application Components:

This conversion is easy, if the application components are stateless. Stateless
application components are independent of others and can run independently.
These stateless application components can handle/process the requests/data
equally. We just need a load-balancing mechaninsm to redirect the requests/data
to avilable application servers. If we need to handle more data, then we need
to add more servers and install the application components.

Stateful Application Components:

But most of the practical clustered applications components are stateful.
For instance, the application components may share a common data/configuaration
based on which processing happens. The application components may share a common
cache for fatser processing.

The following the capabilities/supports required to build stateful application

* Distributed State Sharing / Caching
* Distributed Service Registry and Discovery
* Group communication and membership with state maintenance and querying capability:
* Dynamic leader server election and Task co-ordination
* Distributed Locks/Synchronization
* Distributed Data Structures , ID Generator
* Distributed Execution
* Distributed Messaging System
* Distributed Scheduling

Next article we will explore  the above capabilities/supports required to build Application-Level Server-cluster-aware applications.

Posted in Application Clustering, Distributed Systems | Tagged , | Leave a comment

Distributed Java Application Development – Part 2

This is the second article in the series of articles exploring distributed java
application development. In this article we will continue to discuss about
distributed applications.

As discuses in the previous article we need a distributed/clustered system for
the following non-functionality reasons.

High availability
Fault tolerance
Load balancing

Application Server Clustering vs Application(-Level) Clustering

Java 2, Enterprise Edition (J2EE) uses application-server clustering to deliver
mission-critical applications over the web. Within the J2EE framework, clusters
provide mission-critical services to ensure minimal downtime and maximum scalability.
A application server cluster is a group of application servers that transparently
run your J2EE application as if it were a single entity. The clustering support
is available for services/requests like JNDI, EJB, JSP, HttpSession replication
and component fail-over, load-balancing etc..



Most of the Java enterprise servers (JBoss, Resin, WebLogic etc) have built-in support for clustering.

More details about Aplication-Server Clustering available at

Application-Server Clustering alone is not sufficient to build full fledged distributed
application. To build a full fledged distributed application, application should be
aware of other available servers in the cluster. This Application-Level Cluster awareness is required to handle various custom use cases like state sharing , group communication and task co-ordination and distribution, etc..

Next article we will explore capabilities/support required to build Application-Level Server-cluster-aware applications.

Posted in Distributed Systems, java | Tagged , , | Leave a comment

Distributed Java Application Development – Part 1

This is the first article in the series of articles exploring distributed java application development.

In this article we will discuss some basic concepts.

Basic Concepts/Definitions:

Software requirements are divided in to two types, Functional Requirements and
Non-Functional Requirements

Functional Requirements (FRs) defines the specific behavior (or) functionality
of the system. They tell exact usage of the system being designed. We will capture these
requirements in Software Requirement Specification (SRS). We define them as
“System shall do *****” etc.. The plan for implementing FRs is detailed in
the “System Design/Design Document”.

Non-Functional Requirements (NFRs) is a requirement that specifies criteria
that can be used to judge the operation of the system. These are also called
“Qualities” of the system. We define them as “System shall be 99.9% Available”
etc.. The plan for implementing non-functional requirements is detailed in the
system architecture.

The important NFRs are

1. Scalability
2. High-Availability
3. Load-balancing
4. Fault Tolerant/Fail-Over

Scalability :  Ability to handle additional load by adding more computational resources (CPU , RAM, Disk, Network etc..). Scalability can be Vertical scalability and Horizontal scalability.

Vertical scalability (Scaling Up): is handling additional load by adding more power to a single machine. i.e By adding a faster CPU, add more RAM or using a faster Solid-State Disk (SSD).

Horizontal scalability (Scaling Out): is handling additional load by adding more servers.
Horizontal scalability is much harder to achieve as adding servers requires
ensuring data consistency and process synchronization.

The distribution of processing load among the group of servers is known as server load balancing.


A server failure can be because of many reasons; system failures, planned outage, hardware or network problems etc.

High availability: is redundancy in the system. If one server fails, the others take
over the failed server’s load transparently.   The failure of an individual server is invisible to the client. High availability can ensured by not having any  single points  of failure.  Our traditional single-server solution can be good for scalability  (add more memory and CPU), but not for high availability as it has single point of failure.

Fault tolerant service: always guarantees strictly correct behavior despite a certain
number of faults. Failover is another key technology behind clustering to achieve fault
tolerance. By choosing another node in the cluster, the process will continue when the
original node fails.

We traditionally start with single server architecture for our applications.
But sooner or later application will have to process more requests/data than a
single server can handle. Scaling up (Vertical Scalability) (Fig A) can be used.

Horizantal Scaling

Fig A. Vertical  Scaling ( Image shows we need to add more hardware resources)

Scaling up can be a short-term solution. And it’s a limited approach because the cost of
upgrading is disproportionately high relative to the gains in server capability.
For these reasons most successful Internet companies/enterprise applications
follow a scale out (Horizontal Scalability) (Fig C) approach.

We need to horizontally scale our application to multiple servers based on
low-cost hardware and operating systems (cluster of servers).

Horizantal Scaling

Fig C.  Horizontal Scaling

A cluster is a group of application servers that transparently run your application as if it were a single entity. In order to implement  server clustering we need distributed applications/technologies.

There are different types clusterings at different tiers like DB Clustering, Hardware Clustering,  Application Server (AS) Clustering and Application-level Clustering.

Next article we will discuss more about Application-level Clustering




Posted in Distributed Systems, java | Tagged , , | Leave a comment