Distributed Java Application Development – Part 6

In this article we will explore some more capabilities which are required to
build Application-Level Server-cluster-aware applications.
While developing standalone java applications we use various java built-in data
structures like Map,Queue,List,Set etc. and built-in concurrency constructs like
Synchronized, Lock, Semaphore, CountDownlatch, ExecutorService etc. These data 
structures/constructs made java development easy for complex applications.
Distributed Data Structures
We should be able to use above data structures/constructs in clustered environment
also. For example, we can take BlockingDeque/HashMap and add something to it on one
server and poll/get it from another server. Or have a distributed ID generator which
would guarantee unique ID across all servers. 
Distributed Locks/Synchronization 
Distributed synchronization allows clustered Java applications to maintain consistency
by serializing access to shared data. Multiple servers that modify shared resources
concurrently may cause interference and data inconsistency. Distributed locks provide
safety of data access, application liveness and simplicity of programming.
Distributed locks ensure safe access to the shared data. At most one thread on
one server may enter the section of code protected by a distributed lock. 
Distributed counter is counter that attempts atomic increments
Distributed ID Generator
In distributed system development, a common requirement is to generate unique ids across
the cluster. Distributed counter/AtomicInteger can be used to generate unique ids.
Distributed ExecutorService
We are familiar with standard Java ExecutorService interface. It is used for asynchronous
execution of tasks.
Distributed ExecutorService is a distributed implementation of ExecutorService,
which will allow us to execute tasks in parallel in a cluster made of many 
servers. By distributing your tasks/jobs within the cluster, you automatically get them
load-balanced across all nodes. Moreover, your computation becomes fault-tolerant and
is guaranteed to execute as long as there is at least one node left. 
Distributed Job Scheduling
On some projects, we may need to execute certain jobs and tasks at an exactly specified
time or at regular time intervals. Developers typically use some Job Scheduler to execute
scheduled tasks. On distributed-systems, we may need distributed task scheduling felicity.
Quartz, Obsidian Java Schedulers have the clustering felicity which brings both high
availability and scalability to your scheduler via fail-over and load balancing
Some of the open-source in-memory data-management tools which can be used to implement above capabilities are
Hazelcast – http://hazelcast.com/
Hazelcast is a clustering and highly scalable data distribution platform for Java. Hazelcast
helps architects and developers to easily design and develop faster, highly scalable and
reliable applications for their businesses.
Distributed implementations of java.util.{Queue, Set, List, Map}
Distributed implementation of java.util.concurrent.ExecutorService
Distributed implementation of java.util.concurrency.locks.Lock
Distributed Topic for publish/subscribe messaging
Transaction support and J2EE container integration via JCA
Distributed listeners and events
Support for cluster info and membership events
Dynamic HTTP session clustering
Dynamic clustering
Dynamic scaling to hundreds of servers
Dynamic partitioning with backups
Dynamic fail-over
Grid Gain – http://www.gridgain.com/
GridGain is Java-based middleware for in-memory processing of big data in a distributed
environment. Developers all over the world are using GridGain to create auto-elastic grids
across any number of machines which then power high performance, data-intensive real time
applications. GridGain typically resides between business, analytics or BI applications
and long term data storage such as RDBMS, ERP or Hadoop HDFS, and provides in-memory data
platform for high performance, low latency data processing and computations.
With GridGain you can process terabytes of data, on 1000s of nodes in under a second – all
the while enjoying in-memory speed and database reliability.
The two main technologies behind GridGain are:
In-Memory Compute Grid
In-Memory Data Grid
The key features of the GridGain In-Memory Compute Grid are:
Direct API for split and aggregation
Pluggable failover, topology and collision resolution
Distributed task session
Distributed continuations & recursive split
Support for Streaming MapReduce
Support for Complex Event Processing (CEP)
Node-local cache
AOP-based, OOP/FP-based, synch/asynch execution modes
Support for direct closure distribution in Java, Scala and Groovy
Cron-based scheduling
Direct redundant mapping support
Zero deployment with P2P class loading
Partial asynchronous reduction
Direct support for weighted and adaptive mapping
State checkpoints for long running tasks
Early and late load balancing
Affinity routing with data grid
Cacheonix is an open source clustered cache and distributed data management framework for
Java that allows developers to scale Java applications in a cluster while preserving the
simplicity of design and coding in a single Java VM. Download Cacheonix binaries and code here.
Key Cacheonix features
Reliable distributed Java cache
Replication for high availability
Cache API with generics
Integration with ORM frameworks
Data partitioning for load balancing
Support for non-multicast networks
High performance computing
Fast local Java cache
Distributed locks
This entry was posted in Application Clustering, Distributed Systems, java, Uncategorized and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s