Distributed Java Application Development – Part 6

In this article we will explore some more capabilities which are required to
build Application-Level Server-cluster-aware applications.
While developing standalone java applications we use various java built-in data
structures like Map,Queue,List,Set etc. and built-in concurrency constructs like
Synchronized, Lock, Semaphore, CountDownlatch, ExecutorService etc. These data 
structures/constructs made java development easy for complex applications.
Distributed Data Structures
We should be able to use above data structures/constructs in clustered environment
also. For example, we can take BlockingDeque/HashMap and add something to it on one
server and poll/get it from another server. Or have a distributed ID generator which
would guarantee unique ID across all servers. 
Distributed Locks/Synchronization 
Distributed synchronization allows clustered Java applications to maintain consistency
by serializing access to shared data. Multiple servers that modify shared resources
concurrently may cause interference and data inconsistency. Distributed locks provide
safety of data access, application liveness and simplicity of programming.
Distributed locks ensure safe access to the shared data. At most one thread on
one server may enter the section of code protected by a distributed lock. 
Distributed counter is counter that attempts atomic increments
Distributed ID Generator
In distributed system development, a common requirement is to generate unique ids across
the cluster. Distributed counter/AtomicInteger can be used to generate unique ids.
Distributed ExecutorService
We are familiar with standard Java ExecutorService interface. It is used for asynchronous
execution of tasks.
Distributed ExecutorService is a distributed implementation of ExecutorService,
which will allow us to execute tasks in parallel in a cluster made of many 
servers. By distributing your tasks/jobs within the cluster, you automatically get them
load-balanced across all nodes. Moreover, your computation becomes fault-tolerant and
is guaranteed to execute as long as there is at least one node left. 
Distributed Job Scheduling
On some projects, we may need to execute certain jobs and tasks at an exactly specified
time or at regular time intervals. Developers typically use some Job Scheduler to execute
scheduled tasks. On distributed-systems, we may need distributed task scheduling felicity.
Quartz, Obsidian Java Schedulers have the clustering felicity which brings both high
availability and scalability to your scheduler via fail-over and load balancing
Some of the open-source in-memory data-management tools which can be used to implement above capabilities are
Hazelcast – http://hazelcast.com/
Hazelcast is a clustering and highly scalable data distribution platform for Java. Hazelcast
helps architects and developers to easily design and develop faster, highly scalable and
reliable applications for their businesses.
Distributed implementations of java.util.{Queue, Set, List, Map}
Distributed implementation of java.util.concurrent.ExecutorService
Distributed implementation of java.util.concurrency.locks.Lock
Distributed Topic for publish/subscribe messaging
Transaction support and J2EE container integration via JCA
Distributed listeners and events
Support for cluster info and membership events
Dynamic HTTP session clustering
Dynamic clustering
Dynamic scaling to hundreds of servers
Dynamic partitioning with backups
Dynamic fail-over
Grid Gain – http://www.gridgain.com/
GridGain is Java-based middleware for in-memory processing of big data in a distributed
environment. Developers all over the world are using GridGain to create auto-elastic grids
across any number of machines which then power high performance, data-intensive real time
applications. GridGain typically resides between business, analytics or BI applications
and long term data storage such as RDBMS, ERP or Hadoop HDFS, and provides in-memory data
platform for high performance, low latency data processing and computations.
With GridGain you can process terabytes of data, on 1000s of nodes in under a second – all
the while enjoying in-memory speed and database reliability.
The two main technologies behind GridGain are:
In-Memory Compute Grid
In-Memory Data Grid
The key features of the GridGain In-Memory Compute Grid are:
Direct API for split and aggregation
Pluggable failover, topology and collision resolution
Distributed task session
Distributed continuations & recursive split
Support for Streaming MapReduce
Support for Complex Event Processing (CEP)
Node-local cache
AOP-based, OOP/FP-based, synch/asynch execution modes
Support for direct closure distribution in Java, Scala and Groovy
Cron-based scheduling
Direct redundant mapping support
Zero deployment with P2P class loading
Partial asynchronous reduction
Direct support for weighted and adaptive mapping
State checkpoints for long running tasks
Early and late load balancing
Affinity routing with data grid
Cacheonix is an open source clustered cache and distributed data management framework for
Java that allows developers to scale Java applications in a cluster while preserving the
simplicity of design and coding in a single Java VM. Download Cacheonix binaries and code here.
Key Cacheonix features
Reliable distributed Java cache
Replication for high availability
Cache API with generics
Integration with ORM frameworks
Data partitioning for load balancing
Support for non-multicast networks
High performance computing
Fast local Java cache
Distributed locks
Posted in Application Clustering, Distributed Systems, java, Uncategorized | Tagged , , | Leave a comment

Distributed Java Application Development – Part 5

In this article we will explore Server Group membership/Group communication/Service Registry and Discovery capabilities which are required to build Application-Level Server-cluster-aware applications.

Group membership and Group communication

In distributed systems, application must be able to establish a dynamic process/server
group and track the status of all of the servers/processes in that group. Server group members should be able to communicate each other in the group. They should be able to exchange of data and commands across the server group.   Fig. A shows a example sample server group called “Server Cluster” with three members.

Server Group

Fig A. Server Group

Leader process/server And Task coordination

In distributed computing, leader election is the process of designating a single process as the organizer of some task distributed among several processes. This single process/server can be called as leader process.  After a leader election algorithm has been run, each server throughout the server group recognizes a particular, unique server as the task leader. This election process should be dynamic so that; if a leader server crashes, a new leader can be elected to continue processing application tasks.

This leader process can be used for controlling a task (or) distributed tasks.

Fig B. shows a sample cluster setup, in which component 2 selected as leader process and controls task distribution to other servers.

Leader Server

Fig. B Server Group with elected leader

Distributed Service Registry and Discovery

In SOA/distributed systems, services need to find each other. i.e. a web service might need to find a caching service, etc.. Clients may have to locate service which may be running at multiple servers.  Service Registry and Discovery is the mechanism by which severs register their services and clients find the required services.

Service Registry  provides a mechanism for Services to register their availability
Service Discovery system provides a mechanism for locating a single instance of a particular service. It also notifies when the instances of a service change (service deletion/addition/update).

Fig. C,D shows Service Discovery and Registry implementation using Apache ZooKeeper. In this all services register their availability with ZooKeeper and Clients (Web Server, API Server) can locate services using ZooKeeper.

Fig C. Service Registry and Discovery using ZooKeeper (photo courtesy:  http://engineering.pinterest.com/)

Fig C. Service Registry and Discovery using ZooKeeper (photo courtesy: http://engineering.pinterest.com/)

Fig. D Server Group with  ZooKeeper

Fig. D Server Group with ZooKeeper

Some of the open-source java based tools which can be used to implement above capabilities are

Apache ZooKeeper
Apache Curator
Apache Helix



Posted in Application Clustering, Distributed Systems, java | Tagged | Leave a comment

Scala Collections usage

Scala Collections Usage

Posted in Scala | Tagged | Leave a comment

Distributed Java Application Development – Part 4

In this article we will explore the distributed caching capability which is required to build Application-Level Server-cluster-aware applications.

Clustered Server

Fig A. Clustered Application

Our sample clustered application  above [Fig. A] likely to encounter major scalability issues as we try to scale and put more load on application. Scalability bottlenecks mostly occurs in database which are used as data store. Normally databases do not scale very well.

Caching is a well-known concept used software worlds to eliminate the database scalability
bottlenecks. Traditionally, caching was a stand-alone mechanism, but that has been evolved
to distributed caching for scalabilty and availability reasons. Distributed caching is a form of caching that allows the cache to span multiple servers so that it can grow in size. Distributed caching comes in various flavors like Replicated Caching , Cache Grid etc.. [Fig. B, C]

Cache Evolution

Fig  B. Cache Evolution (Source: planet.jboss)

Distributed Caching (Source: planet.jboss)

Fig. C Distributed Caching (Source: planet.jboss)


More details about Distributed Caching can be found at

Fig. D shows our sample clustered application with distributed cache. This caching can be used for variety of application specific caches like data, state, configuration caches etc.. These caches are available to all the servers available in the application cluster.  A good  distributed caching solution will help us to build  a Application-Level Server-cluster-aware application.

Application Clustering with  Caching

Fig D. Application Clustering with  Distributed Cache

Some of the available caching tools are


Ehcache is one of the leading java based open-source cache solution.
Ehcache supports basic replication caching.

Terracotta provides leading in-memory data management and Big Data solutions for the enterprise, including BigMemory, Universal Messaging, and more.Terracotta offers commercial distributed cache solutions under the brand name of BigMemoryGO and BigMemoryMax.

Infinipan/JBOSS Data Grid:

Infinispan is an extremely scalable, highly available key/value data store
and data grid platform. It is 100% open source, and written in Java.
It is often used as a distributed cache, but also as a NoSQL key/value store or object database. JBOSS Data Grid is a licenced version with suppoert from Redhat.

Hazelcast is an in-memory Open Source data grid based on Java.

NoSQL Databases:

Many NoSQL database technologies have excellent integrated caching capabilities,
keeping frequently-used data in system memory as much as possible and removing the need for a separate caching layer that must be maintained.

Grid Gain

Next article we will explore  the remaining capabilities/supports required to build Application-Level Server-cluster-aware applications.

Posted in Application Clustering, Distributed Systems, java | Tagged , , | Leave a comment

Distributed Java Application Development – Part 3

This is the third article in the series of articles exploring distributed java
application development. In this article we will explore the capabilities/support
required to build Aplication-Level Server-cluster-aware applications.

We will explore these capabilities with the help of a simple example. In this
example we will take sample usecase of converting standalone application to
clustered application.

Fig A. shows a sample stand-alone application. This sample application process
the real-time data/events/messages supplied by a external system.


Standalone Server

Fig A. Standalone Server


Now for scalability and availability reasons we want to convert this application
into clustered application (Fig B).


Clustered Server

Fig B. Clustered Server


While converting stand-alone application to clustered application, we must
design applications so that they can run as multiple instances on different/same
physical servers. This can be done by splitting the application into smaller
components that can be deployed independently.

Stateless Application Components:

This conversion is easy, if the application components are stateless. Stateless
application components are independent of others and can run independently.
These stateless application components can handle/process the requests/data
equally. We just need a load-balancing mechaninsm to redirect the requests/data
to avilable application servers. If we need to handle more data, then we need
to add more servers and install the application components.

Stateful Application Components:

But most of the practical clustered applications components are stateful.
For instance, the application components may share a common data/configuaration
based on which processing happens. The application components may share a common
cache for fatser processing.

The following the capabilities/supports required to build stateful application

* Distributed State Sharing / Caching
* Distributed Service Registry and Discovery
* Group communication and membership with state maintenance and querying capability:
* Dynamic leader server election and Task co-ordination
* Distributed Locks/Synchronization
* Distributed Data Structures , ID Generator
* Distributed Execution
* Distributed Messaging System
* Distributed Scheduling

Next article we will explore  the above capabilities/supports required to build Application-Level Server-cluster-aware applications.

Posted in Application Clustering, Distributed Systems | Tagged , | Leave a comment

Distributed Java Application Development – Part 2

This is the second article in the series of articles exploring distributed java
application development. In this article we will continue to discuss about
distributed applications.

As discuses in the previous article we need a distributed/clustered system for
the following non-functionality reasons.

High availability
Fault tolerance
Load balancing

Application Server Clustering vs Application(-Level) Clustering

Java 2, Enterprise Edition (J2EE) uses application-server clustering to deliver
mission-critical applications over the web. Within the J2EE framework, clusters
provide mission-critical services to ensure minimal downtime and maximum scalability.
A application server cluster is a group of application servers that transparently
run your J2EE application as if it were a single entity. The clustering support
is available for services/requests like JNDI, EJB, JSP, HttpSession replication
and component fail-over, load-balancing etc..



Most of the Java enterprise servers (JBoss, Resin, WebLogic etc) have built-in support for clustering.

More details about Aplication-Server Clustering available at

Application-Server Clustering alone is not sufficient to build full fledged distributed
application. To build a full fledged distributed application, application should be
aware of other available servers in the cluster. This Application-Level Cluster awareness is required to handle various custom use cases like state sharing , group communication and task co-ordination and distribution, etc..

Next article we will explore capabilities/support required to build Application-Level Server-cluster-aware applications.

Posted in Distributed Systems, java | Tagged , , | Leave a comment

Distributed Java Application Development – Part 1

This is the first article in the series of articles exploring distributed java application development.

In this article we will discuss some basic concepts.

Basic Concepts/Definitions:

Software requirements are divided in to two types, Functional Requirements and
Non-Functional Requirements

Functional Requirements (FRs) defines the specific behavior (or) functionality
of the system. They tell exact usage of the system being designed. We will capture these
requirements in Software Requirement Specification (SRS). We define them as
“System shall do *****” etc.. The plan for implementing FRs is detailed in
the “System Design/Design Document”.

Non-Functional Requirements (NFRs) is a requirement that specifies criteria
that can be used to judge the operation of the system. These are also called
“Qualities” of the system. We define them as “System shall be 99.9% Available”
etc.. The plan for implementing non-functional requirements is detailed in the
system architecture.

The important NFRs are

1. Scalability
2. High-Availability
3. Load-balancing
4. Fault Tolerant/Fail-Over

Scalability :  Ability to handle additional load by adding more computational resources (CPU , RAM, Disk, Network etc..). Scalability can be Vertical scalability and Horizontal scalability.

Vertical scalability (Scaling Up): is handling additional load by adding more power to a single machine. i.e By adding a faster CPU, add more RAM or using a faster Solid-State Disk (SSD).

Horizontal scalability (Scaling Out): is handling additional load by adding more servers.
Horizontal scalability is much harder to achieve as adding servers requires
ensuring data consistency and process synchronization.

The distribution of processing load among the group of servers is known as server load balancing.


A server failure can be because of many reasons; system failures, planned outage, hardware or network problems etc.

High availability: is redundancy in the system. If one server fails, the others take
over the failed server’s load transparently.   The failure of an individual server is invisible to the client. High availability can ensured by not having any  single points  of failure.  Our traditional single-server solution can be good for scalability  (add more memory and CPU), but not for high availability as it has single point of failure.

Fault tolerant service: always guarantees strictly correct behavior despite a certain
number of faults. Failover is another key technology behind clustering to achieve fault
tolerance. By choosing another node in the cluster, the process will continue when the
original node fails.

We traditionally start with single server architecture for our applications.
But sooner or later application will have to process more requests/data than a
single server can handle. Scaling up (Vertical Scalability) (Fig A) can be used.

Horizantal Scaling

Fig A. Vertical  Scaling ( Image shows we need to add more hardware resources)

Scaling up can be a short-term solution. And it’s a limited approach because the cost of
upgrading is disproportionately high relative to the gains in server capability.
For these reasons most successful Internet companies/enterprise applications
follow a scale out (Horizontal Scalability) (Fig C) approach.

We need to horizontally scale our application to multiple servers based on
low-cost hardware and operating systems (cluster of servers).

Horizantal Scaling

Fig C.  Horizontal Scaling

A cluster is a group of application servers that transparently run your application as if it were a single entity. In order to implement  server clustering we need distributed applications/technologies.

There are different types clusterings at different tiers like DB Clustering, Hardware Clustering,  Application Server (AS) Clustering and Application-level Clustering.

Next article we will discuss more about Application-level Clustering




Posted in Distributed Systems, java | Tagged , , | Leave a comment

Introduction to Network Fault Management


Fault management (FM) is usually mentioned as the first concern in network management. Its main role is to ensure high availability of a network. Hence, it involves procedures to automatically detect, notify the occurrence of a fault and isolate the  root cause (RCA) of the fault.

Below diagram depicts the view of network operations with and  without  integrated FM.With automated FM system we can integrate and monitor multiple technologies from multiple vendors with limited human resource.


Fault Management

FM is the process of locating problems or faults on the network. 

It involves the following steps:

  1.  Discover/Detect the faults
  2.  Isolate the faults
  3.  Fix/Notify/Report the faults


Below diagram shows the FM functionality as part of  FCAPS model.


The main functions of FM are

  • Event/Alarm Discovery
  • Event/Alarm Filtering
  • Event/Alarm Correlation (RCA/SIA)
  • Alarm Forwarding/Notification
  • Alarm Reporting/Analysis
  • Third Party Integration


The functions of FM can  be broadly divided in to three parts

  • Event Collection
  • Event/Alarm Processing
  • Generating Info/Reports


Event Collection : Connecting and Collecting events/alarms from  the various network elements. Suppressing  unnecessary events/alarms. Managing the retention of events/alarms.

Event/Alarm Processing: Events/Alarms filtering , Events/alarms thresholding, Enrichment Process. Event/Alarm Correlation ,Event/Alarm Forwarding, Root Cause Analysis (RCA)/Service Impact Analysis (SIA).

Generating Info/Reports: Events/Alarm Reporting , Event/Alarm Analysis, Integrate with other OSS  system to generate other information, information forwarding.

Event/Alarm Management

What is Fault?

   A Fault is a software or hardware defect in a system that disrupts communication or degrades the performance.

 What is Event?

An event is a distinct incident that occurs at a specific point in time. Any happening that has an impact on the network performance can be called an event. It can be informational in nature, a cleared event, warning message, a trouble sign or even a critical fault.

All the faults in the system/network are notified as events. Events are the source of information for all the management happenings that take place within the FM system.

Typically an event is associated with an managed object (Ex: ME, PTP, Router, Switch etc..) in which it occurs with a specific event Type at a specific layer rate etc. This combination can be called a AlarmKey i.e All the events associated with same fault will have same alarmKey.

Events also have an associated severity. The common severities are Critical, Major, Minor, Warning  and Clear.


 Examples of  fault/events include:

  • Port status change
  • Connectivity loss/Fiber Cut
  • Device reset/Equipment failures
  • Device becoming unreachable by the EMS

What is Alarm?

The life cycle of a fault scenario is called an Alarm. An alarm is characterized by a sequence of related events (having same alarmKey), such as port-down and port-up.  The last event in the sequence determines the severity and state of the alarm. An alarm that ends with an event that has a severity of cleared is called a cleared  alarm.

One ManagedObject can have many different alarms with different alarmKeys.


port down event with critical severity results in to Critical Alarm  And a port up event  comes with  cleared severity. This moves the Critical Alarm to Clear Alarm.


Flapping Events

Flapping is a flood of event notifications with toggling severity which are related to the same alarm (having same alarmKey). Flapping can occur when a fault is  unstable and causes repeated event notifications. Flapping can be indicative of configuration problems, real network problems.

A flapping example is illustrated in below diagram


A sequence of events is identified as flapping if:

  • All events share the same alarmKey
  • The time interval between consecutive events is less than configured value.

Event Discovery/Identification

Normally the management systems(EMS/NMS) notifies the events to the interested parties through SNMP/Corba/TCP mechanisms. Events can also be generated by external systems for threshold events.


 The event processor listens and parses the event notification messages to get more information about the event and maintains the Event information for further processing

Some of the event properties are

  • Event Source –  Associated  ManagedObject name.
  • Event Functionality Type – Alarm (Fault Event), TCA (Performance Event)
  • Event Type –  ITU-T X.733 Alarm Type (Exa. Communication Alarm , QoS Alarm)
  • Event description  – Indicates event message
  • Event Severity  – Severity of the the Event

Event enrichment

Event enrichment is the process of the populating additional information about the generated event.  This process may need to contact with third-party systems to get the information. This enriched information can be useful during fault resolution.

Event Correlation and Alarms

Event correlation is the process of establishing relationships between network events

 Main Functionality:

  1.  Filter out redundant and spurious events.
  2.  Root cause of faults in a network (RCA)

 Event Filtering

One important aspect of FM is filtering and prioritizing incoming events to identify  the serious events. Based on event information the FM can determine whether the event continue to be processed or is dropped.  All unwanted/duplicated events can dropped at event collection or event processing stage.



When an NE on the network is faulty, the management system (EMS/NMS) reports the network events to the FM. Each fault  may triggers multiple events/alarms. Some events may by triggered by the same fault, so they are associated with each other. The alarm correlation function can analyze the events and generate single alarm for multiple events.


A failure situation on the network usually generates multiple events, because a failure condition on one device may render other devices inaccessible. The events generated indicate that all of the devices are inaccessible.

Network operators use Root cause analysis (RCA) to investigate the root-cause of events. They can determine which events are root cause and which events are results of that root cause (symptom events) and this enables them to to quickly focus on the events that are causing network problems.

Normally RCA process uses knowledge of the network topology to establish a point of failure and  identify  symptom events.

RCA algorithms can be rule based, predictive or model-based.

If a device fails, the immediate question that needs to be answered is “what business service did it impact” and what is the cost to my business. This kind  analysis is  called Service Impact Analysis (SIA). SIA uses RCA information to find out the impacted services/customers.

Third-Party Integration

The main aim of network operator is to shorten the fault resolution time period. So basic fault detection and reporting features may not be sufficient for efficient fault monitoring.  FM systems should integrate with third-party system like Trouble Ticketing Systems, Performance Mgmt System etc.. These third-party integration’s help the operators to fasten the fault resolution process.

Posted in EMS/NMS/OSS | Tagged , | Leave a comment

The Art of Java Application Performance Analysis and Tuning – Part 7

This is the last article in the series of articles exploring the tools and techniques for analyzing, monitoring, and improving the performance of Java applications.

The Art of Java Application Performance Analysis and Tuning-Part 1                                      The Art of Java Application Performance Analysis and Tuning-Part 2                                     The Art of Java Application  Performance Analysis and Tuning-Part 3                                       The Art of Java Application  Performance Analysis and Tuning-Part 4                                    The Art of Java Application  Performance Analysis and Tuning-Part 5                                          The Art of Java Application  Performance Analysis and Tuning-Part 6

In this blog we are going to discuss  performance analysis during development/QA  and common Java performance problems.

Performance Analysis During during Development/QA

Many of us developers/QA engineers only test for functionality at low load and miss the performance issues at higher load. During higher load, application may experience the performance issues like  slowness, OutOfMemory, Memory/Thread Leaks.


Developers should be aware all the tools mentioned above. Thread dump analysis, JConsole, Jmap are must. Running Profilers like YourKit, IBM Health Monitor will give more insights of your program.

All the existing major features and new features must be profiled to identify the performance problems early in the development.


QA members should be aware of tools like JConsole, JMap, Profilers/IBM HealthMonitor, Linux Commands like (top, ps, iostat, vmstat). All the existing/new features must be profiled to identify the performance problems early in the testing.

Common Java Performance Problems



  1. OutOfMemoryError in logs
  2. Program stops responding
  3. JConsle/Jmap shows full memory usage


  1. Use Thread dump analysis to identify the cause of memory leak.
  2. Use JMap (Oracle)/IBM Health Monitor(IBM)-Memory to identify the leaked objects.
  3. Use Thread dump/JConsole for possible thread leak.
  4. Use Yourkit profiler to identify the leaked objects/caused classes/methods.

Thread Leaks


  1. OutOfMemoryError in logs
  2. OS hangs
  3. OS related errors like
   "Unable to fork new process"
   "Cannot allocate memory: fork: Unable to fork new process"


  1. Use Thread dump/JConsole to confirm possible Thread leak.
  2. Use Thread dump analysis/Method Profiling to identify the cause of thread leak method.

Command to find out number of the threads in Linux

   # ps -elfT | wc -l

If  Java process is leaking threads then this number increases. This can cause your system to reach maximum allowed threads/processes.

To see the number of threads one specific process is using you can do the following.

   # ps -p PID -lfT
   # ps -p 2089 -lfT

If the above number is increasing then we can suspect thread leak.

The common causes of are thread leaks are

  1. Blocked threads
  2. Infinite loops in Thread.run()
  3. Not closing ExecutorService/ThreadPoolExecutor



  1. Program stops responding
  2. Application hangs


  1. Use Thread dumps/JConsole to confirm possible thread leak.

Reference :



Performance analysis and tuning is an art and is a iterative process. Performance tuning exercises should be accompanied by concrete performance requirements and measurement programs. Tools like Java thread dump , JConsole and IBM Health monitor
can help us in analyzing the applications. Analyzing Java thread dumps are critical for understanding what’s really going  inside Java application server. Caching is important to reduce the load on database and to improve the performance.

Must Read Book

[1] “Java Concurrency In Practice” by Brian Goetz”, David Holmes, Doug Lea, Tim Peierls, Joshua Bloch

Posted in java | Tagged , | Leave a comment

The Art of Java Application Performance Analysis and Tuning – Part 6

This is the sixth article in the series of articles exploring the tools and techniques for analyzing, monitoring, and improving the performance of Java applications.

The Art of Java Application Performance Analysis and Tuning-Part 1                                      The Art of Java Application Performance Analysis and Tuning-Part 2                                     The Art of Java Application  Performance Analysis and Tuning-Part 3                                       The Art of Java Application  Performance Analysis and Tuning-Part 4                                    The Art of Java Application  Performance Analysis and Tuning-Part 5

The Art of Java Application  Performance Analysis and Tuning-Part 7

In this blog we are going to discuss Java application tuning techniques.

Java Application performance tuning

Java Application performance tuning involves the tuning of following components.

  1. JVM Tuning
  2. OS/Hardware Tuning
  3. Database (MySQL/Oracle) Tuning
  4. Application (Code) Tuning

In this article we will discuss about Application (Code) Tuning.

Application (Code) Tuning

In my experience I came across the following techniques to improve the performance.

  1. Identify the bottlenecks and solve them
  2. Using Better Logic/Algorithms
  3. Using Less and Efficient DB Queries
  4. Caching
  5. Using Java Concurrency API for Improving Performance

How to identify Bottlenecks?

Using Thread dump Analysis: : Take multiple thread dumps during busy/peak hour. Analyze the thread dumps to identify the hot methods/frequently called methods and start optimizing the code/logic..

Using Method Profiling: : Use some profiler/ IBM Health Center – Profile option to find out the hot methods/frequently called methods and start optimizing the code/logic.

Using Application Logs: : Use the application logs/time stamps to measure the performance/response time and start optimizing the code/logic

Using Memory Analysis: : Use the Jmap/IBM Health Center – Classes option to find out the the high memory usage/OutOfMemory errors

Repeat above techniques until you meet your performance requirements.

Using Better Logic/Algorithms/Implementation

After identifying the bottlenecks we may need to optimize/change the code/logic.

single thread bottlenecks: Common reason for slowness/low throughput are single threaded bottlenecks. Single threaded bottlenecks are the code portions where only a single thread exists for processing. These bottlenecks cause other threads/data to wait every time they execute. Often these bottlenecks are the reason for high memory usage and OutOfMemory problems.


  1. Design concurrent applications around execution of independent tasks (Java Concurrency Executor API).
  2. We can write faster algorithms by using Java Concurrency API/Parallel algorithms.

Using Less and Efficient DB Queries

Most of the Java applications are data-base bound. Even if you can scale-out your application, you can not scale the RDBMS data-bases easily. Efficient database modeling and query mechanisms are key to the data-base performance.

1. It is difficult to change the Database model after project deployment. So we need to make the DB model right first time.

2. Use Batch Queries (PreparedStatement.addBatch, Hibernate Bulk addition, etc.) where-ever possible. In some cases batch queries are 100 times faster! than serial queries.

3. Always try to query on a indexed columns.

4. Care should be taken while doing JOIN queries on huge tables.

5. Some times queries need be tuned for specific DBs (Oracle , MySQL). Use EXPLAIN PLAN command to find out the cost of the query.

6. Care should be taken while using ORMs like Hibernate, JPA etc. Understanding ORM Query generation helps query tuning.

7. Poor database deployments/sizing can hurt the application performance severely. So monitor the DB (Oracle, MySQL) applications and tune the DB Server accordingly.
Excessive load on database effects the DB performance. “Caching” can be used to decrease the load on database.


The current mantra of industry for high performance and scalable applications are

  1. Scale horizontally (multiple/distributed servers)
  2. Cache as much as you can. The more useful cache the better the responsiveness.


1. Cache frequently used data and static data.

2. Cache precomputed results for repeated use.

3. Use thread dump analysis/method profiling to identify the hot methods and try to cache the data.

4. Make sure your caches are configurable. So that caches can be tuned for project requirements.

5. Make sure your cache configurations are align with available RAM.

Using Java Concurrency API for Performance

The number of cores in multi-core processors has increased. Based on project requirements/network size we can add more and more processor-cores (scale up) to the hardware.

But the question is, Are we using the multi-core processors efficiently?

To keep all processor cores busy we need to code more fine grained and more scalable parallelism. The applications that have not been tuned for multi-core systems may suffer from performance problems.

In using Java Concurrency API to achieve better performance, we are trying to utilize the multi-core processing resources we have more effectively.

  1. Design concurrent applications around execution of independent tasks (Java Concurrency Executor API).
  2. We can write faster algorithms by using Java Concurrency API/Parallel algorithms.

Must read book for all Java developers : “Java Concurrency In Practice” by Brian Goetz”





More details on this topic in future article.

Posted in java | Tagged , | Leave a comment