Performance Principles
Common Performance Problems
Performance Methodology
Development and Performance
Monitoring Operating System Performance
Monitor CPU Usage
Monitor Network I/O
Monitor Disk I/O
Monitor Virtual Memory Usage
Monitor and Identify Lock Contention
Monitoring the JVM
HotSpot Generational Garbage Collector
Monitor the Garbage Collector with Command Line Tools
Monitor the Garbage Collector with VisualVM
Monitor the JIT Compiler
Throughput and Responsiveness
Performance Profiling
NetBeans Profiler, Oracle Solaris Studio, and jmap/jhat
Profile CPU Usage
Profile JVM Heap
Find Memory Leaks
Identify Lock Contention
Heap Profiling Anti-patters
Method Profiling Anti-patterns
Garbage Collection Schemes
Garbage Collection
Generational Garbage Collection
GC Performance Metrics
Garbage Collection Algorithms
Types of Garbage Collectors
JVM Ergonomics
Garbage Collection Tuning
Tune the Garbage Collection
Select the Garbage Collector
Interpret GC Output
Language Level Concerns and Garbage Collection
The best practices for Object Allocation
Invoking the Garbage Collector
Reference Types in Java
The use of Finalizers
Performance Tuning at the Language Level
String-efficient Java Applications
Collection Classes
Using Threads
Using I/O Efficiently
On the other hand, good object-oriented design actually encourages many small methods
and significant polymorphism in the method hierarchy
However, this technique cannot be applied Ofreilly - Java Performance Tuning
- 9 -
when it is too difficult to determine method calls at compile time, as is the case for many Java
methods.
A lot of time (in CPU cycles) passes while the user is reacting to the application interface. This time
can be used to anticipate what the user wants to do (using a background low priority thread), so that
precalculated results are ready to assist the user immediately. This makes an application appear
blazingly fast.
Multiuser response times depending on the number of users (if applicable)
Systemwide throughput (e.g., number of transactions per minute for the system as a whole,
or response times on a saturated network, again if applicable)
The maximum number of users, data, files, file sizes, objects, etc., the application supports
Any acceptable and expected degradation in performance between minimal, average, and
extreme values of supported resources
You must specify target times for each benchmark. You should specify ranges: for example, best
times, acceptable times, etc.
CPU time (the time allocated on the CPU for a particular procedure)
The number of runnable processes waiting for the CPU (this gives you an idea of CPU
contention)
Paging of processes
Memory sizes
Disk throughput
Disk scanning times
Network traffic, throughput, and latency
Transaction rates
Other system values
For distributed applications , you need to break down measurements into times spent on each
component, times spent preparing data for transfer and from transfer (e.g., marshalling and
unmarshalling objects and writing to and reading from a buffer), and times spent in network
transfer. Each separate machine used on the networked system needs to be monitored during the test
if any system parameters are to be included in the measurements. T
Any decoupling, indirection, abstraction, or extra layers in the design are highly likely to be
candidates for causing performance problems. You should include all these elements in your design
if they are called for. But you need to be careful to design using interfaces in such a way that the
concrete implementation allows any possible performance optimizations to be incorporated. Design
elements that block, copy, queue, or distribute also frequently cause performance problems. These
elements can be difficult to optimize, and the design should focus attention on them and ensure that
they can either be replaced or that their performance is targeted.
[7]
Asynchronous and background
events can affect times unpredictably, and their effects need to be clearly identified by benchmark
testing.
The behavior and efficiency of the garbage collector used can heavily influence the performance and responsiveness of an application thatfs taking advantage of it
Clocks per CPU instruction, usually referred to as CPI, for an application is a ratio of the number of CPU clock ticks used per CPU instruction. CPI is a measure of the efficiency of generated code that is produced by a compiler. A change in the application, JVM, or operating system that reduces the CPI for an application will realize an improvement in its performance since it takes advantage of better and more optimized generated code.
1.using verbosege option with the VM to print out garbage collector staticstics,it gives idea how often garbage collector is running .
2.application partioning ,data compressing
3.Larger heap take longer to collect garbage
young generation,old generation,permanent generation --- see it
5.Most employ a gstop the worldh collection, meaning the running
application must stop processing while the GC is engaged.
? Therefore frequent or large amounts of collection affect the
performance of your application.There are four garbage collector
6.use most updated jdk
7Hints are added to the command line start up of an application, such as
g-XX:+UseAdaptiveGCBoundaryh to allow the Garbage Collector to
change the allotted space between Young and Old Generations.
8.gThe Garbage Collector runs too frequently / is too slow.h
Try to reduce the number of temporary objects that are created.
If all else fails, carefully consider using a different garbage collector.
The Parallel Compactor (good for multiple CPUs where response
time isnft as important as overall throughput) can be selected with
the -XX:+UseParallelGC command line option.
The Parallel Compacting Collector (good for multiple CPUs where
response time is more important) can be selected with the
XX:ParallelGCThreads=n command line option.
9.gThe Garbage Collector...h (cont.)
The Concurrent Mark-Sweep Collector (good for apps running on
machines with few CPUs that needs more frequent garbage
collection) can be selected with the -XX:+UseConcMarkSweepGC
command line option.
9.Monitor and identify lock contention
10.Select the Garbage Collector
11.Invoking the Garbage Collector
12.Interpret GC Output
13.Garbage Collection Algorithms
Types of Garbage Collectors
14.Tune garbage collectors
15.Tune Just in Time (JIT) compilers
16.Examine and tune 64 bit JVMs
? Optimize the JVM for Multi-core platforms
17.Tune 64 bit JVM for different application requirements
o Tune a 64 bit JVM for a specific application
18.Optimize the JVM for Multi-core platforms
19.Select collector that best fits application characteristics and
requirements
20.Measure frequency and duration of collections
21.The algorithms and parameters used by GC can have dramatic effects on performance
22.Demonstrates that an application that spends 10% of its time in garbage collection can lose 75% of its throughput when scaled out to 32 processors
23.permanent generation
holds all the reflective data of the virtual machine itself, such as class and method objects
24.Young generation ? all new objects are created here. Majority of GC activity takes place here and is usually fast (Minor GC).
Old generation ? long lived objects are promoted (or tenured) to the old generation. Fewer GCfs occur here, but when they do, it can be lengthy (Major GC).
25.The permanent generation holds objects that the JVM finds convenient to have the garbage collector manage, such as objects describing classes and methods, as well as the classes and methods themselves.
26.Yound generation collector
Serial Copying Collector
All J2SEs (1.4.x default)
Stop-the-world
Single threaded
Parallel Copying Collector
-XX:+UseParNewGC
JS2E 1.4.1+ (1.5.x default)
Stop-the-world
Multi-threaded
Parallel Scavenge Collector
-XX:UseParallelGC
JS2E 1.4.1+
Like Parallel Copying Collector
Tuned for very large heaps (over 10GB) w/ multiple CPUs
Must use Mark-Compact Collector in Old Generation
Old generation collector :
Mark-Compact Collector
All J2SEs (1.4.x default)
Stop-the-world
Single threaded
Train (or Incremental) Collector
-Xincgc
About 10% performance overhead
All J2SEs
To be replaced by CMS Collector
Concurrent Mark-Sweep (CMS) Collector
-XX:+UseConcMarkSweepGC
J2SE 1.4.1+ (1.5.x default (-Xincgc))
Mostly concurrent
Single threaded
28.Lots of cast
29.Replace multiple objects by sinlge object or few
30 String test=new String("test");
String test=new String("test");
two objects if used
String test="test";
String test1="test";
two points to same objects
vector is synchronized
hashtable is synchronized and hashmap is not
Important to understand
the internal representation of a Java object and an internal representation of a Java class are very similar. From this point on let me just call them Java objects and Java classes and you'll understand that I'm referring to their internal representation. The Java objects and Java classes are similar to the extent that during a garbage collection both are viewed just as objects and are collected in exactly the same way. So why store the Java objects in a separate permanent generation? Why not just store the Java classes in the heap along with the Java objects?
Classes also have classes that describe their content.
important
//he permanent generation holds objects that the JVM finds convenient to have the garbage collector manage, such as objects describing classes and methods, as well as the classes and methods themselves.
//dynamic binding is slower than static binding,
//Minimize Subclasses and Method Overriding