Cougaar Memory Profiler

Cougaar

NOTE:
As of January 2007, the Cougaar memory profiler should be considered deprecated. The Cougaar memory profiler was written to overcome limitations in Sun's JVMPI implementation, as used by hprof, OptimizeIt(tm), JProbe(tm), etc. Most of these problems have been fixed in Java 1.6.

Instead of using the Cougaar memory profiler, we recommend that developers first try Sun's new, free `jhat` profiler:
  http://java.sun.com/javase/6/docs/technotes/tools/index.html#troubleshoot
See the above link's Developer's Guide for notes on how to use jhat and interpret the results.

For future reference, there may be interest in extending jhat to better identify non-leak memory problems and potential solutions. For example, a developer might want to estimate the potential memory savings of adding caching (e.g. intern the most common Strings) or changing a data structure (e.g. change an "Integer" field to a primitive "int"). These "identify potential memory saving" analysis methods would best be implemented as enhancements to the post-processing jhat dump reader:
  j2se/src/share/classes/com/sun/tools/hat/Main.java
The modified code could be a new Cougaar project (e.g. "jhat_extensions") and offered back to Sun.

 

The remainder of this page describes the Cougaar memory profiler, which still works but is more difficult to configure and use than jhat. Some of the concepts may be of interest for use in future projects, e.g. bytecode instrumentation and weak reference tables.

 



Overview

The Cougaar memory profiler is a tool for debugging memory usage and leaks in any Java application. It features a scalable 100% Java design that is lighter weight than existing JVMPI-based profilers (hprof, OptimizeIt(tm), JProbe(tm), etc).

Memory leaks can arise in Java applications due to unintentional object references. For example, an application may fail to remove callback listeners or close streams, preventing these objects from being garbage collected. Even minor leaks can grow to major problems in deployed systems.

The profiler tracks memory usage within the application by using tables of WeakReferences. The developer selects which classes should be tracked and runs an automated classfile editor (using BCEL) to add profiling instructions to the constructors. The profiler maintains pointers to the live instances and can display useful debugging information:

An interactive servlet-based UI is included that displays all the above profiling data. A servlet-based UI was developed instead of a Swing UI since it matched the Cougaar Agent Architecture's embedded servlet server. It'd be straight-forward to port the servlet to Swing or create a JEditorPane/HTMLEditorKit adaptor. The servlet has minimal Cougaar dependencies and can be modified to run in any servlet server.

Jump to snapshots

The developer controls the profiling overhead by selecting which classes to profile and reducing the level of profiling (e.g. turning off allocation stacktrace data). The runtime overhead is less than JVMPI tools, partially because the JVMPI tools must maintain similar heap tables (see the hprof source in $JDK/j2se/src/share/tools/hprof for details). We've found the JVMI overhead to be prohibitively expensive when debugging large applications. Long-term JSR-163 (Java(tm) Platform Profiling Architecture) may address some of these performance issues, however the draft specification suggests that bytecode-based object profiling may be the recommended approach.

There are a couple noteworthy limitations of this approach vs JVMPI:

The advantages are noted above: scalability and Java-based APIs. Also, since the profiler is 100% Java, it's easier to extend and can be used from within the application itself.

The main project page has the latest release. The download contains a "run" script that runs a simple example application.

Please send questions and comments to the cougaar-developers@cougaar.org mailing list.


Snapshots

Here's an example HTML snapshot:

The above snapshot shows:

Here's a key for each table row:

Type
The class name
Sample%
The percent of allocated instances that were tracked. This is an option specified by the developer in the profiler filter configuration.
Live
The number of live instances of the class, multiplied by 1/sample
GC'ed
The number of garbage collected instances of the class, multiplied by 1/sample
Total
"Live" + "GC'ed"
Bytes Each
The estimated size of each instance, based upon the object header size and number of fields in the object
*Live
"Live" * "Bytes Each"
+Capacity Bytes
"*Live" + "Capacity Bytes", where "Capacity Bytes" is the sum of array lengths multipled by array element size in all the live instances (e.g. for ArrayList, for each "Object[] elementData", the sum of:
  (elementData == null ? 0 : 12 + 4*elementData.length)
Off to the right, out of view in the snapshot, are columns for:
Size Sum
The sum of all "size()" values of the instances (e.g. for Vector, the sum of all "int elementCount" values)
Size Max
The maximum "size()" of the live instances
Size Max Ever
The maximum "Size Max" observed since the start of the run
Size Mean
"Size Sum" / "Live"
Capacity Sum
The sum of all array lengths in all the live instances (e.g. for ArrayList, for each "Object[] elementData", the sum of:
  (elementData == null ? 0 : elementData.length)
Capacity Max
The maximum array length of the live instances
Capacity Max Ever
The maximum "Capacity Max" observed since the start of the run
Capacity Mean
"Capacity Sum" / "Live"

For example, 1% of String allocations were tracked. Currently 56 of 28042 tracked Strings are live, so we can estimate that 5600 are live out of 2804200 allocated (0.2%). Each String costs 24 bytes excluding the char[] data:

  header + fields =
  header + "char[] value" + "int offset" + "int count" + "int hash" =
  8 + 4 + 4 + 4 + 4 =
  24
So there's about 24*5600 = 134400 bytes in Strings, excluding the "char[] value". The "+Capacity Bytes" is 474800, so the sum of all the "char[]" headers (12 bytes each) and chars (2 bytes each) is 474800-134400 = 340400 bytes. This is an estimate, since some of the Strings may point to the same "char[]" (e.g. substrings and StringBuffers). The max "length()" of a String is currently 49, and the max "value.length" is 98, where the discrepency is likely due to substrings.

If we click on the "java.lang.String" row a new page is generated:

This simply repeats the above data and allows the user to submit a query on instances of "java.lang.String". Here we select a decreasing view by "length()" and see:

In this run we disabled "java.*" allocation timestamps and stacktraces, to minimize the performance overhead, but we can see the "toString()" and other useful information. In this example shows the largest 20 of all 30 Strings (of the 1% we tracked).

Additional snapshots and notes can be found here.


Usage

The release contains a "run" script that will run a simple example application. A large HashMap is allocated, serialized, and deserialized, and at each point the profiler is queried to track this map.

BCEL 5.1 is required to modify the classes but not required when running the profiler.

The "profiler_transform/bin/transform_jar" script can be used to instrument complete jars (e.g. "lib/core.jar", "sys/tomcat_40.jar", etc) or JDK packages (e.g. "java.lang", "java.util", etc). For options run:

  ./profiler_impl/bin/transform_jar --help

The "transform_jar" script allows the developer to specify an "OptionsFactory" class that will define the profiling level at runtime, e.g.:

  ./profiler_impl/bin/transform_jar --config=foo.MyConfig xerces
The default implementation is DefaultOptionsFactory, which enables full profiling for all classes. The class must contain a method matching this signature:
    public static final Options getOptions(
        String module,
        String classname) {..}

The recommended usage is to specify a custom implementation that you'll modify between runs. This will allow you to run the "transform_jar" script on your jars only once (to instrument the bytecode and specify your OptionsFactory) and in subsequent runs you'll only need to change your OptionsFactory to change the profiling config.

Here's an example:

  package foo;

  import org.cougaar.profiler.Options;

  public class MyConfig {

    /**
     * Option that will track 100% of allocations,
     * record allocation timestamp,
     * record allocation stacktrace,
     * record size/capacity/context.
     * <p>
     * This is the best detail but the most overhead.
     */
    private static final Options FULL_DETAIL =
      new Options(
          true, // time
          true, // stack
          true, // size
          true, // capacity
          true, // context
          1.00  // sampleRatio
          );

    /**
     * Track only 1% of allocations,
     * record only the instance itself,
     * allow size/capacity.
     * <p>
     * This is very low overhead but only samples a small
     * subset of the instances.  This is ideal for the
     * initial profiling runs, to get a rough idea of the
     * allocation counts.
     */
    private static final Options MIN_DETAIL =
      new Options(
          false, // time
          false, // stack
          true,  // size
          true,  // capacity
          false, // context
          0.01   // sampleRatio
          );

    /*
     * The profiled class will call this method to get its options.
     *
     * @param module optional module name, e.g. "xerces" or null
     * @param classname non-null class name, e.g.
     *   "org.apache.xerces.dom.DocumentImpl$LEntry"
     * @return profiling options for the class, or null if the
     *   class shouldn't be tracked.
     */
    public static final Options getOptions(
        String module,
        String classname) {
      if ("core".equals(module)) {
        // we're interested in all classes in "lib/core.jar".
        // we want to track every allocation, timestamp, and stack
        return FULL_DETAIL;
      } else if (classname.startsWith("java.")) {
        // we're a little interested in "java.*", but
        // not enough to suffer a big overhead.  This is a good
        // "scatter-shot" approach.
        return MIN_DETAIL;
      } else {
        // everything else will be ignored
        return null;
      }
    }
  }

This should be compiled, jar'ed (e.g. "profiler_config.jar"), and placed at the head of the VM's bootclasspath along with the "profiler_impl.jar" and other VM jar overrides, e.g.:

  #!/bin/sh
  ..
  BOOTCLASSPATH="\
    -Xbootclasspath/p:$COUGAAR_INSTALL_PATH/lib/profiler_config.jar\
    -Xbootclasspath/p:$COUGAAR_INSTALL_PATH/lib/profiler_impl.jar\
    -Xbootclasspath/p:$COUGAAR_INSTALL_PATH/lib/javalangpatch.jar\
    -Xbootclasspath/p:$COUGAAR_INSTALL_PATH/lib/javaiopatch.jar"
  ..
  java $BOOTCLASSPATH ..
For details see the example "run" script included in the release.

Custom "OptionsFactory" classes should minimize imports to avoid VM loading errors. For example, a call to "new HashSet()" will fail if HashSet is profiled, due to a stack overflow caused by the circular reference. Similarily, if all of "java.lang" will be profiled, then Strings should be carefully handled to avoid string allocations, including any calls to "System.out"!

The class modifier can also be used within a ClassLoader at class definition time. The "SelfProfiler" defines a standard BCEL ClassLoader method:

  public JavaClass modifyClass(JavaClass clazz) {..}
See BCEL's "org.apache.bcel.util.ClassLoader" for details. This approach would avoid the "transform_jar" script but would add runtime overhead.


ACME Usage

Here are usage notes specific to ACME, which is Cougaar's automation scripting environment (soon to be open-sourced). This is primarily of interest to Cougaar and Ultra*Log developers.

Download the latest profiler install and unzip it into your $CIP. It'll overwrite:
  $CIP/build/data/default.build
to specify the compile dependencies of UL jars (e.g. "ar_mic.jar").

To verify the basic install, run the "$CIP/run" script. It should generate output similar to "$CIP/run.log". After this test, delete the temporary files:
  rm -rf $CIP/tmp
  rm $CIP/lib/javautilpatch*
Also delete the example jar and config, which you won't need for testing:
  rm $CIP/lib/profiler_examples.jar
  rm $CIP/lib/profiler_config.jar

First you must create an "OptionsFactory" Java class to control the level of profiling. An example is shown above named "foo.MyConfig". A modified zip containing the source, jar, and compile script can be found here. This implementation will enable minimal profiling for all classes:
  Track 1% of instance allocations at minimal detail
You may want to modify the "profiler_config/src/foo/MyConfig.java" to profile more/less classes or capture more/less detail; see the above usage section for details.

Next you must transform the jars that you will profile. You can transform all the jars in sys and lib using:

  #!/bin/sh

  CONFIG="--config=foo.MyConfig"

  cd $CIP || exit -1

  # profile everything!  This may take a while...
  #
  # The following errors are expected:
  #   icu4j:        two class files are corrupt
  #   jena:         missing jars (sesame, antlr, sleepycat)
  #   juddi:        missing jars (axis)
  #   log4j:        missing jars (jms, jmx)
  #   openmap:      missing jars (j3d)
  #   tomcat_40:    missing jars (jmx)
  #   vishnuServer: missing jars (vishnu, util)
  # These errors and related NPEs can be ignored, since the classes
  # that depend upon these missing jars will never be loaded at
  # runtime.
  JAVA_PKGS="applet awt beans io lang math net nio rmi security sql text util"
  for x in $JAVA_PKGS; do ./transform_jar $CONFIG java.$x; done
  for x in $CIP/lib/*.jar; do ./transform_jar $CONFIG $x; done
  for x in $CIP/sys/*.jar; do ./transform_jar $CONFIG $x; done
This may take up to an hour on a slow machine (~500mhz). This is a one-time step that is only required when/if you upgrade your jars. After the transform you can delete the temporary files in:
  rm -rf $CIP/tmp
In a profiling session you'll only need to modify the "profiler_config" source and recompile the tiny "$CIP/lib/profiler_config.jar" to change the profiling level.

At this point your $CIP/lib should look something like:
  aggagent.jar -> /mnt/shared/socC/lib/aggagent.jar.new
  aggagent.jar.new
  aggagent.jar.old
  ...
The "*.jar.new" contains the instrumented classes and is signed by the operator "sign" script.

Next you must modify your:
  $CIP/csmart/config/rules/isat/tic_env.rule
by commenting out line 55 (the Xbootclaspath). This allows ACME to use the profiler's VM "-Xbootclasspath" rule:
  $CIP/csmart/config/rules/isat/bootclasspath.rule
I'll submit an ISAT bug to split this create the basic "bootclasspath.rule" in the CSMART module, so in the future you won't need to modify your "tic_env.rule".

You should also modify the above "bootclasspath.rule" to uncomment all the jars (e.g. uncomment the "javarmipatch", etc).

Lastly, you can optionally enable the "wget" action:
  $CIP/csmart/acme_scripting/src/lib/cougaar/wget.rb
by modifying the very end of:
  $CIP/csmart/acme_scripting/src/lib/cougaar/scripting.rb
to append:
  require 'cougaar/wget'
Then add "wget" actions in your "MOAS" script, e.g. just before stage-2 add:
  do_action 'WGetAllNodes', '/profiler', 'profiler_beforeStage2'
to capture "/profiler" servlet snapshots to the archives. I'll submit an ISAT bug to move this action into the CSMART module.

The config is now ready to run. You can run any "MOAS" script and it'll profile the classes as defined in your "foo.MyConfig" class, which in this case will track 1% of the instance allocations. This is fine for the first run, since it'll provide a rough idea of where to focus.

You should now have "/profiler" HTML snapshots to analyze. See the above snapshots and notes for details on the "/profiler" servlet output and example analysis. To enable offline javascript-based column sorting in the snapshots you must save this link as "tablesort.js" and run the following script on "*.html":

  #!/bin/sh

  # replace second line:
  #  <script LANGUAGE="JavaScript" SRC="/$~/profiler?action=script"></script>
  # with:
  #  <script LANGUAGE="JavaScript" SRC="tablesort.js"></script>
  for x in $*; do
    cat <<"EOF" | ed $x
  2c
  <script LANGUAGE="JavaScript" SRC="tablesort.js"></script>
  .
  w
  EOF
  done

Based upon your analysis, you may choose to:


Design Details

The bytecode modification is done by the SelfProfiler class.

The basic idea is to transform a class like:

  public class Bar extends Foo {
    public Bar() {
      System.out.println("Hello, world!");
    }
  }
into:
  import foo.MyConfig; // your config class
  import org.cougaar.profiler.MemoryTracker;

  public class Bar extends Foo {
    public Bar() {
      System.out.println("Hello, world!");
      MY_PROFILER.add(this);
    }
    private static final MemoryTracker MY_PROFILER =
      MemoryTracker.getInstance(
        "Bar", // classname
        8,     // object header + fields size
        MyConfig.getOptions("the_module", "Bar"));
  }

The full transform is more complex, since it must call call the profiler in all constructors, "clone()", and "readObject()". The class must also be tagged with the original serialVersionUID to preserve serialization compatibility.

The "run()" method outlines the basic steps:

  private void run() {
    // make sure the serialVersionUID field exists, so we don't
    // break serialization
    addSerialVersionUID();

    if (ENABLE_SIZE_AND_CAPACITY) {
      // add "$get_size()" and "$get_capacity()" methods
      defineSize();
      defineCapacity();
    }

    // add static profiler field
    defineProfilerField();

    // init profiler field in class init method
    initProfilerField();

    // override "profile_<super>" method to disable super's profiler
    disableSuperProfiler();

    // add our "profile_<class>" method
    addClassProfiler();

    // update constructors with "profile_<class>" calls
    // (excluding constructors that call "this(..)")
    //
    // note that the default constructor always exists
    callProfilerInConstructors();

    // catch "readObject", which is a hidden constructor.
    //
    // we can ignore "readExternal" since it calls the no-arg
    // constructor.
    callProfilerInReadObject();

    // handle "clone", which is yet another hidden constructor.
    callProfilerInClone();
  }

The note that "the default constructor always exists" refers to javac's creation of a default constructor if one is not specified, e.g.:

  class Test {}
Javac will create a "public Test() { super(); }" constructor. The VM requires at least one constructor for any class that will be instantiated, so the bytecode instrumentor doesn't need to add constructors.

"clone" creates an object through "super.clone()", which doesn't call the constructor. The result of "super.clone()" is tracked. If clone doesn't call "super.clone()" then it presumable returns either an existing object or uses a constructor to create a new instance.

"readObject" creates the object without calling the constructor. Here's an example test-case:

 import java.io.*;
 class Test implements Serializable {
    public static void main(String[] args) throws Exception {
      Test t1 = new Test();
      System.out.println("t1: "+t1);
      Test t2 = (Test) testSerial(t1);
      System.out.println("t2: "+t2);
    }
    public Test() {
      System.out.println("constructor");
    }
    private void readObject(ObjectInputStream s) throws Exception {
      System.out.println("readObject");
      s.defaultReadObject();
    }
    private static Object testSerial(Object o) throws Exception {
      ByteArrayOutputStream bos = new ByteArrayOutputStream();
      ObjectOutputStream os = new ObjectOutputStream(bos);
      os.writeObject(o);
      os.flush();
      o = null;
      byte[] b = bos.toByteArray();
      ByteArrayInputStream bis = new ByteArrayInputStream(b);
      ObjectInputStream is = new ObjectInputStream(bis);
      Object newO = is.readObject();
      return newO;
    }
  }
It generates output similar to:
  constructor
  t1: Main@1a16869
  readObject
  t2: Main@19efb05
ObjectInputStream uses ObjectStreamClass "newInstance()", which uses:
  import sun.reflect.ReflectionFactory;
    private static final ReflectionFactory reflFactory = ..;
      ..
      Constructor cons = reflFactory.newConstructorForSerialization(cl, cons);
      ..
      return cons.newInstance(..);
From there it's muddy, since we're in Sun's private code, but the end result is that "readObject" never calls the constructor.

Externalizable will call the zero-arg constructor. If there's no zero-arg constructor then it'll throw a runtime InvalidClassException. Note that non-static inner classes always pass the outer class as an implicit first parameter, so they can't be Externalizable.

The profiler can't track primitive arrays since they're not created through a class constructor. A possible workaround is to track arrays created by a profiled class by tracking the result of "newarray" bytecodes. This is different than the profiler's normal operation, since arrays constructed by non-profiled classes won't be tracked, as opposed to the current behavior where profiled classes are tracked regardless of which code constructed them. A synthetic "java.lang.Array" class would be ideal but would require VM changes.

A possible enhancement is to create a transform that tracks "new" bytecodes to find allocations by a given class, e.g. track all objects allocated by Xerces classes.