Dealing with memory leak in Kepler
A page for how to track down performance and memory problems in Kepler, and how to prevent future memory leaks.
Main memory leaks tests for Kepler suite 2.2
1. workflow open and close
Status: memory is able released after workflow closing. See detailed information in the next section.
2. workflow open, execution and close
Status at 1/5/2001: if a big workflow is open, executed and closed, the memory cost increases after garbage collection, about 30 MB.
3. workflow open, edit and close
Status at 1/5/2001: if a big workflow is open, edited and closed, the memory cost increases after garbage collection, about 20 MB.
Main memory leaks found and fixed for Kepler 2.2
1. memory not release after a workflow is open and closed
Phenomena: each time a workflow is opened and closed, the memory cost increases after garbage collection, about 20 MB for each big workflow.
Reason:
- Several static classes or static variables in normal classes do not remove useless objects after a workflow is closed. The problematic types are usually Vector, Hashtable, Hashmap, and so on.
- Several classes still keep the references of the workflow just closed.
Tracking method:
- Use Jprofile to get ‘Heap Walker’ of class ‘ptolemy.gui.Top$1’ or ‘org.kepler.gui.KeplerGraphFrame’. Its instance count should be for the number of current opened windows plus 1 (the extra instance is for Kepler welcome window). If so, there are objects hold references to the closed windows. Using ‘References’ view of the object and choosing ‘Show Paths To GC Root’ item of right click menu, you will be able to see why the instances for the closed windows can not be garbage collected. For each suspicious GC root, check the logic and variables to identify the reason and solution.
- Other classes (such as org.kepler.provenance.ProvenanceRecorder) can also be used if it is clear about relationship between the number of its objects and the number of Kepler window/workflow. If the object number is larger than expected, it means some objects are probably not released properly.
Solution:
- For static variables like set, list and map, make sure their elements are correctly removed when they are not useful anymore.
- For static listener list, sometimes it is hard to explicitly remove them. If so, using weak reference will enable useless listeners can be garage collected if they are not explicitly removed. More information can be found at using-weakhashmap-for-listener-lists.
- For references to the closed workflow, the corresponding classes have to be re-designed.
2. Database connection cleanup
Phenomena: database connections are created but not closed when not needed or reused for future connections.
Reason: Several classes forget database connection closing after its connections. Yet it should be ok for static objects keep its database connection in life cycle of one Kepler instance. From section 2.1.9: Freeing DBMS Resources: 'Therefore, it is recommended that programmers explicitly close all connections (with the method Connection.close) and statements (with the method Statement.close) as soon as they are no longer needed, thereby freeing DBMS resources as early as possible'.
Tracking method:
- One simple way is to use tool findbugs to go through all source codes. You can find those suspicious classes at 'Bug'->'Bad practice'->'Database resource not closed on all paths' category.
- Class “org.kepler.util.sql.DatabaseFactory” is the main class to get data base connection. One way to know which classes need database connections is to check how functions of DatabaseFactory (such as getDBConnection() and getConnectedDatabaseType()) is called.
Solution:
- Close database connections when they not used any more.
- Reuse them is better in performance if there are many connections need to be created and closed. Java provides Connection Pooling package to support it.
General comments
- Be very clear about the life cycle of each class before implementation. When and how the objects of the class will be created (such as window open, workflow open, workflow execution, etc.)? When and how the objects of the class will be disposed (window closing, workflow execution finish, etc.)? how these objects will map with the objects of other related classes (such as how the objects are mapped to each kepler window and each kepler workflow)?
- Usually, other leaks won't appear until the current found leak gets fixed. So be patient and be prepared for the possible multiple iterations.
- The current version is tested on normal workflows and basic functionalities for Kepler and reporting suite, there may be memory leaks in specific actors or functionalities.
- Ptolemy performance introduction.
- An memory leaks handling article from IBM developerworks web site.
- An memory leaks handling article from Java world web site.
Useful tools
- java jdk commands, including jConsole, jps, jmap and jhat, they are basic tools to check java process and memory info. More information can be found at Monitoring and Managing Java SE 6 Platform Applications. The Java VisualVM in Java6 is especially useful.
- Memory Analyzer (MAT) from Eclipse: a free tool to trace java application performance.
- Jprofile: a very powerful tool to trace java application performance. It's help document on memory leak handling can be found at http://resources.ej-technologies.com/jprofiler/help/doc/helptopics/memory/memoryLeak.html
- findbugs: an open source tool useful to check possible bugs by going through java source codes.
Example of analyzing memory usage/leaks using Java VisualVM
jvisualvm is a free tool distributed with the Java SE 6 JDK version 7 and higher. These steps use 1.6.0_24 (small differences may be noticeable in other versions).
- Verify that you are running java version 1.6.0_07 or higher using "java -version"
- Run the jvisualvm command (make sure it is in your path, it may not be in the same directory as your java and javac commands) try "find . -name jvisualvm -print" on unix based systems or searching for it using windows explorer. If it's not there you'll need to download the full java SE 6 jdk
- After the jvisualvm window opens there is a "Start Page" tab that has links to documentation. You can find lots of info there, such as this tutorial screencast.
- Now run kepler (ant run), you will see an ant process and an org.kepler.Kepler process in the Local Applications in jvisualvm. It will detect them auto-magically. (Make sure you are running the jvisualvm from the same jdk as you are running kepler or you may not see the process or some of the tabs may not show up after the next step!)
- Double click (or right click->Open) the org.kepler.Kepler process and a tab will open with several sub tabs (Overview, Monitor, Thread, Profiler, etc.)
- Click the Monitor tab to see live graphs of CPU, Heap, Classes, and Threads. Notice the Perform GC (Garbage Collection) and Heap Dump buttons in the upper right.
- Click the Heap Dump button to get a snapshot of what is in the Heap at the time you pushed the button. Sub tabs show Summary, Classes, and Instances.
- In this example we'll focus on the org.kepler.KeplerGraphFrame class. Click on the Classes tab of your Heap Dump and at the bottom type in org.kepler.KeplerGraphFrame into the Filter text area and press enter.
- Notice the number of instances of org.kepler.KeplerGraphFrame. There should only be 1 at this time since there is one frame open.
- Back in the Kepler app, go to the File menu -> New Workflow -> Blank. Another Kepler window appears. We expect there to be one more KeplerGraphFrame instance now.
- Perform another heap dump from the Monitor tab and notice there are now 2 instances of KeplerGraphFrame.
- Close the new window in Kepler and perform another Heap dump. Notice the number of KeplerGraphFrame instances is still 2 even though we would expect there to only be 1 now. This is a memory leak. (although hopefully it is fixed by the time you are reading this so the number may be 1).
- To solve this issue a good way to start is by looking at all of the inner classes. Inner classes always contain a reference to their parent so make sure all of those references get released when expected. This step can be complicated by Anonymous inner classes because they show up simply as a number (e.g. org.kepler.KeplerGraphFrame$1) making it almost impossible to determine what they are in the code. So a good first step is to convert all of the anonymous inner classes into inner classes so you can see their name (e.g. org.kepler.KeplerGraphFrame$DeletionListener).
- The next step is to look at all of the classes in the extension hierarchy (which for KeplerGraphFrame there are 6 ActorGraphFrame-> ExtendedGraphFrame-> BasicGraphFrame-> PtolemyGraphFrame-> TableauFrame-> Top-> JFrame). You can probably stop when you hit system classes. Check all of these and inner classes to free up references.
- A useful tool during all of this is the Instances view. In the Heap Dump -> Classes tab right click on the org.kepler.KeplerGraphFrame row and choose "Show in Instances View". Clicking on the different Instances (#1, #2) you can get a sense for what they are by looking at the Fields and References section (notice one of them is marked JNI global - that's probably the first window that was opened). Making sure the second window is closed right click on the first reference field for the instance that is not JNI global (this is the reference that should be released from memory by now). Right click on the reference and choose "Show Nearest GC Root". If you're lucky it will be something recognizable and you can then go track that reference down in the code. However, if it is some listener somewhere you will see the GC root is contained in the D3DContext, at which point you have to go around trying to find listeners in the code that got added to the object at some point!
- Also, in the instances view, you can click the "Compute Retained Sizes" link to get a better idea of how much memory could be freed up if the reference went away.
- Fun Fun!