Saturday, September 6, 2008

Weak references

I've been coding in Java for quite some time, but until recently I've never heard of weak references, although it has been part of JDK since version 1.2. Surprisingly, many people who have quite a good programming experience in Java either haven't heard of it or have a very vague idea about it.

Java attempts to solve the problem of dangling pointers and memory-leak by automatic garbage-collection. If an object residing in the heap is not reachable from any variable residing in the stack, the program has no way to use that object and hence, the garbage-collector (GC) can free the memory held by that object. The programmer does not have to worry about the lifetime of the object and free it explicitly (as they have to do in C++).
Unfortunately, the logical set of live objects is sometimes smaller than the set of objects reachable from the stack variables, ie, there can be reachable objects which the programmer does not intend to use any more.
Consider the following scenario: your program creates a number of objects of some class MyClass. You want to associate some external data (eg some id) with each object. One standard way of doing it is to put the <object,data> tuple in a Map. You retrieve the data only when the object (used as key) is still live, ie the program holds some other reference to that object. Now if at some point of time, the object is no more relevant, the program might not keep any reference to that object other than the one maintained in the map. Although the program might never use that object again, GC cannot free that memory allocated to that object, as there is still a reference to it (in the map). If the program keeps on creating such objects and inserts them in the map, it can lead to a memory-leak.
Of course this problem would not occur if the programmer remembers to remove the pair from the map after the last usage of that object, but then the advantages of the automatic garbage-collection will be lost... the programmer has to remember the last usage explicitly. Java offers an elegant solution to this problem through weak references.
A weak reference is basically a Java API java.lang.ref.WeakReference<T>. You can get a weak reference of an object by
WeakReference<MyClass> wr =
new WeakReference<MyClass>(myObj);
You can get back the the original reference using wr.get().
What is interesting about the weak reference is that GC can free the object if all the references to it are weak. In other words, the object can be garbage-collected if it is reachable through weak references only. If the object has been garbage-collected, wr.get() returns null.
Hence, if we use weak references as the key of our map, objects used as keys can be garbage-collected if there are no other references to it and the programmer need not worry about deleting the pair every time an object becomes irrelevant. When we need to get the data from the map (and we have only the strong object reference), we first need to iterate through all the keys and check if any of the weak references in the keys wrap our object reference and then use it to fetch the data. Of course, this is slower than the other implementations of maps eg HashMap.
This usage of weak references as keys to maps is so common that JDK provides a WeakHashMap implementation which can be used just as a HashMap but the keys can be garbage-collected if there are no other (strong) reference to them.
Now, what about the data in the map? While the keys are garbage-collected automatically, the reference to data still resides inside the map and can lead to a memory-leak themselves. It is dangerous to make them weak as the program might not maintain any other reference to them. ReferenceQueue comes to rescue here. A second constructor to WeakReference takes a ReferenceQueue as second argument. When ever an object is garbage-collected, the corresponding weak reference is put into the queue. Once in a while, the program can process the queue and do the necessary cleanup. Again, WeakHashMap does all these things internally.

Three comments before I end this (already long) post:
  1. Java also has Soft and Phantom references. Both of them are related to garbage-collection.
  2. Many other garbage-collected languages like C#, Python etc have some notion of weak references.
  3. Two excellent resources on Java weak reference can be found here and here. This post is highly indebted to them.




3 comments:

AG said...

Never read about these `Weak references' earlier. But, they seem to be useful in very specific situations (like the example u gave), and more than that it would require some thinking on a programmer's part regarding their usage!!

Arnab De said...

@AG: Weak reference is one of the very few features where you need to peek "under the hood" of the JVM. There are few more.
I don't know of any other realistic use of weak reference than WeakHashTable, but WeakHashTable is used in many places.

Yo said...

Consider an event producer, a event handler and an object O that the handler operates on.

The producer needs a reference to the event handler (to broadcast events), and the event handler needs a reference to O.

Now when we're done using O, we should not be prevented from Garbage-collecting O, just because the producer is still producing (perhaps used by other consumers). Therefore either the reference from the producer to the handler, or from the handler to the object O needs to be weak.