Friday, September 12, 2008

Recursive acronyms

Anybody with a non-trivial interest in computer science must be familiar with recursive acronyms: acronym whose expansion contains the abbreviation itself. Most famous of them is probably GNU: GNU's Not Unix.
Some other interesting and famous ones:
  • CYGNUS: Cygnus, Your GNU Support
  • LAME: Lame Ain't an MP3 Encoder
  • PNG: PNG's Not GIF
  • WINE: Wine's Not an Emulator
Thanks to Wikipedia for this list.
But one that beats them all is XINU: Xinu Is Not Unix. It is not only a recursive acronym, but also an anagram!


Saturday, September 6, 2008

Weak references

I've been coding in Java for quite some time, but until recently I've never heard of weak references, although it has been part of JDK since version 1.2. Surprisingly, many people who have quite a good programming experience in Java either haven't heard of it or have a very vague idea about it.

Java attempts to solve the problem of dangling pointers and memory-leak by automatic garbage-collection. If an object residing in the heap is not reachable from any variable residing in the stack, the program has no way to use that object and hence, the garbage-collector (GC) can free the memory held by that object. The programmer does not have to worry about the lifetime of the object and free it explicitly (as they have to do in C++).
Unfortunately, the logical set of live objects is sometimes smaller than the set of objects reachable from the stack variables, ie, there can be reachable objects which the programmer does not intend to use any more.
Consider the following scenario: your program creates a number of objects of some class MyClass. You want to associate some external data (eg some id) with each object. One standard way of doing it is to put the <object,data> tuple in a Map. You retrieve the data only when the object (used as key) is still live, ie the program holds some other reference to that object. Now if at some point of time, the object is no more relevant, the program might not keep any reference to that object other than the one maintained in the map. Although the program might never use that object again, GC cannot free that memory allocated to that object, as there is still a reference to it (in the map). If the program keeps on creating such objects and inserts them in the map, it can lead to a memory-leak.
Of course this problem would not occur if the programmer remembers to remove the pair from the map after the last usage of that object, but then the advantages of the automatic garbage-collection will be lost... the programmer has to remember the last usage explicitly. Java offers an elegant solution to this problem through weak references.
A weak reference is basically a Java API java.lang.ref.WeakReference<T>. You can get a weak reference of an object by
WeakReference<MyClass> wr =
new WeakReference<MyClass>(myObj);
You can get back the the original reference using wr.get().
What is interesting about the weak reference is that GC can free the object if all the references to it are weak. In other words, the object can be garbage-collected if it is reachable through weak references only. If the object has been garbage-collected, wr.get() returns null.
Hence, if we use weak references as the key of our map, objects used as keys can be garbage-collected if there are no other references to it and the programmer need not worry about deleting the pair every time an object becomes irrelevant. When we need to get the data from the map (and we have only the strong object reference), we first need to iterate through all the keys and check if any of the weak references in the keys wrap our object reference and then use it to fetch the data. Of course, this is slower than the other implementations of maps eg HashMap.
This usage of weak references as keys to maps is so common that JDK provides a WeakHashMap implementation which can be used just as a HashMap but the keys can be garbage-collected if there are no other (strong) reference to them.
Now, what about the data in the map? While the keys are garbage-collected automatically, the reference to data still resides inside the map and can lead to a memory-leak themselves. It is dangerous to make them weak as the program might not maintain any other reference to them. ReferenceQueue comes to rescue here. A second constructor to WeakReference takes a ReferenceQueue as second argument. When ever an object is garbage-collected, the corresponding weak reference is put into the queue. Once in a while, the program can process the queue and do the necessary cleanup. Again, WeakHashMap does all these things internally.

Three comments before I end this (already long) post:
  1. Java also has Soft and Phantom references. Both of them are related to garbage-collection.
  2. Many other garbage-collected languages like C#, Python etc have some notion of weak references.
  3. Two excellent resources on Java weak reference can be found here and here. This post is highly indebted to them.




Tuesday, September 2, 2008

Chrome and muli-core processors

Google is releasing their own web browser Chrome tomorrow! Here you can know about the cool features of Chrome, described in a perfect Googly way.
One of the main design decisions for Chrome is making it multi-process, that is, each tab runs as a different process. It has many advantages like less memory leak, sandboxing etc. as they have mentioned in the above link.
One aspect they have not mention is enhanced parallelism and the speedup on a multi-core processor due to that. As all modern processors have more than one cores, any application with multiple threads/ processes should be able to take advantage of that by running truly parallelly (in traditional single-core processors, even if the application is multi-threaded/ multi-process, instructions from different threads/ processes are interleaved, not run parallelly). I hope that the processes in Chrome actually spawn native processes and are not just virtual processes managed by some single-threaded virtual machine (like many Java applications).
I'm eagerly waiting to get my hands on it :)

Update: Chrome does spawn native processes for each tab :)