Friday, September 12, 2008

Recursive acronyms

Anybody with a non-trivial interest in computer science must be familiar with recursive acronyms: acronym whose expansion contains the abbreviation itself. Most famous of them is probably GNU: GNU's Not Unix.
Some other interesting and famous ones:
  • CYGNUS: Cygnus, Your GNU Support
  • LAME: Lame Ain't an MP3 Encoder
  • PNG: PNG's Not GIF
  • WINE: Wine's Not an Emulator
Thanks to Wikipedia for this list.
But one that beats them all is XINU: Xinu Is Not Unix. It is not only a recursive acronym, but also an anagram!


Saturday, September 6, 2008

Weak references

I've been coding in Java for quite some time, but until recently I've never heard of weak references, although it has been part of JDK since version 1.2. Surprisingly, many people who have quite a good programming experience in Java either haven't heard of it or have a very vague idea about it.

Java attempts to solve the problem of dangling pointers and memory-leak by automatic garbage-collection. If an object residing in the heap is not reachable from any variable residing in the stack, the program has no way to use that object and hence, the garbage-collector (GC) can free the memory held by that object. The programmer does not have to worry about the lifetime of the object and free it explicitly (as they have to do in C++).
Unfortunately, the logical set of live objects is sometimes smaller than the set of objects reachable from the stack variables, ie, there can be reachable objects which the programmer does not intend to use any more.
Consider the following scenario: your program creates a number of objects of some class MyClass. You want to associate some external data (eg some id) with each object. One standard way of doing it is to put the <object,data> tuple in a Map. You retrieve the data only when the object (used as key) is still live, ie the program holds some other reference to that object. Now if at some point of time, the object is no more relevant, the program might not keep any reference to that object other than the one maintained in the map. Although the program might never use that object again, GC cannot free that memory allocated to that object, as there is still a reference to it (in the map). If the program keeps on creating such objects and inserts them in the map, it can lead to a memory-leak.
Of course this problem would not occur if the programmer remembers to remove the pair from the map after the last usage of that object, but then the advantages of the automatic garbage-collection will be lost... the programmer has to remember the last usage explicitly. Java offers an elegant solution to this problem through weak references.
A weak reference is basically a Java API java.lang.ref.WeakReference<T>. You can get a weak reference of an object by
WeakReference<MyClass> wr =
new WeakReference<MyClass>(myObj);
You can get back the the original reference using wr.get().
What is interesting about the weak reference is that GC can free the object if all the references to it are weak. In other words, the object can be garbage-collected if it is reachable through weak references only. If the object has been garbage-collected, wr.get() returns null.
Hence, if we use weak references as the key of our map, objects used as keys can be garbage-collected if there are no other references to it and the programmer need not worry about deleting the pair every time an object becomes irrelevant. When we need to get the data from the map (and we have only the strong object reference), we first need to iterate through all the keys and check if any of the weak references in the keys wrap our object reference and then use it to fetch the data. Of course, this is slower than the other implementations of maps eg HashMap.
This usage of weak references as keys to maps is so common that JDK provides a WeakHashMap implementation which can be used just as a HashMap but the keys can be garbage-collected if there are no other (strong) reference to them.
Now, what about the data in the map? While the keys are garbage-collected automatically, the reference to data still resides inside the map and can lead to a memory-leak themselves. It is dangerous to make them weak as the program might not maintain any other reference to them. ReferenceQueue comes to rescue here. A second constructor to WeakReference takes a ReferenceQueue as second argument. When ever an object is garbage-collected, the corresponding weak reference is put into the queue. Once in a while, the program can process the queue and do the necessary cleanup. Again, WeakHashMap does all these things internally.

Three comments before I end this (already long) post:
  1. Java also has Soft and Phantom references. Both of them are related to garbage-collection.
  2. Many other garbage-collected languages like C#, Python etc have some notion of weak references.
  3. Two excellent resources on Java weak reference can be found here and here. This post is highly indebted to them.




Tuesday, September 2, 2008

Chrome and muli-core processors

Google is releasing their own web browser Chrome tomorrow! Here you can know about the cool features of Chrome, described in a perfect Googly way.
One of the main design decisions for Chrome is making it multi-process, that is, each tab runs as a different process. It has many advantages like less memory leak, sandboxing etc. as they have mentioned in the above link.
One aspect they have not mention is enhanced parallelism and the speedup on a multi-core processor due to that. As all modern processors have more than one cores, any application with multiple threads/ processes should be able to take advantage of that by running truly parallelly (in traditional single-core processors, even if the application is multi-threaded/ multi-process, instructions from different threads/ processes are interleaved, not run parallelly). I hope that the processes in Chrome actually spawn native processes and are not just virtual processes managed by some single-threaded virtual machine (like many Java applications).
I'm eagerly waiting to get my hands on it :)

Update: Chrome does spawn native processes for each tab :)


Thursday, August 28, 2008

Prof. Haritsa's views on research

I recently bumped upon this interesting link on Prof. Jayant Haritsa's views on research. Although it is supposed to be Prof. Haritsa's views on MS vs PhD, much of it discusses his views on research life in general. As he says in the beginning, he is heavily biased towards a traditional notion of academics.
I don't think I have enough experience to comment on this topic. The last part is worth quoting:
Finally, in my view, ideal Phd is one where *you* come up with the problem, work alone (modulo advisor) and single-handedly write a definitive thesis on the issue.
Still considering grad school?

P.S. My personal interaction with Dr. Haritsa was primarily during the DBMS course he offers. Highly recommended if you end up at IISc!






Sunday, August 24, 2008

Perlis' epigrams

Alan J. Perlis was the first Turing award winner, "for his influence in the area of advanced programming techniques and compiler construction". He is also famous for his article Epigrams on Programming, which contains 130 one-liners, mostly about his experience with programming. A few of the epigrams are on epigrams... he calls them meta-epigrams (Russell's paradox?)!
Five pearls I found from the article (go dig your own):
  • A language that doesn't affect the way you think about programming, is not worth knowing.
  • There will always be things we wish to say in our programs that in all known languages can only be said poorly.
  • The computer is the ultimate polluter. Its feces are indistinguishable from the food it produces.
  • Computer Science is embarrassed by the computer.
  • The last epigram? Neither eat nor drink them, snuff epigrams.

Tuesday, August 19, 2008

Geeky quotes

Thanks to Sumit for the link.


Here is a link to a bunch of (funny and/or insightful) programming quotes. Some of them are by well known personalities, but some are from Anonymous blah blah.
My favorite:
Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.
Brian W. Kernighan

Sunday, August 17, 2008

Telling an SMT solver to tell them apart

Thanks to Jacob Burnim for this post.


A few days back, I was trying to write a sudoku solver using an SMT solver. The idea is to express the constraints of a sudoku board in a language that an SMT solver can understand. The solver either gives a model from which the solution can be extracted or it declares the constraints to be unsatisfiable in which case no solution exists.
The SMT solver I was using was Yices. I first used the following assertion to tell Yices to make all the values in a column distinct:
(define-type single_dig (subrange 0 9))
(define board::(-> single_dig single_dig single_dig))

(assert
(forall (n1::single_dig n2::single_dig
m::single_dig)
(or (= n1 n2) (/= (board n1 m) (board n2 m)))
))

Neat and compact! But wait... when you give it to Yices, it cowardly backs off, returning unknown! OK, fine, too many quantifiers... lets ask it to do it for each row separately. But no... still no luck! Notice that the subrange we are working on is finite, but somehow Yices fails to understand that. After trying many different expressions with quantifiers, I gave up (Yices is really bad with quantifiers).
And then came Jacob with a smart trick. Just define a function that maps the board values to different integers:
(define f::(-> single_dig single_dig))
(assert (and
(= (f (board 1 1)) 1)
(= (f (board 2 1)) 2)
(= (f (board 3 1)) 3)
(= (f (board 4 1)) 4)
(= (f (board 5 1)) 5)
(= (f (board 6 1)) 6)
(= (f (board 7 1)) 7)
(= (f (board 8 1)) 8)
(= (f (board 9 1)) 9)))
And voila! It automatically makes all board values distinct. Yices does not complain any more. Of course this is not too compact and the model returned by the solver becomes a bit clumsy, but it works! Looks like this is a standard trick to make a finite number of values distinct.