Sunday, May 8, 2011

Java Decompilers

 This article isn't anwhere near current but is still valid as general commentary on java decompilers.  I'm not planning to re-review the modern crop, but I will note new programs as I become aware of them.

and are they worth worrying about

Generally, the object of decompiler is to accept a java .class file as input, and produce a compilable source file as its result. In the chaotic world of software development there are many reasons, legitimate and otherwise, to wish for such a tool. The transparent and information-rich structure of java .class files, which makes Java's dynamic linking so much better than previously common models, also makes such tools particularly easy to build.
What tools are available?

All of these tools are pure java, so the essential distribution consists of a java class library and instructions to invoke it. They're all a littly quirky to set up and use, a characteristic shared by many standalone java applications. Once set up, they more or less "just work", producing output that is nearly ready for the compiler.
Mocha is free. Unfortunately, its author is prematurely deceased, and its future as a product is in doubt.
WingDis, is a product of Wing Software. A "crippleware" demo is available. [April 2002, still available and supported]
DejaVu, is distributed as part of the OEW developement environment, but appears to be completely independant of it. OEW is available as a free trial, and DeJaVu continues to be functional when the trial has expired. I've reviewed OEW separately.
[April 2002, Open source decompiler Jreverse Pro]
Testing method

I chose a small utility library, consisting of about 15 classes, as my standard test set. I compiled the library using JDK 1.02, with -o and without -g. I decompiled with all three decompilers, then manually edited the decompiled sources until they could be successfully recompiled. I then decompiled these three sets of "second generation" binaries, with each of the three decompilers, yielding nine sets of "third generation" sources. I then manually compared various pairs of sources, looking for inconsistancies which might indicate incorrectly decompiled code. Since this was a "only a test", I had the luxury of referring to the original sources, and the double luxury that I wrote these sources myself; two advantages that would not generally be available to anyone using a decompiler in earnest.
The test set was not specifically designed to validate or torture decompilers, and there is no way to know if the results here are representative of all classes, or if the list of problems encountered is complete. It should, however, give you some idea what to watch for.

I organized decompilation errors according to the taxonomy below, based on the general idea that easy-to-spot and easy-to-fix errors were less significant than hidden or hard to fix errors. The very worst thing a decompiler can do is produce code that passes through a compiler without complaint, but which is not functionally equivalent to the original code.

Error Taxonomy

Class 1 errorsClass 2 errorsClass 3 errorsClass 4 errorsClass 5 errorsClass 6 errors
general descriptionflagged by compiler, easily fixedflagged by compiler, not easily fixed.Ugly, Incomprehinsible, but correct code.Suble misprints. Subtly Incorrect programsTotal failure.Gross errors Severly damaged semmantics No warning, and hard to identify
exampleboolean variable incorrectly identified as intmissing, but trivial type cast.generating code containing gotounreconstructed flow conrolunreconstructed use of + for string appendfailing to use \ to escape characters in string constantsmisprinting character constantscrash without producing outputmisuse or non-use of "this."other patently incorrect code

Decompiler Errors by type

Class 1 errorsClass 2 errorsClass 3 errorsClass 4 errorsClass 5 errorsClass 6 errors
version beta 1
a fewnononoyes, mocha crashes on some class
version 2.06
just onenooveruse of if(x!=false) and similar constructionnonomisuse or non-use of of "super." 
mistranslation of x=a++; to a++; x=a;
version 1.0
a fewnomajor problem with flow analysisyesnono