Sunday, May 8, 2011

CONTENTS Introduction Bytecode Encryption- Straightforward But Totally Flawed Name Obfuscation String Encryption Code and Data Flow Obfuscation Strengthening Protection with Ahead-Of-Time Compilation Popular Obfuscators Further Reading Appendices: Obfuscation Examples Impact of Flow Obfuscation on Performance Protect Your Java Code - Through Obfuscators And Beyond

Reverse engineering of your proprietary applications by unfair competition or malicious hackers may result in highly undesirable exposure of your algorithms and ideas, proprietary data formats, licensing and security mechanisms, and, most importantly, your customer's data. Here is why Java is particularly weak in this respect compared to C++:
Target Instruction Set
C++: Compiles to a low-level instruction set that operates on raw binary data and is specific to the target hardware, such as x86 or PowerPC
Java: Compiles to a higher-level portable bytecode that operates on classes and primitive types.
Compiler Optimizations
C++: Numerous code optimizations are performed at compile time. Inline substitution results in copies of the given (member) function being scattered around the binary image; use of the preprocessor combined with compile-time evaluation of expressions may leave no trace of the constants defined in the source code; and so on.
Java: Relies on dynamic (Just-In-Time) compilation for performance improvement. The standard javaccompiler is straightforward, it does no compile time optimizations commonly found in C++ compilers. The idea is to enable the JIT compiler to perform all optimizations at run time, taking the execution profile into account.
C++: Programs are statically linked, and metaprogramming facilities (reflection) are absent in the core language. So the names of classes, members, and variables need not be present in the compiled and linked program, except for names exported from dynamic libraries (DLLs/shared objects.)
Java: Dependencies are resolved at run time, when classes are loaded. So the name of the class and names of its methods and fields must be present in a class file, as well as names of all imported classes, called methods, and accessed fields.
Delivery Format
C++: An application is delivered as a monolithic executable (maybe with a few dynamic libraries), so it is not easy to identify all member functions of a given class or reconstruct the class hierarchy.
Java: An application is delivered as a set of jar files, which are just non-encrypted archives containing individual classes.
As a result, the decompilation of Java programs is a much simpler task compared to C++ and therefore may be fully automated. Class hierarchy, high-level statements, names of classes, methods and fields - all this can be retrieved from class files emitted by the standard javac compiler. Any person of ordinary skills in programming can download a Java decompiler, run your program through it and read the source code almost as if it was open source.
Let's see what can be done to prevent that.