About Java:-
Initially java language is named as "Oak" in 1991, which is designed for the consumer electronic appliances. Later in 1995 the name was changed to Java. Java was developed by James Gosling, a development leader in sun micro system. Oak was redesigned in 1995 and changed the name to java for the development of the applications which can be run over internet. Using the java language, java programs can be embedded in to the html pages. Java is not only limited for the web applications, it is also useful to develop the stand alone applications. Java has a feature called OOPs, which make it more familiar. Object oriented programming replaced the old traditional techniques i.e. procedural programming.
Characteristics of java:-
Simple:-
Java language is simple than the previous languages such as c and c++. Java eliminates the pointers concept which is earlier present in c and c++. Java also has a properties i.e. automatic allocation of memory and garbage collection, where as in c/c++ the garbage collection and allocation of memory will be done by the programmer which is a complex task.
Object oriented:-All the programming languages apart from the c++ are procedural languages which are paradigm of procedures. Java programming language is object oriented because java uses the concept of the object. In java everything will depend on objects i.e. creating the objects and making objects to work together. The overall functionality of the high level program will depends on the objects. Because java is object oriented program it provides great range of reusability, modularity and flexibility.
Distributed:-Java uses the http and ftp which are internet protocols, in order to have access the files over the network. So by using this libraries which are in java can easily make file transfers over the network which is connected to internet.
Interpreted:-In order to run the java programs we need interpreter. When the java programs are compiled it produces the byte code, which is machine understandable language. The byte code which is produced after the compilation is machine independent, so that it can run on any system using java interpreter. Most of the compilers will convert the high level language instructions to the low-level machine understandable language as machine can't understand the high level instruction. The machine code can only be executed on that compiled native machine. For example a source code is compile on windows platform, the executable file produced after can't be executed on other platforms apart from the windows. But, coming to java it is different i.e. the source code is compiled once and the executable byte code can be run on any platform using java interpreter. The main functionality of the interpreter is, it converts the byte code to the machine language of the target machine.
Robust and secure:-Java programming is more reliable. At the time of the execution time java shows all the errors. In java bad and error prone language constructs are eliminated. Java eliminated the concepts such as pointers, due to this there is no corruption of data and overwriting the memory locations. In the same way java supports the exception-handling, which makes java more reliable and robust. Java forces the programmer to write the code for the exceptions, which may occur during the execution of the program. So that program can be terminated successfully, without any error stopping the execution flow of the program. Java also provides the lot of security. Security is important over the network because the computer will be attacked by the external program. Java provides the security that; it encounters the applets for the un-trusted sources.
Architecture- neutral:-Java is a interpreted language, which enables java as a architectural neutral i.e. platform independent. We can write the program once and it can be executed on any platform with the help of the Java Virtual Machine (JVM).
The java virtual machine can be embedded on the operating system or on web browser. Once the part of the java code is loaded into the machine, it is verified. Byte code verification play a major role, as it check all the code generated by the compiler will not corrupt the machine on which the code is loaded. At the end of the compilation, byte code verification will be done; in order to make sure that's the code is accurate and correct. So the byte code verification is the integral to the compilation and execution. Due to the property of architectural neutral had by java, it is portable. The program once written can be run on any platform without recompilation. Java does not provide any platform specific features. In other languages, such as Ada where the large integer varies according to the platform it runs. But in the case of java the range of the numbers are fixed. Java environment is portal to every operating system and hardware.
Multi-threaded:-It is defined as the programs ability to perform several tasks (or) functions simultaneously. The multithreading property is embedded in the java program. Using the java programs we can perform the several tasks simultaneously without calling any procedures of the operating system, which is done by the other programming languages in order to perform the multi-threading.
Constant Pool:-Every program i.e. class in java, has a array of constants in the heap memory called as the constant pool, which is available to that class. Usually it is created by the java compiler. The constants encode all the name of the (methods, variables and constant that are presented in the constant pool) which is used by particular method of any class. Each individual class i.e. stored in heap memory has a count of how many constants are there and also has offset "which specifies how far in to the class description itself the array of constants begins" (Laura Lemay, Charles L.Perkins, and Micheal Morrison, n.d). The constants are represented (or) typed in the special coded bytes and which has a very well defined format, when these constants are appeared in the .class file for the java class file. JVM instructions refer to the symbolic information in java, rather than relying on the run time layouts of the class, methods and fields.
Sun Java Wireless Toolkit:-Sun java wireless toolkit CLDC (connected Limited Device Configuration) is a group of tools which is used to develop the applications for the mobiles and for other wireless equipments (or) devices. Although the sun java wireless toolkit is based on the MIDP (Mobile Information Device Profile), it also supports many other optional packages, which make a sun java wireless toolkit as a great tool for developing many applications. It can be supported on the windows and Linux. All the users who have account on the host machine can access this tool either singly or simultaneously. It allows you to use a byte code obfuscator to reduce the size of your MIDlet suite JAR file. It also supports many other standard Application Programming Interfaces (API's) which are defined by the (JCP) Java Community Process program.
Even though, the sun java wireless toolkit did not come up with an obfuscator, it is configured in a way that it supports the ProGaurd. All you need to do is, just simply to download the ProGuard and place it in the system, which sun java wireless tool kit can find it. But due to the flexible nature of the tool, it allows any kind of the obfuscator.
BCEL:-BCEL full abbreviation is Byte Code Engineering library. The BCEL helps you to dig the byte code of the java classes. BCEL gives the utmost power on the code because it works at the individual JVM instructions, even though the power comes with cost in complexity. Using the BCEL, we can transform the existing classes' transformation or we can construct the new classes. The main difference between the BCEL and Javassist is javassist provides the source code interface where as the BCEL is developed in the intension to work at the level of the JVM assembly language. BCEL is good because the approach it uses is low level, which is very helpful to control the program at the instruction level. Compared to Javassist it is more complex to work with the BCEL.
BCEL has the capability to inspect, to edit and to create binary classes in java. There are 2 hierarchy components in the BCEL, in which one component is used to create the new code and the other component is used to edit (or) update the existing code. The inspection of the class aspect in the BCEL mainly deals with the duplication whatever available in the java platform using the Reflection API. This duplication is necessary (or) mandatory in classworking because we generally don't want to load the classes on which we are working until they are modified fully. Org.apache.bcel.classfile package provides all the definition which is related to inspection-related code.org.apache.bcel package provides the basic constant definitions. JavaClass is a class which is the starting point of the package. The JavaClass plays a role in accessing the information of the class using the BCEL same as like java,lang.Class does using the regular reflection in java. The JavaClass has a methods to get the information like structural information about the super classes and interfaces, to get the information of the class i.e. information about the field and methods in the class. The JavaClass will provide access to the some internal information about the class, including constant pool and identifiers. It also represents the Byte stream which is the complete binary class representation. If the actual binary class is parcel, then we can create the instance for the JavaClass. To handle the parsing BCEL provides a class called org.apache.bcel.Respository. The representation of the classes are parsed and cached by the BCEL by default, which are on the JVM path, to get the actual binary classes representation from the org.apache.bcel.util.respository instance. org.apache.bcel.util.respository is an interface which is source for binary classes representations.
Changing the classes:-Not only the accessing the components of class, org.apache.bcel.Classfile.JavaClass also provide certain methods, in order to provide the liberty to change (or) alter the classes. The class component can be set to the new values by using those methods. Although those are of no direct use much, because the other classes in the package don't support constructing the new versions of the components that are building. There are certain classes in the org.apache.bcel.generic package that will provide the editable versions of the same components there in the org.apache.bcel.classfile classes. Org.apache.bcel.generic.ClassGen is the starting step (or) point for the creating the new classes. This also useful to modify the existing classes, to do this one, there is a constructor that takes a JavaClass Instance in order to initialize ClassGen class information. Once you modified the changes to the class, then we get the usable (or) useful class representation from ClassGen instance, in order get the usable representation of the class, we need to call any method that returns the class called JavaClass. Later it will be converted into the binary class information. It is little bit confusing, in order to eliminate this confusion, it is better to write a wrapper class for eliminating some differences.
In order to manage the construction of the various class components, org.apache.bcel.generic provides many other classes apart from the ClassGen. It has a class called ConstantPoolGen , which is used to handle the constant pool. FieldGen, MethodGen classes which are used to handle the Fields and the methods in classes. For the working with the sequence of the JVM instructions there is other class called Instruction List. org.apache.bcel.generic also provides the classes for the each and every type instructions which are executed over JVM. We can create the instance for these classes directly some times and in other times by using the helper class called org.apache.generic.InstrcutionFactory. The main advantage of this helper class is, it handles are the book keeping details of the each and every instruction constructing for us( i.e. adding the items to the constant pool as required for the instructions).
Sand Mark:-Sandmark is a tool i.e. developed to measure the performance of the software protection algorithms and effectiveness of the methods that are preventing the software from the piracy issues, water tampering and reverse engineering techniques. Sandmark is also has an ability to find which algorithm is most resilience's to the attacks and have a least performance of over head. There are many software protections are proposed both in software and hardware. The hardware protections are there from the dongle protection and now tamper-proof software. The sandmark tool is developed to evaluating and implementing the software-based techniques such as code obfuscation (making code complex to understand) and water tampering.
History of reverse engineering:-Reverse engineering most probably starts with Dos (disk operating system) based computer games. The aim is to have full life and armed for the player to finish the final stage of the game. In that way the technique of reverse engineering came in to picture, it is just to find the memory locations where the life and number of weapons are stored and modifying the values of that memory locations. So that, the player can changes the values and gets through the final stage and win the game. That's why memory cheating tools such as game hack came in to existence.
Reverse Engineering:-Reverse engineering is the process of the understanding the particular aspects of the program, which are listed below To identify the components of the system and the interrelationship between the components. And enhance the components of the system and to improve the performance and scalability of the system (or) subsystem. Software reverse engineering is a technique that converts a machine code of a program (string 0's and 1's usually sent to logic processor) back in to the programmable language statements which is called as source code. Software reverse engineering is done to get the source code of the program because to know how the particular parts of the program performs particular operations in order to improve the program functionality or to fix the bugs in the program or to find malicious block of statements in the software if any. Generally, this reverse engineering will take place in older industries on machines. But now it is frequently used on computer hardware and software. The important contents like data formats, algorithms what the programmer used to implement the software and ideas of the programmer (or) company will be revealed to the 3rd person by violating the security and privacy issues using reverse engineering technique.
"Reverse engineering is evolving as a major link in the software lifecycle, but its growth is hampered by confusion" (Elliot J.chilkofsky & James H.Cross ii, Jan 1990).
Reverse engineering is generally implemented to improve the quality of the product, to observe the competitors products. Forward engineering is the process of moving from the high level abstracts (or) from the initial requirements stage (objectives, constraints and proper solution to the problem), logical, and independent designs (specification of the solution) to the final product i.e. implementation (coding and testing).; whereas the reverse engineering is the process of moving from the final product to the initial requirements stage in order to under the system logically, why particular function (or) action is being performed. By knowing the system logically, the flaws and errors in the system can be rectified and helps to improve the systems functionality when the source code of the application is not available. For this sake the concept of the reverse engineering techniques is evolved.
Fig 1: reverse engineering and related process are transformations between or within the abstract levels, represented here in terms of life cycle phases. (Elliot J.chilkofsky & James H.Cross ii, Jan 1990)
Reverse engineering in and of itself doesn't mean changing the subsystem or developing the new system based on the existing. It is a process of examination (or) understanding the program (or) software but not replication (or) change. Reverse engineering involves very broad range of aspects such as starting from the existing implementation, recreating or recapturing the design ideas and extracts the actual requirements of the existing system. Design recovery is the most vital subset of the reverse engineering because in which knowledge of the domain, external (or) outer side information and deduction or fuzzy reasoning are added to the investigated (or) subjected system in order to find the high level abstract of the system, normally which is not obtained by directly observing the system. According to the Ted BiggerStaff: " ...
Students Paper:... Ted BiggerStaff: "design recovery recreates design abstractions from a combination of code, existing design documentation(if available), personal experience, and general knowledge about problem and application domains. Design recovery must reproduce ...
Re-engineering is termed as renovation and reclamation, is the examination and altering the subjective system again to construct in the new form and the implementation of the new system. Re-engineering involves some form of reverse engineering i.e. to obtain the high level of the abstract of the existing system followed by forward engineering. This may be changes according to the new requirements that were not previously implemented in the system. While re-engineering is not super type of the forward engineering and reverse engineering but it uses the forward engineering and reverse engineering.
Objectives:-The primary goal of the reverse engineering is to enhance the overall comprehensibility of the system for the both maintenance and new development.
Cope up with the complexity. In order to meet the complexity and shear volumes of the system we have to develop a better methods i.e. automated support. In order to extract the relevant information reverse engineering methods and tools should be combined with the CASE environments. So that decision makers can control the process and product in system evolutions.
Alternative views should be generated. Comprehension aids such as graphic representation as been accepted for long time. However maintaining and creating them is becoming difficult in the process. Reverse engineering facilitates the generation or regeneration of the graphical representation in the other forms. While many designers work on single diagrams such as data flow diagrams where as the reverse engineering tools will give the other graphical representations such as control flow diagrams, entity relation diagrams and structure charts to aid the review and verification process.
To identify the side effects. Both haphazard initial design and intentional modifications to the system can lead to unintentional ramifications and side effects that affect the system performance. Reverse engineering can provide better observation than we can observe by forward engineering perspective. So it makes us to solve that ramifications and anomalies before users intimate them as bugs. Component reuse. Software reusability is becoming the more essential part in developing the new products in the software field. Reverse engineering can be able to help to detect the candidates for reusable components from the present system.
To recover the lost information. When the continuous evolution of the long lived system which will lead to loss of information. In order to preserve the old information of the system design; "design recovery "of reverse engineering techniques is used. Many reverse engineering tools try to extract the structure of the legacy systems with the intension to pass this information to software engineers in order to re-engineer or to reverse engineer the existing component.
Code reverse engineering:-During the evolution of the software, many changes will apply to the code, to add any functionality which is to be added and to change the code in order to rectify the defect and enhance the systems performance (or) quality. Systems with the poor documentation only the code will be reliable solution to get information about the system. As a result, the process of reverse engineering is focused on understanding the code.
Thus reverse engineering has good and bad ends.
Obfuscation:-Java provides platform independence to the software programs so that software programs will run independently on any platform. All the programs are compiled in order get intermediate code format i.e. ...
A class file consists of a stream of ... ... very large amount of information regarding the program methods, variable and constant enough to do reverse engineering. When a company develops the program (or) software in java and sell this product in intermediate code format to the other organization by not giving the original software. The organization who buys the program (or) software will simply change (or) modify the software by violating the security and privacy issues of authorised company; by simply applying the reverse engineering technique. This reverse engineering will be done by the software developers, automated tools and decompilers. Java byte code can be easily decompiled, which makes reverse engineering technique easier in java.
In programming context Obfuscation is described as, making program code more difficult to read and understand for security and privacy purposes of the software. Decompilers can easily extract the source code from the compiled code, in that point of view protecting the code secretly will make impossible. So the growth of obfuscators increased rapidly in order to keep effectively smoke screen around the code. Code obfuscation is the one of the most prominent and best method to protect the java code securely. Code obfuscation makes program to understand difficult. So that code will be more resistant to the reverse engineering.
There are 2 byte code obfuscation techniques that are
Source code technique is simply changing the source code of the program, where as byte code obfuscation is changing the classfile of the program (functionality is same as the source code).
There several obfuscation techniques to prevent java byte code from decompilation.
For example consider a set of class files, S, becomes another set of class files S' through an obfuscator. Here the set of class files of s and s' are different, but they produce the same output.
Example:-
By observing the above code the class name OHello is changed to the aa and the gHello method name is changed to the aa. It is more difficult to read the program with aa than a OHello. By this way less information will be interpreted and understand to the reverse engineers. This is just a simple example by renaming the class variables and class method names.
Categories of obfuscation techniques:-
Description of Obfuscation techniques:-
One way of obfuscating the source program by the obfuscators is replacing a symbol of a class file by illegal string. The replacement might be the private are even worst ***.
Other techniques usually obfuscator will use targeting the specific decompilers (Mocha and Jode) is inserting a bad instruction in the code.
The example is
Let us taken an example with bad instruction, let's take the original code (decompiled): Method void main(java.lang.String[]) 0 new #4 3 invokespecial #10 6 return and after obfuscation the code is as follows (names are not changed, not to make complex): Method void main(java.lang.String[]) 0 new #4 3 invokespecial #10 6 return 7 pop By observing the above routine we notice that a pop instruction is added after the return statement. The last and final statement in the method that has return type should be return statement, but in the above routine a pop keyword is inserted which make the routine not to be executed for ever.
Lexical obfuscation:-Lexical obfuscation changes the lexical structure of a program by scrambling the identifiers. All the names of classes, fields and methods which are meaningful symbolic information of java program, is renamed with meaningless name i.e. useless names. An example obfuscator for lexical obfuscation is crema. Obfuscator is defined as the program that automatically makes the transformation in the classfile in order obfuscate the classfile, to undo the reverse engineering technique to produce the source code from the class file.
Layout obfuscation:-Layout obfuscation dealt with changing the layout structure of the program i.e. done by 2 basic methods
Above 2 will make program code less informative to the reverse engineers. Layout obfuscation techniques use the one way functions such renaming the identifiers by random symbols, removing the comments, unused methods and debugging information. Though the reverse engineers can understand the obfuscated code i.e. done by layout obfuscation, it consumes the cost of reverse engineering. Layout obfuscation techniques are most commonly used in the code obfuscation. All most all obfuscators of java will use these techniques.
Control obfuscation:-Changing the control flow of the program. It is easiest way to do and which make reverse engineer to find the code what exactly. For example consider a code in which a there is a method A(). Here another new method called A_Dummy() will be created and in the program
Data Obfuscation:-Data obfuscation mainly deals with breaking up the data structures used in the program and encrypting the literals. This includes changing the inheritance, restructuring the arrays, making the variable names constant etc. In that way data obfuscation affect the data structures of the program. Thus data obfuscation make impossible to obtain the original source code of the program. More viable source code obfuscation methods are based on composite functions, which are Array Index Transformation, Method Argument Transformation, and Hiding Constant. The obfuscation techniques that are based on composite functions make the computation complex and extensive use of these techniques make the software to respond slowly. Some source code obfuscation methods are directed at the object oriented concept; Class Coalescing, Class splitting, and Type Hiding. Other source code obfuscation techniques may include; false refactoring, restructure arrays, inline and outline methods, clone methods, split variables, convert static to procedural data, and merge scalar variables. The obfuscation techniques that work over object oriented concept and other techniques like restructure arrays, split variables, merge scalar variables may distort the logic of the software, so these must be carefully used. The employment of obfuscation technique like outline methods, clone methods, convert static to procedural data increase the size of a class file without providing any significant advantage. In lining a method results in an unresolved method call when some other class calls the in lined method.
Advanced obfuscation techniques for byte code:-There are several obfuscation techniques to prevent java byte code from de-compilation. Many of these tools are simply to change the names of the identifiers with the meaningless names which are stored in byte code. Many crackers can understand the actual source code, even though identifier name are changed, but it will take more time to understand.
Traditionally, when a program is compiled to machine code, most of the symbolic information will be stripped off, after the compilation of the program. When the program is compiled, the address of the variable and functions of the program will be denoted by the identifiers. Even though de-compilation of such compiled code is difficult, but still it is possible to decompile the code. We say protection techniques are difficult if and only if the time and effort taken by the cracker to crack the software should be with more cost and effort. Cracking time to crack software is more than a re-writing a program, then it's of no use and waste of time and valueless.
Java became the most popular because of benefits that it is providing. One of the major benefits is portability i.e. compiled program can run on any platform i.e. platform independent. When the program is compiled it produces independent byte code. Java uses the symbolic references rather than the traditional memory addresses. Therefore, the names of methods and, variables and types are stored in a constant pool with in a byte code file.
There are many commercial de-compilers (P & C, 2001, Vliot 1996, hoeniche 2001 etc.). When the program is decompiled, it extracts the program almost identical to the source code. Making use of decompiler to extract the source code becomes the lethal weapon to intellectual property piracy.
Obfuscation technique is used to stop de-compilation of the byte code. The main aim of obfuscation technique is to make decompiled program harder to understand i.e. more time and effort to understand the obfuscated code.
Obfuscation scope:-Java application consists of one or more packages. A programmer might divide the program in to packages. He can also use the packages that are in standard library and proprietary libraries. Only the part of the program developed by the developer will be given outside. The proprietary library is not distributed due to the copyright restrictions. Obfuscation scope termed as the part of the program obfuscated by the obfuscation techniques, i.e. the part of the program/software developed by the developer is protected not the entire software. The package that serves as the utilities for the standard library and proprietary libraries not obfuscated.
Candidates considered for identifiers scrambling:-An identifier will denote the following terms in java
https://www.cis.nctu.edu.tw/~wuuyang/papers/Obfuscation20011123.doc
... the bytecode file. By default, parameters and local variables are stripped and ...
... deleted (or) removed from the byte code. The names of the local variables and parameters are stored in the LocalVariableTable in the byte code, if the debug info is enabled. But, by default the de-bug info is enabled in java compiler. If the local variable is not found, de-compilers itself create the names for local variable and parameter, which makes program after reverse somewhat understandable. Even, if we rename the names of the variables and parameter in LocalVariableTable, good decompiler will simply ignore the re-named names and creates the new names, decompile and extract the program same as the actual program. Since the parameter and local variables are not treated as identifier by describing the reasons in the above paragraph, de-compilers are successfully extract the source code by simply creating the new names.
When the application in java is executed, ...
Students Paper:
... is executed, JVM dynamically loads and links the referenced types in to ...
https://www.cis.nctu.edu.tw/~wuuyang/papers/Obfuscation20011123.doc
... virtual machine (JVM) dynamically loads and links the referenced types into the ...
... in to the runtime environment. The symbolic references locate the referenced types which are stored in the byte code file i.e. fully qualified names of class or an interface. So this symbolic references cannot be changed i.e. can't be obfuscated. Entities which denote the standard libraries and proprietary libraries should not be obfuscated.
The following four groups of the entities should not be obfuscated, that are
When a package is in obfuscation scope, some parts of the package should be kept outside the obfuscation scope. For example, the main method of the program is entry point to that program to execute that program, so the name of the main method should be retained. An proprietary library may export certain ...
Students Paper:export certain types and methods as the interface of the library. So ...
https://www.cis.nctu.edu.tw/~wuuyang/papers/Obfuscation20011123.docThe GUI package of the java uses the call back function mainly in event handling model. When a caller of a instance method M which acts as a call back function is not in the obfuscation scope, then the method M should not be obfuscated. This is due to caller function can't find the method M, if the name of the method is obfuscated. On the other case if the caller is also in obfuscation scope, then symbolic references are changed to new, name of the obfuscated method i.e. M. Then the name M can be obfuscated. All the call back functions that should retain the name will come under the exception group 1 ...
Students Paper:... group 1 and 2.
Statically fields, nested types and static methods are resolved java ...
https://www.cis.nctu.edu.tw/~wuuyang/papers/Obfuscation20011123.doc... the group 2.
Fields, static methods, and nested types are statically determined at ... ... resolved java compiler. JVM will not change any resolution, once the byte code file generated. Therefore fields, nested types and static methods are changed arbitrarily if they are in obfuscation scope. J-T. Chan, W.Yang stated that, is to re-use the identifier as many times as we can. By this the reverse engineer is confused because identifiers are overloaded heavily. After decompiling the source code, the reverse engineer can't understand the functionality of the program just by the names. The reverse engineer can understand the program, if he is able to understand the context of the identifier, which is difficult to understand, if identifier is heavily used. One more advantage is the size of the byte code will be decreased by using shorter and fewer names.
Copy right issues:-Software piracy is defined as the copying the software without authorization and distributing the software with copy rights. This software piracy can be done by selling, sharing the software with the others, installing the multiple numbers of copies which has permission for single installation and downloading the software without proper licence i.e. using the software by cracking it. When we are purchasing the software means, we have only right use the software but not the stealing the code and changing code according to usage of ours. The software license agreement tells how many times we can install the program and use that program. So, when we purchase the software we have to read the license document carefully and according to that software vendor license rules, we have to use the software. If we violate any rule in the software license document, it will come under the software piracy.
So, sharing the software with the other by multiple copies is software piracy. So by reading the license document, we can able to know all the piracy issues. So, up to some extent the piracy cases can be reduced. The people who work for the development of software, will take many days and lot of effort to think inorder to write the software. So it will also come under intellectual property rights. By the rapid growth of the internet, many users are keeping the pirated software or keys for the software in the internet. So many people are downloading the software's and running the software without proper authorization, which leads to the growth of the piracy. Reverse engineering helps us to learn the programs structure and logic of the program i.e. how a particular function is performing a particular functionality. Thus by understanding the programs logic, everybody can change the logical flow of the program. Technically this is called as patching, because it involves in placing the new code over the original code, like a patch on a clothes. Patching allows the reverse engineer to add some additional code to original code which may change the functionality of a particular method how it works. Thus it enables us to maintain the secret code, deleting the particular function (or) disabling the functionality of the particular method or class and fixing the security bugs without the source code.
Because reverse engineering involves in reconstructing the code, it will come under intellectual property law. Software companies thus fear of reverse engineering technique because their secret algorithms and methods will be directly revealed to the outside people than external observation through machines, which they might copy and use them.
Reverse engineering can be used to remove the copy right issues or copy right schemes part of the source code from the software. Patching software to delete (or) defeat the copy right schemes or digital management rights are illegal. But reverse engineering is not an illegal. The main reason software vendors forbid about reverse engineering is that, their secret code is revealed to the external persons, but this seems to be a bit silly because the person who understands the compiled code is already understood the program. In order to prevent this not to happen, some encryption technologies has to be applied on the secret code parts of the program. Software companies forbid of the reverse engineering because any researchers can find the security flaws in their code and can give this buggy information to the people. This may lead to the bad image on the software companies and stops the reputation of the company. If reverse engineering is made illegal, then researchers stops checking the quality of the code produced by the company without examining the code. In that situation people has to accept that software is fully secured even though it is not much secure and correct code.
Software security:-In the present market, the entire software programs are protected by various techniques. Some software's are accessible to the users if and only if, they are registered with the software products. Reverse engineering is the technique which allows removing the protection on the program called as Cracking.
In general terms' cracking is termed as "when we develop a software program, we build the executable file from the source code. Reverse engineering is a technique, which allows extracting the source code from the executable file. By using the reverse engineering techniques, we can understand, in what way the program is performing particular action and can bypass the protection. In simple terms reverse engineering is termed as the making the program to work in the way reverse engineer wants, than it was originally intended to work.
This technique is the simplest technique, in which one serial key will be given to all the users. When the user enters the given serial key, the software product checks itself to the original key using the algorithms, and if the user enters the correct key then the software will be successfully registered otherwise it won't work.
Serial number with name protection:-In this technique user has to enter both the serial and name. Same as hard coded serial, user entered serial key and the original serial key is checked, no which is derived from our name using the same algorithm. This protection is sometimes easy and difficult, based on the programmer's usage of algorithm. This kind of technique is seen in WinZip.
Nag screen:-In this protection technique, every time when a user starts the application a window will appear showing the no of days subscription left (or) you should activate your software (or) any some other information will be displayed. This is hard to remove. This is somewhat difficult to be new comers to understand as programmers find it difficult to under standard. This is used by the WinZip.
Time trial :-According to the +ORC, this following kind of protection techniques are used
Cinderella protection, in which a predetermined amount of the days is given, says 60 days from the starting day of the installation. 'Count down' time predictions, in which the some amount of time, say 5 mins/sec or given to the user to use that application after that it will ask for the product registration. Mostly we will see this in game applications.
To have a particular finish date independent of starting date, i.e. 'BEST_BEFORE' protection date.
To a predetermined times only user can execute or use the application. It is strictly time independent, but dependent on how many times user executes the program.
Dongle protection:-Dongle protection is the toughest technique to crack. This protection uses EPROM, which is connected to the port of the computer. When the person wants to access the software or program, first it checks the User ID and Hardware ID i.e. 2 unique Ids which are not changeable. If the user gives the correct Ids then the user can be able to access the program or software. In this some RSA algorithm is used for data protection. This kind of the protection is difficult to implement, so it is implemented places where the software and programs are more important. This protection is implemented by the I/O LPT hardware; you will need the registered card attached to the pc's parallel port, in order to access the complete software or program otherwise it won't be accessed. HASP / sentinel are mostly commonly used dongles. DLLs and VxD are used by the dongle to check "is registered".
Commercial protection:-Most of the software programmers don't want to spend their time on developing the security algorithms for their software, which is time consuming. Here programmers are taking equal or more time to develop the security algorithms for their software, which the time consumed to develop the actual software. Here comes the need of the commercial protection, mainly instead of developer developing the security algorithm or software for the software to be protected. There are several companies which will develop the security algorithms (or) software for the software (which has to be developed). The companies that are using the commercial protection are macromedia and Symantec. The commercial protection makes the fully functional software into unregistered version i.e. the software is not exposed to the outside world until they are registered with the software. After the successful registration with the software, then the functionality of the software will come into picture to the user (or) company who wants to use the software.
Other protections:-The other most common types of protection for the software's are by disabling the certain functions in the software and cd-rom protection. The cd-rom protection will be known by many of the computer users, when we keep cd only, the program functionality can be executed. Even though, the content of the cd is saved in our pc. This kind of cd-rom protection will be mainly applicable to the games. The other kind of the software protection is disabling the functions such as we can't save our work on the pc and even we can't use any functions.
Related work:-Previously the technique is, converting the same source code in to the other source code but the functionality of the changed source code and original source code are same. But it is more difficult to understand. The techniques that are used previously are simply renaming the identifiers with more difficult names. The later proposed source code obfuscation is transformation of indexes of arrays, which uses the composite functions in order to change the indices of the array. By using this technique, 3rd person can easily reveal the element where it is indexed. Here the problem noticed is that arrays are not properly used, which leads to the wastage of the memory. So in this way the algorithm used for the transformation of indexes of arrays is unsuccessful. The next technique came up is array index data transformation, (S Praveen and P.Sojan Lal, 2007) in which the single array is split in to the 3(or) more arrays. The intension here is, transforming the data of the single array in to multiple arrays. So that the reverse engineering takes the more time on understanding on what basis the arrays are split in to multiple arrays. The above technique is useful against reverse engineering, but still even though it will be decompiled and understood by the reverse engineers. Due to this the execution of the software becomes slow. From the paper written by, [Praveen Sivadasam and P.Sojan Lal] out of every 10 software's 4 software's are becoming pirated. According to the paper it states that, global piracy has increased by 40% which is the loss of 11 billion US dollars. Many people reverse engineers use reverse engineering because they want to cut down the development time and cost for the software produced by their companies. In this way piracy is increasing, which we can't predict who is pirating our software even though it is illegal. Obfuscation after of constant hiding takes the same execution program as the source code without obfuscation. In this way the obfuscation technique for constant hiding is more accurate. But the problem is, when the source code which has no constants, this hiding constants will be ineffective, which is drawback of this tool. Class coalescing is a technique that allows the several class to be merged in to the single class. The other techniques are class splitting, which allows the single class will be split in to multiple classes. Both the class coalescing and class splitting changes the program structure drastically, by which the design of the software is hided and program understanding will be difficult. The other techniques are using polymorphism i.e. by encapsulating the method return types and parameters through a new defined class, which hides the information. But the techniques used in this cause the drastic increase in the program size and performance of the program will be decreased slightly. The other techniques are re outline methods and inline methods, static procedural data which will increase the program structure of the software and which causes the loss of performance.
Conclusion:-Reverse engineering is a technique that allows the software to be reproduced by de- compilation of the byte code of the java file. The obfuscation techniques are used to prevent the reverse engineering against the de-compilation of the software. The main objectives of the obfuscation are
So the main aim of the obfuscation is to prevent the source code against the reverse engineering which leads to the piracy of the software. The piracy means taking the intellectual property of the other persons and making using of this piece of algorithms and logics in their software, to enhance their software illegally.
Weakness of the obfuscation:-Obfuscation is technique that will make the software program with more number of lines of instructions, which makes software program very large. The software program will become large as, in the process of obfuscation new classes and methods are created in the actual program. New statements are added to the existing source code, to make original source code to look in more complex form. If there is one conditional block in the program, making that conditional block in to 3 (or) more conditional blocks, making the conditional block to be understood by the reverse engineer (or) de-obfuscator. In the same way changing the arguments of the methods i.e. if there are 2 arguments for methods, then the method arguments are increased more than 2, So that it will be difficult for the reverse engineer to understood the code very easily. In the same way if the hierarchy of the class is single level making it in to multi level, so that it will be hard to the reverse engineer to understood. So by this way the size of the software program is increased drastically, which leads to the decrease of the program efficiency and more time for the execution of the program.
Due to the reasons mentioned in the above paragraph it states that, because of obfuscating the program the size of the file will be increased due to insertion of additional statements.
Even we make the source code to be obfuscate, we can't say that it will be forever cannot be reverse engineered and get the source code from the obfuscated file.
So in order to prevent the byte code from the reverse engineering tools, reverse engineering techniques and from the reverse engineers, here in the project there are 2 techniques which prevent the java byte code from the reverse engineering. In order to achieve this two techniques of obfuscation for byte code by using the sandmark, bcellibrary and sun java wireless toolkit. Here, the project is mainly about removing the names of the variables and the names of the methods from the constant pool. In the same way, un-letting the completion of the statement, which means the left hand side of the expression (or) statement is initialized to the dead values, by retrieving the actual values from the buffer when the values are required. By obfuscation, we can prevent the software code from the other people, who want to steal the code without purchasing the actual code.
Students Paper:
Master document text
Chapter 2 Literature review
2.1 About Java:-
Initially java language is named as "Oak" in 1991, which is designed for the consumer electronic appliances. Later in 1995 the name was changed to Java. Java was developed by James Gosling, a development leader in sun micro system. Oak was redesigned in 1995 and changed the name to java for the development of the applications which can be run over internet. Using the java language, java programs can be embedded in to the html pages. Java is not only limited for the web applications, it is also useful to develop the stand alone applications. Java has a feature called OOPs, which make it more familiar. Object oriented programming replaced the old traditional techniques i.e. procedural programming.
Characteristics of java:-
Simple:-
Java language is simple than the previous languages such as c and c++. Java eliminates the pointers concept which is earlier present in c and c++. Java also has a properties i.e. automatic allocation of memory and garbage collection, where as in c/c++ the garbage collection and allocation of memory will be done by the programmer which is a complex task.
Object oriented:-All the programming languages apart from the c++ are procedural languages which are paradigm of procedures. Java programming language is object oriented because java uses the concept of the object. In java everything will depend on objects i.e. creating the objects and making objects to work together. The overall functionality of the high level program will depends on the objects. Because java is object oriented program it provides great range of reusability, modularity and flexibility.
Distributed:-
Java uses the http and ftp which are internet protocols, in order to have access the files over the network. So by using this libraries which are in java can easily make file transfers over the network which is connected to internet.
Interpreted:-
In order to run the java programs we need interpreter. When the java programs are compiled it produces the byte code, which is machine understandable language. The byte code which is produced after the compilation is machine independent, so that it can run on any system using java interpreter. Most of the compilers will convert the high level language instructions to the low-level machine understandable language as machine can't understand the high level instruction. The machine code can only be executed on that compiled native machine. For example a source code is compile on windows platform, the executable file produced after can't be executed on other platforms apart from the windows. But, coming to java it is different i.e. the source code is compiled once and the executable byte code can be run on any platform using java interpreter. The main functionality of the interpreter is, it converts the byte code to the machine language of the target machine.
Robust and secure:-Java programming is more reliable. At the time of the execution time java shows all the errors. In java bad and error prone language constructs are eliminated. Java eliminated the concepts such as pointers, due to this there is no corruption of data and overwriting the memory locations. In the same way java supports the exception-handling, which makes java more reliable and robust. Java forces the programmer to write the code for the exceptions, which may occur during the execution of the program. So that program can be terminated successfully, without any error stopping the execution flow of the program. Java also provides the lot of security. Security is important over the network because the computer will be attacked by the external program. Java provides the security that; it encounters the applets for the un-trusted sources.
Architecture- neutral:-Java is a interpreted language, which enables java as a architectural neutral i.e. platform independent. We can write the program once and it can be executed on any platform with the help of the Java Virtual Machine (JVM).
The java virtual machine can be embedded on the operating system or on web browser. Once the part of the java code is loaded into the machine, it is verified. Byte code verification play a major role, as it check all the code generated by the compiler will not corrupt the machine on which the code is loaded. At the end of the compilation, byte code verification will be done; in order to make sure that's the code is accurate and correct. So the byte code verification is the integral to the compilation and execution. Due to the property of architectural neutral had by java, it is portable. The program once written can be run on any platform without recompilation. Java does not provide any platform specific features. In other languages, such as Ada where the large integer varies according to the platform it runs. But in the case of java the range of the numbers are fixed. Java environment is portal to every operating system and hardware.
Multi-threaded:-It is defined as the programs ability to perform several tasks (or) functions simultaneously. The multithreading property is embedded in the java program. Using the java programs we can perform the several tasks simultaneously without calling any procedures of the operating system, which is done by the other programming languages in order to perform the multi-threading.
Constant Pool:-Every program i.e. class in java, has a array of constants in the heap memory called as the constant pool, which is available to that class. Usually it is created by the java compiler. The constants encode all the name of the (methods, variables and constant that are presented in the constant pool) which is used by particular method of any class. Each individual class i.e. stored in heap memory has a count of how many constants are there and also has offset "which specifies how far in to the class description itself the array of constants begins" (Laura Lemay, Charles L.Perkins, and Micheal Morrison, n.d). The constants are represented (or) typed in the special coded bytes and which has a very well defined format, when these constants are appeared in the .class file for the java class file. JVM instructions refer to the symbolic information in java, rather than relying on the run time layouts of the class, methods and fields. All the constant pool table entries has a fixed format i.e.
Sun Java Wireless Toolkit:-Sun java wireless toolkit CLDC (connected Limited Device Configuration) is a group of tools which is used to develop the applications for the mobiles and for other wireless equipments (or) devices. Although the sun java wireless toolkit is based on the MIDP (Mobile Information Device Profile), it also supports many other optional packages, which make a sun java wireless toolkit as a great tool for developing many applications. It can be supported on the windows and Linux. All the users who have account on the host machine can access this tool either singly or simultaneously. It allows you to use a byte code obfuscator to reduce the size of your MIDlet suite JAR file. It also supports many other standard Application Programming Interfaces (API's) which are defined by the (JCP) Java Community Process program.
Even though, the sun java wireless toolkit did not come up with an obfuscator, it is configured in a way that it supports the ProGaurd. All you need to do is, just simply to download the ProGuard and place it in the system, which sun java wireless tool kit can find it. But due to the flexible nature of the tool, it allows any kind of the obfuscator.
BCEL:-BCEL full abbreviation is Byte Code Engineering library. The BCEL helps you to dig the byte code of the java classes. BCEL gives the utmost power on the code because it works at the individual JVM instructions, even though the power comes with cost in complexity. Using the BCEL, we can transform the existing classes' transformation or we can construct the new classes. The main difference between the BCEL and Javassist is javassist provides the source code interface where as the BCEL is developed in the intension to work at the level of the JVM assembly language. BCEL is good because the approach it uses is low level, which is very helpful to control the program at the instruction level. Compared to Javassist it is more complex to work with the BCEL.
BCEL has the capability to inspect, to edit and to create binary classes in java. There are 2 hierarchy components in the BCEL, in which one component is used to create the new code and the other component is used to edit (or) update the existing code. The inspection of the class aspect in the BCEL mainly deals with the duplication whatever available in the java platform using the Reflection API. This duplication is necessary (or) mandatory in classworking because we generally don't want to load the classes on which we are working until they are modified fully. Org.apache.bcel.classfile package provides all the definition which is related to inspection-related code.org.apache.bcel package provides the basic constant definitions. JavaClass is a class which is the starting point of the package. The JavaClass plays a role in accessing the information of the class using the BCEL same as like java,lang.Class does using the regular reflection in java. The JavaClass has a methods to get the information like structural information about the super classes and interfaces, to get the information of the class i.e. information about the field and methods in the class. The JavaClass will provide access to the some internal information about the class, including constant pool and identifiers. It also represents the Byte stream which is the complete binary class representation. If the actual binary class is parcel, then we can create the instance for the JavaClass. To handle the parsing BCEL provides a class called org.apache.bcel.Respository. The representation of the classes are parsed and cached by the BCEL by default, which are on the JVM path, to get the actual binary classes representation from the org.apache.bcel.util.respository instance. org.apache.bcel.util.respository is an interface which is source for binary classes representations.
Changing the classes:-Not only the accessing the components of class, org.apache.bcel.Classfile.JavaClass also provide certain methods, in order to provide the liberty to change (or) alter the classes. The class component can be set to the new values by using those methods. Although those are of no direct use much, because the other classes in the package don't support constructing the new versions of the components that are building. There are certain classes in the org.apache.bcel.generic package that will provide the editable versions of the same components there in the org.apache.bcel.classfile classes. Org.apache.bcel.generic.ClassGen is the starting step (or) point for the creating the new classes. This also useful to modify the existing classes, to do this one, there is a constructor that takes a JavaClass Instance in order to initialize ClassGen class information. Once you modified the changes to the class, then we get the usable (or) useful class representation from ClassGen instance, in order get the usable representation of the class, we need to call any method that returns the class called JavaClass. Later it will be converted into the binary class information. It is little bit confusing, in order to eliminate this confusion, it is better to write a wrapper class for eliminating some differences.
In order to manage the construction of the various class components, org.apache.bcel.generic provides many other classes apart from the ClassGen. It has a class called ConstantPoolGen , which is used to handle the constant pool. FieldGen, MethodGen classes which are used to handle the Fields and the methods in classes. For the working with the sequence of the JVM instructions there is other class called Instruction List. org.apache.bcel.generic also provides the classes for the each and every type instructions which are executed over JVM. We can create the instance for these classes directly some times and in other times by using the helper class called org.apache.generic.InstrcutionFactory. The main advantage of this helper class is, it handles are the book keeping details of the each and every instruction constructing for us( i.e. adding the items to the constant pool as required for the instructions).
Sand Mark:-Sandmark is a tool i.e. developed to measure the performance of the software protection algorithms and effectiveness of the methods that are preventing the software from the piracy issues, water tampering and reverse engineering techniques. Sandmark is also has an ability to find which algorithm is most resilience's to the attacks and have a least performance of over head.
There are many software protections are proposed both in software and hardware. The hardware protections are there from the dongle protection and now tamper-proof software. The sandmark tool is developed to evaluating and implementing the software-based techniques such as code obfuscation (making code complex to understand) and water tampering.
History of reverse engineering:-Reverse engineering most probably starts with Dos (disk operating system) based computer games. The aim is to have full life and armed for the player to finish the final stage of the game. In that way the technique of reverse engineering came in to picture, it is just to find the memory locations where the life and number of weapons are stored and modifying the values of that memory locations. So that, the player can changes the values and gets through the final stage and win the game. That's why memory cheating tools such as game hack came in to existence.
Reverse Engineering:-Reverse engineering is the process of the understanding the particular aspects of the program, which are listed below To identify the components of the system and the interrelationship between the components.
And enhance the components of the system and to improve the performance and scalability of the system (or) subsystem. Software reverse engineering is a technique that converts a machine code of a program (string 0's and 1's usually sent to logic processor) back in to the programmable language statements which is called as source code. Software reverse engineering is done to get the source code of the program because to know how the particular parts of the program performs particular operations in order to improve the program functionality or to fix the bugs in the program or to find malicious block of statements in the software if any. Generally, this reverse engineering will take place in older industries on machines. But now it is frequently used on computer hardware and software. The important contents like data formats, algorithms what the programmer used to implement the software and ideas of the programmer (or) company will be revealed to the 3rd person by violating the security and privacy issues using reverse engineering technique.
"Reverse engineering is evolving as a major link in the software lifecycle, but its growth is hampered by confusion" (Elliot J.chilkofsky & James H.Cross ii, Jan 1990).
Reverse engineering is generally implemented to improve the quality of the product, to observe the competitors products. Forward engineering is the process of moving from the high level abstracts (or) from the initial requirements stage (objectives, constraints and proper solution to the problem), logical, and independent designs (specification of the solution) to the final product i.e. implementation (coding and testing).; whereas the reverse engineering is the process of moving from the final product to the initial requirements stage in order to under the system logically, why particular function (or) action is being performed. By knowing the system logically, the flaws and errors in the system can be rectified and helps to improve the systems functionality when the source code of the application is not available. For this sake the concept of the reverse engineering techniques is evolved.
Fig 1: reverse engineering and related process are transformations between or within the abstract levels, represented here in terms of life cycle phases. (Elliot J.chilkofsky & James H.Cross ii, Jan 1990)
Reverse engineering in and of itself doesn't mean changing the subsystem or developing the new system based on the existing. It is a process of examination (or) understanding the program (or) software but not replication (or) change. Reverse engineering involves very broad range of aspects such as starting from the existing implementation, recreating or recapturing the design ideas and extracts the actual requirements of the existing system. Design recovery is the most vital subset of the reverse engineering because in which knowledge of the domain, external (or) outer side information and deduction or fuzzy reasoning are added to the investigated (or) subjected system in order to find the high level abstract of the system, normally which is not obtained by directly observing the system. According to the Ted BiggerStaff: "design recovery recreates design abstractions from a combination of code, existing design documentation(if available), personal experience, and general knowledge about problem and application domains. Design recovery must reproduce all of the information required for a person to fully understand what a program does, how it does it, why it does it, and so forth. Thus, it delas with a far wider range of information than found in conventional software-engineering representation of code." (T.J. Bigger Staff, 1989).
Re-engineering is termed as renovation and reclamation, is the examination and altering the subjective system again to construct in the new form and the implementation of the new system. Re-engineering involves some form of reverse engineering i.e. to obtain the high level of the abstract of the existing system followed by forward engineering. This may be changes according to the new requirements that were not previously implemented in the system. While re-engineering is not super type of the forward engineering and reverse engineering but it uses the forward engineering and reverse engineering.
Objectives:-The primary goal of the reverse engineering is to enhance the overall comprehensibility of the system for the both maintenance and new development.
Cope up with the complexity. In order to meet the complexity and shear volumes of the system we have to develop a better methods i.e. automated support. In order to extract the relevant information reverse engineering methods and tools should be combined with the CASE environments. So that decision makers can control the process and product in system evolutions. Alternative views should be generated. Comprehension aids such as graphic representation as been accepted for long time. However maintaining and creating them is becoming difficult in the process. Reverse engineering facilitates the generation or regeneration of the graphical representation in the other forms. While many designers work on single diagrams such as data flow diagrams where as the reverse engineering tools will give the other graphical representations such as control flow diagrams, entity relation diagrams and structure charts to aid the review and verification process.
To identify the side effects. Both haphazard initial design and intentional modifications to the system can lead to unintentional ramifications and side effects that affect the system performance. Reverse engineering can provide better observation than we can observe by forward engineering perspective. So it makes us to solve that ramifications and anomalies before users intimate them as bugs. Component reuse. Software reusability is becoming the more essential part in developing the new products in the software field. Reverse engineering can be able to help to detect the candidates for reusable components from the present system.
To recover the lost information. When the continuous evolution of the long lived system which will lead to loss of information. In order to preserve the old information of the system design; "design recovery "of reverse engineering techniques is used.
Many reverse engineering tools try to extract the structure of the legacy systems with the intension to pass this information to software engineers in order to re-engineer or to reverse engineer the existing component.
Code reverse engineering:-During the evolution of the software, many changes will apply to the code, to add any functionality which is to be added and to change the code in order to rectify the defect and enhance the systems performance (or) quality. Systems with the poor documentation only the code will be reliable solution to get information about the system. As a result, the process of reverse engineering is focused on understanding the code.
Thus reverse engineering has good and bad ends.
Obfuscation:-Java provides platform independence to the software programs so that software programs will run independently on any platform. All the programs are compiled in order get intermediate code format i.e. class file format. A class file consists of very large amount of information regarding the program methods, variable and constant enough to do reverse engineering. When a company develops the program (or) software in java and sell this product in intermediate code format to the other organization by not giving the original software. The organization who buys the program (or) software will simply change (or) modify the software by violating the security and privacy issues of authorised company; by simply applying the reverse engineering technique. This reverse engineering will be done by the software developers, automated tools and decompilers. Java byte code can be easily decompiled, which makes reverse engineering technique easier in java.
In programming context Obfuscation is described as, making program code more difficult to read and understand for security and privacy purposes of the software. Decompilers can easily extract the source code from the compiled code, in that point of view protecting the code secretly will make impossible. So the growth of obfuscators increased rapidly in order to keep effectively smoke screen around the code. Code obfuscation is the one of the most prominent and best method to protect the java code securely. Code obfuscation makes program to understand difficult. So that code will be more resistant to the reverse engineering.
There are 2 byte code obfuscation techniques that are
Source code technique is simply changing the source code of the program, where as byte code obfuscation is changing the classfile of the program (functionality is same as the source code).
There several obfuscation techniques to prevent java byte code from decompilation.
For example consider a set of class files, S, becomes another set of class files S' through an obfuscator. Here the set of class files of s and s' are different, but they produce the same output.
Example:-
By observing the above code the class name OHello is changed to the aa and the gHello method name is changed to the aa. It is more difficult to read the program with aa than a OHello. By this way less information will be interpreted and understand to the reverse engineers. This is just a simple example by renaming the class variables and class method names.
Categories of obfuscation techniques:-
Description of Obfuscation techniques:-
One way of obfuscating the source program by the obfuscators is replacing a symbol of a class file by illegal string. The replacement might be the private are even worst ***.
Other techniques usually obfuscator will use targeting the specific decompilers (Mocha and Jode) is inserting a bad instruction in the code.
The example is
Let us taken an example with bad instruction, let's take the original code (decompiled): Method void main(java.lang.String[]) 0 new #4 3 invokespecial #10 6 return and after obfuscation the code is as follows (names are not changed, not to make complex): Method void main(java.lang.String[]) 0 new #4 3 invokespecial #10 6 return 7 pop
By observing the above routine we notice that a pop instruction is added after the return statement. The last and final statement in the method that has return type should be return statement, but in the above routine a pop keyword is inserted which make the routine not to be executed for ever.
Lexical obfuscation:-Lexical obfuscation changes the lexical structure of a program by scrambling the identifiers. All the names of classes, fields and methods which are meaningful symbolic information of java program, is renamed with meaningless name i.e. useless names. An example obfuscator for lexical obfuscation is crema. Obfuscator is defined as the program that automatically makes the transformation in the classfile in order obfuscate the classfile, to undo the reverse engineering technique to produce the source code from the class file.
Layout obfuscation:-Layout obfuscation dealt with changing the layout structure of the program i.e. done by 2 basic methods
Above 2 will make program code less informative to the reverse engineers. Layout obfuscation techniques use the one way functions such renaming the identifiers by random symbols, removing the comments, unused methods and debugging information. Though the reverse engineers can understand the obfuscated code i.e. done by layout obfuscation, it consumes the cost of reverse engineering. Layout obfuscation techniques are most commonly used in the code obfuscation. All most all obfuscators of java will use these techniques.
Control obfuscation:-Changing the control flow of the program. It is easiest way to do and which make reverse engineer to find the code what exactly. For example consider a code in which a there is a method A(). Here another new method called A_Dummy() will be created and in the program
Data Obfuscation:-Data obfuscation mainly deals with breaking up the data structures used in the program and encrypting the literals. This includes changing the inheritance, restructuring the arrays, making the variable names constant etc. In that way data obfuscation affect the data structures of the program. Thus data obfuscation make impossible to obtain the original source code of the program.
More viable source code obfuscation methods are based on composite functions, which are Array Index Transformation, Method Argument Transformation, and Hiding Constant. The obfuscation techniques that are based on composite functions make the computation complex and extensive use of these techniques make the software to respond slowly. Some source code obfuscation methods are directed at the object oriented concept; Class Coalescing, Class splitting, and Type Hiding. Other source code obfuscation techniques may include; false refactoring, restructure arrays, inline and outline methods, clone methods, split variables, convert static to procedural data, and merge scalar variables. The obfuscation techniques that work over object oriented concept and other techniques like restructure arrays, split variables, merge scalar variables may distort the logic of the software, so these must be carefully used. The employment of obfuscation technique like outline methods, clone methods, convert static to procedural data increase the size of a class file without providing any significant advantage. In lining a method results in an unresolved method call when some other class calls the in lined method.
Advanced obfuscation techniques for byte code:-There are several obfuscation techniques to prevent java byte code from de-compilation. Many of these tools are simply to change the names of the identifiers with the meaningless names which are stored in byte code. Many crackers can understand the actual source code, even though identifier name are changed, but it will take more time to understand.
Traditionally, when a program is compiled to machine code, most of the symbolic information will be stripped off, after the compilation of the program. When the program is compiled, the address of the variable and functions of the program will be denoted by the identifiers. Even though de-compilation of such compiled code is difficult, but still it is possible to decompile the code. We say protection techniques are difficult if and only if the time and effort taken by the cracker to crack the software should be with more cost and effort. Cracking time to crack software is more than a re-writing a program, then it's of no use and waste of time and valueless.
Java became the most popular because of benefits that it is providing. One of the major benefits is portability i.e. compiled program can run on any platform i.e. platform independent. When the program is compiled it produces independent byte code. Java uses the symbolic references rather than the traditional memory addresses. Therefore, the names of methods and, variables and types are stored in a constant pool with in a byte code file.
There are many commercial de-compilers (P & C, 2001, Vliot 1996, hoeniche 2001 etc.). When the program is decompiled, it extracts the program almost identical to the source code. Making use of decompiler to extract the source code becomes the lethal weapon to intellectual property piracy.
Obfuscation technique is used to stop de-compilation of the byte code. The main aim of obfuscation technique is to make decompiled program harder to understand i.e. more time and effort to understand the obfuscated code.
Obfuscation scope:-Java application consists of one or more packages. A programmer might divide the program in to packages. He can also use the packages that are in standard library and proprietary libraries. Only the part of the program developed by the developer will be given outside. The proprietary library is not distributed due to the copyright restrictions. Obfuscation scope termed as the part of the program obfuscated by the obfuscation techniques, i.e. the part of the program/software developed by the developer is protected not the entire software. The package that serves as the utilities for the standard library and proprietary libraries not obfuscated.
Candidates considered for identifiers scrambling:-After compilation not all the above 7 will be kept in byte code file, only the identifiers 1 to 5 from the above list are stored in the byte code file. By default local variables and parameters are deleted (or) removed from the byte code. The names of the local variables and parameters are stored in the LocalVariableTable in the byte code, if the debug info is enabled. But, by default the de-bug info is enabled in java compiler. If the local variable is not found, de-compilers itself create the names for local variable and parameter, which makes program after reverse somewhat understandable. Even, if we rename the names of the variables and parameter in LocalVariableTable, good decompiler will simply ignore the re-named names and creates the new names, decompile and extract the program same as the actual program.
Since the parameter and local variables are not treated as identifier by describing the reasons in the above paragraph, de-compilers are successfully extract the source code by simply creating the new names.
When the application in java is executed, JVM dynamically loads and links the referenced types in to the runtime environment. The symbolic references locate the referenced types which are stored in the byte code file i.e. fully qualified names of class or an interface. So this symbolic references cannot be changed i.e. can't be obfuscated. Entities which denote the standard libraries and proprietary libraries should not be obfuscated.
The following four groups of the entities should not be obfuscated, that are
Java supports the polymorphism. An instant method will be dispatched at the run time dynamically by the no. of formal parameters, name of the method and types of the parameter of the method i.e. called as the signature of the method.( Jien-Tsai *, Wuu Yang, 2002) described as the because the name of the method M which is outside the obfuscation scope is retained, the name of the method which is in obfuscation scope and overrides the method M should also retained as well. Otherwise the JVM can't find the overriding methods based on the signature of the M. So, these retained methods will come under exception group 1 and 2.
When a package is in obfuscation scope, some parts of the package should be kept outside the obfuscation scope. For example, the main method of the program is entry point to that program to execute that program, so the name of the main method should be retained. An proprietary library may export certain types and methods as the interface of the library. So the names of exported types and exported method names should be retained as well. So this will come under exception 3.
The GUI package of the java uses the call back function mainly in event handling model. When a caller of a instance method M which acts as a call back function is not in the obfuscation scope, then the method M should not be obfuscated. This is due to caller function can't find the method M, if the name of the method is obfuscated. On the other case if the caller is also in obfuscation scope, then symbolic references are changed to new, name of the obfuscated method i.e. M. Then the name M can be obfuscated. All the call back functions that should retain the name will come under the exception group 1 and 2.
Statically fields, nested types and static methods are resolved java compiler. JVM will not change any resolution, once the byte code file generated. Therefore fields, nested types and static methods are changed arbitrarily if they are in obfuscation scope. J-T. Chan, W.Yang stated that, is to re-use the identifier as many times as we can. By this the reverse engineer is confused because identifiers are overloaded heavily. After decompiling the source code, the reverse engineer can't understand the functionality of the program just by the names. The reverse engineer can understand the program, if he is able to understand the context of the identifier, which is difficult to understand, if identifier is heavily used. One more advantage is the size of the byte code will be decreased by using shorter and fewer names.
Copy right issues:-Software piracy is defined as the copying the software without authorization and distributing the software with copy rights. This software piracy can be done by selling, sharing the software with the others, installing the multiple numbers of copies which has permission for single installation and downloading the software without proper licence i.e. using the software by cracking it. When we are purchasing the software means, we have only right use the software but not the stealing the code and changing code according to usage of ours. The software license agreement tells how many times we can install the program and use that program. So, when we purchase the software we have to read the license document carefully and according to that software vendor license rules, we have to use the software. If we violate any rule in the software license document, it will come under the software piracy.
So, sharing the software with the other by multiple copies is software piracy. So by reading the license document, we can able to know all the piracy issues. So, up to some extent the piracy cases can be reduced. The people who work for the development of software, will take many days and lot of effort to think inorder to write the software. So it will also come under intellectual property rights. By the rapid growth of the internet, many users are keeping the pirated software or keys for the software in the internet. So many people are downloading the software's and running the software without proper authorization, which leads to the growth of the piracy.
Reverse engineering helps us to learn the programs structure and logic of the program i.e. how a particular function is performing a particular functionality. Thus by understanding the programs logic, everybody can change the logical flow of the program. Technically this is called as patching, because it involves in placing the new code over the original code, like a patch on a clothes. Patching allows the reverse engineer to add some additional code to original code which may change the functionality of a particular method how it works. Thus it enables us to maintain the secret code, deleting the particular function (or) disabling the functionality of the particular method or class and fixing the security bugs without the source code.
Because reverse engineering involves in reconstructing the code, it will come under intellectual property law. Software companies thus fear of reverse engineering technique because their secret algorithms and methods will be directly revealed to the outside people than external observation through machines, which they might copy and use them.
Reverse engineering can be used to remove the copy right issues or copy right schemes part of the source code from the software. Patching software to delete (or) defeat the copy right schemes or digital management rights are illegal. But reverse engineering is not an illegal. The main reason software vendors forbid about reverse engineering is that, their secret code is revealed to the external persons, but this seems to be a bit silly because the person who understands the compiled code is already understood the program. In order to prevent this not to happen, some encryption technologies has to be applied on the secret code parts of the program. Software companies forbid of the reverse engineering because any researchers can find the security flaws in their code and can give this buggy information to the people. This may lead to the bad image on the software companies and stops the reputation of the company. If reverse engineering is made illegal, then researchers stops checking the quality of the code produced by the company without examining the code. In that situation people has to accept that software is fully secured even though it is not much secure and correct code.
Software security:-In the present market, the entire software programs are protected by various techniques. Some software's are accessible to the users if and only if, they are registered with the software products. Reverse engineering is the technique which allows removing the protection on the program called as Cracking.
In general terms' cracking is termed as "when we develop a software program, we build the executable file from the source code. Reverse engineering is a technique, which allows extracting the source code from the executable file. By using the reverse engineering techniques, we can understand, in what way the program is performing particular action and can bypass the protection. In simple terms reverse engineering is termed as the making the program to work in the way reverse engineer wants, than it was originally intended to work.
Various software protectionsThis technique is the simplest technique, in which one serial key will be given to all the users. When the user enters the given serial key, the software product checks itself to the original key using the algorithms, and if the user enters the correct key then the software will be successfully registered otherwise it won't work.
Serial number with name protection:-In this technique user has to enter both the serial and name. Same as hard coded serial, user entered serial key and the original serial key is checked, no which is derived from our name using the same algorithm. This protection is sometimes easy and difficult, based on the programmer's usage of algorithm. This kind of technique is seen in WinZip.
Nag screen:-In this protection technique, every time when a user starts the application a window will appear showing the no of days subscription left (or) you should activate your software (or) any some other information will be displayed. This is hard to remove. This is somewhat difficult to be new comers to understand as programmers find it difficult to under standard. This is used by the WinZip.
Time trial :-According to the +ORC, this following kind of protection techniques are used
Dongle protection is the toughest technique to crack. This protection uses EPROM, which is connected to the port of the computer. When the person wants to access the software or program, first it checks the User ID and Hardware ID i.e. 2 unique Ids which are not changeable. If the user gives the correct Ids then the user can be able to access the program or software. In this some RSA algorithm is used for data protection. This kind of the protection is difficult to implement, so it is implemented places where the software and programs are more important. This protection is implemented by the I/O LPT hardware; you will need the registered card attached to the pc's parallel port, in order to access the complete software or program otherwise it won't be accessed. HASP / sentinel are mostly commonly used dongles. DLLs and VxD are used by the dongle to check "is registered".
Commercial protection:-Most of the software programmers don't want to spend their time on developing the security algorithms for their software, which is time consuming. Here programmers are taking equal or more time to develop the security algorithms for their software, which the time consumed to develop the actual software. Here comes the need of the commercial protection, mainly instead of developer developing the security algorithm or software for the software to be protected. There are several companies which will develop the security algorithms (or) software for the software (which has to be developed). The companies that are using the commercial protection are macromedia and Symantec. The commercial protection makes the fully functional software into unregistered version i.e. the software is not exposed to the outside world until they are registered with the software. After the successful registration with the software, then the functionality of the software will come into picture to the user (or) company who wants to use the software.
Other protections:-The other most common types of protection for the software's are by disabling the certain functions in the software and cd-rom protection. The cd-rom protection will be known by many of the computer users, when we keep cd only, the program functionality can be executed. Even though, the content of the cd is saved in our pc. This kind of cd-rom protection will be mainly applicable to the games. The other kind of the software protection is disabling the functions such as we can't save our work on the pc and even we can't use any functions.
Related work:-Previously the technique is, converting the same source code in to the other source code but the functionality of the changed source code and original source code are same. But it is more difficult to understand. The techniques that are used previously are simply renaming the identifiers with more difficult names. The later proposed source code obfuscation is transformation of indexes of arrays, which uses the composite functions in order to change the indices of the array. By using this technique, 3rd person can easily reveal the element where it is indexed. Here the problem noticed is that arrays are not properly used, which leads to the wastage of the memory. So in this way the algorithm used for the transformation of indexes of arrays is unsuccessful. The next technique came up is array index data transformation, (S Praveen and P.Sojan Lal, 2007) in which the single array is split in to the 3(or) more arrays. The intension here is, transforming the data of the single array in to multiple arrays. So that the reverse engineering takes the more time on understanding on what basis the arrays are split in to multiple arrays. The above technique is useful against reverse engineering, but still even though it will be decompiled and understood by the reverse engineers. Due to this the execution of the software becomes slow. From the paper written by, [Praveen Sivadasam and P.Sojan Lal] out of every 10 software's 4 software's are becoming pirated. According to the paper it states that, global piracy has increased by 40% which is the loss of 11 billion US dollars. Many people reverse engineers use reverse engineering because they want to cut down the development time and cost for the software produced by their companies. In this way piracy is increasing, which we can't predict who is pirating our software even though it is illegal. Obfuscation after of constant hiding takes the same execution program as the source code without obfuscation. In this way the obfuscation technique for constant hiding is more accurate. But the problem is, when the source code which has no constants, this hiding constants will be ineffective, which is drawback of this tool. Class coalescing is a technique that allows the several class to be merged in to the single class. The other techniques are class splitting, which allows the single class will be split in to multiple classes. Both the class coalescing and class splitting changes the program structure drastically, by which the design of the software is hided and program understanding will be difficult. The other techniques are using polymorphism i.e. by encapsulating the method return types and parameters through a new defined class, which hides the information. But the techniques used in this cause the drastic increase in the program size and performance of the program will be decreased slightly. The other techniques are re outline methods and inline methods, static procedural data which will increase the program structure of the software and which causes the loss of performance.
Reverse engineering is a technique that allows the software to be reproduced by de- compilation of the byte code of the java file. The obfuscation techniques are used to prevent the reverse engineering against the de-compilation of the software.
The main objectives of the obfuscation are
So the main aim of the obfuscation is to prevent the source code against the reverse engineering which leads to the piracy of the software. The piracy means taking the intellectual property of the other persons and making using of this piece of algorithms and logics in their software, to enhance their software illegally.
Weakness of the obfuscation:-Obfuscation is technique that will make the software program with more number of lines of instructions, which makes software program very large. The software program will become large as, in the process of obfuscation new classes and methods are created in the actual program. New statements are added to the existing source code, to make original source code to look in more complex form. If there is one conditional block in the program, making that conditional block in to 3 (or) more conditional blocks, making the conditional block to be understood by the reverse engineer (or) de-obfuscator. In the same way changing the arguments of the methods i.e. if there are 2 arguments for methods, then the method arguments are increased more than 2, So that it will be difficult for the reverse engineer to understood the code very easily. In the same way if the hierarchy of the class is single level making it in to multi level, so that it will be hard to the reverse engineer to understood. So by this way the size of the software program is increased drastically, which leads to the decrease of the program efficiency and more time for the execution of the program.
Due to the reasons mentioned in the above paragraph it states that, because of obfuscating the program the size of the file will be increased due to insertion of additional statements.
Even we make the source code to be obfuscate, we can't say that it will be forever cannot be reverse engineered and get the source code from the obfuscated file.
So in order to prevent the byte code from the reverse engineering tools, reverse engineering techniques and from the reverse engineers, here in the project there are 2 techniques which prevent the java byte code from the reverse engineering. In order to achieve this two techniques of obfuscation for byte code by using the sandmark, bcellibrary and sun java wireless toolkit. Here, the project is mainly about removing the names of the variables and the names of the methods from the constant pool. In the same way, un-letting the completion of the statement, which means the left hand side of the expression (or) statement is initialized to the dead values, by retrieving the actual values from the buffer when the values are required. By obfuscation, we can prevent the software code from the other people, who want to steal the code without purchasing the actual code.
Java language dissertation. (2017, Jun 26).
Retrieved January 15, 2025 , from
https://studydriver.com/java-language-dissertation/
A professional writer will make a clear, mistake-free paper for you!
Get help with your assignmentPlease check your inbox
Hi!
I'm Amy :)
I can help you save hours on your homework. Let's start by finding a writer.
Find Writer