Monday, February 2

Reverse Engineering Techniques


Introduction:- :-

Reverse engineering is the process of analyzing a subject system to identify the system’s components and their interrelationships and create representations of the system in another form or at a higher level of abstraction. The purpose of reverse engineering is to understand a software system in order to facilitate enhancement, correction, documentation, redesign, or reprogramming in a different programming language.

Decompilers Reverse engineering is done with the help of decompilers. A decompiler is a program that reads a program written in a machine language (the source language) and translates it into an equivalent program in a high level language (the target language) (see Figure 1.2). A decompiler, or reverse compiler, thus attempts to reverse the process of a compiler, which translates a high-level language program into a binary or executable program. Basic decompiler techniques are used to decompile binary programs from a wide variety of machine languages to a diversity of high-level languages. The structure of decompilers is based on the structure of compilers; similar principles and techniques are used to perform the analysis of programs.
 
Code Improving Optimizations

This section presents the code-improving transformations used by a decompiler. The aim of these optimizations is to eliminate the low-level language concepts of condition codes, registers, and intermediate instructions, and introduce the high-level concept of expressions of more than two operands. For this purpose, it is noted that push instructions are used in a variety of ways by today's compilers. Parameter passing is the most common use of this instruction, by pushing them before the subroutine call, in the order specified by the calling convention in use. Register spilling is used whenever the compiler runs out of registers to compute an expression.



Structuring Algorithms 

In Decompilation, the aim of a structuring algorithm is to determine the underlying control structures of an arbitrary graph, thus converting it into a functional and semantic equivalent graph. Arbitrary graph stands for any control flow graph, reducible or irreducible, from a structured or unstructured language. Since it is not known what language the initial program was written in, and what compiler was used (e.g. what optimizations were turned on), the use of goto jumps must be allowed in case the graph cannot be structured into a set of generic high-level structures. 

Structuring Loops

In order to structure loops, a loop in terms of a graph representation needs to be defined. This representation must be able to not only determine the extent of a loop, but also provide a nesting order for the loops. The representation of a loop by means of cycles is too fine a representation since loops are not necessarily properly nested or disjoint. The use of strongly connected components as loops is too coarse a representation as there is no nesting order. The use of strongly connected regions does not provide a unique cover of the graph, and does not cover the entire graph. Once a loop has been found, the type of loop is determined.


Finding The Type Of Loop 

The type of a loop is determined by the header and latching nodes of the loop. In a pre-tested loop, the 2-way header node determines whether the loop is executed or not, and the 1-way latching node transfers control back to the header node. A post-tested loop is characterized by a 2-way latching node that branches back to the header of the loop or out of the loop, and any type of header node. Finally, an endless loop has a 1-way latching node that transfers control back to the header node, and any type of header node.

Finding The Loop Follow Node 

The loop follow node is the first node that is reached after the loop is terminated. In a pre-tested loop, the follow node is the successor of the loop header that does not belong to the loop. In a similar way, the follow node of a post-tested loop is the successor of the loop-latching node that does not belong to the loop. Initially, no follow nodes are taken for endless loops as neither the header nor the latching node jump out of the loop. But since an endless loop can have a jump out of the loop in the middle of the loop, it can too have a follow node. Since the follow node is the first node that is reached after the loop is ended, it is desirable to find the closest node that is reached from the loop after an exit is performed. The closest node is the one with the smallest reverse post order numbering; i.e. the one that is closest to the loop (in numbering order). Any other node that is also reached from the loop can be reached from the closest node (because it must have a greater reverse post order numbering), thus, the closest node is considered the follow node of an endless loop.

Generating Code for a Basic Block 

After data flow analysis, the intermediate instructions in a basic block are all high-level instructions; pseudo high-level instructions must have been eliminated from the code before this point. For each basic block, the instructions in the basic block are mapped to an equivalent instruction of the target language. Transfer of control instructions (i.e. jcond and jmp instructions) is dependent on the structure of the graph (i.e. they belong to a loop or a conditional jump (2 and n ways), or be equivalent to a goto), and hence, code is generated for them according to the control flow information. This section illustrates how code is generated for all other instructions of a basic block.

Challenges

1. The main difficulty of a decompiler parser is the separation of code from data, that is, the determination of which bytes in memory represent code and which ones represent data.

2. Less mature. Skilled staff sparse.

3. The model can be imperfect. Salvaging partial information is still useful.

Conclusion 

This seminar report has presented techniques for the reverse compilation or Decompilation of binary programs. Decompilers use similar principles and techniques used in compilers. The data flow analysis phase analyzes the low-level intermediate code and converts it into a high-level intermediate representation available in any high-level language. Finally, the code generation phase generates high-level code based on the high-level intermediate representation and the structured graph of each subroutine. In this way, a decompiler for a different machine can be built by writing a new front-end for that machine, and a decompiler for a different target high-level language can be built by writing a new back-end for the target language.

0 comments:


Home About-us Computer Science Electronics Mechanical Electrical IT Civil
Copyright © 2018 www.seminartopics.org | All Rights Reserved. Design By Templateclue