`Accurate Comparison of Binary Executables
`
`Martial Bourquin Andy King
`University of Kent
`bourquin.martial@gmail.com
`a.m.king@kent.ac.uk
`
`Edward Robbins
`
`er209@kent.ac.uk
`
`Abstract
`As the volume of malware inexorably rises, comparison of binary
`code is of increasing importance to security analysts as a method of
`automatically classifying new malware samples; purportedly new
`examples of malware are frequently a simple evolution of exist-
`ing code, whose differences stem only from a need to avoid de-
`tection. This paper presents a polynomial algorithm for calculat-
`ing the differences between two binaries, obtained by fusing the
`well-known BinDiff algorithm with the Hungarian algorithm for
`bi-partite graph matching. This significantly improves the matching
`accuracy. Additionally a meaningful metric of similarity is calcu-
`lated, based on graph edit distance, from which an informed com-
`parison of the binaries can be made. The accuracy of this method
`over the standard approach is demonstrated.
`
`1. Introduction
`Automated auditing of binary code is a topic of increasing interest
`to both academia and industry. The pace of software development is
`such that governments and other organisations must deal with soft-
`ware updates on an almost daily basis that must be vetted for vul-
`nerabilities. Meanwhile the anti-virus (AV) industry is attempting
`to cope with an almost constant proliferation of new malware, most
`of which are simple evolutions of known code, and needs to iden-
`tify, classify and protect against each. For these tasks binary differ-
`encing is of great necessity; for vulnerability discovery in software
`updates the security analyst wishes to examine only the new code
`and so must identify what is old and what is new quickly, while in
`malware identification/classification the key questions are whether
`the code is a mutation of a known piece of malware, and if so, what
`has changed that needs to be accounted for? In contrast to various
`nefarious applications, such as reversing security patches, binary
`differencing has also found application in legal settings as a litmus
`test for infringement against open source licensing and copyright
`agreements.
`The most obvious cause of changes between different versions
`of the same application is simple addition and removal of code.
`However there are at least two other sources of differences which
`result in some binaries appearing different, while in fact being func-
`tionally identical. The first is so-called `server-side polymorphism',
`typically a result of using different compiler optimisations, a dif-
`
`Permission to make digital or hard copies of all or part of this work for personal or
`classroom use is granted without fee provided that copies are not made or distributed
`for profit or commercial advantage and that copies bear this notice and the full citation
`on the first page. To copy otherwise, to republish, to post on servers or to redistribute
`to lists, requires prior specific permission and/or a fee.
`PPREW '13 Jan 26, 2013 Rome, Italy.
`Copyright © 2013 ACM 978-1-4503-1857-0/13/01...$15.00
`
`ferent compiler version, or a different compiler all together. The
`second is wilful obfuscation in order to bypass detection by signa-
`ture engines of anti-virus software. Regardless of the source of the
`changes, they render simple comparison techniques such as byte-
`for-byte file differencing almost completely useless. Some exam-
`ples of such changes are:
`
`• Sequences of opcodes may remain identical but register alloca-
`tion can change depending on availability at a compile time, i.e.
``ecx' may become `edx' and so on.
`• If instructions do not depend on other instructions they may
`be reordered due to pipeline optimisations without affecting
`operational semantics.
`• Junk/do-nothing code is inserted between instructions in order
`to bypass detection by signature engines of anti-virus software.
`• Obfuscating [7] or diversifying [2, 8] transforms are applied to
`a binary to replace sets of instructions by equivalent ones, thus
`preserving the action of the code but changing its representation
`at the instruction level. In the former case the new code aims to
`be more cryptic; in the latter case it aims to be unique.
`
`This paper seeks to address these issues in binary comparison by
`bringing together two existing approaches to the problem. Both
`techniques perform structural matching [16]. In the context of bi-
`naries this means that they seek to identify/recover key structural
`components from each binary to compare and match, establishing
`an isomorphism between the two binaries.
`The first technique was developed by Thomas Dullien (AKA
`Halvar Flake) and his colleagues in their tool BinDiff [15]. BinDiff
`attempts to reconstruct the Control Flow Graph (CFG) of each
`binary, as it uses the functions and basic blocks as the units for
`comparison.
`Note that the Control Flow Graph can be understood as a graph
`of graphs; at the top level it consists of the Call Graph (CG), which
`links functions together via function calls and returns, while each
`function is itself a CFG linking basic blocks via simple branch-
`es/jumps. To clearly disambiguate whether the term CFG refers to
`the CFG as a graph of graphs of a whole program, or the CFG
`of a single function, henceforth when used in this paper it can be
`assumed that it refers to the graph of a function alone, unless oth-
`erwise stated.
`The BinDiff algorithm compares functions and basic blocks
`based on graph-centric properties derived as identifiers for them,
`such as number of out-going edges, in-coming edges etc. These
`properties make up a tuple for each function or basic block, but it is
`important to note that the tuples are not necessarily unique. BinDiff
`first creates an initial set of matches consisting only of uniquely
`identical functions from each binary. This set is then expanded
`upon by taking each matched pair and searching for more unique
`matches, but only amongst their unmatched neighbours, thus limit-
`ing the search space and increasing the likelihood of finding unique
`matches. This is then repeated with the new matches exhaustively
`
`WIZ, Inc. EXHIBIT - 1047
`WIZ, Inc. v. Orca Security LTD.
`
`Pg. 1
`
`
`
`Algorithm 1: Initial match discovery
`1 function initialMatches(SA, SB);
`2 M
`0;
`3 foreach vertex a% E SA do
`4
`foreach Selector E do
`5
`if (a,, bj) E(a,,SB) then
`6
`4— .A4 U {az
`M
`b3};
`7
`SAVatl;
`8
`Sg\fbil;
`SB
`9
`break;
`
`SA
`
`10 return (M, SA, SB);
`
`until no new unique matches can be found. The treatment of chil-
`dren and parent nodes are discussed subsequently. The BinDiff al-
`gorithm can be applied to match blocks in an analogous way to
`matching functions.
`The second technique is also graph-centric [14], and is based
`on a thread of work in Graph-Edit-Distance (GED) computation
`[17, 23]. GED, from graphing theory, is the minimal number of edit
`operations required to transform a graph gA into a graph gB [30].
`A series of edit operations is called an edit path. With a directed
`labelled graph, an edit operation is either vertex substitution, vertex
`insertion or vertex deletion. A vertex substitution occurs when a
`vertex in graph gA is substituted for one from graph cB. Edge
`substitutions are defined analogously. Vertex insertion occurs when
`a vertex from graph cB is added to graph gA. Edge insertion
`is defined in a corresponding manner. Vertex deletion and edge
`deletion are defined similarly.
`There are multiple edit paths from one graph to another, as one
`would expect, but the desired path is the one with the lowest total
`cost of edit operations. GED also provides a convenient and mean-
`ingful metric for measuring graph, and hence binary, similarity. Un-
`fortunately computation of the GED is an NP-hard problem [33].
`However, it has been shown [26, 30] that an approximation can
`be obtained by transforming the problem into a weighted bipartite
`graph matching problem, which can be solved in cubic time with
`the Hungarian algorithm [24]. Furthermore, Hu et Al. have shown
`that performing automatic analysis and classification of malware
`samples into families using an approximation of the GED is indeed
`possible [17].
`BinDiff is fast and moreover the matches it makes are usually
`accurate. The problem is that the matches it derives are often
`incomplete: It frequently fails to match a pair of functions that
`are conspicuously similar. This issue stems from the fact that a
`function can exist in one binary whose tuple does not uniquely
`match the tuple of a function in the other. The real problem is
`that basing the matching of functions purely on tuples is brittle,
`as they abstract the structural features of a function. This is not a
`problem if the desire is to only match functions which are identical
`as deemed by the abstraction, rather than those that have undergone
`modification and are similar but not identical. It is arguable whether
`this is an issue when two related executables are scrutinised so
`as to diagnose the effect of a patch. However, for the problem of
`malware classification, namely quantifying the degree of similarity
`between two arbitrary binaries, it is important to also match similar
`functions, a concept which is crystalised with the notion of shortest
`edit path. BinDiff does provide a metric for the similarity of the
`graphs, but it is not particularly meaningful as the developers of
`BinDiff themselves concede [22]. On the other hand GED is one of,
`perhaps the, most natural graph metric [11], and its solution always
`provides a suggested match for remaining unmatched nodes.
`
`
`
`Algorithm 2: Match propagation
`1 function propagateMatches(./Vi, SA, SB);
`2 foreach {ai
`bj} E M do
`3
`7F
`foreach
`Property
`do
`4
`7r(ai,
`SA
`SA);
`s
`ir(bj,SB);
`6
`0 then
`if S'A
`0
`A S iB
`7
`E S'A do
`foreach
`vertex
`8
`
`foreach Selector E do
`9
`(aii, cij) 4-- e(aii , SB ) then
`if
`to
`.A4 U {di
`bij};
`11
`SA S'A \{a1};
`12
`S /B\{b /j};
`SB
`13
`SA SAVaa;
`14
`SB\{b'j};
`SB
`15
`break;
`
`16 return (M, SA, SB);
`
`The work presented here performs automated static compari-
`son of binary executable files by combining the two approaches
`outlined. The BinDiff algorithm for matching both functions and
`basic blocks between two binaries is improved by augmentation
`with the Hungarian algorithm, which finds matches with the min-
`imum possible GED over all functions and basic blocks. The net
`effect of overlaying the Hungarian algorithm on top of the Bin-
`Diff algorithm is a more robust matching procedure that mops up
`and matches functions that would otherwise remain unmatched. We
`consider this to be the main contribution of the work. Moreover, the
`mop up phase is cubic, which is pleasing considering the computa-
`tional complexity of GED.
`GED is also adapted to provide an accurate metric for the
`edit operations required to transform one binary into another. The
`system is implemented in a novel way that provides an accurate
`final edit path to the analyst at the end of the diffing process,
`and quantitative accuracy results are presented for the number
`of functions matched that conclusively demonstrate the value of
`this method. As far as we know, our work also represents a step
`change in systematic evaluation; previously accuracy testing has
`been carried out on a few examples, at best.
`
`2. BinDiff
`As briefly explained in section 1, BinDiff [16] associates each basic
`block/function with a tuple that describes some of its properties.
`For example, the tuple for a function is (a, 13, 7) where:
`
`• a is the number of basic blocks in the CFG of the function.
`
`• /3 is the number of edges in the CFG of the function.
`
`• 7 is the number of function calls in the CFG of the function.
`
`In fact, in the later paper by Dullien [9], the concept of matching
`is generalised using the idea of a selector. A selector is a function
`that takes a vertex a, from a graph GA, and a set SB of vertices
`from another graph GB and returns a vertex from SB that uniquely
`matches a, if precisely one exists. Thus matching based on the tuple
`defined above can be thought of as a selector that simply compares
`the elements of the tuples for equality and uniqueness. Another
`example of a selector is a checksum selector based on a cyclic
`redundancy check (CRC32) of the machine code that constitutes a
`block. For unstripped binaries the symbolic names of the functions
`
`Pg. 2
`
`
`
`Algorithm 3: BinDiff
`1 function binDiff(cA, cs);
`2 SAS g A;
`3 SB
`gg;
`4 A4/
`0;
`5 (M, SA, SB)
`initialMatches(SA, SB);
`6 while M' M do
`7
`M;
`8
`(M, SA, SB)
`propagateMatches(M, SA, SB);
`9 return (M, SA, SB);
`
`provide a natural selector [9], however this selector should be
`applied with caution as very different functions can share the same
`name, for instance main.
`The BinDiff algorithm finds an initial set of matches by com-
`paring (using selectors) all vertices SA in the first graph with all
`the vertices SB in the second graph. In Dullien's first publications
`[14, 15] these initial matches were then expanded upon by taking all
`neighbours of a pair of matched vertices, and searching for matches
`amongst those. This limited the vertices that were searched, and in
`so doing increased the likelihood of finding new matches. However,
`Dullien later [9] generalises neighbours to more abstract concepts
`that he refers to as properties. Consider a function that takes a ver-
`tex and returns all neighbours of that vertex; it is in fact taking a
`vertex and returning a subset of the vertices in the graph for which
`the neighbour relationship holds. Properties abstract the construc-
`tion of subsets to arbitrary relationships, for example, the parent
`property can be used to return all parents of a vertex, or the child
`property to return all children.
`
`2.1 BinDiff Algorithm
`Initial matches are found by using all selectors across all the ver-
`tices in the first graph cA. If a selector finds a uniquely matching
`vertex in the second graph gB it returns a match (at, ba). The re-
`turned matches then create an initial mapping M, containing all
`matching vertices. The initial match discovery algorithm is shown
`in Algorithm 1.
`The next step of the BinDiff algorithm is to find more matches
`based on those found initially. The propagateMatches algorithm
`listed in Algorithm 2 does exactly this. For each initial match
`properties are used to create subsets S'A and SiB of the remaining
`unmatched graph vertices, consisting of all the neighbours of the
`vertices in the match, or all the parents, etc., depending on which
`property is used. Selectors are then used on the vertices in the
`subsets to find new matches. Note that, as in initial match discovery,
`after a match is discovered the vertices must be removed from the
`sets so that the algorithm only ever searches amongst unmatched
`vertices.
`Finally the main program loop (Algorithm 3) brings all this
`together by first creating the initial matches, and then by calling
`propagate until no new matches are discovered. Like many iterative
`algorithms BinDiff can be reformulated in terms of so-called delta
`sets so that propogateMatches is not called on all of M, but merely
`on those pairs that have been most recently added.
`
`Illustrated Example
`2.2
`We will now consider an example execution of the BinDiff algo-
`rithm by comparing a simple binary with itself, using only the stan-
`dard 3-tuple selector given previously. Listing 1 is a listing of the
`C source code of the program, while Figure 1 shows the recovered
`call graph (CG). Notice that only 8 functions can be seen in the
`source code of the program, while the CG contains 18 nodes. This
`
`Listing 1. A basic C program
`
`#include <stdio .h>
`#include <stdlib .h>
`#include <time .h>
`
`void DO {
`printf ("\n"); return;
`
`}
`
`void CO {
`DO;
`printf (" \n");
`
`}
`
`void B() {
`CO;
`printf("\n");
`
`}
`
`void A() {
`BO;
`printf("\n");
`
`}
`
`int main() {
`srand(45);
`AO;
`BO;
`C();
`DO;
`rand();
`
`is due to code inserted by the compiler for program initialisation,
`and support functions such as srand. Since we are comparing the
`binary with itself we would expect that all vertices will be matched.
`
`15(1,0,0)
`
`7(791
`
`6(221
`
`3(110
`
`2(1,1,0) 1
`
`5(1,1,0
`
`13(19 22 6
`
`12(6 6 2) 2)
`
`
`2
`,6,2)
`11(6
`
`6
`2)
`
`10(6
`
`9(5 5,1
`
`4(110
`
`14(7,8,3
`
`0(553
`
`8(451
`
`16(1, 0 0
`
`17(5 61 1
`
`Figure 1. Extracted Call-Graph from the binary 'test'
`
`Searching for initial matches selects 9 vertices: 0, 6, 7, 8, 9, 13, 14,
`17 and 18. The matches represent those vertices that are unique.
`For example, note that vertex 7 has been matched since the tuple
`(7, 9, 1) is unique, while vertex 10 is unmatched since the tuple
`(6, 6, 2) also characterises vertex 11.
`BinDiff now proceeds with propagation to find more matches.
`On the first iteration of propagateMatches, vertices 1, 3, 4, 10
`and 16 are matched. To demonstrate how, consider propagation on
`vertex 9 using the child property. We can see that vertex 4, which
`is not unique in the graph, is unique in the subset of children of
`vertex 9, being its only child, and so it is immediately selected.
`Similarly, vertex 10 is matched in this iteration when applying the
`parent property to vertex 9, as it is both unique in the parent subset
`
`Pg. 3
`
`
`
`and the only remaining unmatched parent (vertex 13 having already
`been removed in initial matching).
`The final iteration will match vertices 11 and 12 in a similar
`manner, and after this the algorithm stops since it cannot propagate
`any more matches. Thus, although the compared binaries in this
`example are identical, the algorithm failed to match three vertices;
`2, 5, and 15. Indeed, vertices 2 and 5 have the same tuple, so
`they cannot be matched because the selector will always return the
`empty set.
`However, vertex 15 has a unique label in the unmatched vertex
`sets. We could run the the BinDiff algorithm again, using the
`unmatched sets of vertices from each binary as inputs, and in
`this second run vertex 15 will be matched by the initial match
`function immediately. However a better tactic is to simply modify
`the BinDiff algorithm to call the initial matches function again after
`propagation has finished.
`
`2.3 Similarity Metric
`
`Once BinDiff has computed its matches it presents statistics that
`purport to measure similarity and confidence for each match and
`for matching across the binary as a whole. The BinDiff manual
`explains these values as follows:
`
`"The confidence value displayed by the differ is the average
`algorithm confidence (match quality) used to find a particu-
`lar match weighted by a sigmoid squashing function . .. The
`final similarity value is multiplied by confidence - even a
`seemingly good match is not trustworthy if produced by
`weak algorithms."
`www.zynamics.com/bindiff/manual/#N2049A
`
`Similarity is constructed from a weighted sum, though it is not clear
`how confidence values are derived.
`
`3. Bipartite Graph Matching, GED and the
`Hungarian Algorithm
`The Hungarian algorithm [24] was designed to solve the assign-
`ment problem, which is well known in the combinatorial opti-
`misation literature, and is concerned with finding an optimal as-
`signment between two input sets of the same size. However, the
`algorithm can be reinterpreted as finding an optimal matching in
`a bipartite graph, as first demonstrated by Jonker and Volgenant
`[18]. A bipartite graph matching is defined as a bijective function
`17g, , where VgA is the vertex set of graph cB and
`zt) : VgA
`VgA the vertex set of graph cB. To fully explain the relationship
`between the two problems let us first consider a simple example of
`the assignment problem:
`
`Problem: We have a list of tasks to do and a list of workers, and
`a matrix that shows the cost of a worker undertaking a certain task.
`We want to assign a task to each worker so that the final assignment
`minimises the total cost required to perform all tasks.
`
`The problem of bipartite graph matching is analogous to the assign-
`ment problem, in that the aim is to uniquely assign each vertex of
`a graph gA to a vertex in another graph cB so that the assignment
`minimises the cost (i.e. edit distance) required to transform gA into
`cB. The minimum total cost is then considered the GED between
`the graphs. However, there are some problems:
`
`1. There is no clear way to assign costs to the edit operations
`required to calculate GED.
`
`2. There are three different edit operations possible between any
`two vertices, and each has a separate cost. Therefore determin-
`ing a single cost to be used as an element in the cost matrix is
`not possible.
`
`3. The CG/CFG of different binaries can have different sizes, and
`the standard form of the Hungarian algorithm only works over
`equal size graphs.
`
`Fortunately, the problems outlined above have already been ad-
`dressed by several researchers, most recently Riesen and Bunke
`[29]. They define the cost matrix as follows:
`
`Definition 3.1 (Cost Matrix for Bipartite Graph Matching). Let
`(VA, EA) be the source graph and gB = (VB, EB) be the
`g A =
`target graph, where VA = {a l , • • • an} and VB = {b1,
`bm}.
`The cost matrix is:
`
`[ C1,1
`
`C2,1
`
`C1,2
`
`• • • C1,m
`
`C1, E
`
`00
`
`• • •
`
`OO:
`
`C2,2
`
`"
`
`'
`
`C2,m
`
`00
`
`C2 ,
`
`•
`
`C—
`
`Cn ,1 Cn , 2
`
`C,,1
`
`00
`
`' ' '
`
`.
`
`.
`
`.
`
`Cn ,m
`
`00
`
`00
`
`00
`
`00
`
`Cn,,
`0
`
`oc
`
`0
`
`0
`
`O 0
`
`• • •
`
`00
`
`CE ,m
`
`where
`
`0
`
`• the upper-left quarter contains all costs c,,3 to substitute a vertex
`a, from graph gA with a vertex b3 from graph cB;
`• the diagonal of the upper-right quarter contains all costs c,,E to
`delete a vertex a, from gA;
`• the diagonal of the bottom-left quarter contains all costs cE ,3 to
`insert a vertex b3 from graph cB into graph gA;
`• the bottom-right quarter is zero.
`
`The dimensions of the cost matrix are (n m) x (m n), and
`it is thus square. Costs can be set in such a way that the algorithm
`computes a lower bound of the true edit distance (the minimum
`number of edits) [17], though we define costs using the selectors in
`the BinDiff algorithm to quantify the degree of distortion required
`to transform a vertex from gA to a vertex in c B.
`
`Workerl
`Worker2
`Worker3
`
`15
`3
`
`TaskA TaskB TaskC
`7
`4
`8
`
`9 1
`
`Finding an optimal assignment does not necessary imply that each
`worker will be assigned to the task for which their cost is minimal.
`In this example, the optimal assignment is made by all grey boxes
`and its total cost is 14.
`
`4. BinSlayer
`BinDiff is able to match CG/CFG nodes with a high degree of ac-
`curacy, however, it leaves it to the analyst to deal with unmatched
`sets of functions from compared executables, and it may be a long
`and fastidious process to manually sort them and decide which
`have been deleted, inserted or modified. To address this problem,
`BinSlayer applies the Hungarian algorithm to the unmatched ver-
`tices in gA and gB with costs assigned as follows:
`± 13i — i j ±
`ci,l =
`+ +71
`3
`'
`ce,3 = ce/3 + / 3' + 73
`
`el ,E =
`
`^/j/
`
`Pg. 4
`
`
`
`where (xi, /3j and ry are, as previously stated, the number of basic
`blocks in the function at vertex ai of cA, the number of edges
`in ai and the number of function calls in ai. Likewise
`and 7.; represent the corresponding values for the vertex bj of
`cB. These costs are designed to reflect whether or not an edit
`operation represents a strong modification of the graph. The cost
`ci,,j quantifies the structural difference between vertices ai and
`bj, as characterised by differences in their aj, /3i and ry values.
`Observe that ci, j
`and ci, j < ce, j reflecting that deletion and
`insertion are both stronger than a substitution operation.
`BinSlayer also applies the Hungarian algorithm in the same way
`to the problem of matching basic blocks. This can be achieved
`merely by replacing aj, /3i and ry with structural measures appro-
`priate for blocks, again using the selector used for matching blocks
`in BinDiff [9]. In this case, a
`is assigned to the number of blocks
`on the shortest path from the given block to a function exit; /3i is set
`to the number of blocks on the shortest path from the entry point
`into a function to the given block; and ry is defined to be the num-
`ber of function calls within the block. There is no reason why these
`measures could not be augmented with others [9].
`Assigning non-uniform costs to the edit operations of deletion,
`insertion and substitution leads to a generalised notion of GED
`[29] where the cost of an edit path is defined to be the sum of the
`costs of the component edit operations. Moreover this new notion
`of GED can be normalised, to give a metric between 0 and 1 for the
`similarity (or, in fact, dissimilarity) of two binaries:
`
`Definition 4.1 (Graph Dissimilarity). The dissimilarity (5(gA, cs)
`between two graphs cA and cB is a real value on the interval [0, 11,
`where 0 indicates they are identical whereas a value near 1 implies
`that they are highly dissimilar:
`
`ged(cA, gB)
`g(gA,gB) =
`(ErL, + /31 + 71) + (Ejm=i a ij + /3/j +7 /j)
`where EcA and EcB are the edges of graphs g A and gB respec-
`tively and ged(cA, cs) is the GED between graphs g A and cs.
`
`Observe that the numerator never exceeds the denominator, though
`it can equal it, and hence dissimilarity is guaranteed to give a
`variable in the interval [0, 11.
`Therefore our approach to binary comparison is quite straight-
`forward; first use the BinDiff algorithm to match as many nodes
`in the executables as possible with a high degree of confidence,
`and second use the Hungarian algorithm to match the remaining
`unmatched nodes. Due to the inaccuracies of the Hungarian algo-
`rithm, we have written a validation algorithm to attempt to correct
`the edit path it produces. The validator uses both BinDiff and the
`Hungarian algorithm to improve the accuracy of the matches. Fi-
`nally, a normalisation of the GED is calculated as a metric of graph
`similarity to present to the user.
`
`4.1 Validator
`The Hungarian algorithm is used to find an edit path between two
`binaries that has the minimum overall GED. However, the resultant
`edit path will typically contain errors, where structurally dissimilar
`functions have been substituted one for the other because the edit
`cost for substitution is lower than that of insertion or deletion. The
`validator, Algorithm 4, attempts to correct these errors by adjusting
`the costs of edit operations for functions to better reflect structural
`differences.
`Since it is not clear exactly how BinDiff forms its similarity
`measurement (see section 2.3), we created a simple metric for
`our implementation of BinDiff to measure similarity, based on the
`number of matches it found as a fraction of the maximum number
`of vertices in the compared binaries:
`
`Definition 4.2 (Match Similarity). For a given match .A4 between
`two graphs, cA and cB, the match similarity is defined as:
`6m(gA,CB) = I-Mlimax(IVgAII IVgBD
`The validator checks the assignments made for functions (and only
`functions) against BinDiff, using this similarity measurement to
`decide how good the matches made by the Hungarian algorithm
`actually were. It takes a list of vertex substitutions made by the
`Hungarian algorithm, P, and a threshold t. For each substitution
`{ai
`bj} E P, it compares the structural similarity of the func-
`tion at ai with the function at bj. The CFGs of these functions
`are denoted gai and gb, respectively. Note, the vertices ai and bj
`are actually functions. If aj is sufficiently similar to bj, as deter-
`mined by c5x4 and the similarity threshold t, then the substitution
`{aj
`bj} is added to P'. Otherwise, the cost, ci, j of the substitu-
`tion in the cost matrix is doubled, so as to decrease the likelihood of
`this substitution being generated again by the Hungarian algorithm.
`In addition ai and bj are added to the sets of unmatched vertices
`in readiness for the next application of the Hungarian algorithm.
`When all input substitutions have been considered, the Hungarian
`algorithm is reapplied to the adjusted cost matrix, yielding a new
`set of substitutions. This should increase the accuracy of the match-
`ing.
`
`Algorithm 4: Validator
`1 functiono validator(P, t);
`2 Ui
`3 U2
`0;
`0;
`4P/
`bj} E P do
`5 foreach { ai
`binDiff(gai gb; );
`(M, Si, Sj)
`6
`if sm(ga ,gb;)< t then
`7
`8
`Ui 4— U].
`fail;
`9
`U2 4— U2 L-J {bj};
`ci,j
`2ci,j;
`10
`else L
`11
`"P' u {ai
`bi};
`12
`13 return P'u hungarian(Ui , U2);
`
`Implementation Details
`4.2
`BinSlayer is implemented in a highly configurable C++ program
`with a Qt based GUI. The user is presented with a clear list of
`matched and unmatched nodes, including the source of the match
`(BinDiff/Hungarian algorithm/validator), is able to correct matches
`manually and to view the CFGs and binary code itself.
`As far as we can tell from published sources, most other bi-
`nary comparison implementations use the industry standard IDA
`Pro disassembler to recover the programs CFG, however, IDA Pro
`is known to struggle in this task, especially with obfuscated bina-
`ries. We decided instead to use Dynlnst [1], which excels at CFG
`reconstruction, able to recover the full CFG for many obfuscated
`and stripped binaries, and even succeeding at determining indirect
`jump targets.
`
`5. Experimental Results
`To investigate the accuracy of composing BinDiff with the Hun-
`garian algorithm we have compared the number of matched func-
`tions against the number found by the vanilla BinDiff algorithm.
`Our evaluation focused on various versions of the coreutils suite of
`programs. Coreutils consists of various Unix shell utilities which
`
`Pg. 5
`
`
`
`Coreutils: v6.10 VS v8.19
`
`Coreutils: v8.15 VS v8.19
`
`• Reference
`O BinDiff
`• BinDiff+Hunganan
`• BinDiff+Hunganan+validaror
`• Hungarian
`• Hungarian-W.0.er
`
`250 —
`240 —
`230 —
`220 —
`210 —
`200 —
`190 —
`180 —
`170 —
`160 —
`150 —
`140 —
`130 —
`120 —
`110 —
`100 —
`90 —
`80 —
`70 —
`60 —
`50 —
`40 —
`30 —
`20 —
`10 —
`0
`
`Number of Correctly Matched Symbols
`
`• Reference
`
`• BinDiff+Hunganan
`• BinDiff+Hunganan+validaror
`• Hungarian
`• Hungarian-W.0.er
`
`cat
`
`dd
`
`Is
`
`chown chroot
`
`cp
`
`rm
`
`base64
`
`cat
`
`dd
`
`Is
`
`chown ch oot
`
`cp
`
`rm
`
`base64
`
`Coreutils: v6.10 VS v8.19
`
`Coreutils: v8.15 VS v8.19
`
`• Reference
`• BinDiff+Hungarian
`• BinDiff+Hungarian,yaridator
`• Hungarian
`• Hungarian-O/.0.er
`
`250 —
`240 —
`230 —
`220 —
`210 —
`200 —
`190 —
`180 —
`170 —
`160 —
`150 —
`140 —
`130 —
`120 —
`110 —
`100 —
`90 —
`80 —
`70 —
`60 —
`50 —
`40 —
`30 —
`20 —
`10 -
`0-
`
`Number of Incorrect Matched Symbols
`
`• Reference
`• BinDiff+Hungarian
`• BinDiff+Hungarian,yaridator
`• Hungarian
`• Hungarian-O/.0.er
`
`200 —
`190 —
`180 —
`170 —
`160 —
`150 —
`140 —
`130 —
`120 —
`110 —
`100 —
`90 —
`80 —
`70 —
`60 —
`50 —
`40 —
`30 —
`20 —
`10 —
`0
`
`200 —
`190 —
`180 —
`170 —
`160 —
`150 —
`140 —
`130 —
`120 —
`110 —
`100 —
`90 —
`80 —
`70 —
`60 —
`50 —
`40 —
`30 —
`20 —
`10 —
`o -
`
`Number of Correctly Matched Symbols
`
`Number of Incorrect Matched Symbols
`
`cat
`
`dd
`
`s
`
`chown chroot
`
`cp
`
`m
`
`base64
`
`cat
`
`dd
`
`chown chroot
`
`cp
`
`rm
`
`base64
`
`Figure 2. Number of Correctly Matched Functions
`
`are attractive for the purposes of analysis because many versions
`are readily available. Moreover the relatively small size of the utili-
`ties facilitates hand checking. For the purposes of assessing perfor-
`mance we chose to compare eight different versions of the libxml2
`dynamically linked library for XML file handling. This binary is
`sufficiently large to enable timing differences to be observed. We
`have investigated scalability by measuring the running time of both
`the BinDiff algorithm and the Hungarian algorithm against the size
`of the executable. We chose 12 executables with sizes between 200
`kilobytes and 22 megabytes.
`
`To assess accuracy and more specifically to measure how many
`of the matches were actually correct or incorrect, we ran BinDiff
`using symbols as the only selector. This is guaranteed to match
`all named functions and therefore provides an upper bound on the
`best possible matching, hereafter referred to as the reference. Note
`that we cannot assess correct matching of basic blocks or unnamed
`functions because there is no automatic way to establish if a match
`is correct or not. We thus used named functions because names
`suggest that they share provenance and should be matched. As
`noted in section 1, we consider matches between different versions
`
`Pg. 6
`
`
`
`of the same function as correct, even if they are not structurally
`identical as is required by BinDiff.
`The graphs of Figure 2 present the reference as the leftmost
`column for each of the eight binaries. For the top panes, which
`show correct matches, the second column gives the number of