throbber
Malware and Machine Learning
`
`Charles LeDoux and Arun Lakhotia
`
`Abstract Malware analysts use Machine Learning to aid in the fight against
`the unstemmed tide of new malware encountered on a daily, even hourly, basis.
`The marriage of these two fields (malware and machine learning) is a match made
`in heaven: malware contains inherent patterns and similarities due to code and code
`pattern reuse by malware authors; machine learning operates by discovering inherent
`patterns and similarities. In this chapter, we seek to provide an overhead, guiding
`view of machine learning and how it is being applied in malware analysis. We do
`not attempt to provide a tutorial or comprehensive introduction to either malware or
`machine learning, but rather the major issues and intuitions of both fields along with
`an elucidation of the malware analysis problems machine learning is best equipped
`to solve.
`
`1 Introduction
`
`Malware, short for malicious software, is the weapon of cyber warfare. It enables
`online sabotage, cyber espionage, identity theft, credit card theft, and many more
`criminal, online acts. A major challenge in dealing with the menace, however, is its
`sheer volume and rate of growth. Tens of thousands of new and unique malware are
`discovered daily. The total number of new malware has been growing exponentially,
`doubling every year over the last three decades.
`Analyzing and understanding this vast sea of malware manually is simply impos-
`sible. Fortunately for the malware analyst, very few of these unique malware are truly
`novel. Writing software is a hard problem, and this remains the case whether said
`software is benign or malicious. Thus, malware authors often reuse code and code
`
`C. LeDoux (B) · A. Lakhotia
`
`Center for Advanced Computer Studies, University of Louisiana at Lafayette,
`PO Box 44330, Lafayette, LA 70504, USA
`e-mail: charles.a.ledoux@gmail.com
`
`A. Lakhotia
`e-mail: arun@louisiana.edu
`
`© Springer International Publishing Switzerland 2015
`R.R. Yager et al. (eds.), Intelligent Methods for Cyber Warfare,
`Studies in Computational Intelligence 563, DOI 10.1007/978-3-319-08624-8_1
`
`1
`
`

`

`2
`
`C. LeDoux and A. Lakhotia
`
`patterns in creating new malware. The result is the existence of inherent patterns and
`similarities between related malware, a weakness that can be exploited by malware
`analysts.
`In order to capitalize on this inherent similarity and shared patterns between
`malware, the anti-malware industry has turned to the field of Machine Learning, a
`field of research concerned with “teaching” computers to recognize concepts. This
`“learning” occurs through the discovery of indicative patterns in a group of objects
`representing the concept being taught or by looking for similarities between objects.
`Though humans too use patterns in learning, such as using color, shape, sound, and
`smell to recognize objects, machines can find patterns in large swaths of data that
`may be gibberish to a humans, such as the patterns in sequences of bits of a collection
`of malware. Thus, Machine Learning has a natural fit with Malware Analysis since
`it can more rapidly learn and find patterns in the ever growing corpus of malware
`than humans.
`Both Machine Learning and Malware Analysis are very diverse and varied fields
`with equally diverse and varied ways in which they overlap. In this chapter, we seek to
`provide a guiding, overhead cartography of these varied landscapes, focusing on the
`areas and ways in which they overlap. We do not seek to provide a comprehensive
`tutorial or introduction to either Malware or Machine Learning research. Instead,
`we strive to elucidate the major ideas, issues, and intuitions for each field; pointing
`to further resources when necessary. It is our intention that a researcher in either
`Malware Analysis or Machine Learning can read this chapter and gain a high-level
`understanding of the other field and the problems in Malware that Machine Learning
`has, is, and can be used to solve.
`
`2 A Short History of Malware
`
`The theory of malware is almost as old as the computer itself, tracing back to lec-
`tures by von Neumann in late 1940s on self-reproducing automata [1]. These early
`malware, if they can be called as such, did nothing significantly more than demon-
`strate self-reproduction and propagation. For example, one of the earliest malware
`to escape “into the wild” was called Elk Cloner and would simply display a small
`poem every 50th time an infected computer was booted:
`
`Elk Cloner: The program with a personality
`
`I t will get on all your disks
`I t will
`i n f i l t r a t e your chips
`Yes it ’s Cloner!
`
`I t will stick to you like glue
`I t will modify ram too
`Send in the Cloner!
`
`

`

`Malware and Machine Learning
`
`3
`
`The term computer virus was coined in early 1980s to describe such
`self-replicating programs [2]. The use of the term was influenced by the analogy of
`computer malware to biological viruses. A biological virus comes alive after it infects
`a living organism. Similarly, the early computer viruses required a host—typically
`another program—to be activated. This was necessitated by the limitations of the
`then computing infrastructure which consisted of isolated, stand-alone, machines.
`In order to propagate, that is infect currently uninfected machines, a computer virus
`necessarily had to copy itself in various drives, tapes, and folders that would be
`accessed by different machines. In order to ensure that the viral code was executed
`when it reached the new machine, the virus code would attach itself to, i.e. infect,
`another piece of code (a program or boot sector) that would be executed when the
`drive or tape reached another machine. When the now infected code would later
`execute, so would the viral code, furthering the propagation.
`The early viruses remained mostly pranks. Any damage they caused, such as crash-
`ing a computer or exhausting disk space, was largely unintentional and a side effect of
`uncontrolled propagation. However, the number and spread of viruses quickly grew
`to enough of a nuisance that it led to the development of first anti-virus companies in
`the late 1980s. Those early viruses were simple enough that they could be detected
`by specific sequences of bytes, a la signatures.
`The advent of networking, leading to the Internet, changed everything. Since
`data could now be transferred between computers without using an external storage
`device, so could the viruses. This freedom to propagate also meant that a virus no
`longer needed to infect a host program. A new class of malware called worm emerged.
`A worm was a stand alone program that could propagate from machine to machine
`without necessarily attaching to any other program.
`Malware writing too quickly morphed from simple pranks into malicious vandal-
`ism, such as that done by the ILOVEYOU worm. This worm came as an attachment
`to an email with the (unsurprising) subject line “ILOVEYOU”. When a user would
`open the attachment, the worm would first email itself to the user’s contacts and
`then begin destroying data on the current computer. There were a number of similar
`malware created, designed only to wreak havoc and gain underground notoriety for
`their authors. These “graffiti” malware, however, soon gave way to the true threat:
`malware designed to make money and steal secrets.
`Malware today has little if any resemblance to the malware of past. For one,
`gone are the simple days of pranks and vandalism conducted by bored teenagers and
`budding hackers. Modern malware is an well-organized activity forming a complete
`underground economy with its own supply chain. Malware is now a tool used by large
`underground organizations for making money and a weapon used by governments
`for espionage and attacks. Malware targeted towards normal, everyday computers
`can be designed to steal bank and credit card information (for direct theft of money),
`harvest email addresses (for selling to spammers), or gain remote control of the
`computer. The major threat from malware, however, comes from malware targeted not
`towards the average computer, but towards a particular corporation or government.
`These malware are designed to facilitate theft of trade or national secrets, steal
`crucial information (such as sensitive emails), or attack infrastructure. For example,
`
`

`

`4
`
`C. LeDoux and A. Lakhotia
`
`Stuxnet was malware designed to attack and damage various nuclear facilities in
`Iran. These malware often have large organizations (such as rival corporations) or
`even governments behind them.
`
`3 Types of Malware
`
`Whenever there is a large amount of information or data, it helps to categorize and
`organize it so that it can be managed. Classification also aids in communication
`between people, giving them a common nomenclature. The same is true of malware.
`The industry uses a variety of methods to classify and organize malware. The classi-
`fication is often based on the method of propagation, the method of infection, and the
`objective of the malware. There is, however, no known standard nomenclature that
`is used across the industry. Classifications sometimes also come with legal impli-
`cations. For instance, can a program that inserts advertisements as you browse the
`web be termed as malicious. What if the program was downloaded and installed by
`the user, say after being enticed by some free offering? To thwart legal notices the
`industry invented the term potentially unwanted program or PUP to refer to such
`programs.
`Though there is no accepted standard for classification of malware in the industry,
`there is a reasonable agreement on classifying malware on their method of propaga-
`tion into three types: virus, worm, and trojan (short for Trojan horse).
`Virus, despite being often used as a synonym for malware, technically refers to a
`malware that attaches a copy of itself to a host, as described earlier. Propagation by
`infecting removable media was the only method for transmission available prior to
`the Internet, and this method is still in use today. For instance, modern viruses travel
`by infecting USB drives. This method is still necessary to reach computer systems
`that are not connected to the Internet, and is hypothesized as the way Stuxnet was
`transmitted.
`A trojan propagates the same way its name sake entered the city of Troy, by hiding
`inside something that seems perfectly innocent. The earliest trojan was a game called
`ANIMAL. This simple game would ask the user a serious of questions and attempt to
`guess what animal the user was thinking of. When the game was executed, a hidden
`program, named PERVADE, would install a copy of itself and ANIMAL to every
`location the user had access to. A common modern example of a trojan is a fake
`antivirus, a program that purports to be an anti-virus system but in fact is a malware
`itself.
`A worm, as mentioned earlier, is essentially a self-propagating malware. Whereas
`a virus, after attaching itself to a program or document, relies on an action from a
`user to be activated and spread, a worm is capable of spreading between network
`connected computers all by itself. This is typically accomplished one of two ways:
`exploiting vulnerabilities on a networked service or through email. The worm CODE
`RED was an example of the first type of worm. CODE RED exploited a bug in a
`specific type of server that would allow a remote computer to execute code on the
`
`

`

`Malware and Machine Learning
`
`5
`
`server. The worm would simply scan the network looking for a vulnerable server.
`Once found, it would attempt to connect to the server and exploit the known bug.
`If successful, it would create another instance of the worm that repeated the whole
`process. The ILOVEYOU worm, discussed earlier, is an example of an email worm
`and spread as an email attachment. When a user opened the attachment, the worm
`would email a copy of itself to everyone in the user’s contact list and damage the
`current machine.
`While the above methods of propagation are the mostly commonly known, they
`by no means represent all possible ways in which malware can propagate. In general,
`one of two methods are employed to get a malware onto a system: exploit a bug in
`software installed on the computer or exploit the trust (or ignorance) of the user of
`the computer through social engineering. There are many different types of software
`bugs that allow for arbitrary code to be executed and almost as many ways to trick
`a user into installing a malware. Complicating matters further, There is no technical
`reason for a malware to limit its use to only one method of propagation. It is entirely
`conceivable, as was demonstrated by Stuxnet, for a malware to enter a network
`through email or USB, and then spread laterally to other machines by exploiting
`bugs.
`
`4 Malware Analysis Pipeline
`
`The typical end goal of malware analysis is simple: automatically detect malware
`as soon as possible, remove it, and repair any damage it has done. To accomplish
`this goal, software running on the system being protected (desktop, laptop, server,
`mobile device, embedded device, etc.) uses some type of “signatures” to look for
`malware. When a match is made on a “signature”, a removal and repair script is
`triggered. The various portions of the analysis “pipeline” all in one way or another
`support this end goal [3, 4].
`The general phases of creating and using these signatures are illustrated by Fig. 1.
`Creating a signature and removal instructions for a new malware occurs in the “Lab.”
`The input into this malware analysis pipeline is a feed of suspicious programs to
`be analyzed. This feed can come from many sources such as honeypots or other
`companies. This feed first goes through a triage stage to quickly filter out known
`programs and assign an analysis priority to the sample. The remaining programs
`are then analyzed to discover what it looks like and what it does. The results of the
`analysis phase are used to create a signature and removal/repair instructions which
`are then verified for correctness and performance concerns. Once verified, these
`signatures are propagated to the end system and used by a scanner to detect, remove,
`and repair malware.
`Each of the various phases of the anti-malware analysis process is attempting to
`accomplish a related, but independent task and thus has its own unique goals and
`performance constraints. As a result, each phase can independently be automated
`and optimized in order to improve the performance of the entire analysis pipeline.
`
`

`

`6
`
`C. LeDoux and A. Lakhotia
`
`Fig. 1 Phases of the malware analysis pipeline
`
`In fact, it is almost a requirement that automation techniques be tailored for the
`specific phase they are applied in, even if the technique could be applied to multiple
`phases. For example, a machine learning algorithm designed to filter out already
`analyzed malware in the triage stage will most likely perform poorly as a scanner.
`While both the triage stage and the scanner are accomplishing the same basic task,
`detect known malware, the standard by which they are evaluated is different.
`
`4.1 Triage
`
`The first phase of analysis, triage, is responsible for filtering out already analyzed
`malware and assigning analysis priority to the incoming programs. Malware ana-
`lysts receive a very large number of new programs for analysis every day. Many
`of these programs, however, are essentially the same as programs that have already
`been analyzed and for which signatures exist. A time stamp or other trivial detail
`may have been changed causing a hash of the binary to be unique. Thus, while the
`program is technically unique, it does not need to reanalyzed as the differences are
`inconsequential. One of the purposes of triage is to filter these binaries out.
`In addition to filtering out “exact” matches (programs that are essentially the
`same as already analyzed programs), triage is typically also tasked with assigning
`the incoming programs into malware families when possible. A malware family is
`a group of highly related malware, typically originating from common source code.
`If an incoming program can be assigned to a known malware family, any further
`analysis does not need to start with zero a priori knowledge, but can leverage general
`knowledge about the malware family, such as known intent or purpose.
`A final purpose of the triage stage is to assign analysis priority to incoming
`programs. Humans still are and most likely will remain an integral part of the analysis
`pipeline. Like any other resource, what the available human labor is expended upon
`must be carefully chosen. Not all malware are created equal; it is more important
`
`

`

`Malware and Machine Learning
`
`7
`
`that some malware have signatures created before others. For example, malware that
`only affects, say, Microsoft Windows 95 will not have the same priority as malware
`that affects the latest version of Windows.
`The performance concerns for the triage phase are (1) ensuring that programs
`being filtered out truly should be removed and (2) efficient computation in order to
`achieve very high throughput. Programs filtered out by triage are not subjected to
`further analysis and thus it is very important that they do not actually need further
`analysis. Especially dangerous is the case of malware being filtered out as a benign
`program. In this case, that particular malware will remain undetectable. Marking
`a known malware or a benign program as malware for further processing, while
`undesirable, is not disastrous as it can still be filtered out in the later processing stages.
`Along the same lines, it is sufficient that malware be assigned to a particular family
`with only a reasonably high probability rather than near certainty. Finally, speed is
`of the utmost importance in this stage. This stage of the analysis pipeline examines
`the largest number of programs and thus requires the most efficient algorithms.
`Computationally expensive algorithms at this stage would cause a backlog so great
`that analysts would never be able to keep up with malware authors.
`
`4.2 Analysis
`
`In the analysis phase, information about what the program being analyzed does, i.e.
`its behavior, is gathered. This can be done in two ways: statically or dynamically.
`Static analysis is performed without executing the program. Information about
`the behavior of the program is extracted by disassembling the binary and converting
`it back into human readable machine code. This is not high level source code, such as
`C++, but the low level assembly language. An assembly language is the human read-
`able form of the instructions being given directly to the processor. ARM, PowerPC,
`and ×86 are the better known examples of assembly languages. After disassembly,
`the assembly code (often just called the malware “code” for short) can be analyzed
`to determine the behavior of the program. The methods for doing this analysis con-
`stitute an entire research field called program analysis and as such are outside the
`scope of this chapter. Nielson et al. [5] have a comprehensive tutorial to this field.
`Static analysis can theoretically provide perfect information about the behavior of
`a program, but in practice provides an over approximation of the behaviors present.
`Only what is in the code is what can be executed, thus the code contains everything
`the program can do. However, extracting this information from a binary can be
`difficult, if not impossible. Perfectly solving many of the problems of static analysis
`is undecidable.
`As an example of the problems faced by static analysis, binary disassembly is
`itself an undecidable problem. Binaries contain both data and code and separating
`the two from each other is undecidable. As a result some disassemblers treat the entire
`binary, including data, as if it were code. This results in a proper extraction of most
`of the original assembly code, along with much code that never originally existed.
`
`

`

`8
`
`C. LeDoux and A. Lakhotia
`
`There are many other methods of disassembly, such as the recent work by Schwarz
`et al. [6]. While these methods significantly improve on the resulting disassembly,
`none can guarantee correct disassembly. For instance, it is possible that there exists
`“dead code” in the original binary, i.e. code that can never be reached at runtime.
`In an ideal disassembly, such code ought to be excluded. Thus all of static analysis
`operates on approximations. Most disassemblers used in practice do not guarantee
`either over approximation or under approximation.
`Dynamic analysis, in contrast with static analysis, is conducted by actually exe-
`cuting the program and observing what it does. The program can be observed from
`either within or without the executing environment. From within uses the same tools
`and techniques software developers use to debug their own programs. Tools that
`observe the operating system state can be utilized and the analyzed program run in
`a debugger. Observation from without the execution environment occurs by using a
`specially modified virtual machine or emulator. The analyzed program is executed
`within the virtual environment and the tools providing the virtualization observe and
`report the behavior of the program.
`Dynamic analysis, as opposed to static analysis, generally provides an under
`approximation of the behaviors contained in the analyzed program, but guarantees
`that returned behaviors can be exhibited. Behaviors discovered by dynamic analysis
`are obviously guaranteed to be possible as the program was observed performing
`these behaviors. Only the observed behaviors can be returned, however. A single
`execution of a program is not likely to exhibit all the behaviors of the program as
`only a single path of execution through the binary is followed per run. A differing
`execution environment or differing input may reveal previously unseen behaviors.
`
`4.3 Signatures and Verification
`
`While the most common image conjured by the phrase “malware signatures” is
`specific patterns of bytes (often called strings) used by an Anti-Virus system to detect
`a malware, we do not use the term in that restricted sense. What we mean by signature
`is any method utilized for determining if a program is malware. This can include the
`machine learning system built to recognize malware, a set of behaviors marked as
`malicious, a white list (anything not on the white list is marked as malicious), and
`more. The important thing about a signature is that it can be used to determine if a
`program is malware or not.
`Along with the signatures, instructions for how to remove malware that has
`infected the system and repair any damage it has done must also be created. This
`is usually done manually, utilizing the results of the analysis stage. Observe what
`the malware did, and then reverse it. One major concern here is ensuring that the
`repair instructions do not cause even more damage. If the malware changed a registry
`key, for example, and the original key is unknown, it may be safest to just leave the
`key alone. Changing it to a different value or removing it all together may result
`
`

`

`Malware and Machine Learning
`
`9
`
`in corrupting the system being “protected.” Thus repair instructions are often very
`conservative, many times only removing the malware itself.
`Once created, the signatures need to be verified for correctness and, more impor-
`tantly, for accuracy. Even more important than creating a signature that matches the
`malware is creating a signature that only matches the malware. Signatures that also
`match benign programs are worse than useless; they are acting like malware them-
`selves! Saying that benign programs are actually malware, called a false positive,
`is an error that cannot be tolerated once the signatures have been deployed to the
`scanner.
`
`4.4 Application
`
`Once created, the signatures are deployed to the end user. At the end system, new
`files are scanned using the created signatures. When a file matches a signature, the
`associated repair instructions followed.
`The functionality of the scanner will depend on the type of signature created.
`String based signatures will use a scanner that checks for existence of the string in
`the file. A scanner based on Machine Learning signatures will apply what has been
`learned through ML to detect malware. A rule based scanner will check if the file
`matches its rules, and so on and so forth.
`
`5 Challenges in Malware Analysis
`
`One of the fundamental problems associated with every step of the malware analysis
`pipeline is the reliance on incomplete approximations. In every stage of the pipeline,
`the exact solution is generally impossible. Triage cannot perfectly identify every
`part of every program that has already been identified. Analysis will generate either
`potentially inaccurate or incomplete information. All types of signatures are limited.
`Even verification is limited by what can be practically tested.
`Naturally, malware authors have developed techniques that directly attack each
`stage of the analysis pipeline and shift the error in the inherent approximations to their
`favor. Packing and code morphing are used against triage to increase the number of
`“unique” malware that must be analyzed. Packing, tool detection, and obfuscation are
`used against the analysis stage to increase the difficultly of extracting any meaningful
`information.
`While the ultimate goal of the malware authors is obviously to completely avoid
`detection, simply increasing the difficulty of achieving detection can be considered a
`“win” for the malware authors. The more resources consumed in analyzing a single
`malware, the less total malware that can be analyzed and detected. If this singular
`cost is driven high enough, then detection of any but the most critical malware simply
`becomes too expensive.
`
`

`

`10
`
`5.1 Code Morphing
`
`C. LeDoux and A. Lakhotia
`
`The most common and possibly the most effective attack against the malware analysis
`pipeline targets the first stage: triage. The attack is to simply inundate the pipeline
`with as many unique malware as possible. Unique is not used here to mean novel,
`i.e. does something unique; here it simply means that the triage stage considers it
`something that has not been analyzed before. Analysis stages further down the pipe
`from Triage are allowed to be more expensive because it is assumed Triage has
`filtered out already analyzed malware, severely reducing the number of malware the
`expensive processes are run on. By slipping more malware past Triage and forcing
`the more expensive processes to run, the cost of analysis can be driven up, possibly
`prohibitively high.
`One of the ways this attack is accomplished is through automated morphing of
`the malware’s code into a different but semantical equivalent form. Such malware
`is often called metamorphic or polymorphic. Before infecting a new computer, a
`rewriting engine changes what the code looks like through such means as altering
`control flow, utilizing different instructions, and adding instructions that have no
`semantic effect. The changes performed by the rewriting engine only change the
`look or syntax of the code and leave its function or semantics intact. The result is
`that each “generation” of metamorphic malware is functionally equivalent, but the
`code can be radically different.
`While several subtle variations in definitions exist, we view the difference between
`metamorphic and polymorphic malware as where the rewriting engine lies. Metamor-
`phic malware contains its own, internal rewriting engine, that is, the malware binary
`rewrites itself. Polymorphic malware, on the other hand, have a separate mutating
`engine; a separate binary rewrites the malware binary. This mutating engine can
`either be distributed with the malware (client side) or kept on a distributing server
`and simply distribute a different version of malware every time (server side).
`Metamorphic malware is more limited than polymorphic malware in the transfor-
`mations it can safely perform. Any rewriting engine is going to contain limitations
`as to what it can safely take as input. If the engine is designed to modify the con-
`trol flow of the program, for example, it will only be able to rewrite programs for
`which it can identify the existing control flow. Since metamorphic malware contains
`its own rewriting engine, the output of the rewriting engine must be constrained to
`acceptable input. Without this constraint, further mutations would not be possible.
`Polymorphic malware, however, does not contain this constraint. Since the rewriting
`engine is separate and can thus always operate over the exact some input, the output
`does not need to be constrained to only acceptable input.
`
`

`

`Malware and Machine Learning
`
`5.2 Packing
`
`11
`
`Packing is a process whereby an arbitrary executable is taken and encrypted and
`compressed into a “packed” form that must be uncompressed and decrypted, i.e.
`“unpacked”, before execution. This packed version of the executable is then pack-
`aged as data inside another executable that will decompress, decrypt, and run the
`original code. Thus, the end result is a new binary that looks very different from the
`original, but when executed performs the exact same task, albeit with some additional
`unpacking work. A program that does packing is referred to a packer and the newly
`created executable is called the packed executable.
`Packing directly attacks Triage and static analysis. While packing a binary does not
`modify any of the malware’s code, it drastically modifies the binary itself, potentially
`even changing a number of statistical properties. If there is some randomization
`within the packing routine, a binary that appears truly unique will result every time
`the exact same malware is packed. Unless the Triage stage can first unpack the binary,
`it will not be able to match it to any known malware.
`Packing does more than simply complicate the triage stage, it also directly attacks
`any use of static analysis. As discussed in Sect. 4.2, the first step in static analysis
`is usually to disassemble the binary. Packing, however, often encrypts the original
`binary, preventing direct disassembly. A disassembler will not be able to mean-
`ingfully interpret the stored bits unless it is first unpacked and the original binary
`recovered.
`The need to unpack a program (recover the original binary) is usually not a straight
`forward task—hence the existence of a challenge. As one might expect, there exists
`very complex packers intentionally designed to foil unpacking. Some packers, for
`example, only decrypt a single instruction at a time while others never fully unpack
`the binary and instead run the packed program in a virtual machine with a randomly
`created instruction set.
`It might seem that simply detecting that an executable was packed would be
`sufficient to determine that it was malware. There are, however, legitimate uses for
`packing. First, packing is capable of reducing the overall size of the binary. The
`compression rate of the original binary is often large enough that even with the
`additional unpacking routine (which can be made fairly small), the packed binary
`is smaller in size than the original binary. Of course, when size is the only concern,
`the encryption part of packing is unnecessary. So, perhaps detecting encryption is
`sufficient? Unfortunately, no. Encryption has a legitimate application in protecting
`intellectual property. A software developer may compress and encrypt the executables
`they sell and ship to prevent a competitor from reversing the program and discovering
`trade secrets.
`
`

`

`12
`
`5.3 Obfuscation
`
`C. LeDoux and A. Lakhotia
`
`While packing attempts to create code that cannot be interpreted at all, obfuscation
`attempts to make extracting meaning from the code, statically or dynamically, as dif-
`ficult as possible. In general, obfuscation refers to writing or transforming a program
`into a form that hides its true functionality. The simplest example of a source code
`obfuscation is to give all variables meaningless names. Without descriptive names,
`the analyst must determine the purpose of each variable. At the binary level, examples
`of obfuscation include adding dead code (valid code that is never executed), inter-
`leaving several procedures within each other, and running all control flow through a
`single switch statement (called control flow flattening). An in depth treatment of code
`obfuscation, including methods for deobfuscating the code, is given by Collberg and
`Nagra [7].
`
`5.4 Tool Detection
`
`A major problem in dynamic analysis is malware detecting that it is being analyzed
`and modifying its behavior. Static analysis has a slight advantage in that the analyzed
`malware has no control over the analysis process. In dynamic analysis, however,
`the malware is actually being executed and so can be made capable of altering its
`behavior. Thus, malware authors will often check to see if any of the observation
`tools often used

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket