Research ArticleAll-in-One Framework for Detection Unpacking andVerification for Malware Analysis
Mi-Jung Choi Jiwon Bang Jongwook Kim Hajin Kim and Yang-Sae Moon
Department of Computer Science Kangwon National University 1 Kangwondaehak-gil Chuncheon-siGangwon 24341 Republic of Korea
Correspondence should be addressed to Yang-Sae Moon ysmoonkangwonackr
Received 10 April 2019 Revised 21 August 2019 Accepted 5 September 2019 Published 13 October 2019
Academic Editor Jesus Dıaz-Verdejo
Copyright copy 2019Mi-Jung Choi et alis is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
Packing is the most common analysis avoidance technique for hiding malware Also packing can make it harder for the securityresearcher to identify the behaviour of malware and increase the analysis time In order to analyze the packed malware we need toperform unpacking rst to release the packing In this paper we focus on unpacking and its related technologies to analyze thepacked malware rough extensive analysis on previous unpacking studies we pay attention to four important drawbacks nophase integration no detection combination no real-restoration and no unpacking verishycation To resolve these four drawbacks inthis paper we present an all-in-one structure of the unpacking system that performs packing detection unpacking (ie res-toration) and verication phases in an integrated framework For this we rst greatly increase the packing detection accuracy inthe detection phase by combining four existing and new packing detection techniques We then improve the unpacking phase byusing the state-of-the-art static and dynamic unpacking techniques We also present a verication algorithm evaluating theaccuracy of unpacking results Experimental results show that the proposed all-in-one unpacking system performs all of the threephases well in an integrated framework In particular the proposed hybrid detection method is superior to the existing methodsand the system performs unpacking very well up to 100 of restoration accuracy for most of the les except for a few packers
1 Introduction
Recently as the Internet usage has explosively increased therisk of malware exposure is also rapidly increasingAccording to the 2017 AV-Test security report [1] about sixbillion malwares are used annually in DDoS (distributeddenial of service) spammails and APT (advanced persistentthreat) In addition due to the advent of new malwaresexploiting analysis avoidance techniques there have beenmany research eorts on personal information protectionmalicious code detection and malware analysis technology[2ndash10] Among the analysis avoidance techniques packing isthe most common one used to hide malware Packing alsoknown as ldquoexecutable compressionrdquo is a technique forcompressing an executable le to reduce the le size whilepreserving its format Packing is originally developed toreduce storage space but malicious users exploit it to hidemalware in the executable le [11 12] According to the
WildListrsquos 2006 report more than 92 of malwares arerunning compression technology [13] Since packing mostlytransforms the original code we need to perform unpackingrst before analyzing the packed les which may includemalware In this paper we focus on such unpacking tech-niques used for malware detection and analysis
In order to unpack the packed malware we need a phaseof detecting whether or not the le is packed If we concludethe le is packed we restore (ie unpack) the le andsometimes verify the unpacked le However the existingwork has separately developed these three phases of packingdetection unpacking and verication and thus the analysthas dibrvbarculty in using all these three phases in an integratedmanner Moreover there are many detection methods[14ndash19] but there has been no attempt to combine thesedetection methods We also note that the previousunpacking research focuses on nding OEP (original entrypoint) the rst command address where the actual program
HindawiSecurity and Communication NetworksVolume 2019 Article ID 5278137 16 pageshttpsdoiorg10115520195278137
starts but does not address actual restoration of packed filesIn addition even if unpacking is successful there is noverification for the unpacked files to evaluate theirunpacking accuracy or reliability Based on a thoroughsurvey on recent studies and products we pay attention tofour important observations (1) no integration of necessaryphases (2) no combination of detection techniques (3) noreal-restoration of packed malware and (4) no verificationof unpacked images We briefly call these observations nophase integration no detection combination no real-resto-ration and no unpacking verification respectively
e goal of this paper is to propose an all-in-oneunpacking system solving or improving four observations asshown in Table 1 We explain each observation and itssolution in detail as follows First no phase integration is aproblem in which analysts have to perform each phaseseparately since the packing detection unpacking andverification phases are separately developed To resolve thisproblem we present an all-in-one unpacking system thatintegrates all these three phases of packing detectionunpacking and verification As shown in Table 2 eachexisting method focuses on a particular phase at is manystudies have been done in depth focusing on one of the threephases covered in this paper In many real applicationshowever we often need to apply all three phases at once orsequentially rather than just one phase To satisfy this de-mand the proposed all-in-one system supports all threephases together Since the system supports all necessaryphases in an integrated framework we can easily analyze themalware and obtain the objective restoration rate throughthe actual unpacking and verification phases
e second observation no detection combination isthat there is no attempt to combine various packingdetection methods Based on this observation we proposea hybrid approach to improve the packing detectionaccuracy by combining four existing and new packingdetection methods e third observation no real-res-toration is that the previous work tries to find only OEPwhen it detects packing but there is little discussionabout restoring the actual executable file We improvethis problem by actually restoring the image of theunpacked file e fourth observation no unpackingverification is that there is no previous work to measurethe restoration accuracy of unpacked images We resolvethis problem by presenting a verification algorithm toquantitatively evaluate the restoration accuracy ofunpacked file images
In this paper we implement the all-in-one unpackingsystem to reflect the solutions of Table 1 and empiricallyevaluate the proposed system First we verify each phase ofthe all-in-one system to confirm that the overall workingmechanism works well Next we construct a dataset com-posed of 2 600 PE (portable executable) files and use the setin evaluation of detection unpacking and verificationphases of the all-in-one system In the detection phase weneed to determine the entropy range first and through apreliminary experiment we set it to be less than 600 orgreater than 685 Experimental results of comparing theproposed hybrid method with individual or combined
detection stage(s) show that the proposed method shows thehighest detection accuracy up to 984 without any falsepositives We also try to actually unpack all files of thedataset to verify the unpacking phase and we confirm thatall files including Yodarsquos Protector [22] are 100 unpackedFinally through the evaluation of the verification phase wesee that most files except those packed by some packers showup to 100 restoration accuracy
e contribution of the paper is as follows First this isthe first attempt to integrate detection unpacking andverification phases into a single unified framework eproposed all-in-one concept which integrates all threephases rather than focusing on only one phase allows usersto more easily detect and analyze malware Second based onempirical experience we present a hybrid approach ofpacking detection that exploits four existing and new de-tection techniques ird we perform actual unpackingrestoration beyond detecting OEPs only Fourth we proposea verification algorithm that measures the accuracy ofunpacking results Fifth through various experiments wedemonstrate the superiority of the proposed all-in-oneunpacking system
e rest of the paper is organized as follows Section 2describes related work on packing detection and unpackingSection 3 presents an overall architecture of the proposed all-in-one unpacking system Section 4 explains detectionunpacking and verification phases in detail to show how theall-in-one system works Section 5 presents experimentalresults We finally summarize and conclude the paper inSection 6
2 Related Work
21 Packing Detection Techniques It takes much time todetect and analyze the malware to which packing is appliedand thus there have beenmany studies on packing detectionand unpacking techniques Choi et al [15] propose PHADthat detects packed files by analyzing the header informationof PE files PHAD selects eight characteristic variables todistinguish between the general file and the packed filethrough heuristic analysis of the PE header and based onthese variables it determines whether a file is packed or notMore specifically it calculates the Euclidean distance of eightvariables selected by the characteristic vector (CV) andconfirms it to be packed if that distance exceeds the heu-ristically determined minimum threshold PHAD shows thehigher detection accuracy with lower false-negative rates ascompared to commonly used software PEiD [23] but has adrawback that many false positives occur
Lyda and Hamrock [17] use the entropy-based analysisto detect encrypted or packed malware Entropy is a measureof the uncertainty of information and packed files tend tohave higher entropy than regular files because they compressthe original executable sections or collapse those sectionsinto a few new sections We calculate the entropy as shownin equation (1) where p(i) is the probability of the i-th unitof information (such as a number) in event xrsquos series of nsymbols is equation generates entropy scores as realnumbers [17]
2 Security and Communication Networks
H(x) minus 1113944
n
i1p(i)log2p(i) (1)
e entropy-based method has an advantage of beingeasily applied to various packers However it causes a falsepositive if the entropy of a normal file is low or a falsenegative if that of a packed file is high
Han and Lee [16] propose REMINDer that detects thepacking based on the entropy value of the entry point sectionand the WRITE attribute ey note that a large number offalse positives occur because the entropy calculation range istoo wide and thus REMINDer uses the EP (entry point)section only in computing the entropy value In addition tothe entropy-based detection method of the EP section theyalso use the WRITE attribute test which is an essentialfeature of the packed file to reduce false positives is isbecause unlike ordinary files packed PE files requireWRITE permission to perform unpacking before the file isexecuted
Arora proposed by Rohit et al [19] is a packing detectiontechnique for analyzing the PE information in a packed fileusing a heuristic algorithm It defines various parameters forpacking analysis and presents a heuristic algorithm forassigning weights and risk factors to those parameters Basedon the training set to which weights and risk factors areassigned it determines whether or not a given input file ispacked
22 Unpacking Techniques Unpacking techniques areroughly classified into four types e first is a direct analysismethod in which a person unpacks directly using an analysistool is method has been popular in the early days andtypical analysis tools include OllyDbg PlugIn [24] Immu-nity Debugger [25] and IDA PlugIn [26] is manualunpacking is relatively accurate but it takes too much timeto manually analyze all the commands of the packed file
e second is a static unpacking method based on thecharacteristics of the specific packing algorithm(s) Since thismethod is based on the characteristics of the packing al-gorithm it is very useful when the prior knowledge of thealgorithm is known Static unpacking is commonly used inantivirus programs and it has an advantage of no infectionrisk and fast unpacking since the packed file does not need tobe directly executed However it has a disadvantage that wecannot use it if we do not know the packing algorithm usedor if the packing technique is partially modified
e third is a dynamic unpacking method that does notdepend on the packing algorithm(s) ere have been manystudies on dynamic unpacking since we can do unpackingeven without knowing the exact packing algorithm usedJeong et al [21] propose an entropy-based dynamicunpacking method that finds the OEP (original entry point)based on the characteristics of increasing the entropy if thefile is packed Bat-Erdene et al [27] classify unpacking al-gorithms into multiple clusters by measuring the entropychanges during the unpacking process Cesare and Yang [28]propose an algorithm for constructing a control flow graphsignature using an inverse transformation technique eyperform the entropy analysis first to determine whether ornot the PE file is packed and if it is packed they find thehidden code by investigating the end of packing throughdynamic analysis Moreover recent dynamic methods in-clude OmniUnpack [29] Renovo [30] and PinDemonium[31] OmniUnpack unpacks the PE file by detecting theexecuting offset of the original code at the page level insteadof at the instruction level Renovo exploits shadowmemory tomonitor program execution and memory writes andthrough the shadow memory it extracts the hidden codefrom the executable Finally PinDemonium uses the Dy-namic Binary Instrumentation (DBI) technique to performImport Address Table (IAT) analysis JUMP command de-tection and entropy calculation and through these pro-cesses it unpacks the PE file
Table 1 Observations and solutions for the proposed all-in-one unpacking system
Observation Explanation Solution
No phase integration Unpacking-related three phases are separatelydeveloped
Adopt an all-in-one approach integrating all threephases
No detection combination ere is no attempt to combine various existingmethods for packing detection
Combine four packing detection methods to improvedetection accuracy
No real-restoration Main goal is to find OEP Restore unpacked files by performing actualunpacking as well as finding OEP
No unpacking verification ere is no quantitative way to verify the restorationaccuracy
Present a verification algorithm to evaluate theaccuracy of unpacking results
Table 2 Comparison of analytical phases supported by existing and proposed systems
Analytical phases PHAD [15] REMINDer [16] PEframe [20] [11 13 21] All-in-one systemDetection times times
Unpacking Static times times Dynamic times times
Verification times times times
Security and Communication Networks 3
e fourth is an integration method of both static anddynamic unpacking techniques Representative examples ofsuch integration methods are PolyUnpack [26] CoDisasm[32] and BinUnpack [3] PolyUnpack and CoDisasm firstextract the static model through static analysis before exe-cuting the file en they unpack the PE file by comparingthe extracted static model to the dynamic model obtained byrunning the executable PolyUnpack unpacks the file byexploiting iterative comparisons of two models andCoDisasm does it by comparing memory snapshots re-peatedly Finally BinUnpack first performs static analysis bydeveloping a hook-evasion-resistant API monitor and it thenperforms dynamic analysis by monitoring API calls with theanalyzed information
3 Overall Architecture of All-in-OneUnpacking System
As we mentioned in Section 1 we note four importantobservations no phase integration no detection combi-nation no real-restoration and no unpacking verificationIn this section we describe the overall architecture of theall-in-one unpacking system that considers these fourobservations Figure 1 shows a flow diagram of the pro-posed all-in-one system As shown in the figure we im-prove no phase integration by supporting all three phases ofdetection unpacking and verification in an integratedframework In the diagram for a given input PE file (1①)the system first determines whether or not it is packed(2②) If the file is not packed it proceeds to check the nextPE file If it is packed the system checks whether or notthere is an unpacking library used in packing (3③) If theunpacking library exists the system performs staticunpacking using the library (4④) otherwise the systemperforms dynamic unpacking that depends on packingalgorithm (5⑤) After completing the unpacking thesystem calculates the restoration accuracy to verify thecorrectness of unpacking (6⑥)
We use the existing and proposed techniques togetherto implement each phase of the all-in-one unpackingsystem Table 3 summarizes the existing or proposedmethods used in each phase First in the detection phasewe use four techniques to detect the packing of the input PEfile is hybrid approach improves no detection combi-nation Among four techniques the EP section test is a newone proposed in this paper which increases the packingdetection rate by investigating the explicit existence of theEP section and we describe it in detail in Section 41Second in the unpacking phase we use either staticunpacking or dynamic unpacking depending on whether ornot an unpacking library exists is unpacking phaserestores the actual file and improves no real-restorationird in the verification phase we verify the correctness ofunpacking by computing the restoration accuracy Wecalculate the restoration accuracy by comparing the bytecodes of two files is phase improves no unpackingverification and we describe this verification algorithm indetail in Section 43
4 Detection Mechanism of All-in-OneUnpacking System
41 Detection Phase As we explained in Section 3 whilethere have been various packing detection techniques therehas been no attempt to combine the advantages of thesetechniques Since they have been studied separately eachtechnique carries out packing detection focusing on its owncharacteristics only In this paper we propose a hybridapproach to improve the packing detection accuracy bycombining four different techniques
e detection phase consists of four tests EP sectionsignature WRITE attribute and entropy tests In this paperwe describe each of these techniques as Detection Phase-NoEP Section (DP-NEP) Detection Phase-Signature (DP-SIG)Detection Phase-WRITE (DP-WR) and Detection Phase-Entropy (DP-ENT) respectively Table 4 compares thedetection techniques used in the existing work and theproposed approach As shown in the table PEiD supportsonly DP-SIG [17] supports only DP-ENT [16] supports DP-WR and DP-ENT but the proposed method supports allfour techniques
Algorithm 1 presents the proposed hybrid approach ofpacking detection It takes the PE file as an input andconfirms if the file is packed or not Since DP-SIG DP-WRand DP-ENT work based on the EP section of a PE file wefirst check DP-NEP to confirm if the EP section exists If theEP section does not exist we determine that the file is packed(Line 1) On the contrary if the EP section exists we performDP-SIG to check whether or not the PE file has a packersignature If the signature exists ie if a packer is used wedetermine that the file is packed (Line 2) We then performDP-WR and DP-ENT where DP-WR checks whether or notthe WRITE attribute exists in the EP section and DP-ENTinvestigates whether or not the entropy of the EP section is ina specific range Packing-Range For example this can beunder 60 or over 068 in this paper If both conditions holdwe determine it packed and return Packed (Line 3-4) Fi-nally anyone of DP-NEP DP-SIG DP-WR and DP-ENP isnot satisfied we return Not-Packed (Line 5)
We now explain four packing detection cases of Algo-rithm 1 in detail First DP-NEP finds the EP section fromthe input PE file We here conclude that the file is packed if ithas no EP section is is because every PE file has an EPsection so no EP section means that someone hides thesection intentionally We can know the existence of the EPsection by investigating an EP section address in the headerof the PE file We explain the effect of this DP-NEP case indetail According to actual experimental results we couldnot find the EP section from some PE files packed by apacker such as Upack [34] and PESpin [35] is is becausesuch packers intentionally hide or scramble the EP sectionObviously we need to detect these PE files as the packedones in the detection phase and Algorithm 1 does Algo-rithm 1 detects such ldquoNo EP Sectionrdquo PE file as the DP-NEPcase resulting in improving the packing detection rate Inother words considering DP-NEP in Algorithm 1 we ex-ploit the effect of increasing the packing detection rate
4 Security and Communication Networks
especially for Unpack and PESpin packers In the case of DP-NEP however we cannot find the EP section itself andaccordingly we cannot proceed the next phases ofunpacking and verification More precisely unpacking suchDP-NEP packers might be possible if we manually modifythe raw hexadecimal codes by using a binary editorHowever this manual raw datamodification is beyond scopeof the paper and we do not consider the manual modifi-cation of hexadecimal codes In summary we use theconcept of ldquoNo EP Sectionrdquo in the detection phase of Al-gorithm 1 to increase the packing detection rate but we donot consider it in the unpacking and verification phases
Second in DP-SIG we investigate whether or not thesignature extracted from the EP section is in the signaturedatabase We construct the signature database with a total of7 000 signatures consisting of 4 200 signatures provided byPEiD [23] and 2 800 signatures collected from BobSoft [36]
Figure 2 shows three sample signatures of the packers storedin the database e first line of each packer represents thetype and version of the packer and the maker of the sig-nature the second line is the hexadecimal values of thepackerrsquos signature and the third line of the ep_only columnis true or false that indicates whether the signature can befound in the EP section
ird in DP-WR we check the WRITE attribute of theEP section In the process of unpacking a packed file thesystem needs to modify the memory so the packed fileabsolutely requires the WRITE attribute us we suspectthat the file is packed if theWRITE attribute exists Howeversince a normal file can also have the WRITE attribute wecannot determine packing only by the existence of theWRITE attribute erefore as proposed in [16] we detectpacking using the WRITE attribute and the entropy valuetogether
Fourth in DP-ENT we compute the entropy of an inputPE file using equation (1) and check whether that entropy isin a specific range (Packing-Range in Algorithm 1) is isbased on the previous observation [16 17] that the entropyof a packed file differs from that of an ordinary file Wecompute the entropy value for the EP section only not theentire file using equation (1) To use DP-ENT we need to setPacking-Range used for determining whether or not a file ispacked In this paper we set this range through the ex-periment in Section 53
42 Unpacking Phase As mentioned in Section 22unpacking which is performed automatically without hu-man intervention can be divided into static unpacking anddynamic unpacking Static unpacking is based on thecharacteristics of the packing algorithm so it is fast and hasno risk of infection because of no direct execution of thepacked file On the contrary dynamic unpacking has acharacteristic that it does not depend on a specific packingalgorithm even though it takes a long analysis time In thispaper we use both static and dynamic methods to maximizethe unpacking accuracy More specifically if the packer typefound in the detection phase provides an unpacking librarywe use the static unpacking otherwise we perform the
Detection phase
Unpacking phase
Verification phase
① Input PE file
② Packing detection
④ Static unpacking ⑤ Dynamic unpacking
Yes
No
③ Unpacking library
Yes No
⑥ Verification
Figure 1 Overall working mechanism of the all-in-one unpacking system
Table 3 Existing or proposed techniques used in each phase of theall-in-one unpacking system
Analyticalphase Techniques used or proposed
Detection
(i) EP Section test to be proposed in Section 41(ii) Signature test [23]
(iii) WRITE attribute test [16](iv) Entropy test [17]
Unpacking(i) Staticmdashlibrary-based unpacking [33]
(ii) Dynamicmdashentropy change-based unpacking[14 27]
Verification Verification algorithm to be proposed in Section43
Table 4 Detection techniques used in existing and proposedapproaches
Method DP-NEP DP-SIG DP-WR DP-ENTPEiD [23] times times times
[17] times times times [16] times times Proposed approach
Security and Communication Networks 5
dynamic unpacking We can easily solve the static unpackingby using the found unpacking library for each packer so wefocus on explaining how the dynamic unpacking works indetail in this section Our dynamic unpacking method is alsoknown as the ldquoexecute-after-writerdquomethodMore precisely theproposed method first restores the original file into thememory by finding the OEP from the packed file and thenanalyzes (ie executes) the restored original file in thememoryIn order to avoid the infection of realmachines we perform theunpacking and validation phases in the debugging mode
e OEP is the address of the first instruction to executeafter the packed file is unpacked and it represents the entrypoint of the original program In other words since the codesfollowing the OEP represent the starting codes of the originalprogram finding such OEP is the most important issue indynamic unpacking To find the OEP we use the entropychange-based analysis proposed by Bat-Erdene et al [27]When a packed file is executed the packed data is uncom-pressed and written to the memory and at this time the datamight be changed or added to the file Because of this datachange the entropy value continuously changes during theunpacking process If unpacking is successful the entropy valueis stabilized the original file is restored and the IP (instructionpointer) is moved to another section of the restored original filefor its execution us we can find the OEP by examiningwhether the IP moves to another section when there is nochange in the entropy value of each section (If the heuristic ofinvestigating change of IP sections is known someone maycreate a malicious packing method that can evade the
heuristic However for this avoidance the OEP must befound without change of IP sections and packing a PE filesuch way might not be easy According to the actualanalysis result we do not find such cases in the 19 packerswe covered Even though the case hardly exists it mighthave occurred in the worst case We leave the advancedapproach that works against such avoidance as a furtherstudy) To accomplish this we need to execute the targetprocess in units of instructions and monitor the entropychange however measuring the entropy value after everyinstruction is very time-consuming and very inefficientWe here note that the OEP generally exists after a branchinstruction so we calculate the entropy value only whenthe instruction is JMP Conditional JMP (CJMP) orRETN We also note that the iterations at program exe-cution are independent of the OEP so we store and reusethe entropy values measured after JMP CJMP and RETNinstructions to avoid duplicated calculations by the it-erations and reduce the total analysis time
Algorithm 2 shows the proposed dynamic unpackingalgorithm based on the entropy changes and it returnsthe OEP address for a given PE file First it computes theinitial entropy of each section (Line 1) en it repeatsthe algorithm until the process ends (Lines 2 to 18) If thecurrent IP address is greater than the last address of thecurrent section it returns Not-Found (Line 4) is isbecause IP moves to the next section without a JMPfamily instruction so it cannot find OEP in this sectionOn the contrary if IP is one of JMP CJMP or RETN itexecutes the file up to IP and stores the address of thenext instruction in DstAddr (Lines 5 to 7) But if IP existsin IP-history or DstAddr exists in DstAddr-history itmeans an iterative instruction or address so we move tothe next IP (Line 8) Otherwise it stores IP and DstAddrinto each history for the next iteration (Line 9) At thistime if DstAddr is out of the range of the file it returnsNot-Found because it cannot find OEP within the file(Line 10) If DstAddr is in the normal range it calculatesthe entropy of each section again and compares thecurrent entropy with the previous entropy to checkwhether it is stable (Lines 11 to 13) If the entropy changeis stable and DstAddr is in a different section it meansthat unpacking a file is successful and IP is moved toanother section by a JMP family instruction us it
[Code-Lock vxx]
signature = 47 8B C2 05 1E 00 52 8B D0 B8 02 3D CD 21 8B D8 5A
ep_only = true
[CodeCrypt v014b]
signature = E9 C5 02 00 00 EB 02 83 3D 58 EB 02 FF 1D 5B EB 02 0F C7 5F
ep_only = true
[CodeCrypt v015b]
signature = E9 31 03 00 00 EB 02 83 3D 58 EB 02 FF 1D 5B EB 02 0F C7 5F
ep_only = true
Figure 2 Examples of packer signatures stored in the signaturedatabase
Input A PE fileOutput Packed or Not-Packed Packing detection resultbegin
(1) if No EP Section then return Packed DP-NEP(2) else if Packer Signature Found then return Packed DP-SIG(3) else if ldquoWRITErdquo enabled and EP Section Entropy isin Packing-Range then(4) return Packed DP-WR DP-ENT(5) else return Not-Packed(6) end-if
end
ALGORITHM 1 Packing detection
6 Security and Communication Networks
stops the execution and returns DstAddr as OEP (Lines13 and 14)
43 Verification Phase In this section we present an al-gorithm for measuring the restoration accuracy by com-paring the restored file with its original file To execute thepacked file the proposed system uses dynamic unpackingand finds the OEP during its execution If the system findsthe OEP we assume the restoration of an original file issuccessful and regard that file as the restored file In theverification phase we quantitatively measure the restorationaccuracy by comparing such restored file and its original file
Figure 3 shows an example of comparing some bytecodesof the original file with those of the restored file For thecomparison we first pack the pspasswdexe file provided byMicrosoft [37] and then unpack it again by dynamicunpacking e figure shows that the bytecodes of the foursections of the original file text rdata data and rsrc areidentically appeared in UPX0 UPX1 and rsrc sections ofthe restored file at is even if the name and location ofeach section are different the same bytecodes exist in therestored file However the restored file contains padding andgarbage values and some bytecodes may be in other loca-tions in the original file us in order to compare theoriginal and the restored files we need to search where theoriginal bytecodes are located in the restored file Sincemanually comparing the bytecodes of the original and re-stored files takes a very long time in this paper we proposean efficient search algorithm using a hash function We usethe well-known MD5 [38] as a hash function of the searchalgorithm As a message reduction algorithm MD5 is widelyused in checking the integrity of long data
Algorithm 3 shows the restoration accuracy calculationalgorithm Its basic principle is to check whether eachsection of the original file exists in the restored file We hereuse the MD5 hash function to speed up the scan e inputsto the algorithm are an original file A and a restored file Band the output is restorationAccuracy a list of storing res-toration accuracy values of all sections First we removegarbage and padding values from the restored file (Line 1)For each section Ai of A we obtain the length curLen of thesection and calculate a hash value H1 by applying the MD5hash function (Lines 4 to 7) We then calculate another hashvalue H2 from the recovered file B by considering only thelength of curLen (Line 9) If H1 equals to H2 (Line 10) itmeans that B contains Ai so we stop the comparison andcalculate the restoration accuracy of the section (Line 14)However if H1 and H2 are not equal we perform the hashvalue comparison again by decrementing the length curLenof Ai (Lines 6 to 13) We repeat this process to calculate therestoration accuracy for the section Ai (Line 14) Aftercompleting the accuracy calculation for all sections thealgorithm returns the list restorationAccuracy (Line 16)
5 Performance Evaluation
51 Experimental Environment In this section we presentthe results of experimental evaluation to show the superi-ority of the all-in-one unpacking system We conduct theexperiments in the virtual server as well as the physicalserver e reasons for experimenting in two server envi-ronments are as follows First unpacking results may bedifferent between the virtual server and the physical server ifa packed file has anti-unpacking features such as Anti-VM orAnti-Debug Second the dynamic unpacking of malware
Input A PE fileOutput OEP the start address of the unpacked original filebegin
(1) Measure the entropy value of each section(2) while the process is not reached to the end do(3) IP⟵ the current instruction pointer(4) if IPrsquos address gt current sectionrsquos last address then return Not-Found(5) else if IP is JMP or CJMP or RETN then(6) Execute instructions until IP(7) DstAddr⟵ the next instruction address(8) if IP notin IP-Histroy and DstAddr notin DstAddr-History then(9) Store IP and DstAddr into each history(10) if DstAddr is out of the file then return Not-Found(11) else(12) Re-calculate the entropy value of each section(13) if Entropy change is stable and DstAddr is in the different section then(14) return DstAddr(15) end-if(16) end-if(17) end-if(18) end-while(19) return Not-Found
end
ALGORITHM 2 Dynamic unpacking algorithm
Security and Communication Networks 7
files may infect the physical server us we performunpacking first in the virtual server and then perform theexperiment in the physical server only when there is noabnormality e hardware specifications of the experi-mental platform are as follows e virtual server is IntelCore i7-6700 340GHz with 8GB RAM and the physical
server is Intel Core i7-6700 340GHz with 32GB RAM etarget operating system is Windows 7 OS 32Bit and we usePEVIEW to view the structure and content of 32 bit PE filesAs the debugger we use Immunity Debugger 185 [25] toimplement the dynamic unpacking phase of Algorithm 2 Toimplement the all-in-one unpacking system including the
text section
rdata section
data section
rsrc section
0000040000000410000004200000043000000440
00012C0000012C1000012C2000012C3000012C40
0001F0000001F0100001F0200001F0300001F040
0002040000020410000204200002043000020440
5566D10F5E
904E00185A
D028209890
0010000100
8B8BFAB75D
FFFF00FEFE
414345464A
0000000000
EC022B11C3
0101000101
4141514141
0000000000
8B83F18DCC
0000000000
0000000000
0000000000
4DC28D49CC
7C40
DC2662
5080404818
0020005000
0C024202CC
FFFFFDFEFE
424445474B
0000000000
8B660166CC
0101010101
4141414141
0000000001
D185EB89CC
0000000000
0000000000
0080008000
56C00754CC
6C00F03ACC
88BCB02850
0018000001
8D758D0ECC
FF00FDFEFD
424445494B
0000000000
72F5A4FECC
0100010101
4141414141
0000000000
022B2466CC
0000000000
0000000000
0000000000
8DD60085CC
5E26024A7C
C0F420B080
0038000068
648B00D2CC
FFFFFEFEFE
424446494B
0000000000
24750075CC
0101010101
4141414141
0200100000
000800F0CC
0000000000
0000000000
0080000080
(a)
UPX0 section
UPX1 section
UPX1 section
rsrc section
0040100000401010004010200040103000401040
0041FB0C0041FB1C0041FB2C0041FB3C0041FB4C
0042A0000042A0100042A0200042A0300042A040
0042100000421010004210200042103000421040
5566D10F5E
904E00185A
0010000100
D028209890
8B8BFAB75D
FFFF00FEFE
0000000000
414345464A
EC022B11C3
0101000101
0000000000
4141514141
8B83F18DCC
0000000000
0000000000
0000000000
4DC28D49CC
7C40
DC2662
0020005000
5080404817
0C024202CC
FFFFFDFEFE
0000000000
424445474B
8B660166CC
0101010101
0000000001
4141414141
D185EB89CC
0000000000
0080008000
0000000000
56C00754CC
6C00F03ACC
0018000001
88BCB02850
8D758D0ECC
FF00FDFEFD
0000000000
424445494B
72F5A4FECC
0100010101
0000000000
4141414141
022B2466CC
0000000000
0000000000
0000000000
8DD60085CC
5E26024A7C
0038000068
C0F420B080
648B00D2CC
FFFFFEFEFE
0000000000
424446494B
24750075CC
0101010101
0200100000
4141414141
000800F0CC
0000000000
0080000080
0000000000
(b)
Figure 3 Comparison of sample bytecodes of the original and restored files (a) Sections of the original file (b) Sections of the restored file
Input A an original fileB a restored file
Output restorationAccuracy[1n] a list of restoration accuracy values for n sectionsbegin
(1) Remove garbage and padding values from B(2) A⟵ A1 A2 An1113864 1113865 where Ai is the i-th section of A(3) for each section Ai isin A do(4) totalLen⟵ Ailength(5) curLen⟵ totalLen(6) while curLen gt 0 do(7) H1⟵ hash (Ax[1 curLen]) hash a part of section Ai
(8) for j⟵ 1 to (Blength minus curLen + 1) do(9) H2⟵ hash (B[j j + curLen minus 1]) hash a part of B(10) if H1 H2 then break(11) else curLen⟵ curLen minus 1(12) end-for(13) end-while(14) restorationAccuracy[i] ⟵ (curLentotalLen) times 100(15) end-for(16) return restorationAccuracy
end
ALGORITHM 3 Restoration rate calculation algorithm
8 Security and Communication Networks
dynamic unpacking algorithm we run Python scripts on topof Immunity Debugger
In the experiment we use a total 130 PE files where 90files are provided by Microsoft [37] and 40 files are ran-domly collected from putty [39] and MD5 [40] sites Weperform packing of 130 executable files using 19 packersrespectively and construct a dataset consisting of a total of2600 packed files including original 130 filese 19 packersused are UPX ASPack NSPack MPRESS Yodarsquos ProtectorMEW Packman RLPack BeRoEXEPacker PECompactPetite JDpack Molebox eXpressor Yodarsquos Crypter FSGexe32pack WinUpack and Neolite e reason for con-figuring the dataset by packing the normal PE files directly isthat we can quantitatively measure the packing detectionaccuracy the false-positive rate and the false-negative rateonly if the experimental files are packed in advance
In this paper we conduct three experiments e firstexperiment is for an integration feature of the all-in-onesystem which verifies that the system supports all threephases of packing detection unpacking and verification inan integrated framework e second experiment is for thedetection phase where we first set Packing-Range an en-tropy range for judging the packing and then evaluate thepacking detection accuracy using the range e third ex-periment is for the unpacking and verification phases wherewe verify whether the system unpacks the PE file correctly bycalculating the restoration accuracy
52 Experimental Results on Overall Working MechanismIn this section we confirm through experiments that theintegration mechanism of all-in-one unpacking systemworks correctly Figure 4 shows the operation screenshot ofeach phase of the all-in-one system First Figure 4(a) showsan example screenshot of the detection phase To confirmall four cases of packing detection work correctly even if wedetect the packing by the previous case(s) we continueexecuting the remaining cases In Figure 4(a) we firstexamine DP-NEP of Algorithm 2 We detect the EP sectionin DP-NEP so we proceed to DP-SIG (the first two lines)By the DP-SIG test we detect the UPX packer so weconclude that the file is packed (the second two lines) Eventhough detection is completed we proceed to the next DP-ENT and DP-WR tests e entropy value is 793 which iscontained in Packing-Range and the attribute value of theEP section is ldquo0xe0000040rdquo which includes the WIRTEattribute us we conclude that the file is also packed byDP-ENT and DP-WR tests (the third three lines)According to Figure 4(a) we can confirm that DP-NEPDP-SIG DP-ENT and DP-WR tests of the detection phaseall work correctly
Figures 4(b) and 4(c) show example screenshots of theunpacking phase In the detection phase if an unpackinglibrary of the packer detected exists we perform staticunpacking as shown in Figure 4(b) if not we do dynamicunpacking as shown in Figure 4(c) Figure 4(b) shows theresult of static unpacking by the UPX commercial tool [33]for the UPX packed file In the figure the file size increasesfrom 285 760 bytes to 731 200 bytes by unpacking and it
means that the compression ratio is 3908 Figure 4(c)shows the result of the entropy-based dynamic unpackingIn the figure the system calculates the entropy of eachsection if IP encounters JMP and CJMP instructions itreturns the current DstAddr as OEP if the entropy change isstable and DstAddr is in the different section
Figure 4(d) shows an example screenshot of the verifi-cation phase In this example we measure the restorationaccuracy of the restored file the text and data sections of theoriginal filee length of the original text section is 18 439and the length of the unpacked file is 872 555 In the hashvalue comparison since the portion corresponding to theoriginal text section exists in the unpacked file the resto-ration accuracy becomes 100 Similarly we also calculatethe restoration accuracy of the data section as 100
53 Experimental Results onDetection Phase In this sectionwe describe two experimental results of the detection phasee first experiment is to set Packing-Range and the secondis to measure the detection accuracy of the proposed de-tection algorithm In the first experiment we experimentallydetermine Packing-Range used in DP-ENT of Algorithm 1e previous study by Lyda and Hamrock [17] uses theentropy value of the whole file and sets the entropy range ofthe packed file to [6677 6926] On the contrary Han andLee [16] focus on the EP section only and compute theentropy value for the EP section instead of the whole filerough the experiment they set the entropy value to 685and judge a file packed if its entropy is over 685
We also determine the entropy range Packing-Rangethrough the experiments on various packers Figure 5 showsthe entropy distribution of the EP section for differentpackers Figure 5(a) shows the average distribution of theoriginal and 19 packed files Figures 5(b)ndash5(m) show themore detailed entropy distribution for each packerAccording to Figures 5(b)ndash5(m) the entropy distribution ofthe packed file differs depending on the packer type Inparticular as shown in Figures 5(d) 5(e) and 5(k) somepacked files have smaller entropy values than the originalfile and thus we cannot simply set an upper threshold valueas a packed criterion as in the previous study [16] Howeverwe note that while the entropy of packer files distributesover a wide range as shown in Figures 5(c)ndash5(m) the en-tropy of original files distributes only in a specific range asshown in Figure 5(b) More specifically the entropy value ofthe normal file is between 600 and 685 as shown inFigure 5(b) Based on these experimental results we thus setPacking-Range to [0 600) or [685infin] of the EP sectionAccording to [9 29] the entropy method has the disad-vantage of being low accuracy and easy evading To over-come this point we use three other detection techniques ofDP-NEP DP-SIG and DP-WR together with DP-ENT in ahybrid manner Using this hybrid approach we can furtherimprove the detection accuracy
In the second experiment we evaluate the detectionaccuracy and false-negativefalse-positive rates Table 5compares the three detection techniques of the detectionphase with the proposed hybrid approach Since EP sections
Security and Communication Networks 9
(a) (b)
(c) (d)
Figure 4 Operation screenshots of detection unpacking and verification phases (a) Detection phase (b) Unpacking phase (staticunpacking) (c) Unpacking phase (dynamic unpacking) (d) Verification phase
0
50
100
150
200
250
300
350
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Not packed (b)Summation of (c) to (m)
Num
ber o
f file
s
(a)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(b)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(c)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(d)
Figure 5 Continued
10 Security and Communication Networks
Num
ber o
f file
s0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(e)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(f )
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(g)
Num
ber o
f file
s
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(h)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(i)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(j)
Num
ber o
f file
s
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(k)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(l)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(m)
Figure 5 Entropy distributions of the EP section for original and representative packed files (a) Average of original and 19 packers(b) Original (not packed) (c) UPX (d) ASPack (e) NSPack (f ) MPRESS (g) Yodarsquos Protector (h) RLPack (i) BeroEXE (j) MEW(k) PACKMAN (l) WinUpack (m) exe32pack
Security and Communication Networks 11
are mandatory in all PE files we can detect EP sectionsunless intentionally concealed All files of the dataset used inthe experiment also have EP sections and the detection rateby DP-NEP is trivially 0 In other words there is nopacking detection case due to absence of the EP section sowe exclude the experimental result of DP-NEP from Table 5
We now explain the results of Table 5 by each columnFirst we explain the column of DP-SIG In the case of DP-SIG Not-Packed shows a detection rate of 0 since nosignature exists in normal files is is because packingdetection does not occur in normal files which means thatDP-SIG performs an accurate detection for Not-Packed filesOn the other hand for the packed files DP-SIG detects 100of packers except NSPack e detection rate of NSPack ismerely 538 because of the nature of NSPack that thesignatures often exist in different sections rather than the EPsection In summary DP-SIG shows 950 of detectionaccuracy on average
Second for the experiment on the detection rate of DP-WR we intentionally remove the WRITE attribute from 39files (30 of packed files) per packer We perform thisremoval process in 18 packers except exe32pack e reasonfor changing the dataset is that normal packed files usuallyhaveWRITE attributes so we arbitrarily remove theWRITEattribute from 30 of files for the DP-WR experimentLooking at the DP-WR column in Table 5 DP-WR detectsonly 70 files per packer for 18 packers since we in-tentionally remove the WRITE attribute from 30 files Incase of exe32pack however we need not perform the re-moval process since it has no WRITE attribute in the PEheader And accordingly as shown in the column DP-WR ofTable 5 the detection rate of exe32pack by DP-WR istrivially 0 In the case of Not-Packed the detection rate is0 since normal files have noWRITE attribute In summary
checking the presence of theWRITE attribute is very simpleand DP-WR works correctly if the packed file has theWRITE attribute but it does not work well if the WRITEattribute does not exist or maliciously modified
ird the column of DP-ENT shows the result of en-tropy-based detection rates As shown in the table it showsthe 100 detection rate for 9 packers including NSPackUPX MEW and BeRoEXEPacker but it incurs some falsenegatives for 10 packers including ASPack Yodarsquos ProtectorPackman and MPRESS by showing the detection rate lowerthan 100 In particular for MPRESS Petite JDpackeXpressor and Neolite it shows a low detection rate below246 because their entropy value in the packer charac-teristic is in entropy distribution of the normal file DP-ENTalso has a false-positive rate of 384 for Not-Packed Insummary DP-ENT works very accurately for most packersbut it may incur false negatives for a few specific packers orfalse positives for some normal files To overcome thislimitation we use DP-ENT together with other detectiontechniques We can see the improvement through the ex-perimental results of Tables 5 and 6 In the tables when usingonly DP-ENT the detection accuracy is low in some packers(especially MPRESS) but when using it together with theother three techniques the accuracy is close to 100
Fourth the proposed hybrid approach of integratingfour techniques detects 100 of all packers except NSPackand shows no false positives e reason for this high de-tection rate is that the proposed method has all the ad-vantages of DP-SIG DP-WR and DP-ENT as well as DP-NEP In case of NSPack however the detection rate ismerely 700 this is because as mentioned above weforcedly remove the WRITE attribute from the 30 fileAccording to the results of Table 5 detection rate differsaccording to each detection technique and some techniques
Table 5 Comparison of packing detection rates of three techniques and the proposed hybrid approach
PackerTechnique
DP-SIG () DP-WR () DP-ENT () Hybrid approach ()ASPack 100 700 915 100NSPack 538 700 100 700MPRESS 100 700 246 100UPX 100 700 100 100Yodarsquos Protector 100 700 969 100MEW 100 700 100 100BeRoEXEPacker 100 700 100 100Packman 100 700 938 100RLPack 100 700 993 100PECompact 100 700 100 100Petite 100 700 000 100JDpack 100 700 562 100Molebox 100 700 100 100eXpressor 100 700 000 100Yodarsquos Crypter 100 700 915 100FSG 100 700 100 100exe32pack 100 000 100 100WinUpack 100 700 100 100Neolite 100 700 161 100Average 950 663 739 984Not-Packed 000 000 384 000
12 Security and Communication Networks
incur false positives or false negatives on the other hand theproposed hybrid shows the highest detection rate of 984and no false positives
Table 6 shows the detection accuracy of two or threecombinations of detection techniques and of the proposedhybrid approach First DP-SIG ENT a combination ofsignature and entropy tests detects 100 of the packed filebut detects 18 files incorrectly showing a false-positive rateof 073 Second DP-ENTWR a combination of entropyand WRITE tests shows a low detection rate of 920because the entropy value of MPRESS is in the distributionof normal file ird DP-SIG ENT WR a combination ofsignature entropy and WRITE tests and our hybrid ap-proach show 984 detection accuracy and 160 false-negative rate is is because DP-SIG ENT WR and thehybrid approach go through the same detection procedureexcept DP-NEP e false-negative rate of 160 is due tothe files having no WRITE attributes According to theresults of Tables 5 and 6 we believe that the proposed hybridapproach is an excellent method for high detection accuracyby combining the advantages of the existing and new de-tection techniques
54 Experimental Results on Unpacking and VerificationPhases In this section we verify the unpacking and veri-fication phases by performing the unpacking of Algorithm 2and calculating the restoration accuracy of the files throughthe actual restoration of Algorithm 3 However unpackingand verification phases take a lot of time so we experimentwith randomly selected 40 files of the dataset eunpacking phase consists of static unpacking and dynamicunpacking In the experiment we exclude static unpackingsince it uses the existing library as it is and focus on onlydynamic unpacking of Algorithm 2 since it tries to restorethe original file without any information of packers
Table 7 shows the restoration accuracy of the proposedmethod for text and data sections for each of the 19 packersWe cannot directly unpack Yodarsquos Protector and PECom-pact since they use Anti-Debug engineering techniques Toovercome this problem we unpack these packers bybypassing three Anti-Debugging API calls GetCurrent-ProcessID() BlockInput() and IsDebuggerPresent() [41]at is we run Algorithm 2 with the bypassing techniquesfor Yodarsquos Protector and PECompact and include its ex-perimental result in Table 7) In general since only text anddata sections of the original file are packed [42] in theexperiment we consider only the restoration accuracy ofthese text and data sections of the restored file In Table 7among 19 packers 11 packers including UPX NSPackMEW RLPack and BeRoEXEPacker show at 100 resto-ration accuracy for both text and data sections On the other
hand 8 packers including ASPack Packman and MPRESSshow less than 100 restoration accuracy because of themanner in which the packers handle the reloc section Morespecifically if there is a reloc section in the original file theunpacking relocates the text and data sections changing allthe data values in each section erefore the restorationaccuracy depends on whether there is a reloc section in theoriginal file
Table 8 shows the restoration results in detail accordingto the presence of the reloc section As shown in the tablethe restoration accuracy is 100 in the absence of the relocsection while it is very low in the range of 000 to 998 inthe presence of the reloc section However the low resto-ration accuracy is due to the nature of the packer and it doesnot mean that unpacking does not work correctly In otherwords even if the unpacking is successful the data valuemay change due to the packer characteristic and the res-toration accuracy is calculated to be low In summary wecan use the proposed verification algorithm to calculate therestoration accuracy of unpacked files which can quanti-tatively measure the accuracy of the actual restoration
In addition to text data and reloc sections we furtherconsider the PE header which includes the IAT informationIn general the IAT is located in the header of the PE file andif we actually need to run the unpacked file we must alsorestore this header information including the IAT In factthe study of [3 43] restores the header including the IATandproduces the actual whole executable file On the other handmost studies including this paper do not restore the actualwhole executable file itself but restore text data and relocsections which are major parts of the executable file is isbecause the purpose of unpacking is not to execute therestored file but to identify and extract malicious codesincluded in the restored file erefore in this paper we donot need to restore the header required for the actual ex-ecution and accordingly comparison of IAT information isalso not necessary Instead we restore and compare textdata and reloc sections that are more likely to hidemalicious code Restoring the PE header is beyond the scopeof this study we leave it as a separate research topic
6 Conclusions
In this paper we analyzed recent studies of packing de-tection and unpacking techniques and derived four im-portant observations no phase integration no detectioncombination no real-restoration and no unpacking verifi-cation We then proposed an all-in-one unpacking system toimprove these four observations First no phase integrationwas a difficulty in which analysts had to perform packingdetection unpacking and verification phases separately andwe solved this problem by presenting an all-in-one
Table 6 Comparison of packing detection accuracy of two or three combinations and of the hybrid approach
Metrics DP-SIG ENT () DP-ENT WR () DP-SIG ENT WR () Hybrid approach ()Detection accuracy 950 920 984 984False-negative rate 500 800 160 160False-positive rate 073 000 000 000
Security and Communication Networks 13
Tabl
e7
Restorationaccuracy
oftext
anddatasections
ofeach
packer
Section
UPX ()
NSP
ack
()
MEW ()
RLPa
ck(
)Be
RoEX
E(
)Neolite
()
FSG
()
eXpressor
()
Molebox
()
Petite
()
JDpack
()
ASP
ack
()
Yodarsquos
protector
()
exe32p
ack
()
MPR
ESS
()
Yodarsquos
crypter
()
PECom
pact
()
WinUpack
()
Packman
()
Average
()
text
100
100
100
100
100
100
100
100
100
100
100
843
840
808
788
745
740
680
550
892
data
100
100
100
100
100
100
100
100
100
100
100
843
840
100
753
745
740
680
485
902
14 Security and Communication Networks
unpacking system of integrating these three phases in anintegrated framework Second no detection combinationwas due to that there was no attempt to combine variouspacking detection methods and we solved this problem bypresenting a hybrid approach of combining existing and newpacking detection methods ird no real-restoration camefrom that there was little discussion about restoring theactual executable file and we solved this problem by actuallyrestoring an original image of the packed file Fourth nounpacking verification was due to that there was no explicitwork to measure the restoration accuracy of unpacked filesand we solved this problem by proposing a verificationalgorithm of quantitatively evaluating the restorationaccuracy
We constructed a dataset of 2600 PE files and experi-mentally evaluated the proposed system using the datasetFirst we performed all the phases of the all-in-oneunpacking system sequentially and confirmed that its overallworking mechanism worked well Next through the com-parison of the hybrid packing detection approach andexisting detection techniques we showed that the detectionaccuracy of the proposed method was the highest at 984on average with no false positives For this we also de-termined an entropy range of packing Packing-Rangethrough extensive experiments on 19 representative packersFinally in the unpacking and verification phases we con-firmed that unpacking was successful for all files except thefiles packed with Yodarsquos Protector We also showed that therestoration accuracy of unpacked files was nearly 100except for a few packers
As the future research we will focus on unpackingpacked files with Anti-VM Anti-Debug and Anti-Reverseengineering techniques We will also study how to computethe restoration accuracy of the file having reloc sections
Data Availability
All the data files used in the experiments are available athttpsgithubcomchesvectainPackingData
Conflicts of Interest
e authors declare that they have no conflicts of interest
Acknowledgments
is work was partly supported by the Institute of In-formation and Communications Technology Planning andEvaluation (IITP) grant funded by the Korea government(MSIT) (no 2017-0-00158 Development of Cyber reat
Intelligence (CTI) analysis and information sharing tech-nology for national cyber incident response) and the KoreaElectric Power Corporation (no R18XA05)
References
[1] M Morgenstern ldquoAV-TEST Security report 201718rdquo 2019httpswwwav-testorgfileadminpdfsecurity_reportAV-TEST_Security_Report_2017-2018pdf
[2] T Ban R Isawa S Guo D Inoue and K Nakao ldquoEfficientmalware packer identification using support vector machineswith spectrum kernelrdquo in Proceedings of the Eighth Asia JointConference on Information Security pp 69ndash76 Seoul SouthKorea July 2013
[3] B Cheng M Jiang J Fu et al ldquoTowards paving the way forlarge-scale Windows malware analysis generic binaryunpacking with orders-of-magnitude performance boostrdquo inProceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security pp 395ndash411 TorontoCanada October 2018
[4] A Gupta andM S Arya ldquoHashing based encryption and anti-debugger support for packing multiple files into single exe-cutablerdquo International Journal of Advanced Research inComputer Science vol 9 no 1 pp 914ndash920 2018
[5] Y Kawakoya M Iwamura and M Itoh ldquoMemory behavior-based automatic malware unpacking in stealth debuggingenvironmentrdquo in Proceedings of the 5th International Con-ference on Malicious and Unwanted Software pp 39ndash46Lorraine France October 2010
[6] G Liang J Pang Z Shan R Yang and Y Chen ldquoAutomaticbenchmark generation framework for malware detectionrdquoSecurity and Communication Networks vol 2018 Article ID4947695 8 pages 2018
[7] A Pektas and T Acarman ldquoA dynamic malware analyzeragainst virtual machine aware malicious softwarerdquo Securityand Communication Networks vol 7 no 12 pp 2245ndash22572014
[8] I Santos J Nieves and P G Bringas ldquoSemi-supervisedlearning for unknown malware detectionrdquo in InternationalSymposium on Distributed Computing and Artificial In-telligence A Abraham J M Corchado S R Gonzalez andJ F De Paz Santana Eds pp 415ndash422 Springer BerlinGermany 2011
[9] X Ugarte-Pedrero I Santos B Sanz C Laorden andP G Bringas ldquoCountering entropy measure attacks onpacked software detectionrdquo in Proceedings of the 9th In-ternational Conference on Consumer Communications andNetworking pp 164ndash168 Las Vegas NV USA January 2012
[10] H Won S-P Kim S Lee M-J Choi and Y-S MoonldquoSecure principal component analysis in multiple distributednodesrdquo Security and Communication Networks vol 9 no 14pp 2348ndash2358 2016
[11] S Cesare Y Xiang and W Zhou ldquoMalwisendashan effective andefficient classification system for packed and polymorphic
Table 8 Restoration accuracy according to the presence of the reloc section
SectionASPack () Packman
() Mpres () Yodarsquoscrypter ()
PECompact()
YodarsquosProtector
()
exe32pack()
WinUpack()
times times times times times times times times
text 000 100 998 100 000 100 000 100 000 100 000 100 000 100 000 100
data 000 100 881 100 000 100 004 100 000 100 005 100 100 100 000 100
Security and Communication Networks 15
malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013
[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008
[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008
[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017
[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008
[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean
[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007
[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008
[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013
[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe
[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009
[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008
[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid
[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml
[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml
[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006
[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017
[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010
[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007
[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007
[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016
[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016
[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio
[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014
[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581
[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip
[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip
[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992
[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml
[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https
wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one
[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011
[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016
16 Security and Communication Networks
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
starts but does not address actual restoration of packed filesIn addition even if unpacking is successful there is noverification for the unpacked files to evaluate theirunpacking accuracy or reliability Based on a thoroughsurvey on recent studies and products we pay attention tofour important observations (1) no integration of necessaryphases (2) no combination of detection techniques (3) noreal-restoration of packed malware and (4) no verificationof unpacked images We briefly call these observations nophase integration no detection combination no real-resto-ration and no unpacking verification respectively
e goal of this paper is to propose an all-in-oneunpacking system solving or improving four observations asshown in Table 1 We explain each observation and itssolution in detail as follows First no phase integration is aproblem in which analysts have to perform each phaseseparately since the packing detection unpacking andverification phases are separately developed To resolve thisproblem we present an all-in-one unpacking system thatintegrates all these three phases of packing detectionunpacking and verification As shown in Table 2 eachexisting method focuses on a particular phase at is manystudies have been done in depth focusing on one of the threephases covered in this paper In many real applicationshowever we often need to apply all three phases at once orsequentially rather than just one phase To satisfy this de-mand the proposed all-in-one system supports all threephases together Since the system supports all necessaryphases in an integrated framework we can easily analyze themalware and obtain the objective restoration rate throughthe actual unpacking and verification phases
e second observation no detection combination isthat there is no attempt to combine various packingdetection methods Based on this observation we proposea hybrid approach to improve the packing detectionaccuracy by combining four existing and new packingdetection methods e third observation no real-res-toration is that the previous work tries to find only OEPwhen it detects packing but there is little discussionabout restoring the actual executable file We improvethis problem by actually restoring the image of theunpacked file e fourth observation no unpackingverification is that there is no previous work to measurethe restoration accuracy of unpacked images We resolvethis problem by presenting a verification algorithm toquantitatively evaluate the restoration accuracy ofunpacked file images
In this paper we implement the all-in-one unpackingsystem to reflect the solutions of Table 1 and empiricallyevaluate the proposed system First we verify each phase ofthe all-in-one system to confirm that the overall workingmechanism works well Next we construct a dataset com-posed of 2 600 PE (portable executable) files and use the setin evaluation of detection unpacking and verificationphases of the all-in-one system In the detection phase weneed to determine the entropy range first and through apreliminary experiment we set it to be less than 600 orgreater than 685 Experimental results of comparing theproposed hybrid method with individual or combined
detection stage(s) show that the proposed method shows thehighest detection accuracy up to 984 without any falsepositives We also try to actually unpack all files of thedataset to verify the unpacking phase and we confirm thatall files including Yodarsquos Protector [22] are 100 unpackedFinally through the evaluation of the verification phase wesee that most files except those packed by some packers showup to 100 restoration accuracy
e contribution of the paper is as follows First this isthe first attempt to integrate detection unpacking andverification phases into a single unified framework eproposed all-in-one concept which integrates all threephases rather than focusing on only one phase allows usersto more easily detect and analyze malware Second based onempirical experience we present a hybrid approach ofpacking detection that exploits four existing and new de-tection techniques ird we perform actual unpackingrestoration beyond detecting OEPs only Fourth we proposea verification algorithm that measures the accuracy ofunpacking results Fifth through various experiments wedemonstrate the superiority of the proposed all-in-oneunpacking system
e rest of the paper is organized as follows Section 2describes related work on packing detection and unpackingSection 3 presents an overall architecture of the proposed all-in-one unpacking system Section 4 explains detectionunpacking and verification phases in detail to show how theall-in-one system works Section 5 presents experimentalresults We finally summarize and conclude the paper inSection 6
2 Related Work
21 Packing Detection Techniques It takes much time todetect and analyze the malware to which packing is appliedand thus there have beenmany studies on packing detectionand unpacking techniques Choi et al [15] propose PHADthat detects packed files by analyzing the header informationof PE files PHAD selects eight characteristic variables todistinguish between the general file and the packed filethrough heuristic analysis of the PE header and based onthese variables it determines whether a file is packed or notMore specifically it calculates the Euclidean distance of eightvariables selected by the characteristic vector (CV) andconfirms it to be packed if that distance exceeds the heu-ristically determined minimum threshold PHAD shows thehigher detection accuracy with lower false-negative rates ascompared to commonly used software PEiD [23] but has adrawback that many false positives occur
Lyda and Hamrock [17] use the entropy-based analysisto detect encrypted or packed malware Entropy is a measureof the uncertainty of information and packed files tend tohave higher entropy than regular files because they compressthe original executable sections or collapse those sectionsinto a few new sections We calculate the entropy as shownin equation (1) where p(i) is the probability of the i-th unitof information (such as a number) in event xrsquos series of nsymbols is equation generates entropy scores as realnumbers [17]
2 Security and Communication Networks
H(x) minus 1113944
n
i1p(i)log2p(i) (1)
e entropy-based method has an advantage of beingeasily applied to various packers However it causes a falsepositive if the entropy of a normal file is low or a falsenegative if that of a packed file is high
Han and Lee [16] propose REMINDer that detects thepacking based on the entropy value of the entry point sectionand the WRITE attribute ey note that a large number offalse positives occur because the entropy calculation range istoo wide and thus REMINDer uses the EP (entry point)section only in computing the entropy value In addition tothe entropy-based detection method of the EP section theyalso use the WRITE attribute test which is an essentialfeature of the packed file to reduce false positives is isbecause unlike ordinary files packed PE files requireWRITE permission to perform unpacking before the file isexecuted
Arora proposed by Rohit et al [19] is a packing detectiontechnique for analyzing the PE information in a packed fileusing a heuristic algorithm It defines various parameters forpacking analysis and presents a heuristic algorithm forassigning weights and risk factors to those parameters Basedon the training set to which weights and risk factors areassigned it determines whether or not a given input file ispacked
22 Unpacking Techniques Unpacking techniques areroughly classified into four types e first is a direct analysismethod in which a person unpacks directly using an analysistool is method has been popular in the early days andtypical analysis tools include OllyDbg PlugIn [24] Immu-nity Debugger [25] and IDA PlugIn [26] is manualunpacking is relatively accurate but it takes too much timeto manually analyze all the commands of the packed file
e second is a static unpacking method based on thecharacteristics of the specific packing algorithm(s) Since thismethod is based on the characteristics of the packing al-gorithm it is very useful when the prior knowledge of thealgorithm is known Static unpacking is commonly used inantivirus programs and it has an advantage of no infectionrisk and fast unpacking since the packed file does not need tobe directly executed However it has a disadvantage that wecannot use it if we do not know the packing algorithm usedor if the packing technique is partially modified
e third is a dynamic unpacking method that does notdepend on the packing algorithm(s) ere have been manystudies on dynamic unpacking since we can do unpackingeven without knowing the exact packing algorithm usedJeong et al [21] propose an entropy-based dynamicunpacking method that finds the OEP (original entry point)based on the characteristics of increasing the entropy if thefile is packed Bat-Erdene et al [27] classify unpacking al-gorithms into multiple clusters by measuring the entropychanges during the unpacking process Cesare and Yang [28]propose an algorithm for constructing a control flow graphsignature using an inverse transformation technique eyperform the entropy analysis first to determine whether ornot the PE file is packed and if it is packed they find thehidden code by investigating the end of packing throughdynamic analysis Moreover recent dynamic methods in-clude OmniUnpack [29] Renovo [30] and PinDemonium[31] OmniUnpack unpacks the PE file by detecting theexecuting offset of the original code at the page level insteadof at the instruction level Renovo exploits shadowmemory tomonitor program execution and memory writes andthrough the shadow memory it extracts the hidden codefrom the executable Finally PinDemonium uses the Dy-namic Binary Instrumentation (DBI) technique to performImport Address Table (IAT) analysis JUMP command de-tection and entropy calculation and through these pro-cesses it unpacks the PE file
Table 1 Observations and solutions for the proposed all-in-one unpacking system
Observation Explanation Solution
No phase integration Unpacking-related three phases are separatelydeveloped
Adopt an all-in-one approach integrating all threephases
No detection combination ere is no attempt to combine various existingmethods for packing detection
Combine four packing detection methods to improvedetection accuracy
No real-restoration Main goal is to find OEP Restore unpacked files by performing actualunpacking as well as finding OEP
No unpacking verification ere is no quantitative way to verify the restorationaccuracy
Present a verification algorithm to evaluate theaccuracy of unpacking results
Table 2 Comparison of analytical phases supported by existing and proposed systems
Analytical phases PHAD [15] REMINDer [16] PEframe [20] [11 13 21] All-in-one systemDetection times times
Unpacking Static times times Dynamic times times
Verification times times times
Security and Communication Networks 3
e fourth is an integration method of both static anddynamic unpacking techniques Representative examples ofsuch integration methods are PolyUnpack [26] CoDisasm[32] and BinUnpack [3] PolyUnpack and CoDisasm firstextract the static model through static analysis before exe-cuting the file en they unpack the PE file by comparingthe extracted static model to the dynamic model obtained byrunning the executable PolyUnpack unpacks the file byexploiting iterative comparisons of two models andCoDisasm does it by comparing memory snapshots re-peatedly Finally BinUnpack first performs static analysis bydeveloping a hook-evasion-resistant API monitor and it thenperforms dynamic analysis by monitoring API calls with theanalyzed information
3 Overall Architecture of All-in-OneUnpacking System
As we mentioned in Section 1 we note four importantobservations no phase integration no detection combi-nation no real-restoration and no unpacking verificationIn this section we describe the overall architecture of theall-in-one unpacking system that considers these fourobservations Figure 1 shows a flow diagram of the pro-posed all-in-one system As shown in the figure we im-prove no phase integration by supporting all three phases ofdetection unpacking and verification in an integratedframework In the diagram for a given input PE file (1①)the system first determines whether or not it is packed(2②) If the file is not packed it proceeds to check the nextPE file If it is packed the system checks whether or notthere is an unpacking library used in packing (3③) If theunpacking library exists the system performs staticunpacking using the library (4④) otherwise the systemperforms dynamic unpacking that depends on packingalgorithm (5⑤) After completing the unpacking thesystem calculates the restoration accuracy to verify thecorrectness of unpacking (6⑥)
We use the existing and proposed techniques togetherto implement each phase of the all-in-one unpackingsystem Table 3 summarizes the existing or proposedmethods used in each phase First in the detection phasewe use four techniques to detect the packing of the input PEfile is hybrid approach improves no detection combi-nation Among four techniques the EP section test is a newone proposed in this paper which increases the packingdetection rate by investigating the explicit existence of theEP section and we describe it in detail in Section 41Second in the unpacking phase we use either staticunpacking or dynamic unpacking depending on whether ornot an unpacking library exists is unpacking phaserestores the actual file and improves no real-restorationird in the verification phase we verify the correctness ofunpacking by computing the restoration accuracy Wecalculate the restoration accuracy by comparing the bytecodes of two files is phase improves no unpackingverification and we describe this verification algorithm indetail in Section 43
4 Detection Mechanism of All-in-OneUnpacking System
41 Detection Phase As we explained in Section 3 whilethere have been various packing detection techniques therehas been no attempt to combine the advantages of thesetechniques Since they have been studied separately eachtechnique carries out packing detection focusing on its owncharacteristics only In this paper we propose a hybridapproach to improve the packing detection accuracy bycombining four different techniques
e detection phase consists of four tests EP sectionsignature WRITE attribute and entropy tests In this paperwe describe each of these techniques as Detection Phase-NoEP Section (DP-NEP) Detection Phase-Signature (DP-SIG)Detection Phase-WRITE (DP-WR) and Detection Phase-Entropy (DP-ENT) respectively Table 4 compares thedetection techniques used in the existing work and theproposed approach As shown in the table PEiD supportsonly DP-SIG [17] supports only DP-ENT [16] supports DP-WR and DP-ENT but the proposed method supports allfour techniques
Algorithm 1 presents the proposed hybrid approach ofpacking detection It takes the PE file as an input andconfirms if the file is packed or not Since DP-SIG DP-WRand DP-ENT work based on the EP section of a PE file wefirst check DP-NEP to confirm if the EP section exists If theEP section does not exist we determine that the file is packed(Line 1) On the contrary if the EP section exists we performDP-SIG to check whether or not the PE file has a packersignature If the signature exists ie if a packer is used wedetermine that the file is packed (Line 2) We then performDP-WR and DP-ENT where DP-WR checks whether or notthe WRITE attribute exists in the EP section and DP-ENTinvestigates whether or not the entropy of the EP section is ina specific range Packing-Range For example this can beunder 60 or over 068 in this paper If both conditions holdwe determine it packed and return Packed (Line 3-4) Fi-nally anyone of DP-NEP DP-SIG DP-WR and DP-ENP isnot satisfied we return Not-Packed (Line 5)
We now explain four packing detection cases of Algo-rithm 1 in detail First DP-NEP finds the EP section fromthe input PE file We here conclude that the file is packed if ithas no EP section is is because every PE file has an EPsection so no EP section means that someone hides thesection intentionally We can know the existence of the EPsection by investigating an EP section address in the headerof the PE file We explain the effect of this DP-NEP case indetail According to actual experimental results we couldnot find the EP section from some PE files packed by apacker such as Upack [34] and PESpin [35] is is becausesuch packers intentionally hide or scramble the EP sectionObviously we need to detect these PE files as the packedones in the detection phase and Algorithm 1 does Algo-rithm 1 detects such ldquoNo EP Sectionrdquo PE file as the DP-NEPcase resulting in improving the packing detection rate Inother words considering DP-NEP in Algorithm 1 we ex-ploit the effect of increasing the packing detection rate
4 Security and Communication Networks
especially for Unpack and PESpin packers In the case of DP-NEP however we cannot find the EP section itself andaccordingly we cannot proceed the next phases ofunpacking and verification More precisely unpacking suchDP-NEP packers might be possible if we manually modifythe raw hexadecimal codes by using a binary editorHowever this manual raw datamodification is beyond scopeof the paper and we do not consider the manual modifi-cation of hexadecimal codes In summary we use theconcept of ldquoNo EP Sectionrdquo in the detection phase of Al-gorithm 1 to increase the packing detection rate but we donot consider it in the unpacking and verification phases
Second in DP-SIG we investigate whether or not thesignature extracted from the EP section is in the signaturedatabase We construct the signature database with a total of7 000 signatures consisting of 4 200 signatures provided byPEiD [23] and 2 800 signatures collected from BobSoft [36]
Figure 2 shows three sample signatures of the packers storedin the database e first line of each packer represents thetype and version of the packer and the maker of the sig-nature the second line is the hexadecimal values of thepackerrsquos signature and the third line of the ep_only columnis true or false that indicates whether the signature can befound in the EP section
ird in DP-WR we check the WRITE attribute of theEP section In the process of unpacking a packed file thesystem needs to modify the memory so the packed fileabsolutely requires the WRITE attribute us we suspectthat the file is packed if theWRITE attribute exists Howeversince a normal file can also have the WRITE attribute wecannot determine packing only by the existence of theWRITE attribute erefore as proposed in [16] we detectpacking using the WRITE attribute and the entropy valuetogether
Fourth in DP-ENT we compute the entropy of an inputPE file using equation (1) and check whether that entropy isin a specific range (Packing-Range in Algorithm 1) is isbased on the previous observation [16 17] that the entropyof a packed file differs from that of an ordinary file Wecompute the entropy value for the EP section only not theentire file using equation (1) To use DP-ENT we need to setPacking-Range used for determining whether or not a file ispacked In this paper we set this range through the ex-periment in Section 53
42 Unpacking Phase As mentioned in Section 22unpacking which is performed automatically without hu-man intervention can be divided into static unpacking anddynamic unpacking Static unpacking is based on thecharacteristics of the packing algorithm so it is fast and hasno risk of infection because of no direct execution of thepacked file On the contrary dynamic unpacking has acharacteristic that it does not depend on a specific packingalgorithm even though it takes a long analysis time In thispaper we use both static and dynamic methods to maximizethe unpacking accuracy More specifically if the packer typefound in the detection phase provides an unpacking librarywe use the static unpacking otherwise we perform the
Detection phase
Unpacking phase
Verification phase
① Input PE file
② Packing detection
④ Static unpacking ⑤ Dynamic unpacking
Yes
No
③ Unpacking library
Yes No
⑥ Verification
Figure 1 Overall working mechanism of the all-in-one unpacking system
Table 3 Existing or proposed techniques used in each phase of theall-in-one unpacking system
Analyticalphase Techniques used or proposed
Detection
(i) EP Section test to be proposed in Section 41(ii) Signature test [23]
(iii) WRITE attribute test [16](iv) Entropy test [17]
Unpacking(i) Staticmdashlibrary-based unpacking [33]
(ii) Dynamicmdashentropy change-based unpacking[14 27]
Verification Verification algorithm to be proposed in Section43
Table 4 Detection techniques used in existing and proposedapproaches
Method DP-NEP DP-SIG DP-WR DP-ENTPEiD [23] times times times
[17] times times times [16] times times Proposed approach
Security and Communication Networks 5
dynamic unpacking We can easily solve the static unpackingby using the found unpacking library for each packer so wefocus on explaining how the dynamic unpacking works indetail in this section Our dynamic unpacking method is alsoknown as the ldquoexecute-after-writerdquomethodMore precisely theproposed method first restores the original file into thememory by finding the OEP from the packed file and thenanalyzes (ie executes) the restored original file in thememoryIn order to avoid the infection of realmachines we perform theunpacking and validation phases in the debugging mode
e OEP is the address of the first instruction to executeafter the packed file is unpacked and it represents the entrypoint of the original program In other words since the codesfollowing the OEP represent the starting codes of the originalprogram finding such OEP is the most important issue indynamic unpacking To find the OEP we use the entropychange-based analysis proposed by Bat-Erdene et al [27]When a packed file is executed the packed data is uncom-pressed and written to the memory and at this time the datamight be changed or added to the file Because of this datachange the entropy value continuously changes during theunpacking process If unpacking is successful the entropy valueis stabilized the original file is restored and the IP (instructionpointer) is moved to another section of the restored original filefor its execution us we can find the OEP by examiningwhether the IP moves to another section when there is nochange in the entropy value of each section (If the heuristic ofinvestigating change of IP sections is known someone maycreate a malicious packing method that can evade the
heuristic However for this avoidance the OEP must befound without change of IP sections and packing a PE filesuch way might not be easy According to the actualanalysis result we do not find such cases in the 19 packerswe covered Even though the case hardly exists it mighthave occurred in the worst case We leave the advancedapproach that works against such avoidance as a furtherstudy) To accomplish this we need to execute the targetprocess in units of instructions and monitor the entropychange however measuring the entropy value after everyinstruction is very time-consuming and very inefficientWe here note that the OEP generally exists after a branchinstruction so we calculate the entropy value only whenthe instruction is JMP Conditional JMP (CJMP) orRETN We also note that the iterations at program exe-cution are independent of the OEP so we store and reusethe entropy values measured after JMP CJMP and RETNinstructions to avoid duplicated calculations by the it-erations and reduce the total analysis time
Algorithm 2 shows the proposed dynamic unpackingalgorithm based on the entropy changes and it returnsthe OEP address for a given PE file First it computes theinitial entropy of each section (Line 1) en it repeatsthe algorithm until the process ends (Lines 2 to 18) If thecurrent IP address is greater than the last address of thecurrent section it returns Not-Found (Line 4) is isbecause IP moves to the next section without a JMPfamily instruction so it cannot find OEP in this sectionOn the contrary if IP is one of JMP CJMP or RETN itexecutes the file up to IP and stores the address of thenext instruction in DstAddr (Lines 5 to 7) But if IP existsin IP-history or DstAddr exists in DstAddr-history itmeans an iterative instruction or address so we move tothe next IP (Line 8) Otherwise it stores IP and DstAddrinto each history for the next iteration (Line 9) At thistime if DstAddr is out of the range of the file it returnsNot-Found because it cannot find OEP within the file(Line 10) If DstAddr is in the normal range it calculatesthe entropy of each section again and compares thecurrent entropy with the previous entropy to checkwhether it is stable (Lines 11 to 13) If the entropy changeis stable and DstAddr is in a different section it meansthat unpacking a file is successful and IP is moved toanother section by a JMP family instruction us it
[Code-Lock vxx]
signature = 47 8B C2 05 1E 00 52 8B D0 B8 02 3D CD 21 8B D8 5A
ep_only = true
[CodeCrypt v014b]
signature = E9 C5 02 00 00 EB 02 83 3D 58 EB 02 FF 1D 5B EB 02 0F C7 5F
ep_only = true
[CodeCrypt v015b]
signature = E9 31 03 00 00 EB 02 83 3D 58 EB 02 FF 1D 5B EB 02 0F C7 5F
ep_only = true
Figure 2 Examples of packer signatures stored in the signaturedatabase
Input A PE fileOutput Packed or Not-Packed Packing detection resultbegin
(1) if No EP Section then return Packed DP-NEP(2) else if Packer Signature Found then return Packed DP-SIG(3) else if ldquoWRITErdquo enabled and EP Section Entropy isin Packing-Range then(4) return Packed DP-WR DP-ENT(5) else return Not-Packed(6) end-if
end
ALGORITHM 1 Packing detection
6 Security and Communication Networks
stops the execution and returns DstAddr as OEP (Lines13 and 14)
43 Verification Phase In this section we present an al-gorithm for measuring the restoration accuracy by com-paring the restored file with its original file To execute thepacked file the proposed system uses dynamic unpackingand finds the OEP during its execution If the system findsthe OEP we assume the restoration of an original file issuccessful and regard that file as the restored file In theverification phase we quantitatively measure the restorationaccuracy by comparing such restored file and its original file
Figure 3 shows an example of comparing some bytecodesof the original file with those of the restored file For thecomparison we first pack the pspasswdexe file provided byMicrosoft [37] and then unpack it again by dynamicunpacking e figure shows that the bytecodes of the foursections of the original file text rdata data and rsrc areidentically appeared in UPX0 UPX1 and rsrc sections ofthe restored file at is even if the name and location ofeach section are different the same bytecodes exist in therestored file However the restored file contains padding andgarbage values and some bytecodes may be in other loca-tions in the original file us in order to compare theoriginal and the restored files we need to search where theoriginal bytecodes are located in the restored file Sincemanually comparing the bytecodes of the original and re-stored files takes a very long time in this paper we proposean efficient search algorithm using a hash function We usethe well-known MD5 [38] as a hash function of the searchalgorithm As a message reduction algorithm MD5 is widelyused in checking the integrity of long data
Algorithm 3 shows the restoration accuracy calculationalgorithm Its basic principle is to check whether eachsection of the original file exists in the restored file We hereuse the MD5 hash function to speed up the scan e inputsto the algorithm are an original file A and a restored file Band the output is restorationAccuracy a list of storing res-toration accuracy values of all sections First we removegarbage and padding values from the restored file (Line 1)For each section Ai of A we obtain the length curLen of thesection and calculate a hash value H1 by applying the MD5hash function (Lines 4 to 7) We then calculate another hashvalue H2 from the recovered file B by considering only thelength of curLen (Line 9) If H1 equals to H2 (Line 10) itmeans that B contains Ai so we stop the comparison andcalculate the restoration accuracy of the section (Line 14)However if H1 and H2 are not equal we perform the hashvalue comparison again by decrementing the length curLenof Ai (Lines 6 to 13) We repeat this process to calculate therestoration accuracy for the section Ai (Line 14) Aftercompleting the accuracy calculation for all sections thealgorithm returns the list restorationAccuracy (Line 16)
5 Performance Evaluation
51 Experimental Environment In this section we presentthe results of experimental evaluation to show the superi-ority of the all-in-one unpacking system We conduct theexperiments in the virtual server as well as the physicalserver e reasons for experimenting in two server envi-ronments are as follows First unpacking results may bedifferent between the virtual server and the physical server ifa packed file has anti-unpacking features such as Anti-VM orAnti-Debug Second the dynamic unpacking of malware
Input A PE fileOutput OEP the start address of the unpacked original filebegin
(1) Measure the entropy value of each section(2) while the process is not reached to the end do(3) IP⟵ the current instruction pointer(4) if IPrsquos address gt current sectionrsquos last address then return Not-Found(5) else if IP is JMP or CJMP or RETN then(6) Execute instructions until IP(7) DstAddr⟵ the next instruction address(8) if IP notin IP-Histroy and DstAddr notin DstAddr-History then(9) Store IP and DstAddr into each history(10) if DstAddr is out of the file then return Not-Found(11) else(12) Re-calculate the entropy value of each section(13) if Entropy change is stable and DstAddr is in the different section then(14) return DstAddr(15) end-if(16) end-if(17) end-if(18) end-while(19) return Not-Found
end
ALGORITHM 2 Dynamic unpacking algorithm
Security and Communication Networks 7
files may infect the physical server us we performunpacking first in the virtual server and then perform theexperiment in the physical server only when there is noabnormality e hardware specifications of the experi-mental platform are as follows e virtual server is IntelCore i7-6700 340GHz with 8GB RAM and the physical
server is Intel Core i7-6700 340GHz with 32GB RAM etarget operating system is Windows 7 OS 32Bit and we usePEVIEW to view the structure and content of 32 bit PE filesAs the debugger we use Immunity Debugger 185 [25] toimplement the dynamic unpacking phase of Algorithm 2 Toimplement the all-in-one unpacking system including the
text section
rdata section
data section
rsrc section
0000040000000410000004200000043000000440
00012C0000012C1000012C2000012C3000012C40
0001F0000001F0100001F0200001F0300001F040
0002040000020410000204200002043000020440
5566D10F5E
904E00185A
D028209890
0010000100
8B8BFAB75D
FFFF00FEFE
414345464A
0000000000
EC022B11C3
0101000101
4141514141
0000000000
8B83F18DCC
0000000000
0000000000
0000000000
4DC28D49CC
7C40
DC2662
5080404818
0020005000
0C024202CC
FFFFFDFEFE
424445474B
0000000000
8B660166CC
0101010101
4141414141
0000000001
D185EB89CC
0000000000
0000000000
0080008000
56C00754CC
6C00F03ACC
88BCB02850
0018000001
8D758D0ECC
FF00FDFEFD
424445494B
0000000000
72F5A4FECC
0100010101
4141414141
0000000000
022B2466CC
0000000000
0000000000
0000000000
8DD60085CC
5E26024A7C
C0F420B080
0038000068
648B00D2CC
FFFFFEFEFE
424446494B
0000000000
24750075CC
0101010101
4141414141
0200100000
000800F0CC
0000000000
0000000000
0080000080
(a)
UPX0 section
UPX1 section
UPX1 section
rsrc section
0040100000401010004010200040103000401040
0041FB0C0041FB1C0041FB2C0041FB3C0041FB4C
0042A0000042A0100042A0200042A0300042A040
0042100000421010004210200042103000421040
5566D10F5E
904E00185A
0010000100
D028209890
8B8BFAB75D
FFFF00FEFE
0000000000
414345464A
EC022B11C3
0101000101
0000000000
4141514141
8B83F18DCC
0000000000
0000000000
0000000000
4DC28D49CC
7C40
DC2662
0020005000
5080404817
0C024202CC
FFFFFDFEFE
0000000000
424445474B
8B660166CC
0101010101
0000000001
4141414141
D185EB89CC
0000000000
0080008000
0000000000
56C00754CC
6C00F03ACC
0018000001
88BCB02850
8D758D0ECC
FF00FDFEFD
0000000000
424445494B
72F5A4FECC
0100010101
0000000000
4141414141
022B2466CC
0000000000
0000000000
0000000000
8DD60085CC
5E26024A7C
0038000068
C0F420B080
648B00D2CC
FFFFFEFEFE
0000000000
424446494B
24750075CC
0101010101
0200100000
4141414141
000800F0CC
0000000000
0080000080
0000000000
(b)
Figure 3 Comparison of sample bytecodes of the original and restored files (a) Sections of the original file (b) Sections of the restored file
Input A an original fileB a restored file
Output restorationAccuracy[1n] a list of restoration accuracy values for n sectionsbegin
(1) Remove garbage and padding values from B(2) A⟵ A1 A2 An1113864 1113865 where Ai is the i-th section of A(3) for each section Ai isin A do(4) totalLen⟵ Ailength(5) curLen⟵ totalLen(6) while curLen gt 0 do(7) H1⟵ hash (Ax[1 curLen]) hash a part of section Ai
(8) for j⟵ 1 to (Blength minus curLen + 1) do(9) H2⟵ hash (B[j j + curLen minus 1]) hash a part of B(10) if H1 H2 then break(11) else curLen⟵ curLen minus 1(12) end-for(13) end-while(14) restorationAccuracy[i] ⟵ (curLentotalLen) times 100(15) end-for(16) return restorationAccuracy
end
ALGORITHM 3 Restoration rate calculation algorithm
8 Security and Communication Networks
dynamic unpacking algorithm we run Python scripts on topof Immunity Debugger
In the experiment we use a total 130 PE files where 90files are provided by Microsoft [37] and 40 files are ran-domly collected from putty [39] and MD5 [40] sites Weperform packing of 130 executable files using 19 packersrespectively and construct a dataset consisting of a total of2600 packed files including original 130 filese 19 packersused are UPX ASPack NSPack MPRESS Yodarsquos ProtectorMEW Packman RLPack BeRoEXEPacker PECompactPetite JDpack Molebox eXpressor Yodarsquos Crypter FSGexe32pack WinUpack and Neolite e reason for con-figuring the dataset by packing the normal PE files directly isthat we can quantitatively measure the packing detectionaccuracy the false-positive rate and the false-negative rateonly if the experimental files are packed in advance
In this paper we conduct three experiments e firstexperiment is for an integration feature of the all-in-onesystem which verifies that the system supports all threephases of packing detection unpacking and verification inan integrated framework e second experiment is for thedetection phase where we first set Packing-Range an en-tropy range for judging the packing and then evaluate thepacking detection accuracy using the range e third ex-periment is for the unpacking and verification phases wherewe verify whether the system unpacks the PE file correctly bycalculating the restoration accuracy
52 Experimental Results on Overall Working MechanismIn this section we confirm through experiments that theintegration mechanism of all-in-one unpacking systemworks correctly Figure 4 shows the operation screenshot ofeach phase of the all-in-one system First Figure 4(a) showsan example screenshot of the detection phase To confirmall four cases of packing detection work correctly even if wedetect the packing by the previous case(s) we continueexecuting the remaining cases In Figure 4(a) we firstexamine DP-NEP of Algorithm 2 We detect the EP sectionin DP-NEP so we proceed to DP-SIG (the first two lines)By the DP-SIG test we detect the UPX packer so weconclude that the file is packed (the second two lines) Eventhough detection is completed we proceed to the next DP-ENT and DP-WR tests e entropy value is 793 which iscontained in Packing-Range and the attribute value of theEP section is ldquo0xe0000040rdquo which includes the WIRTEattribute us we conclude that the file is also packed byDP-ENT and DP-WR tests (the third three lines)According to Figure 4(a) we can confirm that DP-NEPDP-SIG DP-ENT and DP-WR tests of the detection phaseall work correctly
Figures 4(b) and 4(c) show example screenshots of theunpacking phase In the detection phase if an unpackinglibrary of the packer detected exists we perform staticunpacking as shown in Figure 4(b) if not we do dynamicunpacking as shown in Figure 4(c) Figure 4(b) shows theresult of static unpacking by the UPX commercial tool [33]for the UPX packed file In the figure the file size increasesfrom 285 760 bytes to 731 200 bytes by unpacking and it
means that the compression ratio is 3908 Figure 4(c)shows the result of the entropy-based dynamic unpackingIn the figure the system calculates the entropy of eachsection if IP encounters JMP and CJMP instructions itreturns the current DstAddr as OEP if the entropy change isstable and DstAddr is in the different section
Figure 4(d) shows an example screenshot of the verifi-cation phase In this example we measure the restorationaccuracy of the restored file the text and data sections of theoriginal filee length of the original text section is 18 439and the length of the unpacked file is 872 555 In the hashvalue comparison since the portion corresponding to theoriginal text section exists in the unpacked file the resto-ration accuracy becomes 100 Similarly we also calculatethe restoration accuracy of the data section as 100
53 Experimental Results onDetection Phase In this sectionwe describe two experimental results of the detection phasee first experiment is to set Packing-Range and the secondis to measure the detection accuracy of the proposed de-tection algorithm In the first experiment we experimentallydetermine Packing-Range used in DP-ENT of Algorithm 1e previous study by Lyda and Hamrock [17] uses theentropy value of the whole file and sets the entropy range ofthe packed file to [6677 6926] On the contrary Han andLee [16] focus on the EP section only and compute theentropy value for the EP section instead of the whole filerough the experiment they set the entropy value to 685and judge a file packed if its entropy is over 685
We also determine the entropy range Packing-Rangethrough the experiments on various packers Figure 5 showsthe entropy distribution of the EP section for differentpackers Figure 5(a) shows the average distribution of theoriginal and 19 packed files Figures 5(b)ndash5(m) show themore detailed entropy distribution for each packerAccording to Figures 5(b)ndash5(m) the entropy distribution ofthe packed file differs depending on the packer type Inparticular as shown in Figures 5(d) 5(e) and 5(k) somepacked files have smaller entropy values than the originalfile and thus we cannot simply set an upper threshold valueas a packed criterion as in the previous study [16] Howeverwe note that while the entropy of packer files distributesover a wide range as shown in Figures 5(c)ndash5(m) the en-tropy of original files distributes only in a specific range asshown in Figure 5(b) More specifically the entropy value ofthe normal file is between 600 and 685 as shown inFigure 5(b) Based on these experimental results we thus setPacking-Range to [0 600) or [685infin] of the EP sectionAccording to [9 29] the entropy method has the disad-vantage of being low accuracy and easy evading To over-come this point we use three other detection techniques ofDP-NEP DP-SIG and DP-WR together with DP-ENT in ahybrid manner Using this hybrid approach we can furtherimprove the detection accuracy
In the second experiment we evaluate the detectionaccuracy and false-negativefalse-positive rates Table 5compares the three detection techniques of the detectionphase with the proposed hybrid approach Since EP sections
Security and Communication Networks 9
(a) (b)
(c) (d)
Figure 4 Operation screenshots of detection unpacking and verification phases (a) Detection phase (b) Unpacking phase (staticunpacking) (c) Unpacking phase (dynamic unpacking) (d) Verification phase
0
50
100
150
200
250
300
350
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Not packed (b)Summation of (c) to (m)
Num
ber o
f file
s
(a)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(b)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(c)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(d)
Figure 5 Continued
10 Security and Communication Networks
Num
ber o
f file
s0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(e)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(f )
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(g)
Num
ber o
f file
s
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(h)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(i)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(j)
Num
ber o
f file
s
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(k)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(l)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(m)
Figure 5 Entropy distributions of the EP section for original and representative packed files (a) Average of original and 19 packers(b) Original (not packed) (c) UPX (d) ASPack (e) NSPack (f ) MPRESS (g) Yodarsquos Protector (h) RLPack (i) BeroEXE (j) MEW(k) PACKMAN (l) WinUpack (m) exe32pack
Security and Communication Networks 11
are mandatory in all PE files we can detect EP sectionsunless intentionally concealed All files of the dataset used inthe experiment also have EP sections and the detection rateby DP-NEP is trivially 0 In other words there is nopacking detection case due to absence of the EP section sowe exclude the experimental result of DP-NEP from Table 5
We now explain the results of Table 5 by each columnFirst we explain the column of DP-SIG In the case of DP-SIG Not-Packed shows a detection rate of 0 since nosignature exists in normal files is is because packingdetection does not occur in normal files which means thatDP-SIG performs an accurate detection for Not-Packed filesOn the other hand for the packed files DP-SIG detects 100of packers except NSPack e detection rate of NSPack ismerely 538 because of the nature of NSPack that thesignatures often exist in different sections rather than the EPsection In summary DP-SIG shows 950 of detectionaccuracy on average
Second for the experiment on the detection rate of DP-WR we intentionally remove the WRITE attribute from 39files (30 of packed files) per packer We perform thisremoval process in 18 packers except exe32pack e reasonfor changing the dataset is that normal packed files usuallyhaveWRITE attributes so we arbitrarily remove theWRITEattribute from 30 of files for the DP-WR experimentLooking at the DP-WR column in Table 5 DP-WR detectsonly 70 files per packer for 18 packers since we in-tentionally remove the WRITE attribute from 30 files Incase of exe32pack however we need not perform the re-moval process since it has no WRITE attribute in the PEheader And accordingly as shown in the column DP-WR ofTable 5 the detection rate of exe32pack by DP-WR istrivially 0 In the case of Not-Packed the detection rate is0 since normal files have noWRITE attribute In summary
checking the presence of theWRITE attribute is very simpleand DP-WR works correctly if the packed file has theWRITE attribute but it does not work well if the WRITEattribute does not exist or maliciously modified
ird the column of DP-ENT shows the result of en-tropy-based detection rates As shown in the table it showsthe 100 detection rate for 9 packers including NSPackUPX MEW and BeRoEXEPacker but it incurs some falsenegatives for 10 packers including ASPack Yodarsquos ProtectorPackman and MPRESS by showing the detection rate lowerthan 100 In particular for MPRESS Petite JDpackeXpressor and Neolite it shows a low detection rate below246 because their entropy value in the packer charac-teristic is in entropy distribution of the normal file DP-ENTalso has a false-positive rate of 384 for Not-Packed Insummary DP-ENT works very accurately for most packersbut it may incur false negatives for a few specific packers orfalse positives for some normal files To overcome thislimitation we use DP-ENT together with other detectiontechniques We can see the improvement through the ex-perimental results of Tables 5 and 6 In the tables when usingonly DP-ENT the detection accuracy is low in some packers(especially MPRESS) but when using it together with theother three techniques the accuracy is close to 100
Fourth the proposed hybrid approach of integratingfour techniques detects 100 of all packers except NSPackand shows no false positives e reason for this high de-tection rate is that the proposed method has all the ad-vantages of DP-SIG DP-WR and DP-ENT as well as DP-NEP In case of NSPack however the detection rate ismerely 700 this is because as mentioned above weforcedly remove the WRITE attribute from the 30 fileAccording to the results of Table 5 detection rate differsaccording to each detection technique and some techniques
Table 5 Comparison of packing detection rates of three techniques and the proposed hybrid approach
PackerTechnique
DP-SIG () DP-WR () DP-ENT () Hybrid approach ()ASPack 100 700 915 100NSPack 538 700 100 700MPRESS 100 700 246 100UPX 100 700 100 100Yodarsquos Protector 100 700 969 100MEW 100 700 100 100BeRoEXEPacker 100 700 100 100Packman 100 700 938 100RLPack 100 700 993 100PECompact 100 700 100 100Petite 100 700 000 100JDpack 100 700 562 100Molebox 100 700 100 100eXpressor 100 700 000 100Yodarsquos Crypter 100 700 915 100FSG 100 700 100 100exe32pack 100 000 100 100WinUpack 100 700 100 100Neolite 100 700 161 100Average 950 663 739 984Not-Packed 000 000 384 000
12 Security and Communication Networks
incur false positives or false negatives on the other hand theproposed hybrid shows the highest detection rate of 984and no false positives
Table 6 shows the detection accuracy of two or threecombinations of detection techniques and of the proposedhybrid approach First DP-SIG ENT a combination ofsignature and entropy tests detects 100 of the packed filebut detects 18 files incorrectly showing a false-positive rateof 073 Second DP-ENTWR a combination of entropyand WRITE tests shows a low detection rate of 920because the entropy value of MPRESS is in the distributionof normal file ird DP-SIG ENT WR a combination ofsignature entropy and WRITE tests and our hybrid ap-proach show 984 detection accuracy and 160 false-negative rate is is because DP-SIG ENT WR and thehybrid approach go through the same detection procedureexcept DP-NEP e false-negative rate of 160 is due tothe files having no WRITE attributes According to theresults of Tables 5 and 6 we believe that the proposed hybridapproach is an excellent method for high detection accuracyby combining the advantages of the existing and new de-tection techniques
54 Experimental Results on Unpacking and VerificationPhases In this section we verify the unpacking and veri-fication phases by performing the unpacking of Algorithm 2and calculating the restoration accuracy of the files throughthe actual restoration of Algorithm 3 However unpackingand verification phases take a lot of time so we experimentwith randomly selected 40 files of the dataset eunpacking phase consists of static unpacking and dynamicunpacking In the experiment we exclude static unpackingsince it uses the existing library as it is and focus on onlydynamic unpacking of Algorithm 2 since it tries to restorethe original file without any information of packers
Table 7 shows the restoration accuracy of the proposedmethod for text and data sections for each of the 19 packersWe cannot directly unpack Yodarsquos Protector and PECom-pact since they use Anti-Debug engineering techniques Toovercome this problem we unpack these packers bybypassing three Anti-Debugging API calls GetCurrent-ProcessID() BlockInput() and IsDebuggerPresent() [41]at is we run Algorithm 2 with the bypassing techniquesfor Yodarsquos Protector and PECompact and include its ex-perimental result in Table 7) In general since only text anddata sections of the original file are packed [42] in theexperiment we consider only the restoration accuracy ofthese text and data sections of the restored file In Table 7among 19 packers 11 packers including UPX NSPackMEW RLPack and BeRoEXEPacker show at 100 resto-ration accuracy for both text and data sections On the other
hand 8 packers including ASPack Packman and MPRESSshow less than 100 restoration accuracy because of themanner in which the packers handle the reloc section Morespecifically if there is a reloc section in the original file theunpacking relocates the text and data sections changing allthe data values in each section erefore the restorationaccuracy depends on whether there is a reloc section in theoriginal file
Table 8 shows the restoration results in detail accordingto the presence of the reloc section As shown in the tablethe restoration accuracy is 100 in the absence of the relocsection while it is very low in the range of 000 to 998 inthe presence of the reloc section However the low resto-ration accuracy is due to the nature of the packer and it doesnot mean that unpacking does not work correctly In otherwords even if the unpacking is successful the data valuemay change due to the packer characteristic and the res-toration accuracy is calculated to be low In summary wecan use the proposed verification algorithm to calculate therestoration accuracy of unpacked files which can quanti-tatively measure the accuracy of the actual restoration
In addition to text data and reloc sections we furtherconsider the PE header which includes the IAT informationIn general the IAT is located in the header of the PE file andif we actually need to run the unpacked file we must alsorestore this header information including the IAT In factthe study of [3 43] restores the header including the IATandproduces the actual whole executable file On the other handmost studies including this paper do not restore the actualwhole executable file itself but restore text data and relocsections which are major parts of the executable file is isbecause the purpose of unpacking is not to execute therestored file but to identify and extract malicious codesincluded in the restored file erefore in this paper we donot need to restore the header required for the actual ex-ecution and accordingly comparison of IAT information isalso not necessary Instead we restore and compare textdata and reloc sections that are more likely to hidemalicious code Restoring the PE header is beyond the scopeof this study we leave it as a separate research topic
6 Conclusions
In this paper we analyzed recent studies of packing de-tection and unpacking techniques and derived four im-portant observations no phase integration no detectioncombination no real-restoration and no unpacking verifi-cation We then proposed an all-in-one unpacking system toimprove these four observations First no phase integrationwas a difficulty in which analysts had to perform packingdetection unpacking and verification phases separately andwe solved this problem by presenting an all-in-one
Table 6 Comparison of packing detection accuracy of two or three combinations and of the hybrid approach
Metrics DP-SIG ENT () DP-ENT WR () DP-SIG ENT WR () Hybrid approach ()Detection accuracy 950 920 984 984False-negative rate 500 800 160 160False-positive rate 073 000 000 000
Security and Communication Networks 13
Tabl
e7
Restorationaccuracy
oftext
anddatasections
ofeach
packer
Section
UPX ()
NSP
ack
()
MEW ()
RLPa
ck(
)Be
RoEX
E(
)Neolite
()
FSG
()
eXpressor
()
Molebox
()
Petite
()
JDpack
()
ASP
ack
()
Yodarsquos
protector
()
exe32p
ack
()
MPR
ESS
()
Yodarsquos
crypter
()
PECom
pact
()
WinUpack
()
Packman
()
Average
()
text
100
100
100
100
100
100
100
100
100
100
100
843
840
808
788
745
740
680
550
892
data
100
100
100
100
100
100
100
100
100
100
100
843
840
100
753
745
740
680
485
902
14 Security and Communication Networks
unpacking system of integrating these three phases in anintegrated framework Second no detection combinationwas due to that there was no attempt to combine variouspacking detection methods and we solved this problem bypresenting a hybrid approach of combining existing and newpacking detection methods ird no real-restoration camefrom that there was little discussion about restoring theactual executable file and we solved this problem by actuallyrestoring an original image of the packed file Fourth nounpacking verification was due to that there was no explicitwork to measure the restoration accuracy of unpacked filesand we solved this problem by proposing a verificationalgorithm of quantitatively evaluating the restorationaccuracy
We constructed a dataset of 2600 PE files and experi-mentally evaluated the proposed system using the datasetFirst we performed all the phases of the all-in-oneunpacking system sequentially and confirmed that its overallworking mechanism worked well Next through the com-parison of the hybrid packing detection approach andexisting detection techniques we showed that the detectionaccuracy of the proposed method was the highest at 984on average with no false positives For this we also de-termined an entropy range of packing Packing-Rangethrough extensive experiments on 19 representative packersFinally in the unpacking and verification phases we con-firmed that unpacking was successful for all files except thefiles packed with Yodarsquos Protector We also showed that therestoration accuracy of unpacked files was nearly 100except for a few packers
As the future research we will focus on unpackingpacked files with Anti-VM Anti-Debug and Anti-Reverseengineering techniques We will also study how to computethe restoration accuracy of the file having reloc sections
Data Availability
All the data files used in the experiments are available athttpsgithubcomchesvectainPackingData
Conflicts of Interest
e authors declare that they have no conflicts of interest
Acknowledgments
is work was partly supported by the Institute of In-formation and Communications Technology Planning andEvaluation (IITP) grant funded by the Korea government(MSIT) (no 2017-0-00158 Development of Cyber reat
Intelligence (CTI) analysis and information sharing tech-nology for national cyber incident response) and the KoreaElectric Power Corporation (no R18XA05)
References
[1] M Morgenstern ldquoAV-TEST Security report 201718rdquo 2019httpswwwav-testorgfileadminpdfsecurity_reportAV-TEST_Security_Report_2017-2018pdf
[2] T Ban R Isawa S Guo D Inoue and K Nakao ldquoEfficientmalware packer identification using support vector machineswith spectrum kernelrdquo in Proceedings of the Eighth Asia JointConference on Information Security pp 69ndash76 Seoul SouthKorea July 2013
[3] B Cheng M Jiang J Fu et al ldquoTowards paving the way forlarge-scale Windows malware analysis generic binaryunpacking with orders-of-magnitude performance boostrdquo inProceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security pp 395ndash411 TorontoCanada October 2018
[4] A Gupta andM S Arya ldquoHashing based encryption and anti-debugger support for packing multiple files into single exe-cutablerdquo International Journal of Advanced Research inComputer Science vol 9 no 1 pp 914ndash920 2018
[5] Y Kawakoya M Iwamura and M Itoh ldquoMemory behavior-based automatic malware unpacking in stealth debuggingenvironmentrdquo in Proceedings of the 5th International Con-ference on Malicious and Unwanted Software pp 39ndash46Lorraine France October 2010
[6] G Liang J Pang Z Shan R Yang and Y Chen ldquoAutomaticbenchmark generation framework for malware detectionrdquoSecurity and Communication Networks vol 2018 Article ID4947695 8 pages 2018
[7] A Pektas and T Acarman ldquoA dynamic malware analyzeragainst virtual machine aware malicious softwarerdquo Securityand Communication Networks vol 7 no 12 pp 2245ndash22572014
[8] I Santos J Nieves and P G Bringas ldquoSemi-supervisedlearning for unknown malware detectionrdquo in InternationalSymposium on Distributed Computing and Artificial In-telligence A Abraham J M Corchado S R Gonzalez andJ F De Paz Santana Eds pp 415ndash422 Springer BerlinGermany 2011
[9] X Ugarte-Pedrero I Santos B Sanz C Laorden andP G Bringas ldquoCountering entropy measure attacks onpacked software detectionrdquo in Proceedings of the 9th In-ternational Conference on Consumer Communications andNetworking pp 164ndash168 Las Vegas NV USA January 2012
[10] H Won S-P Kim S Lee M-J Choi and Y-S MoonldquoSecure principal component analysis in multiple distributednodesrdquo Security and Communication Networks vol 9 no 14pp 2348ndash2358 2016
[11] S Cesare Y Xiang and W Zhou ldquoMalwisendashan effective andefficient classification system for packed and polymorphic
Table 8 Restoration accuracy according to the presence of the reloc section
SectionASPack () Packman
() Mpres () Yodarsquoscrypter ()
PECompact()
YodarsquosProtector
()
exe32pack()
WinUpack()
times times times times times times times times
text 000 100 998 100 000 100 000 100 000 100 000 100 000 100 000 100
data 000 100 881 100 000 100 004 100 000 100 005 100 100 100 000 100
Security and Communication Networks 15
malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013
[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008
[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008
[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017
[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008
[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean
[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007
[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008
[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013
[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe
[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009
[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008
[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid
[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml
[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml
[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006
[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017
[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010
[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007
[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007
[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016
[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016
[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio
[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014
[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581
[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip
[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip
[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992
[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml
[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https
wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one
[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011
[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016
16 Security and Communication Networks
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
H(x) minus 1113944
n
i1p(i)log2p(i) (1)
e entropy-based method has an advantage of beingeasily applied to various packers However it causes a falsepositive if the entropy of a normal file is low or a falsenegative if that of a packed file is high
Han and Lee [16] propose REMINDer that detects thepacking based on the entropy value of the entry point sectionand the WRITE attribute ey note that a large number offalse positives occur because the entropy calculation range istoo wide and thus REMINDer uses the EP (entry point)section only in computing the entropy value In addition tothe entropy-based detection method of the EP section theyalso use the WRITE attribute test which is an essentialfeature of the packed file to reduce false positives is isbecause unlike ordinary files packed PE files requireWRITE permission to perform unpacking before the file isexecuted
Arora proposed by Rohit et al [19] is a packing detectiontechnique for analyzing the PE information in a packed fileusing a heuristic algorithm It defines various parameters forpacking analysis and presents a heuristic algorithm forassigning weights and risk factors to those parameters Basedon the training set to which weights and risk factors areassigned it determines whether or not a given input file ispacked
22 Unpacking Techniques Unpacking techniques areroughly classified into four types e first is a direct analysismethod in which a person unpacks directly using an analysistool is method has been popular in the early days andtypical analysis tools include OllyDbg PlugIn [24] Immu-nity Debugger [25] and IDA PlugIn [26] is manualunpacking is relatively accurate but it takes too much timeto manually analyze all the commands of the packed file
e second is a static unpacking method based on thecharacteristics of the specific packing algorithm(s) Since thismethod is based on the characteristics of the packing al-gorithm it is very useful when the prior knowledge of thealgorithm is known Static unpacking is commonly used inantivirus programs and it has an advantage of no infectionrisk and fast unpacking since the packed file does not need tobe directly executed However it has a disadvantage that wecannot use it if we do not know the packing algorithm usedor if the packing technique is partially modified
e third is a dynamic unpacking method that does notdepend on the packing algorithm(s) ere have been manystudies on dynamic unpacking since we can do unpackingeven without knowing the exact packing algorithm usedJeong et al [21] propose an entropy-based dynamicunpacking method that finds the OEP (original entry point)based on the characteristics of increasing the entropy if thefile is packed Bat-Erdene et al [27] classify unpacking al-gorithms into multiple clusters by measuring the entropychanges during the unpacking process Cesare and Yang [28]propose an algorithm for constructing a control flow graphsignature using an inverse transformation technique eyperform the entropy analysis first to determine whether ornot the PE file is packed and if it is packed they find thehidden code by investigating the end of packing throughdynamic analysis Moreover recent dynamic methods in-clude OmniUnpack [29] Renovo [30] and PinDemonium[31] OmniUnpack unpacks the PE file by detecting theexecuting offset of the original code at the page level insteadof at the instruction level Renovo exploits shadowmemory tomonitor program execution and memory writes andthrough the shadow memory it extracts the hidden codefrom the executable Finally PinDemonium uses the Dy-namic Binary Instrumentation (DBI) technique to performImport Address Table (IAT) analysis JUMP command de-tection and entropy calculation and through these pro-cesses it unpacks the PE file
Table 1 Observations and solutions for the proposed all-in-one unpacking system
Observation Explanation Solution
No phase integration Unpacking-related three phases are separatelydeveloped
Adopt an all-in-one approach integrating all threephases
No detection combination ere is no attempt to combine various existingmethods for packing detection
Combine four packing detection methods to improvedetection accuracy
No real-restoration Main goal is to find OEP Restore unpacked files by performing actualunpacking as well as finding OEP
No unpacking verification ere is no quantitative way to verify the restorationaccuracy
Present a verification algorithm to evaluate theaccuracy of unpacking results
Table 2 Comparison of analytical phases supported by existing and proposed systems
Analytical phases PHAD [15] REMINDer [16] PEframe [20] [11 13 21] All-in-one systemDetection times times
Unpacking Static times times Dynamic times times
Verification times times times
Security and Communication Networks 3
e fourth is an integration method of both static anddynamic unpacking techniques Representative examples ofsuch integration methods are PolyUnpack [26] CoDisasm[32] and BinUnpack [3] PolyUnpack and CoDisasm firstextract the static model through static analysis before exe-cuting the file en they unpack the PE file by comparingthe extracted static model to the dynamic model obtained byrunning the executable PolyUnpack unpacks the file byexploiting iterative comparisons of two models andCoDisasm does it by comparing memory snapshots re-peatedly Finally BinUnpack first performs static analysis bydeveloping a hook-evasion-resistant API monitor and it thenperforms dynamic analysis by monitoring API calls with theanalyzed information
3 Overall Architecture of All-in-OneUnpacking System
As we mentioned in Section 1 we note four importantobservations no phase integration no detection combi-nation no real-restoration and no unpacking verificationIn this section we describe the overall architecture of theall-in-one unpacking system that considers these fourobservations Figure 1 shows a flow diagram of the pro-posed all-in-one system As shown in the figure we im-prove no phase integration by supporting all three phases ofdetection unpacking and verification in an integratedframework In the diagram for a given input PE file (1①)the system first determines whether or not it is packed(2②) If the file is not packed it proceeds to check the nextPE file If it is packed the system checks whether or notthere is an unpacking library used in packing (3③) If theunpacking library exists the system performs staticunpacking using the library (4④) otherwise the systemperforms dynamic unpacking that depends on packingalgorithm (5⑤) After completing the unpacking thesystem calculates the restoration accuracy to verify thecorrectness of unpacking (6⑥)
We use the existing and proposed techniques togetherto implement each phase of the all-in-one unpackingsystem Table 3 summarizes the existing or proposedmethods used in each phase First in the detection phasewe use four techniques to detect the packing of the input PEfile is hybrid approach improves no detection combi-nation Among four techniques the EP section test is a newone proposed in this paper which increases the packingdetection rate by investigating the explicit existence of theEP section and we describe it in detail in Section 41Second in the unpacking phase we use either staticunpacking or dynamic unpacking depending on whether ornot an unpacking library exists is unpacking phaserestores the actual file and improves no real-restorationird in the verification phase we verify the correctness ofunpacking by computing the restoration accuracy Wecalculate the restoration accuracy by comparing the bytecodes of two files is phase improves no unpackingverification and we describe this verification algorithm indetail in Section 43
4 Detection Mechanism of All-in-OneUnpacking System
41 Detection Phase As we explained in Section 3 whilethere have been various packing detection techniques therehas been no attempt to combine the advantages of thesetechniques Since they have been studied separately eachtechnique carries out packing detection focusing on its owncharacteristics only In this paper we propose a hybridapproach to improve the packing detection accuracy bycombining four different techniques
e detection phase consists of four tests EP sectionsignature WRITE attribute and entropy tests In this paperwe describe each of these techniques as Detection Phase-NoEP Section (DP-NEP) Detection Phase-Signature (DP-SIG)Detection Phase-WRITE (DP-WR) and Detection Phase-Entropy (DP-ENT) respectively Table 4 compares thedetection techniques used in the existing work and theproposed approach As shown in the table PEiD supportsonly DP-SIG [17] supports only DP-ENT [16] supports DP-WR and DP-ENT but the proposed method supports allfour techniques
Algorithm 1 presents the proposed hybrid approach ofpacking detection It takes the PE file as an input andconfirms if the file is packed or not Since DP-SIG DP-WRand DP-ENT work based on the EP section of a PE file wefirst check DP-NEP to confirm if the EP section exists If theEP section does not exist we determine that the file is packed(Line 1) On the contrary if the EP section exists we performDP-SIG to check whether or not the PE file has a packersignature If the signature exists ie if a packer is used wedetermine that the file is packed (Line 2) We then performDP-WR and DP-ENT where DP-WR checks whether or notthe WRITE attribute exists in the EP section and DP-ENTinvestigates whether or not the entropy of the EP section is ina specific range Packing-Range For example this can beunder 60 or over 068 in this paper If both conditions holdwe determine it packed and return Packed (Line 3-4) Fi-nally anyone of DP-NEP DP-SIG DP-WR and DP-ENP isnot satisfied we return Not-Packed (Line 5)
We now explain four packing detection cases of Algo-rithm 1 in detail First DP-NEP finds the EP section fromthe input PE file We here conclude that the file is packed if ithas no EP section is is because every PE file has an EPsection so no EP section means that someone hides thesection intentionally We can know the existence of the EPsection by investigating an EP section address in the headerof the PE file We explain the effect of this DP-NEP case indetail According to actual experimental results we couldnot find the EP section from some PE files packed by apacker such as Upack [34] and PESpin [35] is is becausesuch packers intentionally hide or scramble the EP sectionObviously we need to detect these PE files as the packedones in the detection phase and Algorithm 1 does Algo-rithm 1 detects such ldquoNo EP Sectionrdquo PE file as the DP-NEPcase resulting in improving the packing detection rate Inother words considering DP-NEP in Algorithm 1 we ex-ploit the effect of increasing the packing detection rate
4 Security and Communication Networks
especially for Unpack and PESpin packers In the case of DP-NEP however we cannot find the EP section itself andaccordingly we cannot proceed the next phases ofunpacking and verification More precisely unpacking suchDP-NEP packers might be possible if we manually modifythe raw hexadecimal codes by using a binary editorHowever this manual raw datamodification is beyond scopeof the paper and we do not consider the manual modifi-cation of hexadecimal codes In summary we use theconcept of ldquoNo EP Sectionrdquo in the detection phase of Al-gorithm 1 to increase the packing detection rate but we donot consider it in the unpacking and verification phases
Second in DP-SIG we investigate whether or not thesignature extracted from the EP section is in the signaturedatabase We construct the signature database with a total of7 000 signatures consisting of 4 200 signatures provided byPEiD [23] and 2 800 signatures collected from BobSoft [36]
Figure 2 shows three sample signatures of the packers storedin the database e first line of each packer represents thetype and version of the packer and the maker of the sig-nature the second line is the hexadecimal values of thepackerrsquos signature and the third line of the ep_only columnis true or false that indicates whether the signature can befound in the EP section
ird in DP-WR we check the WRITE attribute of theEP section In the process of unpacking a packed file thesystem needs to modify the memory so the packed fileabsolutely requires the WRITE attribute us we suspectthat the file is packed if theWRITE attribute exists Howeversince a normal file can also have the WRITE attribute wecannot determine packing only by the existence of theWRITE attribute erefore as proposed in [16] we detectpacking using the WRITE attribute and the entropy valuetogether
Fourth in DP-ENT we compute the entropy of an inputPE file using equation (1) and check whether that entropy isin a specific range (Packing-Range in Algorithm 1) is isbased on the previous observation [16 17] that the entropyof a packed file differs from that of an ordinary file Wecompute the entropy value for the EP section only not theentire file using equation (1) To use DP-ENT we need to setPacking-Range used for determining whether or not a file ispacked In this paper we set this range through the ex-periment in Section 53
42 Unpacking Phase As mentioned in Section 22unpacking which is performed automatically without hu-man intervention can be divided into static unpacking anddynamic unpacking Static unpacking is based on thecharacteristics of the packing algorithm so it is fast and hasno risk of infection because of no direct execution of thepacked file On the contrary dynamic unpacking has acharacteristic that it does not depend on a specific packingalgorithm even though it takes a long analysis time In thispaper we use both static and dynamic methods to maximizethe unpacking accuracy More specifically if the packer typefound in the detection phase provides an unpacking librarywe use the static unpacking otherwise we perform the
Detection phase
Unpacking phase
Verification phase
① Input PE file
② Packing detection
④ Static unpacking ⑤ Dynamic unpacking
Yes
No
③ Unpacking library
Yes No
⑥ Verification
Figure 1 Overall working mechanism of the all-in-one unpacking system
Table 3 Existing or proposed techniques used in each phase of theall-in-one unpacking system
Analyticalphase Techniques used or proposed
Detection
(i) EP Section test to be proposed in Section 41(ii) Signature test [23]
(iii) WRITE attribute test [16](iv) Entropy test [17]
Unpacking(i) Staticmdashlibrary-based unpacking [33]
(ii) Dynamicmdashentropy change-based unpacking[14 27]
Verification Verification algorithm to be proposed in Section43
Table 4 Detection techniques used in existing and proposedapproaches
Method DP-NEP DP-SIG DP-WR DP-ENTPEiD [23] times times times
[17] times times times [16] times times Proposed approach
Security and Communication Networks 5
dynamic unpacking We can easily solve the static unpackingby using the found unpacking library for each packer so wefocus on explaining how the dynamic unpacking works indetail in this section Our dynamic unpacking method is alsoknown as the ldquoexecute-after-writerdquomethodMore precisely theproposed method first restores the original file into thememory by finding the OEP from the packed file and thenanalyzes (ie executes) the restored original file in thememoryIn order to avoid the infection of realmachines we perform theunpacking and validation phases in the debugging mode
e OEP is the address of the first instruction to executeafter the packed file is unpacked and it represents the entrypoint of the original program In other words since the codesfollowing the OEP represent the starting codes of the originalprogram finding such OEP is the most important issue indynamic unpacking To find the OEP we use the entropychange-based analysis proposed by Bat-Erdene et al [27]When a packed file is executed the packed data is uncom-pressed and written to the memory and at this time the datamight be changed or added to the file Because of this datachange the entropy value continuously changes during theunpacking process If unpacking is successful the entropy valueis stabilized the original file is restored and the IP (instructionpointer) is moved to another section of the restored original filefor its execution us we can find the OEP by examiningwhether the IP moves to another section when there is nochange in the entropy value of each section (If the heuristic ofinvestigating change of IP sections is known someone maycreate a malicious packing method that can evade the
heuristic However for this avoidance the OEP must befound without change of IP sections and packing a PE filesuch way might not be easy According to the actualanalysis result we do not find such cases in the 19 packerswe covered Even though the case hardly exists it mighthave occurred in the worst case We leave the advancedapproach that works against such avoidance as a furtherstudy) To accomplish this we need to execute the targetprocess in units of instructions and monitor the entropychange however measuring the entropy value after everyinstruction is very time-consuming and very inefficientWe here note that the OEP generally exists after a branchinstruction so we calculate the entropy value only whenthe instruction is JMP Conditional JMP (CJMP) orRETN We also note that the iterations at program exe-cution are independent of the OEP so we store and reusethe entropy values measured after JMP CJMP and RETNinstructions to avoid duplicated calculations by the it-erations and reduce the total analysis time
Algorithm 2 shows the proposed dynamic unpackingalgorithm based on the entropy changes and it returnsthe OEP address for a given PE file First it computes theinitial entropy of each section (Line 1) en it repeatsthe algorithm until the process ends (Lines 2 to 18) If thecurrent IP address is greater than the last address of thecurrent section it returns Not-Found (Line 4) is isbecause IP moves to the next section without a JMPfamily instruction so it cannot find OEP in this sectionOn the contrary if IP is one of JMP CJMP or RETN itexecutes the file up to IP and stores the address of thenext instruction in DstAddr (Lines 5 to 7) But if IP existsin IP-history or DstAddr exists in DstAddr-history itmeans an iterative instruction or address so we move tothe next IP (Line 8) Otherwise it stores IP and DstAddrinto each history for the next iteration (Line 9) At thistime if DstAddr is out of the range of the file it returnsNot-Found because it cannot find OEP within the file(Line 10) If DstAddr is in the normal range it calculatesthe entropy of each section again and compares thecurrent entropy with the previous entropy to checkwhether it is stable (Lines 11 to 13) If the entropy changeis stable and DstAddr is in a different section it meansthat unpacking a file is successful and IP is moved toanother section by a JMP family instruction us it
[Code-Lock vxx]
signature = 47 8B C2 05 1E 00 52 8B D0 B8 02 3D CD 21 8B D8 5A
ep_only = true
[CodeCrypt v014b]
signature = E9 C5 02 00 00 EB 02 83 3D 58 EB 02 FF 1D 5B EB 02 0F C7 5F
ep_only = true
[CodeCrypt v015b]
signature = E9 31 03 00 00 EB 02 83 3D 58 EB 02 FF 1D 5B EB 02 0F C7 5F
ep_only = true
Figure 2 Examples of packer signatures stored in the signaturedatabase
Input A PE fileOutput Packed or Not-Packed Packing detection resultbegin
(1) if No EP Section then return Packed DP-NEP(2) else if Packer Signature Found then return Packed DP-SIG(3) else if ldquoWRITErdquo enabled and EP Section Entropy isin Packing-Range then(4) return Packed DP-WR DP-ENT(5) else return Not-Packed(6) end-if
end
ALGORITHM 1 Packing detection
6 Security and Communication Networks
stops the execution and returns DstAddr as OEP (Lines13 and 14)
43 Verification Phase In this section we present an al-gorithm for measuring the restoration accuracy by com-paring the restored file with its original file To execute thepacked file the proposed system uses dynamic unpackingand finds the OEP during its execution If the system findsthe OEP we assume the restoration of an original file issuccessful and regard that file as the restored file In theverification phase we quantitatively measure the restorationaccuracy by comparing such restored file and its original file
Figure 3 shows an example of comparing some bytecodesof the original file with those of the restored file For thecomparison we first pack the pspasswdexe file provided byMicrosoft [37] and then unpack it again by dynamicunpacking e figure shows that the bytecodes of the foursections of the original file text rdata data and rsrc areidentically appeared in UPX0 UPX1 and rsrc sections ofthe restored file at is even if the name and location ofeach section are different the same bytecodes exist in therestored file However the restored file contains padding andgarbage values and some bytecodes may be in other loca-tions in the original file us in order to compare theoriginal and the restored files we need to search where theoriginal bytecodes are located in the restored file Sincemanually comparing the bytecodes of the original and re-stored files takes a very long time in this paper we proposean efficient search algorithm using a hash function We usethe well-known MD5 [38] as a hash function of the searchalgorithm As a message reduction algorithm MD5 is widelyused in checking the integrity of long data
Algorithm 3 shows the restoration accuracy calculationalgorithm Its basic principle is to check whether eachsection of the original file exists in the restored file We hereuse the MD5 hash function to speed up the scan e inputsto the algorithm are an original file A and a restored file Band the output is restorationAccuracy a list of storing res-toration accuracy values of all sections First we removegarbage and padding values from the restored file (Line 1)For each section Ai of A we obtain the length curLen of thesection and calculate a hash value H1 by applying the MD5hash function (Lines 4 to 7) We then calculate another hashvalue H2 from the recovered file B by considering only thelength of curLen (Line 9) If H1 equals to H2 (Line 10) itmeans that B contains Ai so we stop the comparison andcalculate the restoration accuracy of the section (Line 14)However if H1 and H2 are not equal we perform the hashvalue comparison again by decrementing the length curLenof Ai (Lines 6 to 13) We repeat this process to calculate therestoration accuracy for the section Ai (Line 14) Aftercompleting the accuracy calculation for all sections thealgorithm returns the list restorationAccuracy (Line 16)
5 Performance Evaluation
51 Experimental Environment In this section we presentthe results of experimental evaluation to show the superi-ority of the all-in-one unpacking system We conduct theexperiments in the virtual server as well as the physicalserver e reasons for experimenting in two server envi-ronments are as follows First unpacking results may bedifferent between the virtual server and the physical server ifa packed file has anti-unpacking features such as Anti-VM orAnti-Debug Second the dynamic unpacking of malware
Input A PE fileOutput OEP the start address of the unpacked original filebegin
(1) Measure the entropy value of each section(2) while the process is not reached to the end do(3) IP⟵ the current instruction pointer(4) if IPrsquos address gt current sectionrsquos last address then return Not-Found(5) else if IP is JMP or CJMP or RETN then(6) Execute instructions until IP(7) DstAddr⟵ the next instruction address(8) if IP notin IP-Histroy and DstAddr notin DstAddr-History then(9) Store IP and DstAddr into each history(10) if DstAddr is out of the file then return Not-Found(11) else(12) Re-calculate the entropy value of each section(13) if Entropy change is stable and DstAddr is in the different section then(14) return DstAddr(15) end-if(16) end-if(17) end-if(18) end-while(19) return Not-Found
end
ALGORITHM 2 Dynamic unpacking algorithm
Security and Communication Networks 7
files may infect the physical server us we performunpacking first in the virtual server and then perform theexperiment in the physical server only when there is noabnormality e hardware specifications of the experi-mental platform are as follows e virtual server is IntelCore i7-6700 340GHz with 8GB RAM and the physical
server is Intel Core i7-6700 340GHz with 32GB RAM etarget operating system is Windows 7 OS 32Bit and we usePEVIEW to view the structure and content of 32 bit PE filesAs the debugger we use Immunity Debugger 185 [25] toimplement the dynamic unpacking phase of Algorithm 2 Toimplement the all-in-one unpacking system including the
text section
rdata section
data section
rsrc section
0000040000000410000004200000043000000440
00012C0000012C1000012C2000012C3000012C40
0001F0000001F0100001F0200001F0300001F040
0002040000020410000204200002043000020440
5566D10F5E
904E00185A
D028209890
0010000100
8B8BFAB75D
FFFF00FEFE
414345464A
0000000000
EC022B11C3
0101000101
4141514141
0000000000
8B83F18DCC
0000000000
0000000000
0000000000
4DC28D49CC
7C40
DC2662
5080404818
0020005000
0C024202CC
FFFFFDFEFE
424445474B
0000000000
8B660166CC
0101010101
4141414141
0000000001
D185EB89CC
0000000000
0000000000
0080008000
56C00754CC
6C00F03ACC
88BCB02850
0018000001
8D758D0ECC
FF00FDFEFD
424445494B
0000000000
72F5A4FECC
0100010101
4141414141
0000000000
022B2466CC
0000000000
0000000000
0000000000
8DD60085CC
5E26024A7C
C0F420B080
0038000068
648B00D2CC
FFFFFEFEFE
424446494B
0000000000
24750075CC
0101010101
4141414141
0200100000
000800F0CC
0000000000
0000000000
0080000080
(a)
UPX0 section
UPX1 section
UPX1 section
rsrc section
0040100000401010004010200040103000401040
0041FB0C0041FB1C0041FB2C0041FB3C0041FB4C
0042A0000042A0100042A0200042A0300042A040
0042100000421010004210200042103000421040
5566D10F5E
904E00185A
0010000100
D028209890
8B8BFAB75D
FFFF00FEFE
0000000000
414345464A
EC022B11C3
0101000101
0000000000
4141514141
8B83F18DCC
0000000000
0000000000
0000000000
4DC28D49CC
7C40
DC2662
0020005000
5080404817
0C024202CC
FFFFFDFEFE
0000000000
424445474B
8B660166CC
0101010101
0000000001
4141414141
D185EB89CC
0000000000
0080008000
0000000000
56C00754CC
6C00F03ACC
0018000001
88BCB02850
8D758D0ECC
FF00FDFEFD
0000000000
424445494B
72F5A4FECC
0100010101
0000000000
4141414141
022B2466CC
0000000000
0000000000
0000000000
8DD60085CC
5E26024A7C
0038000068
C0F420B080
648B00D2CC
FFFFFEFEFE
0000000000
424446494B
24750075CC
0101010101
0200100000
4141414141
000800F0CC
0000000000
0080000080
0000000000
(b)
Figure 3 Comparison of sample bytecodes of the original and restored files (a) Sections of the original file (b) Sections of the restored file
Input A an original fileB a restored file
Output restorationAccuracy[1n] a list of restoration accuracy values for n sectionsbegin
(1) Remove garbage and padding values from B(2) A⟵ A1 A2 An1113864 1113865 where Ai is the i-th section of A(3) for each section Ai isin A do(4) totalLen⟵ Ailength(5) curLen⟵ totalLen(6) while curLen gt 0 do(7) H1⟵ hash (Ax[1 curLen]) hash a part of section Ai
(8) for j⟵ 1 to (Blength minus curLen + 1) do(9) H2⟵ hash (B[j j + curLen minus 1]) hash a part of B(10) if H1 H2 then break(11) else curLen⟵ curLen minus 1(12) end-for(13) end-while(14) restorationAccuracy[i] ⟵ (curLentotalLen) times 100(15) end-for(16) return restorationAccuracy
end
ALGORITHM 3 Restoration rate calculation algorithm
8 Security and Communication Networks
dynamic unpacking algorithm we run Python scripts on topof Immunity Debugger
In the experiment we use a total 130 PE files where 90files are provided by Microsoft [37] and 40 files are ran-domly collected from putty [39] and MD5 [40] sites Weperform packing of 130 executable files using 19 packersrespectively and construct a dataset consisting of a total of2600 packed files including original 130 filese 19 packersused are UPX ASPack NSPack MPRESS Yodarsquos ProtectorMEW Packman RLPack BeRoEXEPacker PECompactPetite JDpack Molebox eXpressor Yodarsquos Crypter FSGexe32pack WinUpack and Neolite e reason for con-figuring the dataset by packing the normal PE files directly isthat we can quantitatively measure the packing detectionaccuracy the false-positive rate and the false-negative rateonly if the experimental files are packed in advance
In this paper we conduct three experiments e firstexperiment is for an integration feature of the all-in-onesystem which verifies that the system supports all threephases of packing detection unpacking and verification inan integrated framework e second experiment is for thedetection phase where we first set Packing-Range an en-tropy range for judging the packing and then evaluate thepacking detection accuracy using the range e third ex-periment is for the unpacking and verification phases wherewe verify whether the system unpacks the PE file correctly bycalculating the restoration accuracy
52 Experimental Results on Overall Working MechanismIn this section we confirm through experiments that theintegration mechanism of all-in-one unpacking systemworks correctly Figure 4 shows the operation screenshot ofeach phase of the all-in-one system First Figure 4(a) showsan example screenshot of the detection phase To confirmall four cases of packing detection work correctly even if wedetect the packing by the previous case(s) we continueexecuting the remaining cases In Figure 4(a) we firstexamine DP-NEP of Algorithm 2 We detect the EP sectionin DP-NEP so we proceed to DP-SIG (the first two lines)By the DP-SIG test we detect the UPX packer so weconclude that the file is packed (the second two lines) Eventhough detection is completed we proceed to the next DP-ENT and DP-WR tests e entropy value is 793 which iscontained in Packing-Range and the attribute value of theEP section is ldquo0xe0000040rdquo which includes the WIRTEattribute us we conclude that the file is also packed byDP-ENT and DP-WR tests (the third three lines)According to Figure 4(a) we can confirm that DP-NEPDP-SIG DP-ENT and DP-WR tests of the detection phaseall work correctly
Figures 4(b) and 4(c) show example screenshots of theunpacking phase In the detection phase if an unpackinglibrary of the packer detected exists we perform staticunpacking as shown in Figure 4(b) if not we do dynamicunpacking as shown in Figure 4(c) Figure 4(b) shows theresult of static unpacking by the UPX commercial tool [33]for the UPX packed file In the figure the file size increasesfrom 285 760 bytes to 731 200 bytes by unpacking and it
means that the compression ratio is 3908 Figure 4(c)shows the result of the entropy-based dynamic unpackingIn the figure the system calculates the entropy of eachsection if IP encounters JMP and CJMP instructions itreturns the current DstAddr as OEP if the entropy change isstable and DstAddr is in the different section
Figure 4(d) shows an example screenshot of the verifi-cation phase In this example we measure the restorationaccuracy of the restored file the text and data sections of theoriginal filee length of the original text section is 18 439and the length of the unpacked file is 872 555 In the hashvalue comparison since the portion corresponding to theoriginal text section exists in the unpacked file the resto-ration accuracy becomes 100 Similarly we also calculatethe restoration accuracy of the data section as 100
53 Experimental Results onDetection Phase In this sectionwe describe two experimental results of the detection phasee first experiment is to set Packing-Range and the secondis to measure the detection accuracy of the proposed de-tection algorithm In the first experiment we experimentallydetermine Packing-Range used in DP-ENT of Algorithm 1e previous study by Lyda and Hamrock [17] uses theentropy value of the whole file and sets the entropy range ofthe packed file to [6677 6926] On the contrary Han andLee [16] focus on the EP section only and compute theentropy value for the EP section instead of the whole filerough the experiment they set the entropy value to 685and judge a file packed if its entropy is over 685
We also determine the entropy range Packing-Rangethrough the experiments on various packers Figure 5 showsthe entropy distribution of the EP section for differentpackers Figure 5(a) shows the average distribution of theoriginal and 19 packed files Figures 5(b)ndash5(m) show themore detailed entropy distribution for each packerAccording to Figures 5(b)ndash5(m) the entropy distribution ofthe packed file differs depending on the packer type Inparticular as shown in Figures 5(d) 5(e) and 5(k) somepacked files have smaller entropy values than the originalfile and thus we cannot simply set an upper threshold valueas a packed criterion as in the previous study [16] Howeverwe note that while the entropy of packer files distributesover a wide range as shown in Figures 5(c)ndash5(m) the en-tropy of original files distributes only in a specific range asshown in Figure 5(b) More specifically the entropy value ofthe normal file is between 600 and 685 as shown inFigure 5(b) Based on these experimental results we thus setPacking-Range to [0 600) or [685infin] of the EP sectionAccording to [9 29] the entropy method has the disad-vantage of being low accuracy and easy evading To over-come this point we use three other detection techniques ofDP-NEP DP-SIG and DP-WR together with DP-ENT in ahybrid manner Using this hybrid approach we can furtherimprove the detection accuracy
In the second experiment we evaluate the detectionaccuracy and false-negativefalse-positive rates Table 5compares the three detection techniques of the detectionphase with the proposed hybrid approach Since EP sections
Security and Communication Networks 9
(a) (b)
(c) (d)
Figure 4 Operation screenshots of detection unpacking and verification phases (a) Detection phase (b) Unpacking phase (staticunpacking) (c) Unpacking phase (dynamic unpacking) (d) Verification phase
0
50
100
150
200
250
300
350
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Not packed (b)Summation of (c) to (m)
Num
ber o
f file
s
(a)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(b)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(c)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(d)
Figure 5 Continued
10 Security and Communication Networks
Num
ber o
f file
s0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(e)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(f )
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(g)
Num
ber o
f file
s
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(h)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(i)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(j)
Num
ber o
f file
s
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(k)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(l)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(m)
Figure 5 Entropy distributions of the EP section for original and representative packed files (a) Average of original and 19 packers(b) Original (not packed) (c) UPX (d) ASPack (e) NSPack (f ) MPRESS (g) Yodarsquos Protector (h) RLPack (i) BeroEXE (j) MEW(k) PACKMAN (l) WinUpack (m) exe32pack
Security and Communication Networks 11
are mandatory in all PE files we can detect EP sectionsunless intentionally concealed All files of the dataset used inthe experiment also have EP sections and the detection rateby DP-NEP is trivially 0 In other words there is nopacking detection case due to absence of the EP section sowe exclude the experimental result of DP-NEP from Table 5
We now explain the results of Table 5 by each columnFirst we explain the column of DP-SIG In the case of DP-SIG Not-Packed shows a detection rate of 0 since nosignature exists in normal files is is because packingdetection does not occur in normal files which means thatDP-SIG performs an accurate detection for Not-Packed filesOn the other hand for the packed files DP-SIG detects 100of packers except NSPack e detection rate of NSPack ismerely 538 because of the nature of NSPack that thesignatures often exist in different sections rather than the EPsection In summary DP-SIG shows 950 of detectionaccuracy on average
Second for the experiment on the detection rate of DP-WR we intentionally remove the WRITE attribute from 39files (30 of packed files) per packer We perform thisremoval process in 18 packers except exe32pack e reasonfor changing the dataset is that normal packed files usuallyhaveWRITE attributes so we arbitrarily remove theWRITEattribute from 30 of files for the DP-WR experimentLooking at the DP-WR column in Table 5 DP-WR detectsonly 70 files per packer for 18 packers since we in-tentionally remove the WRITE attribute from 30 files Incase of exe32pack however we need not perform the re-moval process since it has no WRITE attribute in the PEheader And accordingly as shown in the column DP-WR ofTable 5 the detection rate of exe32pack by DP-WR istrivially 0 In the case of Not-Packed the detection rate is0 since normal files have noWRITE attribute In summary
checking the presence of theWRITE attribute is very simpleand DP-WR works correctly if the packed file has theWRITE attribute but it does not work well if the WRITEattribute does not exist or maliciously modified
ird the column of DP-ENT shows the result of en-tropy-based detection rates As shown in the table it showsthe 100 detection rate for 9 packers including NSPackUPX MEW and BeRoEXEPacker but it incurs some falsenegatives for 10 packers including ASPack Yodarsquos ProtectorPackman and MPRESS by showing the detection rate lowerthan 100 In particular for MPRESS Petite JDpackeXpressor and Neolite it shows a low detection rate below246 because their entropy value in the packer charac-teristic is in entropy distribution of the normal file DP-ENTalso has a false-positive rate of 384 for Not-Packed Insummary DP-ENT works very accurately for most packersbut it may incur false negatives for a few specific packers orfalse positives for some normal files To overcome thislimitation we use DP-ENT together with other detectiontechniques We can see the improvement through the ex-perimental results of Tables 5 and 6 In the tables when usingonly DP-ENT the detection accuracy is low in some packers(especially MPRESS) but when using it together with theother three techniques the accuracy is close to 100
Fourth the proposed hybrid approach of integratingfour techniques detects 100 of all packers except NSPackand shows no false positives e reason for this high de-tection rate is that the proposed method has all the ad-vantages of DP-SIG DP-WR and DP-ENT as well as DP-NEP In case of NSPack however the detection rate ismerely 700 this is because as mentioned above weforcedly remove the WRITE attribute from the 30 fileAccording to the results of Table 5 detection rate differsaccording to each detection technique and some techniques
Table 5 Comparison of packing detection rates of three techniques and the proposed hybrid approach
PackerTechnique
DP-SIG () DP-WR () DP-ENT () Hybrid approach ()ASPack 100 700 915 100NSPack 538 700 100 700MPRESS 100 700 246 100UPX 100 700 100 100Yodarsquos Protector 100 700 969 100MEW 100 700 100 100BeRoEXEPacker 100 700 100 100Packman 100 700 938 100RLPack 100 700 993 100PECompact 100 700 100 100Petite 100 700 000 100JDpack 100 700 562 100Molebox 100 700 100 100eXpressor 100 700 000 100Yodarsquos Crypter 100 700 915 100FSG 100 700 100 100exe32pack 100 000 100 100WinUpack 100 700 100 100Neolite 100 700 161 100Average 950 663 739 984Not-Packed 000 000 384 000
12 Security and Communication Networks
incur false positives or false negatives on the other hand theproposed hybrid shows the highest detection rate of 984and no false positives
Table 6 shows the detection accuracy of two or threecombinations of detection techniques and of the proposedhybrid approach First DP-SIG ENT a combination ofsignature and entropy tests detects 100 of the packed filebut detects 18 files incorrectly showing a false-positive rateof 073 Second DP-ENTWR a combination of entropyand WRITE tests shows a low detection rate of 920because the entropy value of MPRESS is in the distributionof normal file ird DP-SIG ENT WR a combination ofsignature entropy and WRITE tests and our hybrid ap-proach show 984 detection accuracy and 160 false-negative rate is is because DP-SIG ENT WR and thehybrid approach go through the same detection procedureexcept DP-NEP e false-negative rate of 160 is due tothe files having no WRITE attributes According to theresults of Tables 5 and 6 we believe that the proposed hybridapproach is an excellent method for high detection accuracyby combining the advantages of the existing and new de-tection techniques
54 Experimental Results on Unpacking and VerificationPhases In this section we verify the unpacking and veri-fication phases by performing the unpacking of Algorithm 2and calculating the restoration accuracy of the files throughthe actual restoration of Algorithm 3 However unpackingand verification phases take a lot of time so we experimentwith randomly selected 40 files of the dataset eunpacking phase consists of static unpacking and dynamicunpacking In the experiment we exclude static unpackingsince it uses the existing library as it is and focus on onlydynamic unpacking of Algorithm 2 since it tries to restorethe original file without any information of packers
Table 7 shows the restoration accuracy of the proposedmethod for text and data sections for each of the 19 packersWe cannot directly unpack Yodarsquos Protector and PECom-pact since they use Anti-Debug engineering techniques Toovercome this problem we unpack these packers bybypassing three Anti-Debugging API calls GetCurrent-ProcessID() BlockInput() and IsDebuggerPresent() [41]at is we run Algorithm 2 with the bypassing techniquesfor Yodarsquos Protector and PECompact and include its ex-perimental result in Table 7) In general since only text anddata sections of the original file are packed [42] in theexperiment we consider only the restoration accuracy ofthese text and data sections of the restored file In Table 7among 19 packers 11 packers including UPX NSPackMEW RLPack and BeRoEXEPacker show at 100 resto-ration accuracy for both text and data sections On the other
hand 8 packers including ASPack Packman and MPRESSshow less than 100 restoration accuracy because of themanner in which the packers handle the reloc section Morespecifically if there is a reloc section in the original file theunpacking relocates the text and data sections changing allthe data values in each section erefore the restorationaccuracy depends on whether there is a reloc section in theoriginal file
Table 8 shows the restoration results in detail accordingto the presence of the reloc section As shown in the tablethe restoration accuracy is 100 in the absence of the relocsection while it is very low in the range of 000 to 998 inthe presence of the reloc section However the low resto-ration accuracy is due to the nature of the packer and it doesnot mean that unpacking does not work correctly In otherwords even if the unpacking is successful the data valuemay change due to the packer characteristic and the res-toration accuracy is calculated to be low In summary wecan use the proposed verification algorithm to calculate therestoration accuracy of unpacked files which can quanti-tatively measure the accuracy of the actual restoration
In addition to text data and reloc sections we furtherconsider the PE header which includes the IAT informationIn general the IAT is located in the header of the PE file andif we actually need to run the unpacked file we must alsorestore this header information including the IAT In factthe study of [3 43] restores the header including the IATandproduces the actual whole executable file On the other handmost studies including this paper do not restore the actualwhole executable file itself but restore text data and relocsections which are major parts of the executable file is isbecause the purpose of unpacking is not to execute therestored file but to identify and extract malicious codesincluded in the restored file erefore in this paper we donot need to restore the header required for the actual ex-ecution and accordingly comparison of IAT information isalso not necessary Instead we restore and compare textdata and reloc sections that are more likely to hidemalicious code Restoring the PE header is beyond the scopeof this study we leave it as a separate research topic
6 Conclusions
In this paper we analyzed recent studies of packing de-tection and unpacking techniques and derived four im-portant observations no phase integration no detectioncombination no real-restoration and no unpacking verifi-cation We then proposed an all-in-one unpacking system toimprove these four observations First no phase integrationwas a difficulty in which analysts had to perform packingdetection unpacking and verification phases separately andwe solved this problem by presenting an all-in-one
Table 6 Comparison of packing detection accuracy of two or three combinations and of the hybrid approach
Metrics DP-SIG ENT () DP-ENT WR () DP-SIG ENT WR () Hybrid approach ()Detection accuracy 950 920 984 984False-negative rate 500 800 160 160False-positive rate 073 000 000 000
Security and Communication Networks 13
Tabl
e7
Restorationaccuracy
oftext
anddatasections
ofeach
packer
Section
UPX ()
NSP
ack
()
MEW ()
RLPa
ck(
)Be
RoEX
E(
)Neolite
()
FSG
()
eXpressor
()
Molebox
()
Petite
()
JDpack
()
ASP
ack
()
Yodarsquos
protector
()
exe32p
ack
()
MPR
ESS
()
Yodarsquos
crypter
()
PECom
pact
()
WinUpack
()
Packman
()
Average
()
text
100
100
100
100
100
100
100
100
100
100
100
843
840
808
788
745
740
680
550
892
data
100
100
100
100
100
100
100
100
100
100
100
843
840
100
753
745
740
680
485
902
14 Security and Communication Networks
unpacking system of integrating these three phases in anintegrated framework Second no detection combinationwas due to that there was no attempt to combine variouspacking detection methods and we solved this problem bypresenting a hybrid approach of combining existing and newpacking detection methods ird no real-restoration camefrom that there was little discussion about restoring theactual executable file and we solved this problem by actuallyrestoring an original image of the packed file Fourth nounpacking verification was due to that there was no explicitwork to measure the restoration accuracy of unpacked filesand we solved this problem by proposing a verificationalgorithm of quantitatively evaluating the restorationaccuracy
We constructed a dataset of 2600 PE files and experi-mentally evaluated the proposed system using the datasetFirst we performed all the phases of the all-in-oneunpacking system sequentially and confirmed that its overallworking mechanism worked well Next through the com-parison of the hybrid packing detection approach andexisting detection techniques we showed that the detectionaccuracy of the proposed method was the highest at 984on average with no false positives For this we also de-termined an entropy range of packing Packing-Rangethrough extensive experiments on 19 representative packersFinally in the unpacking and verification phases we con-firmed that unpacking was successful for all files except thefiles packed with Yodarsquos Protector We also showed that therestoration accuracy of unpacked files was nearly 100except for a few packers
As the future research we will focus on unpackingpacked files with Anti-VM Anti-Debug and Anti-Reverseengineering techniques We will also study how to computethe restoration accuracy of the file having reloc sections
Data Availability
All the data files used in the experiments are available athttpsgithubcomchesvectainPackingData
Conflicts of Interest
e authors declare that they have no conflicts of interest
Acknowledgments
is work was partly supported by the Institute of In-formation and Communications Technology Planning andEvaluation (IITP) grant funded by the Korea government(MSIT) (no 2017-0-00158 Development of Cyber reat
Intelligence (CTI) analysis and information sharing tech-nology for national cyber incident response) and the KoreaElectric Power Corporation (no R18XA05)
References
[1] M Morgenstern ldquoAV-TEST Security report 201718rdquo 2019httpswwwav-testorgfileadminpdfsecurity_reportAV-TEST_Security_Report_2017-2018pdf
[2] T Ban R Isawa S Guo D Inoue and K Nakao ldquoEfficientmalware packer identification using support vector machineswith spectrum kernelrdquo in Proceedings of the Eighth Asia JointConference on Information Security pp 69ndash76 Seoul SouthKorea July 2013
[3] B Cheng M Jiang J Fu et al ldquoTowards paving the way forlarge-scale Windows malware analysis generic binaryunpacking with orders-of-magnitude performance boostrdquo inProceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security pp 395ndash411 TorontoCanada October 2018
[4] A Gupta andM S Arya ldquoHashing based encryption and anti-debugger support for packing multiple files into single exe-cutablerdquo International Journal of Advanced Research inComputer Science vol 9 no 1 pp 914ndash920 2018
[5] Y Kawakoya M Iwamura and M Itoh ldquoMemory behavior-based automatic malware unpacking in stealth debuggingenvironmentrdquo in Proceedings of the 5th International Con-ference on Malicious and Unwanted Software pp 39ndash46Lorraine France October 2010
[6] G Liang J Pang Z Shan R Yang and Y Chen ldquoAutomaticbenchmark generation framework for malware detectionrdquoSecurity and Communication Networks vol 2018 Article ID4947695 8 pages 2018
[7] A Pektas and T Acarman ldquoA dynamic malware analyzeragainst virtual machine aware malicious softwarerdquo Securityand Communication Networks vol 7 no 12 pp 2245ndash22572014
[8] I Santos J Nieves and P G Bringas ldquoSemi-supervisedlearning for unknown malware detectionrdquo in InternationalSymposium on Distributed Computing and Artificial In-telligence A Abraham J M Corchado S R Gonzalez andJ F De Paz Santana Eds pp 415ndash422 Springer BerlinGermany 2011
[9] X Ugarte-Pedrero I Santos B Sanz C Laorden andP G Bringas ldquoCountering entropy measure attacks onpacked software detectionrdquo in Proceedings of the 9th In-ternational Conference on Consumer Communications andNetworking pp 164ndash168 Las Vegas NV USA January 2012
[10] H Won S-P Kim S Lee M-J Choi and Y-S MoonldquoSecure principal component analysis in multiple distributednodesrdquo Security and Communication Networks vol 9 no 14pp 2348ndash2358 2016
[11] S Cesare Y Xiang and W Zhou ldquoMalwisendashan effective andefficient classification system for packed and polymorphic
Table 8 Restoration accuracy according to the presence of the reloc section
SectionASPack () Packman
() Mpres () Yodarsquoscrypter ()
PECompact()
YodarsquosProtector
()
exe32pack()
WinUpack()
times times times times times times times times
text 000 100 998 100 000 100 000 100 000 100 000 100 000 100 000 100
data 000 100 881 100 000 100 004 100 000 100 005 100 100 100 000 100
Security and Communication Networks 15
malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013
[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008
[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008
[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017
[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008
[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean
[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007
[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008
[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013
[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe
[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009
[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008
[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid
[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml
[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml
[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006
[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017
[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010
[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007
[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007
[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016
[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016
[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio
[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014
[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581
[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip
[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip
[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992
[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml
[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https
wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one
[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011
[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016
16 Security and Communication Networks
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
e fourth is an integration method of both static anddynamic unpacking techniques Representative examples ofsuch integration methods are PolyUnpack [26] CoDisasm[32] and BinUnpack [3] PolyUnpack and CoDisasm firstextract the static model through static analysis before exe-cuting the file en they unpack the PE file by comparingthe extracted static model to the dynamic model obtained byrunning the executable PolyUnpack unpacks the file byexploiting iterative comparisons of two models andCoDisasm does it by comparing memory snapshots re-peatedly Finally BinUnpack first performs static analysis bydeveloping a hook-evasion-resistant API monitor and it thenperforms dynamic analysis by monitoring API calls with theanalyzed information
3 Overall Architecture of All-in-OneUnpacking System
As we mentioned in Section 1 we note four importantobservations no phase integration no detection combi-nation no real-restoration and no unpacking verificationIn this section we describe the overall architecture of theall-in-one unpacking system that considers these fourobservations Figure 1 shows a flow diagram of the pro-posed all-in-one system As shown in the figure we im-prove no phase integration by supporting all three phases ofdetection unpacking and verification in an integratedframework In the diagram for a given input PE file (1①)the system first determines whether or not it is packed(2②) If the file is not packed it proceeds to check the nextPE file If it is packed the system checks whether or notthere is an unpacking library used in packing (3③) If theunpacking library exists the system performs staticunpacking using the library (4④) otherwise the systemperforms dynamic unpacking that depends on packingalgorithm (5⑤) After completing the unpacking thesystem calculates the restoration accuracy to verify thecorrectness of unpacking (6⑥)
We use the existing and proposed techniques togetherto implement each phase of the all-in-one unpackingsystem Table 3 summarizes the existing or proposedmethods used in each phase First in the detection phasewe use four techniques to detect the packing of the input PEfile is hybrid approach improves no detection combi-nation Among four techniques the EP section test is a newone proposed in this paper which increases the packingdetection rate by investigating the explicit existence of theEP section and we describe it in detail in Section 41Second in the unpacking phase we use either staticunpacking or dynamic unpacking depending on whether ornot an unpacking library exists is unpacking phaserestores the actual file and improves no real-restorationird in the verification phase we verify the correctness ofunpacking by computing the restoration accuracy Wecalculate the restoration accuracy by comparing the bytecodes of two files is phase improves no unpackingverification and we describe this verification algorithm indetail in Section 43
4 Detection Mechanism of All-in-OneUnpacking System
41 Detection Phase As we explained in Section 3 whilethere have been various packing detection techniques therehas been no attempt to combine the advantages of thesetechniques Since they have been studied separately eachtechnique carries out packing detection focusing on its owncharacteristics only In this paper we propose a hybridapproach to improve the packing detection accuracy bycombining four different techniques
e detection phase consists of four tests EP sectionsignature WRITE attribute and entropy tests In this paperwe describe each of these techniques as Detection Phase-NoEP Section (DP-NEP) Detection Phase-Signature (DP-SIG)Detection Phase-WRITE (DP-WR) and Detection Phase-Entropy (DP-ENT) respectively Table 4 compares thedetection techniques used in the existing work and theproposed approach As shown in the table PEiD supportsonly DP-SIG [17] supports only DP-ENT [16] supports DP-WR and DP-ENT but the proposed method supports allfour techniques
Algorithm 1 presents the proposed hybrid approach ofpacking detection It takes the PE file as an input andconfirms if the file is packed or not Since DP-SIG DP-WRand DP-ENT work based on the EP section of a PE file wefirst check DP-NEP to confirm if the EP section exists If theEP section does not exist we determine that the file is packed(Line 1) On the contrary if the EP section exists we performDP-SIG to check whether or not the PE file has a packersignature If the signature exists ie if a packer is used wedetermine that the file is packed (Line 2) We then performDP-WR and DP-ENT where DP-WR checks whether or notthe WRITE attribute exists in the EP section and DP-ENTinvestigates whether or not the entropy of the EP section is ina specific range Packing-Range For example this can beunder 60 or over 068 in this paper If both conditions holdwe determine it packed and return Packed (Line 3-4) Fi-nally anyone of DP-NEP DP-SIG DP-WR and DP-ENP isnot satisfied we return Not-Packed (Line 5)
We now explain four packing detection cases of Algo-rithm 1 in detail First DP-NEP finds the EP section fromthe input PE file We here conclude that the file is packed if ithas no EP section is is because every PE file has an EPsection so no EP section means that someone hides thesection intentionally We can know the existence of the EPsection by investigating an EP section address in the headerof the PE file We explain the effect of this DP-NEP case indetail According to actual experimental results we couldnot find the EP section from some PE files packed by apacker such as Upack [34] and PESpin [35] is is becausesuch packers intentionally hide or scramble the EP sectionObviously we need to detect these PE files as the packedones in the detection phase and Algorithm 1 does Algo-rithm 1 detects such ldquoNo EP Sectionrdquo PE file as the DP-NEPcase resulting in improving the packing detection rate Inother words considering DP-NEP in Algorithm 1 we ex-ploit the effect of increasing the packing detection rate
4 Security and Communication Networks
especially for Unpack and PESpin packers In the case of DP-NEP however we cannot find the EP section itself andaccordingly we cannot proceed the next phases ofunpacking and verification More precisely unpacking suchDP-NEP packers might be possible if we manually modifythe raw hexadecimal codes by using a binary editorHowever this manual raw datamodification is beyond scopeof the paper and we do not consider the manual modifi-cation of hexadecimal codes In summary we use theconcept of ldquoNo EP Sectionrdquo in the detection phase of Al-gorithm 1 to increase the packing detection rate but we donot consider it in the unpacking and verification phases
Second in DP-SIG we investigate whether or not thesignature extracted from the EP section is in the signaturedatabase We construct the signature database with a total of7 000 signatures consisting of 4 200 signatures provided byPEiD [23] and 2 800 signatures collected from BobSoft [36]
Figure 2 shows three sample signatures of the packers storedin the database e first line of each packer represents thetype and version of the packer and the maker of the sig-nature the second line is the hexadecimal values of thepackerrsquos signature and the third line of the ep_only columnis true or false that indicates whether the signature can befound in the EP section
ird in DP-WR we check the WRITE attribute of theEP section In the process of unpacking a packed file thesystem needs to modify the memory so the packed fileabsolutely requires the WRITE attribute us we suspectthat the file is packed if theWRITE attribute exists Howeversince a normal file can also have the WRITE attribute wecannot determine packing only by the existence of theWRITE attribute erefore as proposed in [16] we detectpacking using the WRITE attribute and the entropy valuetogether
Fourth in DP-ENT we compute the entropy of an inputPE file using equation (1) and check whether that entropy isin a specific range (Packing-Range in Algorithm 1) is isbased on the previous observation [16 17] that the entropyof a packed file differs from that of an ordinary file Wecompute the entropy value for the EP section only not theentire file using equation (1) To use DP-ENT we need to setPacking-Range used for determining whether or not a file ispacked In this paper we set this range through the ex-periment in Section 53
42 Unpacking Phase As mentioned in Section 22unpacking which is performed automatically without hu-man intervention can be divided into static unpacking anddynamic unpacking Static unpacking is based on thecharacteristics of the packing algorithm so it is fast and hasno risk of infection because of no direct execution of thepacked file On the contrary dynamic unpacking has acharacteristic that it does not depend on a specific packingalgorithm even though it takes a long analysis time In thispaper we use both static and dynamic methods to maximizethe unpacking accuracy More specifically if the packer typefound in the detection phase provides an unpacking librarywe use the static unpacking otherwise we perform the
Detection phase
Unpacking phase
Verification phase
① Input PE file
② Packing detection
④ Static unpacking ⑤ Dynamic unpacking
Yes
No
③ Unpacking library
Yes No
⑥ Verification
Figure 1 Overall working mechanism of the all-in-one unpacking system
Table 3 Existing or proposed techniques used in each phase of theall-in-one unpacking system
Analyticalphase Techniques used or proposed
Detection
(i) EP Section test to be proposed in Section 41(ii) Signature test [23]
(iii) WRITE attribute test [16](iv) Entropy test [17]
Unpacking(i) Staticmdashlibrary-based unpacking [33]
(ii) Dynamicmdashentropy change-based unpacking[14 27]
Verification Verification algorithm to be proposed in Section43
Table 4 Detection techniques used in existing and proposedapproaches
Method DP-NEP DP-SIG DP-WR DP-ENTPEiD [23] times times times
[17] times times times [16] times times Proposed approach
Security and Communication Networks 5
dynamic unpacking We can easily solve the static unpackingby using the found unpacking library for each packer so wefocus on explaining how the dynamic unpacking works indetail in this section Our dynamic unpacking method is alsoknown as the ldquoexecute-after-writerdquomethodMore precisely theproposed method first restores the original file into thememory by finding the OEP from the packed file and thenanalyzes (ie executes) the restored original file in thememoryIn order to avoid the infection of realmachines we perform theunpacking and validation phases in the debugging mode
e OEP is the address of the first instruction to executeafter the packed file is unpacked and it represents the entrypoint of the original program In other words since the codesfollowing the OEP represent the starting codes of the originalprogram finding such OEP is the most important issue indynamic unpacking To find the OEP we use the entropychange-based analysis proposed by Bat-Erdene et al [27]When a packed file is executed the packed data is uncom-pressed and written to the memory and at this time the datamight be changed or added to the file Because of this datachange the entropy value continuously changes during theunpacking process If unpacking is successful the entropy valueis stabilized the original file is restored and the IP (instructionpointer) is moved to another section of the restored original filefor its execution us we can find the OEP by examiningwhether the IP moves to another section when there is nochange in the entropy value of each section (If the heuristic ofinvestigating change of IP sections is known someone maycreate a malicious packing method that can evade the
heuristic However for this avoidance the OEP must befound without change of IP sections and packing a PE filesuch way might not be easy According to the actualanalysis result we do not find such cases in the 19 packerswe covered Even though the case hardly exists it mighthave occurred in the worst case We leave the advancedapproach that works against such avoidance as a furtherstudy) To accomplish this we need to execute the targetprocess in units of instructions and monitor the entropychange however measuring the entropy value after everyinstruction is very time-consuming and very inefficientWe here note that the OEP generally exists after a branchinstruction so we calculate the entropy value only whenthe instruction is JMP Conditional JMP (CJMP) orRETN We also note that the iterations at program exe-cution are independent of the OEP so we store and reusethe entropy values measured after JMP CJMP and RETNinstructions to avoid duplicated calculations by the it-erations and reduce the total analysis time
Algorithm 2 shows the proposed dynamic unpackingalgorithm based on the entropy changes and it returnsthe OEP address for a given PE file First it computes theinitial entropy of each section (Line 1) en it repeatsthe algorithm until the process ends (Lines 2 to 18) If thecurrent IP address is greater than the last address of thecurrent section it returns Not-Found (Line 4) is isbecause IP moves to the next section without a JMPfamily instruction so it cannot find OEP in this sectionOn the contrary if IP is one of JMP CJMP or RETN itexecutes the file up to IP and stores the address of thenext instruction in DstAddr (Lines 5 to 7) But if IP existsin IP-history or DstAddr exists in DstAddr-history itmeans an iterative instruction or address so we move tothe next IP (Line 8) Otherwise it stores IP and DstAddrinto each history for the next iteration (Line 9) At thistime if DstAddr is out of the range of the file it returnsNot-Found because it cannot find OEP within the file(Line 10) If DstAddr is in the normal range it calculatesthe entropy of each section again and compares thecurrent entropy with the previous entropy to checkwhether it is stable (Lines 11 to 13) If the entropy changeis stable and DstAddr is in a different section it meansthat unpacking a file is successful and IP is moved toanother section by a JMP family instruction us it
[Code-Lock vxx]
signature = 47 8B C2 05 1E 00 52 8B D0 B8 02 3D CD 21 8B D8 5A
ep_only = true
[CodeCrypt v014b]
signature = E9 C5 02 00 00 EB 02 83 3D 58 EB 02 FF 1D 5B EB 02 0F C7 5F
ep_only = true
[CodeCrypt v015b]
signature = E9 31 03 00 00 EB 02 83 3D 58 EB 02 FF 1D 5B EB 02 0F C7 5F
ep_only = true
Figure 2 Examples of packer signatures stored in the signaturedatabase
Input A PE fileOutput Packed or Not-Packed Packing detection resultbegin
(1) if No EP Section then return Packed DP-NEP(2) else if Packer Signature Found then return Packed DP-SIG(3) else if ldquoWRITErdquo enabled and EP Section Entropy isin Packing-Range then(4) return Packed DP-WR DP-ENT(5) else return Not-Packed(6) end-if
end
ALGORITHM 1 Packing detection
6 Security and Communication Networks
stops the execution and returns DstAddr as OEP (Lines13 and 14)
43 Verification Phase In this section we present an al-gorithm for measuring the restoration accuracy by com-paring the restored file with its original file To execute thepacked file the proposed system uses dynamic unpackingand finds the OEP during its execution If the system findsthe OEP we assume the restoration of an original file issuccessful and regard that file as the restored file In theverification phase we quantitatively measure the restorationaccuracy by comparing such restored file and its original file
Figure 3 shows an example of comparing some bytecodesof the original file with those of the restored file For thecomparison we first pack the pspasswdexe file provided byMicrosoft [37] and then unpack it again by dynamicunpacking e figure shows that the bytecodes of the foursections of the original file text rdata data and rsrc areidentically appeared in UPX0 UPX1 and rsrc sections ofthe restored file at is even if the name and location ofeach section are different the same bytecodes exist in therestored file However the restored file contains padding andgarbage values and some bytecodes may be in other loca-tions in the original file us in order to compare theoriginal and the restored files we need to search where theoriginal bytecodes are located in the restored file Sincemanually comparing the bytecodes of the original and re-stored files takes a very long time in this paper we proposean efficient search algorithm using a hash function We usethe well-known MD5 [38] as a hash function of the searchalgorithm As a message reduction algorithm MD5 is widelyused in checking the integrity of long data
Algorithm 3 shows the restoration accuracy calculationalgorithm Its basic principle is to check whether eachsection of the original file exists in the restored file We hereuse the MD5 hash function to speed up the scan e inputsto the algorithm are an original file A and a restored file Band the output is restorationAccuracy a list of storing res-toration accuracy values of all sections First we removegarbage and padding values from the restored file (Line 1)For each section Ai of A we obtain the length curLen of thesection and calculate a hash value H1 by applying the MD5hash function (Lines 4 to 7) We then calculate another hashvalue H2 from the recovered file B by considering only thelength of curLen (Line 9) If H1 equals to H2 (Line 10) itmeans that B contains Ai so we stop the comparison andcalculate the restoration accuracy of the section (Line 14)However if H1 and H2 are not equal we perform the hashvalue comparison again by decrementing the length curLenof Ai (Lines 6 to 13) We repeat this process to calculate therestoration accuracy for the section Ai (Line 14) Aftercompleting the accuracy calculation for all sections thealgorithm returns the list restorationAccuracy (Line 16)
5 Performance Evaluation
51 Experimental Environment In this section we presentthe results of experimental evaluation to show the superi-ority of the all-in-one unpacking system We conduct theexperiments in the virtual server as well as the physicalserver e reasons for experimenting in two server envi-ronments are as follows First unpacking results may bedifferent between the virtual server and the physical server ifa packed file has anti-unpacking features such as Anti-VM orAnti-Debug Second the dynamic unpacking of malware
Input A PE fileOutput OEP the start address of the unpacked original filebegin
(1) Measure the entropy value of each section(2) while the process is not reached to the end do(3) IP⟵ the current instruction pointer(4) if IPrsquos address gt current sectionrsquos last address then return Not-Found(5) else if IP is JMP or CJMP or RETN then(6) Execute instructions until IP(7) DstAddr⟵ the next instruction address(8) if IP notin IP-Histroy and DstAddr notin DstAddr-History then(9) Store IP and DstAddr into each history(10) if DstAddr is out of the file then return Not-Found(11) else(12) Re-calculate the entropy value of each section(13) if Entropy change is stable and DstAddr is in the different section then(14) return DstAddr(15) end-if(16) end-if(17) end-if(18) end-while(19) return Not-Found
end
ALGORITHM 2 Dynamic unpacking algorithm
Security and Communication Networks 7
files may infect the physical server us we performunpacking first in the virtual server and then perform theexperiment in the physical server only when there is noabnormality e hardware specifications of the experi-mental platform are as follows e virtual server is IntelCore i7-6700 340GHz with 8GB RAM and the physical
server is Intel Core i7-6700 340GHz with 32GB RAM etarget operating system is Windows 7 OS 32Bit and we usePEVIEW to view the structure and content of 32 bit PE filesAs the debugger we use Immunity Debugger 185 [25] toimplement the dynamic unpacking phase of Algorithm 2 Toimplement the all-in-one unpacking system including the
text section
rdata section
data section
rsrc section
0000040000000410000004200000043000000440
00012C0000012C1000012C2000012C3000012C40
0001F0000001F0100001F0200001F0300001F040
0002040000020410000204200002043000020440
5566D10F5E
904E00185A
D028209890
0010000100
8B8BFAB75D
FFFF00FEFE
414345464A
0000000000
EC022B11C3
0101000101
4141514141
0000000000
8B83F18DCC
0000000000
0000000000
0000000000
4DC28D49CC
7C40
DC2662
5080404818
0020005000
0C024202CC
FFFFFDFEFE
424445474B
0000000000
8B660166CC
0101010101
4141414141
0000000001
D185EB89CC
0000000000
0000000000
0080008000
56C00754CC
6C00F03ACC
88BCB02850
0018000001
8D758D0ECC
FF00FDFEFD
424445494B
0000000000
72F5A4FECC
0100010101
4141414141
0000000000
022B2466CC
0000000000
0000000000
0000000000
8DD60085CC
5E26024A7C
C0F420B080
0038000068
648B00D2CC
FFFFFEFEFE
424446494B
0000000000
24750075CC
0101010101
4141414141
0200100000
000800F0CC
0000000000
0000000000
0080000080
(a)
UPX0 section
UPX1 section
UPX1 section
rsrc section
0040100000401010004010200040103000401040
0041FB0C0041FB1C0041FB2C0041FB3C0041FB4C
0042A0000042A0100042A0200042A0300042A040
0042100000421010004210200042103000421040
5566D10F5E
904E00185A
0010000100
D028209890
8B8BFAB75D
FFFF00FEFE
0000000000
414345464A
EC022B11C3
0101000101
0000000000
4141514141
8B83F18DCC
0000000000
0000000000
0000000000
4DC28D49CC
7C40
DC2662
0020005000
5080404817
0C024202CC
FFFFFDFEFE
0000000000
424445474B
8B660166CC
0101010101
0000000001
4141414141
D185EB89CC
0000000000
0080008000
0000000000
56C00754CC
6C00F03ACC
0018000001
88BCB02850
8D758D0ECC
FF00FDFEFD
0000000000
424445494B
72F5A4FECC
0100010101
0000000000
4141414141
022B2466CC
0000000000
0000000000
0000000000
8DD60085CC
5E26024A7C
0038000068
C0F420B080
648B00D2CC
FFFFFEFEFE
0000000000
424446494B
24750075CC
0101010101
0200100000
4141414141
000800F0CC
0000000000
0080000080
0000000000
(b)
Figure 3 Comparison of sample bytecodes of the original and restored files (a) Sections of the original file (b) Sections of the restored file
Input A an original fileB a restored file
Output restorationAccuracy[1n] a list of restoration accuracy values for n sectionsbegin
(1) Remove garbage and padding values from B(2) A⟵ A1 A2 An1113864 1113865 where Ai is the i-th section of A(3) for each section Ai isin A do(4) totalLen⟵ Ailength(5) curLen⟵ totalLen(6) while curLen gt 0 do(7) H1⟵ hash (Ax[1 curLen]) hash a part of section Ai
(8) for j⟵ 1 to (Blength minus curLen + 1) do(9) H2⟵ hash (B[j j + curLen minus 1]) hash a part of B(10) if H1 H2 then break(11) else curLen⟵ curLen minus 1(12) end-for(13) end-while(14) restorationAccuracy[i] ⟵ (curLentotalLen) times 100(15) end-for(16) return restorationAccuracy
end
ALGORITHM 3 Restoration rate calculation algorithm
8 Security and Communication Networks
dynamic unpacking algorithm we run Python scripts on topof Immunity Debugger
In the experiment we use a total 130 PE files where 90files are provided by Microsoft [37] and 40 files are ran-domly collected from putty [39] and MD5 [40] sites Weperform packing of 130 executable files using 19 packersrespectively and construct a dataset consisting of a total of2600 packed files including original 130 filese 19 packersused are UPX ASPack NSPack MPRESS Yodarsquos ProtectorMEW Packman RLPack BeRoEXEPacker PECompactPetite JDpack Molebox eXpressor Yodarsquos Crypter FSGexe32pack WinUpack and Neolite e reason for con-figuring the dataset by packing the normal PE files directly isthat we can quantitatively measure the packing detectionaccuracy the false-positive rate and the false-negative rateonly if the experimental files are packed in advance
In this paper we conduct three experiments e firstexperiment is for an integration feature of the all-in-onesystem which verifies that the system supports all threephases of packing detection unpacking and verification inan integrated framework e second experiment is for thedetection phase where we first set Packing-Range an en-tropy range for judging the packing and then evaluate thepacking detection accuracy using the range e third ex-periment is for the unpacking and verification phases wherewe verify whether the system unpacks the PE file correctly bycalculating the restoration accuracy
52 Experimental Results on Overall Working MechanismIn this section we confirm through experiments that theintegration mechanism of all-in-one unpacking systemworks correctly Figure 4 shows the operation screenshot ofeach phase of the all-in-one system First Figure 4(a) showsan example screenshot of the detection phase To confirmall four cases of packing detection work correctly even if wedetect the packing by the previous case(s) we continueexecuting the remaining cases In Figure 4(a) we firstexamine DP-NEP of Algorithm 2 We detect the EP sectionin DP-NEP so we proceed to DP-SIG (the first two lines)By the DP-SIG test we detect the UPX packer so weconclude that the file is packed (the second two lines) Eventhough detection is completed we proceed to the next DP-ENT and DP-WR tests e entropy value is 793 which iscontained in Packing-Range and the attribute value of theEP section is ldquo0xe0000040rdquo which includes the WIRTEattribute us we conclude that the file is also packed byDP-ENT and DP-WR tests (the third three lines)According to Figure 4(a) we can confirm that DP-NEPDP-SIG DP-ENT and DP-WR tests of the detection phaseall work correctly
Figures 4(b) and 4(c) show example screenshots of theunpacking phase In the detection phase if an unpackinglibrary of the packer detected exists we perform staticunpacking as shown in Figure 4(b) if not we do dynamicunpacking as shown in Figure 4(c) Figure 4(b) shows theresult of static unpacking by the UPX commercial tool [33]for the UPX packed file In the figure the file size increasesfrom 285 760 bytes to 731 200 bytes by unpacking and it
means that the compression ratio is 3908 Figure 4(c)shows the result of the entropy-based dynamic unpackingIn the figure the system calculates the entropy of eachsection if IP encounters JMP and CJMP instructions itreturns the current DstAddr as OEP if the entropy change isstable and DstAddr is in the different section
Figure 4(d) shows an example screenshot of the verifi-cation phase In this example we measure the restorationaccuracy of the restored file the text and data sections of theoriginal filee length of the original text section is 18 439and the length of the unpacked file is 872 555 In the hashvalue comparison since the portion corresponding to theoriginal text section exists in the unpacked file the resto-ration accuracy becomes 100 Similarly we also calculatethe restoration accuracy of the data section as 100
53 Experimental Results onDetection Phase In this sectionwe describe two experimental results of the detection phasee first experiment is to set Packing-Range and the secondis to measure the detection accuracy of the proposed de-tection algorithm In the first experiment we experimentallydetermine Packing-Range used in DP-ENT of Algorithm 1e previous study by Lyda and Hamrock [17] uses theentropy value of the whole file and sets the entropy range ofthe packed file to [6677 6926] On the contrary Han andLee [16] focus on the EP section only and compute theentropy value for the EP section instead of the whole filerough the experiment they set the entropy value to 685and judge a file packed if its entropy is over 685
We also determine the entropy range Packing-Rangethrough the experiments on various packers Figure 5 showsthe entropy distribution of the EP section for differentpackers Figure 5(a) shows the average distribution of theoriginal and 19 packed files Figures 5(b)ndash5(m) show themore detailed entropy distribution for each packerAccording to Figures 5(b)ndash5(m) the entropy distribution ofthe packed file differs depending on the packer type Inparticular as shown in Figures 5(d) 5(e) and 5(k) somepacked files have smaller entropy values than the originalfile and thus we cannot simply set an upper threshold valueas a packed criterion as in the previous study [16] Howeverwe note that while the entropy of packer files distributesover a wide range as shown in Figures 5(c)ndash5(m) the en-tropy of original files distributes only in a specific range asshown in Figure 5(b) More specifically the entropy value ofthe normal file is between 600 and 685 as shown inFigure 5(b) Based on these experimental results we thus setPacking-Range to [0 600) or [685infin] of the EP sectionAccording to [9 29] the entropy method has the disad-vantage of being low accuracy and easy evading To over-come this point we use three other detection techniques ofDP-NEP DP-SIG and DP-WR together with DP-ENT in ahybrid manner Using this hybrid approach we can furtherimprove the detection accuracy
In the second experiment we evaluate the detectionaccuracy and false-negativefalse-positive rates Table 5compares the three detection techniques of the detectionphase with the proposed hybrid approach Since EP sections
Security and Communication Networks 9
(a) (b)
(c) (d)
Figure 4 Operation screenshots of detection unpacking and verification phases (a) Detection phase (b) Unpacking phase (staticunpacking) (c) Unpacking phase (dynamic unpacking) (d) Verification phase
0
50
100
150
200
250
300
350
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Not packed (b)Summation of (c) to (m)
Num
ber o
f file
s
(a)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(b)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(c)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(d)
Figure 5 Continued
10 Security and Communication Networks
Num
ber o
f file
s0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(e)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(f )
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(g)
Num
ber o
f file
s
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(h)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(i)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(j)
Num
ber o
f file
s
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(k)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(l)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(m)
Figure 5 Entropy distributions of the EP section for original and representative packed files (a) Average of original and 19 packers(b) Original (not packed) (c) UPX (d) ASPack (e) NSPack (f ) MPRESS (g) Yodarsquos Protector (h) RLPack (i) BeroEXE (j) MEW(k) PACKMAN (l) WinUpack (m) exe32pack
Security and Communication Networks 11
are mandatory in all PE files we can detect EP sectionsunless intentionally concealed All files of the dataset used inthe experiment also have EP sections and the detection rateby DP-NEP is trivially 0 In other words there is nopacking detection case due to absence of the EP section sowe exclude the experimental result of DP-NEP from Table 5
We now explain the results of Table 5 by each columnFirst we explain the column of DP-SIG In the case of DP-SIG Not-Packed shows a detection rate of 0 since nosignature exists in normal files is is because packingdetection does not occur in normal files which means thatDP-SIG performs an accurate detection for Not-Packed filesOn the other hand for the packed files DP-SIG detects 100of packers except NSPack e detection rate of NSPack ismerely 538 because of the nature of NSPack that thesignatures often exist in different sections rather than the EPsection In summary DP-SIG shows 950 of detectionaccuracy on average
Second for the experiment on the detection rate of DP-WR we intentionally remove the WRITE attribute from 39files (30 of packed files) per packer We perform thisremoval process in 18 packers except exe32pack e reasonfor changing the dataset is that normal packed files usuallyhaveWRITE attributes so we arbitrarily remove theWRITEattribute from 30 of files for the DP-WR experimentLooking at the DP-WR column in Table 5 DP-WR detectsonly 70 files per packer for 18 packers since we in-tentionally remove the WRITE attribute from 30 files Incase of exe32pack however we need not perform the re-moval process since it has no WRITE attribute in the PEheader And accordingly as shown in the column DP-WR ofTable 5 the detection rate of exe32pack by DP-WR istrivially 0 In the case of Not-Packed the detection rate is0 since normal files have noWRITE attribute In summary
checking the presence of theWRITE attribute is very simpleand DP-WR works correctly if the packed file has theWRITE attribute but it does not work well if the WRITEattribute does not exist or maliciously modified
ird the column of DP-ENT shows the result of en-tropy-based detection rates As shown in the table it showsthe 100 detection rate for 9 packers including NSPackUPX MEW and BeRoEXEPacker but it incurs some falsenegatives for 10 packers including ASPack Yodarsquos ProtectorPackman and MPRESS by showing the detection rate lowerthan 100 In particular for MPRESS Petite JDpackeXpressor and Neolite it shows a low detection rate below246 because their entropy value in the packer charac-teristic is in entropy distribution of the normal file DP-ENTalso has a false-positive rate of 384 for Not-Packed Insummary DP-ENT works very accurately for most packersbut it may incur false negatives for a few specific packers orfalse positives for some normal files To overcome thislimitation we use DP-ENT together with other detectiontechniques We can see the improvement through the ex-perimental results of Tables 5 and 6 In the tables when usingonly DP-ENT the detection accuracy is low in some packers(especially MPRESS) but when using it together with theother three techniques the accuracy is close to 100
Fourth the proposed hybrid approach of integratingfour techniques detects 100 of all packers except NSPackand shows no false positives e reason for this high de-tection rate is that the proposed method has all the ad-vantages of DP-SIG DP-WR and DP-ENT as well as DP-NEP In case of NSPack however the detection rate ismerely 700 this is because as mentioned above weforcedly remove the WRITE attribute from the 30 fileAccording to the results of Table 5 detection rate differsaccording to each detection technique and some techniques
Table 5 Comparison of packing detection rates of three techniques and the proposed hybrid approach
PackerTechnique
DP-SIG () DP-WR () DP-ENT () Hybrid approach ()ASPack 100 700 915 100NSPack 538 700 100 700MPRESS 100 700 246 100UPX 100 700 100 100Yodarsquos Protector 100 700 969 100MEW 100 700 100 100BeRoEXEPacker 100 700 100 100Packman 100 700 938 100RLPack 100 700 993 100PECompact 100 700 100 100Petite 100 700 000 100JDpack 100 700 562 100Molebox 100 700 100 100eXpressor 100 700 000 100Yodarsquos Crypter 100 700 915 100FSG 100 700 100 100exe32pack 100 000 100 100WinUpack 100 700 100 100Neolite 100 700 161 100Average 950 663 739 984Not-Packed 000 000 384 000
12 Security and Communication Networks
incur false positives or false negatives on the other hand theproposed hybrid shows the highest detection rate of 984and no false positives
Table 6 shows the detection accuracy of two or threecombinations of detection techniques and of the proposedhybrid approach First DP-SIG ENT a combination ofsignature and entropy tests detects 100 of the packed filebut detects 18 files incorrectly showing a false-positive rateof 073 Second DP-ENTWR a combination of entropyand WRITE tests shows a low detection rate of 920because the entropy value of MPRESS is in the distributionof normal file ird DP-SIG ENT WR a combination ofsignature entropy and WRITE tests and our hybrid ap-proach show 984 detection accuracy and 160 false-negative rate is is because DP-SIG ENT WR and thehybrid approach go through the same detection procedureexcept DP-NEP e false-negative rate of 160 is due tothe files having no WRITE attributes According to theresults of Tables 5 and 6 we believe that the proposed hybridapproach is an excellent method for high detection accuracyby combining the advantages of the existing and new de-tection techniques
54 Experimental Results on Unpacking and VerificationPhases In this section we verify the unpacking and veri-fication phases by performing the unpacking of Algorithm 2and calculating the restoration accuracy of the files throughthe actual restoration of Algorithm 3 However unpackingand verification phases take a lot of time so we experimentwith randomly selected 40 files of the dataset eunpacking phase consists of static unpacking and dynamicunpacking In the experiment we exclude static unpackingsince it uses the existing library as it is and focus on onlydynamic unpacking of Algorithm 2 since it tries to restorethe original file without any information of packers
Table 7 shows the restoration accuracy of the proposedmethod for text and data sections for each of the 19 packersWe cannot directly unpack Yodarsquos Protector and PECom-pact since they use Anti-Debug engineering techniques Toovercome this problem we unpack these packers bybypassing three Anti-Debugging API calls GetCurrent-ProcessID() BlockInput() and IsDebuggerPresent() [41]at is we run Algorithm 2 with the bypassing techniquesfor Yodarsquos Protector and PECompact and include its ex-perimental result in Table 7) In general since only text anddata sections of the original file are packed [42] in theexperiment we consider only the restoration accuracy ofthese text and data sections of the restored file In Table 7among 19 packers 11 packers including UPX NSPackMEW RLPack and BeRoEXEPacker show at 100 resto-ration accuracy for both text and data sections On the other
hand 8 packers including ASPack Packman and MPRESSshow less than 100 restoration accuracy because of themanner in which the packers handle the reloc section Morespecifically if there is a reloc section in the original file theunpacking relocates the text and data sections changing allthe data values in each section erefore the restorationaccuracy depends on whether there is a reloc section in theoriginal file
Table 8 shows the restoration results in detail accordingto the presence of the reloc section As shown in the tablethe restoration accuracy is 100 in the absence of the relocsection while it is very low in the range of 000 to 998 inthe presence of the reloc section However the low resto-ration accuracy is due to the nature of the packer and it doesnot mean that unpacking does not work correctly In otherwords even if the unpacking is successful the data valuemay change due to the packer characteristic and the res-toration accuracy is calculated to be low In summary wecan use the proposed verification algorithm to calculate therestoration accuracy of unpacked files which can quanti-tatively measure the accuracy of the actual restoration
In addition to text data and reloc sections we furtherconsider the PE header which includes the IAT informationIn general the IAT is located in the header of the PE file andif we actually need to run the unpacked file we must alsorestore this header information including the IAT In factthe study of [3 43] restores the header including the IATandproduces the actual whole executable file On the other handmost studies including this paper do not restore the actualwhole executable file itself but restore text data and relocsections which are major parts of the executable file is isbecause the purpose of unpacking is not to execute therestored file but to identify and extract malicious codesincluded in the restored file erefore in this paper we donot need to restore the header required for the actual ex-ecution and accordingly comparison of IAT information isalso not necessary Instead we restore and compare textdata and reloc sections that are more likely to hidemalicious code Restoring the PE header is beyond the scopeof this study we leave it as a separate research topic
6 Conclusions
In this paper we analyzed recent studies of packing de-tection and unpacking techniques and derived four im-portant observations no phase integration no detectioncombination no real-restoration and no unpacking verifi-cation We then proposed an all-in-one unpacking system toimprove these four observations First no phase integrationwas a difficulty in which analysts had to perform packingdetection unpacking and verification phases separately andwe solved this problem by presenting an all-in-one
Table 6 Comparison of packing detection accuracy of two or three combinations and of the hybrid approach
Metrics DP-SIG ENT () DP-ENT WR () DP-SIG ENT WR () Hybrid approach ()Detection accuracy 950 920 984 984False-negative rate 500 800 160 160False-positive rate 073 000 000 000
Security and Communication Networks 13
Tabl
e7
Restorationaccuracy
oftext
anddatasections
ofeach
packer
Section
UPX ()
NSP
ack
()
MEW ()
RLPa
ck(
)Be
RoEX
E(
)Neolite
()
FSG
()
eXpressor
()
Molebox
()
Petite
()
JDpack
()
ASP
ack
()
Yodarsquos
protector
()
exe32p
ack
()
MPR
ESS
()
Yodarsquos
crypter
()
PECom
pact
()
WinUpack
()
Packman
()
Average
()
text
100
100
100
100
100
100
100
100
100
100
100
843
840
808
788
745
740
680
550
892
data
100
100
100
100
100
100
100
100
100
100
100
843
840
100
753
745
740
680
485
902
14 Security and Communication Networks
unpacking system of integrating these three phases in anintegrated framework Second no detection combinationwas due to that there was no attempt to combine variouspacking detection methods and we solved this problem bypresenting a hybrid approach of combining existing and newpacking detection methods ird no real-restoration camefrom that there was little discussion about restoring theactual executable file and we solved this problem by actuallyrestoring an original image of the packed file Fourth nounpacking verification was due to that there was no explicitwork to measure the restoration accuracy of unpacked filesand we solved this problem by proposing a verificationalgorithm of quantitatively evaluating the restorationaccuracy
We constructed a dataset of 2600 PE files and experi-mentally evaluated the proposed system using the datasetFirst we performed all the phases of the all-in-oneunpacking system sequentially and confirmed that its overallworking mechanism worked well Next through the com-parison of the hybrid packing detection approach andexisting detection techniques we showed that the detectionaccuracy of the proposed method was the highest at 984on average with no false positives For this we also de-termined an entropy range of packing Packing-Rangethrough extensive experiments on 19 representative packersFinally in the unpacking and verification phases we con-firmed that unpacking was successful for all files except thefiles packed with Yodarsquos Protector We also showed that therestoration accuracy of unpacked files was nearly 100except for a few packers
As the future research we will focus on unpackingpacked files with Anti-VM Anti-Debug and Anti-Reverseengineering techniques We will also study how to computethe restoration accuracy of the file having reloc sections
Data Availability
All the data files used in the experiments are available athttpsgithubcomchesvectainPackingData
Conflicts of Interest
e authors declare that they have no conflicts of interest
Acknowledgments
is work was partly supported by the Institute of In-formation and Communications Technology Planning andEvaluation (IITP) grant funded by the Korea government(MSIT) (no 2017-0-00158 Development of Cyber reat
Intelligence (CTI) analysis and information sharing tech-nology for national cyber incident response) and the KoreaElectric Power Corporation (no R18XA05)
References
[1] M Morgenstern ldquoAV-TEST Security report 201718rdquo 2019httpswwwav-testorgfileadminpdfsecurity_reportAV-TEST_Security_Report_2017-2018pdf
[2] T Ban R Isawa S Guo D Inoue and K Nakao ldquoEfficientmalware packer identification using support vector machineswith spectrum kernelrdquo in Proceedings of the Eighth Asia JointConference on Information Security pp 69ndash76 Seoul SouthKorea July 2013
[3] B Cheng M Jiang J Fu et al ldquoTowards paving the way forlarge-scale Windows malware analysis generic binaryunpacking with orders-of-magnitude performance boostrdquo inProceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security pp 395ndash411 TorontoCanada October 2018
[4] A Gupta andM S Arya ldquoHashing based encryption and anti-debugger support for packing multiple files into single exe-cutablerdquo International Journal of Advanced Research inComputer Science vol 9 no 1 pp 914ndash920 2018
[5] Y Kawakoya M Iwamura and M Itoh ldquoMemory behavior-based automatic malware unpacking in stealth debuggingenvironmentrdquo in Proceedings of the 5th International Con-ference on Malicious and Unwanted Software pp 39ndash46Lorraine France October 2010
[6] G Liang J Pang Z Shan R Yang and Y Chen ldquoAutomaticbenchmark generation framework for malware detectionrdquoSecurity and Communication Networks vol 2018 Article ID4947695 8 pages 2018
[7] A Pektas and T Acarman ldquoA dynamic malware analyzeragainst virtual machine aware malicious softwarerdquo Securityand Communication Networks vol 7 no 12 pp 2245ndash22572014
[8] I Santos J Nieves and P G Bringas ldquoSemi-supervisedlearning for unknown malware detectionrdquo in InternationalSymposium on Distributed Computing and Artificial In-telligence A Abraham J M Corchado S R Gonzalez andJ F De Paz Santana Eds pp 415ndash422 Springer BerlinGermany 2011
[9] X Ugarte-Pedrero I Santos B Sanz C Laorden andP G Bringas ldquoCountering entropy measure attacks onpacked software detectionrdquo in Proceedings of the 9th In-ternational Conference on Consumer Communications andNetworking pp 164ndash168 Las Vegas NV USA January 2012
[10] H Won S-P Kim S Lee M-J Choi and Y-S MoonldquoSecure principal component analysis in multiple distributednodesrdquo Security and Communication Networks vol 9 no 14pp 2348ndash2358 2016
[11] S Cesare Y Xiang and W Zhou ldquoMalwisendashan effective andefficient classification system for packed and polymorphic
Table 8 Restoration accuracy according to the presence of the reloc section
SectionASPack () Packman
() Mpres () Yodarsquoscrypter ()
PECompact()
YodarsquosProtector
()
exe32pack()
WinUpack()
times times times times times times times times
text 000 100 998 100 000 100 000 100 000 100 000 100 000 100 000 100
data 000 100 881 100 000 100 004 100 000 100 005 100 100 100 000 100
Security and Communication Networks 15
malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013
[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008
[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008
[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017
[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008
[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean
[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007
[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008
[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013
[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe
[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009
[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008
[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid
[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml
[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml
[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006
[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017
[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010
[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007
[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007
[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016
[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016
[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio
[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014
[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581
[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip
[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip
[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992
[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml
[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https
wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one
[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011
[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016
16 Security and Communication Networks
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
especially for Unpack and PESpin packers In the case of DP-NEP however we cannot find the EP section itself andaccordingly we cannot proceed the next phases ofunpacking and verification More precisely unpacking suchDP-NEP packers might be possible if we manually modifythe raw hexadecimal codes by using a binary editorHowever this manual raw datamodification is beyond scopeof the paper and we do not consider the manual modifi-cation of hexadecimal codes In summary we use theconcept of ldquoNo EP Sectionrdquo in the detection phase of Al-gorithm 1 to increase the packing detection rate but we donot consider it in the unpacking and verification phases
Second in DP-SIG we investigate whether or not thesignature extracted from the EP section is in the signaturedatabase We construct the signature database with a total of7 000 signatures consisting of 4 200 signatures provided byPEiD [23] and 2 800 signatures collected from BobSoft [36]
Figure 2 shows three sample signatures of the packers storedin the database e first line of each packer represents thetype and version of the packer and the maker of the sig-nature the second line is the hexadecimal values of thepackerrsquos signature and the third line of the ep_only columnis true or false that indicates whether the signature can befound in the EP section
ird in DP-WR we check the WRITE attribute of theEP section In the process of unpacking a packed file thesystem needs to modify the memory so the packed fileabsolutely requires the WRITE attribute us we suspectthat the file is packed if theWRITE attribute exists Howeversince a normal file can also have the WRITE attribute wecannot determine packing only by the existence of theWRITE attribute erefore as proposed in [16] we detectpacking using the WRITE attribute and the entropy valuetogether
Fourth in DP-ENT we compute the entropy of an inputPE file using equation (1) and check whether that entropy isin a specific range (Packing-Range in Algorithm 1) is isbased on the previous observation [16 17] that the entropyof a packed file differs from that of an ordinary file Wecompute the entropy value for the EP section only not theentire file using equation (1) To use DP-ENT we need to setPacking-Range used for determining whether or not a file ispacked In this paper we set this range through the ex-periment in Section 53
42 Unpacking Phase As mentioned in Section 22unpacking which is performed automatically without hu-man intervention can be divided into static unpacking anddynamic unpacking Static unpacking is based on thecharacteristics of the packing algorithm so it is fast and hasno risk of infection because of no direct execution of thepacked file On the contrary dynamic unpacking has acharacteristic that it does not depend on a specific packingalgorithm even though it takes a long analysis time In thispaper we use both static and dynamic methods to maximizethe unpacking accuracy More specifically if the packer typefound in the detection phase provides an unpacking librarywe use the static unpacking otherwise we perform the
Detection phase
Unpacking phase
Verification phase
① Input PE file
② Packing detection
④ Static unpacking ⑤ Dynamic unpacking
Yes
No
③ Unpacking library
Yes No
⑥ Verification
Figure 1 Overall working mechanism of the all-in-one unpacking system
Table 3 Existing or proposed techniques used in each phase of theall-in-one unpacking system
Analyticalphase Techniques used or proposed
Detection
(i) EP Section test to be proposed in Section 41(ii) Signature test [23]
(iii) WRITE attribute test [16](iv) Entropy test [17]
Unpacking(i) Staticmdashlibrary-based unpacking [33]
(ii) Dynamicmdashentropy change-based unpacking[14 27]
Verification Verification algorithm to be proposed in Section43
Table 4 Detection techniques used in existing and proposedapproaches
Method DP-NEP DP-SIG DP-WR DP-ENTPEiD [23] times times times
[17] times times times [16] times times Proposed approach
Security and Communication Networks 5
dynamic unpacking We can easily solve the static unpackingby using the found unpacking library for each packer so wefocus on explaining how the dynamic unpacking works indetail in this section Our dynamic unpacking method is alsoknown as the ldquoexecute-after-writerdquomethodMore precisely theproposed method first restores the original file into thememory by finding the OEP from the packed file and thenanalyzes (ie executes) the restored original file in thememoryIn order to avoid the infection of realmachines we perform theunpacking and validation phases in the debugging mode
e OEP is the address of the first instruction to executeafter the packed file is unpacked and it represents the entrypoint of the original program In other words since the codesfollowing the OEP represent the starting codes of the originalprogram finding such OEP is the most important issue indynamic unpacking To find the OEP we use the entropychange-based analysis proposed by Bat-Erdene et al [27]When a packed file is executed the packed data is uncom-pressed and written to the memory and at this time the datamight be changed or added to the file Because of this datachange the entropy value continuously changes during theunpacking process If unpacking is successful the entropy valueis stabilized the original file is restored and the IP (instructionpointer) is moved to another section of the restored original filefor its execution us we can find the OEP by examiningwhether the IP moves to another section when there is nochange in the entropy value of each section (If the heuristic ofinvestigating change of IP sections is known someone maycreate a malicious packing method that can evade the
heuristic However for this avoidance the OEP must befound without change of IP sections and packing a PE filesuch way might not be easy According to the actualanalysis result we do not find such cases in the 19 packerswe covered Even though the case hardly exists it mighthave occurred in the worst case We leave the advancedapproach that works against such avoidance as a furtherstudy) To accomplish this we need to execute the targetprocess in units of instructions and monitor the entropychange however measuring the entropy value after everyinstruction is very time-consuming and very inefficientWe here note that the OEP generally exists after a branchinstruction so we calculate the entropy value only whenthe instruction is JMP Conditional JMP (CJMP) orRETN We also note that the iterations at program exe-cution are independent of the OEP so we store and reusethe entropy values measured after JMP CJMP and RETNinstructions to avoid duplicated calculations by the it-erations and reduce the total analysis time
Algorithm 2 shows the proposed dynamic unpackingalgorithm based on the entropy changes and it returnsthe OEP address for a given PE file First it computes theinitial entropy of each section (Line 1) en it repeatsthe algorithm until the process ends (Lines 2 to 18) If thecurrent IP address is greater than the last address of thecurrent section it returns Not-Found (Line 4) is isbecause IP moves to the next section without a JMPfamily instruction so it cannot find OEP in this sectionOn the contrary if IP is one of JMP CJMP or RETN itexecutes the file up to IP and stores the address of thenext instruction in DstAddr (Lines 5 to 7) But if IP existsin IP-history or DstAddr exists in DstAddr-history itmeans an iterative instruction or address so we move tothe next IP (Line 8) Otherwise it stores IP and DstAddrinto each history for the next iteration (Line 9) At thistime if DstAddr is out of the range of the file it returnsNot-Found because it cannot find OEP within the file(Line 10) If DstAddr is in the normal range it calculatesthe entropy of each section again and compares thecurrent entropy with the previous entropy to checkwhether it is stable (Lines 11 to 13) If the entropy changeis stable and DstAddr is in a different section it meansthat unpacking a file is successful and IP is moved toanother section by a JMP family instruction us it
[Code-Lock vxx]
signature = 47 8B C2 05 1E 00 52 8B D0 B8 02 3D CD 21 8B D8 5A
ep_only = true
[CodeCrypt v014b]
signature = E9 C5 02 00 00 EB 02 83 3D 58 EB 02 FF 1D 5B EB 02 0F C7 5F
ep_only = true
[CodeCrypt v015b]
signature = E9 31 03 00 00 EB 02 83 3D 58 EB 02 FF 1D 5B EB 02 0F C7 5F
ep_only = true
Figure 2 Examples of packer signatures stored in the signaturedatabase
Input A PE fileOutput Packed or Not-Packed Packing detection resultbegin
(1) if No EP Section then return Packed DP-NEP(2) else if Packer Signature Found then return Packed DP-SIG(3) else if ldquoWRITErdquo enabled and EP Section Entropy isin Packing-Range then(4) return Packed DP-WR DP-ENT(5) else return Not-Packed(6) end-if
end
ALGORITHM 1 Packing detection
6 Security and Communication Networks
stops the execution and returns DstAddr as OEP (Lines13 and 14)
43 Verification Phase In this section we present an al-gorithm for measuring the restoration accuracy by com-paring the restored file with its original file To execute thepacked file the proposed system uses dynamic unpackingand finds the OEP during its execution If the system findsthe OEP we assume the restoration of an original file issuccessful and regard that file as the restored file In theverification phase we quantitatively measure the restorationaccuracy by comparing such restored file and its original file
Figure 3 shows an example of comparing some bytecodesof the original file with those of the restored file For thecomparison we first pack the pspasswdexe file provided byMicrosoft [37] and then unpack it again by dynamicunpacking e figure shows that the bytecodes of the foursections of the original file text rdata data and rsrc areidentically appeared in UPX0 UPX1 and rsrc sections ofthe restored file at is even if the name and location ofeach section are different the same bytecodes exist in therestored file However the restored file contains padding andgarbage values and some bytecodes may be in other loca-tions in the original file us in order to compare theoriginal and the restored files we need to search where theoriginal bytecodes are located in the restored file Sincemanually comparing the bytecodes of the original and re-stored files takes a very long time in this paper we proposean efficient search algorithm using a hash function We usethe well-known MD5 [38] as a hash function of the searchalgorithm As a message reduction algorithm MD5 is widelyused in checking the integrity of long data
Algorithm 3 shows the restoration accuracy calculationalgorithm Its basic principle is to check whether eachsection of the original file exists in the restored file We hereuse the MD5 hash function to speed up the scan e inputsto the algorithm are an original file A and a restored file Band the output is restorationAccuracy a list of storing res-toration accuracy values of all sections First we removegarbage and padding values from the restored file (Line 1)For each section Ai of A we obtain the length curLen of thesection and calculate a hash value H1 by applying the MD5hash function (Lines 4 to 7) We then calculate another hashvalue H2 from the recovered file B by considering only thelength of curLen (Line 9) If H1 equals to H2 (Line 10) itmeans that B contains Ai so we stop the comparison andcalculate the restoration accuracy of the section (Line 14)However if H1 and H2 are not equal we perform the hashvalue comparison again by decrementing the length curLenof Ai (Lines 6 to 13) We repeat this process to calculate therestoration accuracy for the section Ai (Line 14) Aftercompleting the accuracy calculation for all sections thealgorithm returns the list restorationAccuracy (Line 16)
5 Performance Evaluation
51 Experimental Environment In this section we presentthe results of experimental evaluation to show the superi-ority of the all-in-one unpacking system We conduct theexperiments in the virtual server as well as the physicalserver e reasons for experimenting in two server envi-ronments are as follows First unpacking results may bedifferent between the virtual server and the physical server ifa packed file has anti-unpacking features such as Anti-VM orAnti-Debug Second the dynamic unpacking of malware
Input A PE fileOutput OEP the start address of the unpacked original filebegin
(1) Measure the entropy value of each section(2) while the process is not reached to the end do(3) IP⟵ the current instruction pointer(4) if IPrsquos address gt current sectionrsquos last address then return Not-Found(5) else if IP is JMP or CJMP or RETN then(6) Execute instructions until IP(7) DstAddr⟵ the next instruction address(8) if IP notin IP-Histroy and DstAddr notin DstAddr-History then(9) Store IP and DstAddr into each history(10) if DstAddr is out of the file then return Not-Found(11) else(12) Re-calculate the entropy value of each section(13) if Entropy change is stable and DstAddr is in the different section then(14) return DstAddr(15) end-if(16) end-if(17) end-if(18) end-while(19) return Not-Found
end
ALGORITHM 2 Dynamic unpacking algorithm
Security and Communication Networks 7
files may infect the physical server us we performunpacking first in the virtual server and then perform theexperiment in the physical server only when there is noabnormality e hardware specifications of the experi-mental platform are as follows e virtual server is IntelCore i7-6700 340GHz with 8GB RAM and the physical
server is Intel Core i7-6700 340GHz with 32GB RAM etarget operating system is Windows 7 OS 32Bit and we usePEVIEW to view the structure and content of 32 bit PE filesAs the debugger we use Immunity Debugger 185 [25] toimplement the dynamic unpacking phase of Algorithm 2 Toimplement the all-in-one unpacking system including the
text section
rdata section
data section
rsrc section
0000040000000410000004200000043000000440
00012C0000012C1000012C2000012C3000012C40
0001F0000001F0100001F0200001F0300001F040
0002040000020410000204200002043000020440
5566D10F5E
904E00185A
D028209890
0010000100
8B8BFAB75D
FFFF00FEFE
414345464A
0000000000
EC022B11C3
0101000101
4141514141
0000000000
8B83F18DCC
0000000000
0000000000
0000000000
4DC28D49CC
7C40
DC2662
5080404818
0020005000
0C024202CC
FFFFFDFEFE
424445474B
0000000000
8B660166CC
0101010101
4141414141
0000000001
D185EB89CC
0000000000
0000000000
0080008000
56C00754CC
6C00F03ACC
88BCB02850
0018000001
8D758D0ECC
FF00FDFEFD
424445494B
0000000000
72F5A4FECC
0100010101
4141414141
0000000000
022B2466CC
0000000000
0000000000
0000000000
8DD60085CC
5E26024A7C
C0F420B080
0038000068
648B00D2CC
FFFFFEFEFE
424446494B
0000000000
24750075CC
0101010101
4141414141
0200100000
000800F0CC
0000000000
0000000000
0080000080
(a)
UPX0 section
UPX1 section
UPX1 section
rsrc section
0040100000401010004010200040103000401040
0041FB0C0041FB1C0041FB2C0041FB3C0041FB4C
0042A0000042A0100042A0200042A0300042A040
0042100000421010004210200042103000421040
5566D10F5E
904E00185A
0010000100
D028209890
8B8BFAB75D
FFFF00FEFE
0000000000
414345464A
EC022B11C3
0101000101
0000000000
4141514141
8B83F18DCC
0000000000
0000000000
0000000000
4DC28D49CC
7C40
DC2662
0020005000
5080404817
0C024202CC
FFFFFDFEFE
0000000000
424445474B
8B660166CC
0101010101
0000000001
4141414141
D185EB89CC
0000000000
0080008000
0000000000
56C00754CC
6C00F03ACC
0018000001
88BCB02850
8D758D0ECC
FF00FDFEFD
0000000000
424445494B
72F5A4FECC
0100010101
0000000000
4141414141
022B2466CC
0000000000
0000000000
0000000000
8DD60085CC
5E26024A7C
0038000068
C0F420B080
648B00D2CC
FFFFFEFEFE
0000000000
424446494B
24750075CC
0101010101
0200100000
4141414141
000800F0CC
0000000000
0080000080
0000000000
(b)
Figure 3 Comparison of sample bytecodes of the original and restored files (a) Sections of the original file (b) Sections of the restored file
Input A an original fileB a restored file
Output restorationAccuracy[1n] a list of restoration accuracy values for n sectionsbegin
(1) Remove garbage and padding values from B(2) A⟵ A1 A2 An1113864 1113865 where Ai is the i-th section of A(3) for each section Ai isin A do(4) totalLen⟵ Ailength(5) curLen⟵ totalLen(6) while curLen gt 0 do(7) H1⟵ hash (Ax[1 curLen]) hash a part of section Ai
(8) for j⟵ 1 to (Blength minus curLen + 1) do(9) H2⟵ hash (B[j j + curLen minus 1]) hash a part of B(10) if H1 H2 then break(11) else curLen⟵ curLen minus 1(12) end-for(13) end-while(14) restorationAccuracy[i] ⟵ (curLentotalLen) times 100(15) end-for(16) return restorationAccuracy
end
ALGORITHM 3 Restoration rate calculation algorithm
8 Security and Communication Networks
dynamic unpacking algorithm we run Python scripts on topof Immunity Debugger
In the experiment we use a total 130 PE files where 90files are provided by Microsoft [37] and 40 files are ran-domly collected from putty [39] and MD5 [40] sites Weperform packing of 130 executable files using 19 packersrespectively and construct a dataset consisting of a total of2600 packed files including original 130 filese 19 packersused are UPX ASPack NSPack MPRESS Yodarsquos ProtectorMEW Packman RLPack BeRoEXEPacker PECompactPetite JDpack Molebox eXpressor Yodarsquos Crypter FSGexe32pack WinUpack and Neolite e reason for con-figuring the dataset by packing the normal PE files directly isthat we can quantitatively measure the packing detectionaccuracy the false-positive rate and the false-negative rateonly if the experimental files are packed in advance
In this paper we conduct three experiments e firstexperiment is for an integration feature of the all-in-onesystem which verifies that the system supports all threephases of packing detection unpacking and verification inan integrated framework e second experiment is for thedetection phase where we first set Packing-Range an en-tropy range for judging the packing and then evaluate thepacking detection accuracy using the range e third ex-periment is for the unpacking and verification phases wherewe verify whether the system unpacks the PE file correctly bycalculating the restoration accuracy
52 Experimental Results on Overall Working MechanismIn this section we confirm through experiments that theintegration mechanism of all-in-one unpacking systemworks correctly Figure 4 shows the operation screenshot ofeach phase of the all-in-one system First Figure 4(a) showsan example screenshot of the detection phase To confirmall four cases of packing detection work correctly even if wedetect the packing by the previous case(s) we continueexecuting the remaining cases In Figure 4(a) we firstexamine DP-NEP of Algorithm 2 We detect the EP sectionin DP-NEP so we proceed to DP-SIG (the first two lines)By the DP-SIG test we detect the UPX packer so weconclude that the file is packed (the second two lines) Eventhough detection is completed we proceed to the next DP-ENT and DP-WR tests e entropy value is 793 which iscontained in Packing-Range and the attribute value of theEP section is ldquo0xe0000040rdquo which includes the WIRTEattribute us we conclude that the file is also packed byDP-ENT and DP-WR tests (the third three lines)According to Figure 4(a) we can confirm that DP-NEPDP-SIG DP-ENT and DP-WR tests of the detection phaseall work correctly
Figures 4(b) and 4(c) show example screenshots of theunpacking phase In the detection phase if an unpackinglibrary of the packer detected exists we perform staticunpacking as shown in Figure 4(b) if not we do dynamicunpacking as shown in Figure 4(c) Figure 4(b) shows theresult of static unpacking by the UPX commercial tool [33]for the UPX packed file In the figure the file size increasesfrom 285 760 bytes to 731 200 bytes by unpacking and it
means that the compression ratio is 3908 Figure 4(c)shows the result of the entropy-based dynamic unpackingIn the figure the system calculates the entropy of eachsection if IP encounters JMP and CJMP instructions itreturns the current DstAddr as OEP if the entropy change isstable and DstAddr is in the different section
Figure 4(d) shows an example screenshot of the verifi-cation phase In this example we measure the restorationaccuracy of the restored file the text and data sections of theoriginal filee length of the original text section is 18 439and the length of the unpacked file is 872 555 In the hashvalue comparison since the portion corresponding to theoriginal text section exists in the unpacked file the resto-ration accuracy becomes 100 Similarly we also calculatethe restoration accuracy of the data section as 100
53 Experimental Results onDetection Phase In this sectionwe describe two experimental results of the detection phasee first experiment is to set Packing-Range and the secondis to measure the detection accuracy of the proposed de-tection algorithm In the first experiment we experimentallydetermine Packing-Range used in DP-ENT of Algorithm 1e previous study by Lyda and Hamrock [17] uses theentropy value of the whole file and sets the entropy range ofthe packed file to [6677 6926] On the contrary Han andLee [16] focus on the EP section only and compute theentropy value for the EP section instead of the whole filerough the experiment they set the entropy value to 685and judge a file packed if its entropy is over 685
We also determine the entropy range Packing-Rangethrough the experiments on various packers Figure 5 showsthe entropy distribution of the EP section for differentpackers Figure 5(a) shows the average distribution of theoriginal and 19 packed files Figures 5(b)ndash5(m) show themore detailed entropy distribution for each packerAccording to Figures 5(b)ndash5(m) the entropy distribution ofthe packed file differs depending on the packer type Inparticular as shown in Figures 5(d) 5(e) and 5(k) somepacked files have smaller entropy values than the originalfile and thus we cannot simply set an upper threshold valueas a packed criterion as in the previous study [16] Howeverwe note that while the entropy of packer files distributesover a wide range as shown in Figures 5(c)ndash5(m) the en-tropy of original files distributes only in a specific range asshown in Figure 5(b) More specifically the entropy value ofthe normal file is between 600 and 685 as shown inFigure 5(b) Based on these experimental results we thus setPacking-Range to [0 600) or [685infin] of the EP sectionAccording to [9 29] the entropy method has the disad-vantage of being low accuracy and easy evading To over-come this point we use three other detection techniques ofDP-NEP DP-SIG and DP-WR together with DP-ENT in ahybrid manner Using this hybrid approach we can furtherimprove the detection accuracy
In the second experiment we evaluate the detectionaccuracy and false-negativefalse-positive rates Table 5compares the three detection techniques of the detectionphase with the proposed hybrid approach Since EP sections
Security and Communication Networks 9
(a) (b)
(c) (d)
Figure 4 Operation screenshots of detection unpacking and verification phases (a) Detection phase (b) Unpacking phase (staticunpacking) (c) Unpacking phase (dynamic unpacking) (d) Verification phase
0
50
100
150
200
250
300
350
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Not packed (b)Summation of (c) to (m)
Num
ber o
f file
s
(a)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(b)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(c)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(d)
Figure 5 Continued
10 Security and Communication Networks
Num
ber o
f file
s0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(e)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(f )
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(g)
Num
ber o
f file
s
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(h)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(i)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(j)
Num
ber o
f file
s
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(k)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(l)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(m)
Figure 5 Entropy distributions of the EP section for original and representative packed files (a) Average of original and 19 packers(b) Original (not packed) (c) UPX (d) ASPack (e) NSPack (f ) MPRESS (g) Yodarsquos Protector (h) RLPack (i) BeroEXE (j) MEW(k) PACKMAN (l) WinUpack (m) exe32pack
Security and Communication Networks 11
are mandatory in all PE files we can detect EP sectionsunless intentionally concealed All files of the dataset used inthe experiment also have EP sections and the detection rateby DP-NEP is trivially 0 In other words there is nopacking detection case due to absence of the EP section sowe exclude the experimental result of DP-NEP from Table 5
We now explain the results of Table 5 by each columnFirst we explain the column of DP-SIG In the case of DP-SIG Not-Packed shows a detection rate of 0 since nosignature exists in normal files is is because packingdetection does not occur in normal files which means thatDP-SIG performs an accurate detection for Not-Packed filesOn the other hand for the packed files DP-SIG detects 100of packers except NSPack e detection rate of NSPack ismerely 538 because of the nature of NSPack that thesignatures often exist in different sections rather than the EPsection In summary DP-SIG shows 950 of detectionaccuracy on average
Second for the experiment on the detection rate of DP-WR we intentionally remove the WRITE attribute from 39files (30 of packed files) per packer We perform thisremoval process in 18 packers except exe32pack e reasonfor changing the dataset is that normal packed files usuallyhaveWRITE attributes so we arbitrarily remove theWRITEattribute from 30 of files for the DP-WR experimentLooking at the DP-WR column in Table 5 DP-WR detectsonly 70 files per packer for 18 packers since we in-tentionally remove the WRITE attribute from 30 files Incase of exe32pack however we need not perform the re-moval process since it has no WRITE attribute in the PEheader And accordingly as shown in the column DP-WR ofTable 5 the detection rate of exe32pack by DP-WR istrivially 0 In the case of Not-Packed the detection rate is0 since normal files have noWRITE attribute In summary
checking the presence of theWRITE attribute is very simpleand DP-WR works correctly if the packed file has theWRITE attribute but it does not work well if the WRITEattribute does not exist or maliciously modified
ird the column of DP-ENT shows the result of en-tropy-based detection rates As shown in the table it showsthe 100 detection rate for 9 packers including NSPackUPX MEW and BeRoEXEPacker but it incurs some falsenegatives for 10 packers including ASPack Yodarsquos ProtectorPackman and MPRESS by showing the detection rate lowerthan 100 In particular for MPRESS Petite JDpackeXpressor and Neolite it shows a low detection rate below246 because their entropy value in the packer charac-teristic is in entropy distribution of the normal file DP-ENTalso has a false-positive rate of 384 for Not-Packed Insummary DP-ENT works very accurately for most packersbut it may incur false negatives for a few specific packers orfalse positives for some normal files To overcome thislimitation we use DP-ENT together with other detectiontechniques We can see the improvement through the ex-perimental results of Tables 5 and 6 In the tables when usingonly DP-ENT the detection accuracy is low in some packers(especially MPRESS) but when using it together with theother three techniques the accuracy is close to 100
Fourth the proposed hybrid approach of integratingfour techniques detects 100 of all packers except NSPackand shows no false positives e reason for this high de-tection rate is that the proposed method has all the ad-vantages of DP-SIG DP-WR and DP-ENT as well as DP-NEP In case of NSPack however the detection rate ismerely 700 this is because as mentioned above weforcedly remove the WRITE attribute from the 30 fileAccording to the results of Table 5 detection rate differsaccording to each detection technique and some techniques
Table 5 Comparison of packing detection rates of three techniques and the proposed hybrid approach
PackerTechnique
DP-SIG () DP-WR () DP-ENT () Hybrid approach ()ASPack 100 700 915 100NSPack 538 700 100 700MPRESS 100 700 246 100UPX 100 700 100 100Yodarsquos Protector 100 700 969 100MEW 100 700 100 100BeRoEXEPacker 100 700 100 100Packman 100 700 938 100RLPack 100 700 993 100PECompact 100 700 100 100Petite 100 700 000 100JDpack 100 700 562 100Molebox 100 700 100 100eXpressor 100 700 000 100Yodarsquos Crypter 100 700 915 100FSG 100 700 100 100exe32pack 100 000 100 100WinUpack 100 700 100 100Neolite 100 700 161 100Average 950 663 739 984Not-Packed 000 000 384 000
12 Security and Communication Networks
incur false positives or false negatives on the other hand theproposed hybrid shows the highest detection rate of 984and no false positives
Table 6 shows the detection accuracy of two or threecombinations of detection techniques and of the proposedhybrid approach First DP-SIG ENT a combination ofsignature and entropy tests detects 100 of the packed filebut detects 18 files incorrectly showing a false-positive rateof 073 Second DP-ENTWR a combination of entropyand WRITE tests shows a low detection rate of 920because the entropy value of MPRESS is in the distributionof normal file ird DP-SIG ENT WR a combination ofsignature entropy and WRITE tests and our hybrid ap-proach show 984 detection accuracy and 160 false-negative rate is is because DP-SIG ENT WR and thehybrid approach go through the same detection procedureexcept DP-NEP e false-negative rate of 160 is due tothe files having no WRITE attributes According to theresults of Tables 5 and 6 we believe that the proposed hybridapproach is an excellent method for high detection accuracyby combining the advantages of the existing and new de-tection techniques
54 Experimental Results on Unpacking and VerificationPhases In this section we verify the unpacking and veri-fication phases by performing the unpacking of Algorithm 2and calculating the restoration accuracy of the files throughthe actual restoration of Algorithm 3 However unpackingand verification phases take a lot of time so we experimentwith randomly selected 40 files of the dataset eunpacking phase consists of static unpacking and dynamicunpacking In the experiment we exclude static unpackingsince it uses the existing library as it is and focus on onlydynamic unpacking of Algorithm 2 since it tries to restorethe original file without any information of packers
Table 7 shows the restoration accuracy of the proposedmethod for text and data sections for each of the 19 packersWe cannot directly unpack Yodarsquos Protector and PECom-pact since they use Anti-Debug engineering techniques Toovercome this problem we unpack these packers bybypassing three Anti-Debugging API calls GetCurrent-ProcessID() BlockInput() and IsDebuggerPresent() [41]at is we run Algorithm 2 with the bypassing techniquesfor Yodarsquos Protector and PECompact and include its ex-perimental result in Table 7) In general since only text anddata sections of the original file are packed [42] in theexperiment we consider only the restoration accuracy ofthese text and data sections of the restored file In Table 7among 19 packers 11 packers including UPX NSPackMEW RLPack and BeRoEXEPacker show at 100 resto-ration accuracy for both text and data sections On the other
hand 8 packers including ASPack Packman and MPRESSshow less than 100 restoration accuracy because of themanner in which the packers handle the reloc section Morespecifically if there is a reloc section in the original file theunpacking relocates the text and data sections changing allthe data values in each section erefore the restorationaccuracy depends on whether there is a reloc section in theoriginal file
Table 8 shows the restoration results in detail accordingto the presence of the reloc section As shown in the tablethe restoration accuracy is 100 in the absence of the relocsection while it is very low in the range of 000 to 998 inthe presence of the reloc section However the low resto-ration accuracy is due to the nature of the packer and it doesnot mean that unpacking does not work correctly In otherwords even if the unpacking is successful the data valuemay change due to the packer characteristic and the res-toration accuracy is calculated to be low In summary wecan use the proposed verification algorithm to calculate therestoration accuracy of unpacked files which can quanti-tatively measure the accuracy of the actual restoration
In addition to text data and reloc sections we furtherconsider the PE header which includes the IAT informationIn general the IAT is located in the header of the PE file andif we actually need to run the unpacked file we must alsorestore this header information including the IAT In factthe study of [3 43] restores the header including the IATandproduces the actual whole executable file On the other handmost studies including this paper do not restore the actualwhole executable file itself but restore text data and relocsections which are major parts of the executable file is isbecause the purpose of unpacking is not to execute therestored file but to identify and extract malicious codesincluded in the restored file erefore in this paper we donot need to restore the header required for the actual ex-ecution and accordingly comparison of IAT information isalso not necessary Instead we restore and compare textdata and reloc sections that are more likely to hidemalicious code Restoring the PE header is beyond the scopeof this study we leave it as a separate research topic
6 Conclusions
In this paper we analyzed recent studies of packing de-tection and unpacking techniques and derived four im-portant observations no phase integration no detectioncombination no real-restoration and no unpacking verifi-cation We then proposed an all-in-one unpacking system toimprove these four observations First no phase integrationwas a difficulty in which analysts had to perform packingdetection unpacking and verification phases separately andwe solved this problem by presenting an all-in-one
Table 6 Comparison of packing detection accuracy of two or three combinations and of the hybrid approach
Metrics DP-SIG ENT () DP-ENT WR () DP-SIG ENT WR () Hybrid approach ()Detection accuracy 950 920 984 984False-negative rate 500 800 160 160False-positive rate 073 000 000 000
Security and Communication Networks 13
Tabl
e7
Restorationaccuracy
oftext
anddatasections
ofeach
packer
Section
UPX ()
NSP
ack
()
MEW ()
RLPa
ck(
)Be
RoEX
E(
)Neolite
()
FSG
()
eXpressor
()
Molebox
()
Petite
()
JDpack
()
ASP
ack
()
Yodarsquos
protector
()
exe32p
ack
()
MPR
ESS
()
Yodarsquos
crypter
()
PECom
pact
()
WinUpack
()
Packman
()
Average
()
text
100
100
100
100
100
100
100
100
100
100
100
843
840
808
788
745
740
680
550
892
data
100
100
100
100
100
100
100
100
100
100
100
843
840
100
753
745
740
680
485
902
14 Security and Communication Networks
unpacking system of integrating these three phases in anintegrated framework Second no detection combinationwas due to that there was no attempt to combine variouspacking detection methods and we solved this problem bypresenting a hybrid approach of combining existing and newpacking detection methods ird no real-restoration camefrom that there was little discussion about restoring theactual executable file and we solved this problem by actuallyrestoring an original image of the packed file Fourth nounpacking verification was due to that there was no explicitwork to measure the restoration accuracy of unpacked filesand we solved this problem by proposing a verificationalgorithm of quantitatively evaluating the restorationaccuracy
We constructed a dataset of 2600 PE files and experi-mentally evaluated the proposed system using the datasetFirst we performed all the phases of the all-in-oneunpacking system sequentially and confirmed that its overallworking mechanism worked well Next through the com-parison of the hybrid packing detection approach andexisting detection techniques we showed that the detectionaccuracy of the proposed method was the highest at 984on average with no false positives For this we also de-termined an entropy range of packing Packing-Rangethrough extensive experiments on 19 representative packersFinally in the unpacking and verification phases we con-firmed that unpacking was successful for all files except thefiles packed with Yodarsquos Protector We also showed that therestoration accuracy of unpacked files was nearly 100except for a few packers
As the future research we will focus on unpackingpacked files with Anti-VM Anti-Debug and Anti-Reverseengineering techniques We will also study how to computethe restoration accuracy of the file having reloc sections
Data Availability
All the data files used in the experiments are available athttpsgithubcomchesvectainPackingData
Conflicts of Interest
e authors declare that they have no conflicts of interest
Acknowledgments
is work was partly supported by the Institute of In-formation and Communications Technology Planning andEvaluation (IITP) grant funded by the Korea government(MSIT) (no 2017-0-00158 Development of Cyber reat
Intelligence (CTI) analysis and information sharing tech-nology for national cyber incident response) and the KoreaElectric Power Corporation (no R18XA05)
References
[1] M Morgenstern ldquoAV-TEST Security report 201718rdquo 2019httpswwwav-testorgfileadminpdfsecurity_reportAV-TEST_Security_Report_2017-2018pdf
[2] T Ban R Isawa S Guo D Inoue and K Nakao ldquoEfficientmalware packer identification using support vector machineswith spectrum kernelrdquo in Proceedings of the Eighth Asia JointConference on Information Security pp 69ndash76 Seoul SouthKorea July 2013
[3] B Cheng M Jiang J Fu et al ldquoTowards paving the way forlarge-scale Windows malware analysis generic binaryunpacking with orders-of-magnitude performance boostrdquo inProceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security pp 395ndash411 TorontoCanada October 2018
[4] A Gupta andM S Arya ldquoHashing based encryption and anti-debugger support for packing multiple files into single exe-cutablerdquo International Journal of Advanced Research inComputer Science vol 9 no 1 pp 914ndash920 2018
[5] Y Kawakoya M Iwamura and M Itoh ldquoMemory behavior-based automatic malware unpacking in stealth debuggingenvironmentrdquo in Proceedings of the 5th International Con-ference on Malicious and Unwanted Software pp 39ndash46Lorraine France October 2010
[6] G Liang J Pang Z Shan R Yang and Y Chen ldquoAutomaticbenchmark generation framework for malware detectionrdquoSecurity and Communication Networks vol 2018 Article ID4947695 8 pages 2018
[7] A Pektas and T Acarman ldquoA dynamic malware analyzeragainst virtual machine aware malicious softwarerdquo Securityand Communication Networks vol 7 no 12 pp 2245ndash22572014
[8] I Santos J Nieves and P G Bringas ldquoSemi-supervisedlearning for unknown malware detectionrdquo in InternationalSymposium on Distributed Computing and Artificial In-telligence A Abraham J M Corchado S R Gonzalez andJ F De Paz Santana Eds pp 415ndash422 Springer BerlinGermany 2011
[9] X Ugarte-Pedrero I Santos B Sanz C Laorden andP G Bringas ldquoCountering entropy measure attacks onpacked software detectionrdquo in Proceedings of the 9th In-ternational Conference on Consumer Communications andNetworking pp 164ndash168 Las Vegas NV USA January 2012
[10] H Won S-P Kim S Lee M-J Choi and Y-S MoonldquoSecure principal component analysis in multiple distributednodesrdquo Security and Communication Networks vol 9 no 14pp 2348ndash2358 2016
[11] S Cesare Y Xiang and W Zhou ldquoMalwisendashan effective andefficient classification system for packed and polymorphic
Table 8 Restoration accuracy according to the presence of the reloc section
SectionASPack () Packman
() Mpres () Yodarsquoscrypter ()
PECompact()
YodarsquosProtector
()
exe32pack()
WinUpack()
times times times times times times times times
text 000 100 998 100 000 100 000 100 000 100 000 100 000 100 000 100
data 000 100 881 100 000 100 004 100 000 100 005 100 100 100 000 100
Security and Communication Networks 15
malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013
[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008
[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008
[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017
[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008
[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean
[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007
[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008
[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013
[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe
[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009
[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008
[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid
[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml
[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml
[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006
[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017
[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010
[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007
[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007
[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016
[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016
[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio
[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014
[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581
[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip
[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip
[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992
[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml
[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https
wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one
[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011
[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016
16 Security and Communication Networks
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
dynamic unpacking We can easily solve the static unpackingby using the found unpacking library for each packer so wefocus on explaining how the dynamic unpacking works indetail in this section Our dynamic unpacking method is alsoknown as the ldquoexecute-after-writerdquomethodMore precisely theproposed method first restores the original file into thememory by finding the OEP from the packed file and thenanalyzes (ie executes) the restored original file in thememoryIn order to avoid the infection of realmachines we perform theunpacking and validation phases in the debugging mode
e OEP is the address of the first instruction to executeafter the packed file is unpacked and it represents the entrypoint of the original program In other words since the codesfollowing the OEP represent the starting codes of the originalprogram finding such OEP is the most important issue indynamic unpacking To find the OEP we use the entropychange-based analysis proposed by Bat-Erdene et al [27]When a packed file is executed the packed data is uncom-pressed and written to the memory and at this time the datamight be changed or added to the file Because of this datachange the entropy value continuously changes during theunpacking process If unpacking is successful the entropy valueis stabilized the original file is restored and the IP (instructionpointer) is moved to another section of the restored original filefor its execution us we can find the OEP by examiningwhether the IP moves to another section when there is nochange in the entropy value of each section (If the heuristic ofinvestigating change of IP sections is known someone maycreate a malicious packing method that can evade the
heuristic However for this avoidance the OEP must befound without change of IP sections and packing a PE filesuch way might not be easy According to the actualanalysis result we do not find such cases in the 19 packerswe covered Even though the case hardly exists it mighthave occurred in the worst case We leave the advancedapproach that works against such avoidance as a furtherstudy) To accomplish this we need to execute the targetprocess in units of instructions and monitor the entropychange however measuring the entropy value after everyinstruction is very time-consuming and very inefficientWe here note that the OEP generally exists after a branchinstruction so we calculate the entropy value only whenthe instruction is JMP Conditional JMP (CJMP) orRETN We also note that the iterations at program exe-cution are independent of the OEP so we store and reusethe entropy values measured after JMP CJMP and RETNinstructions to avoid duplicated calculations by the it-erations and reduce the total analysis time
Algorithm 2 shows the proposed dynamic unpackingalgorithm based on the entropy changes and it returnsthe OEP address for a given PE file First it computes theinitial entropy of each section (Line 1) en it repeatsthe algorithm until the process ends (Lines 2 to 18) If thecurrent IP address is greater than the last address of thecurrent section it returns Not-Found (Line 4) is isbecause IP moves to the next section without a JMPfamily instruction so it cannot find OEP in this sectionOn the contrary if IP is one of JMP CJMP or RETN itexecutes the file up to IP and stores the address of thenext instruction in DstAddr (Lines 5 to 7) But if IP existsin IP-history or DstAddr exists in DstAddr-history itmeans an iterative instruction or address so we move tothe next IP (Line 8) Otherwise it stores IP and DstAddrinto each history for the next iteration (Line 9) At thistime if DstAddr is out of the range of the file it returnsNot-Found because it cannot find OEP within the file(Line 10) If DstAddr is in the normal range it calculatesthe entropy of each section again and compares thecurrent entropy with the previous entropy to checkwhether it is stable (Lines 11 to 13) If the entropy changeis stable and DstAddr is in a different section it meansthat unpacking a file is successful and IP is moved toanother section by a JMP family instruction us it
[Code-Lock vxx]
signature = 47 8B C2 05 1E 00 52 8B D0 B8 02 3D CD 21 8B D8 5A
ep_only = true
[CodeCrypt v014b]
signature = E9 C5 02 00 00 EB 02 83 3D 58 EB 02 FF 1D 5B EB 02 0F C7 5F
ep_only = true
[CodeCrypt v015b]
signature = E9 31 03 00 00 EB 02 83 3D 58 EB 02 FF 1D 5B EB 02 0F C7 5F
ep_only = true
Figure 2 Examples of packer signatures stored in the signaturedatabase
Input A PE fileOutput Packed or Not-Packed Packing detection resultbegin
(1) if No EP Section then return Packed DP-NEP(2) else if Packer Signature Found then return Packed DP-SIG(3) else if ldquoWRITErdquo enabled and EP Section Entropy isin Packing-Range then(4) return Packed DP-WR DP-ENT(5) else return Not-Packed(6) end-if
end
ALGORITHM 1 Packing detection
6 Security and Communication Networks
stops the execution and returns DstAddr as OEP (Lines13 and 14)
43 Verification Phase In this section we present an al-gorithm for measuring the restoration accuracy by com-paring the restored file with its original file To execute thepacked file the proposed system uses dynamic unpackingand finds the OEP during its execution If the system findsthe OEP we assume the restoration of an original file issuccessful and regard that file as the restored file In theverification phase we quantitatively measure the restorationaccuracy by comparing such restored file and its original file
Figure 3 shows an example of comparing some bytecodesof the original file with those of the restored file For thecomparison we first pack the pspasswdexe file provided byMicrosoft [37] and then unpack it again by dynamicunpacking e figure shows that the bytecodes of the foursections of the original file text rdata data and rsrc areidentically appeared in UPX0 UPX1 and rsrc sections ofthe restored file at is even if the name and location ofeach section are different the same bytecodes exist in therestored file However the restored file contains padding andgarbage values and some bytecodes may be in other loca-tions in the original file us in order to compare theoriginal and the restored files we need to search where theoriginal bytecodes are located in the restored file Sincemanually comparing the bytecodes of the original and re-stored files takes a very long time in this paper we proposean efficient search algorithm using a hash function We usethe well-known MD5 [38] as a hash function of the searchalgorithm As a message reduction algorithm MD5 is widelyused in checking the integrity of long data
Algorithm 3 shows the restoration accuracy calculationalgorithm Its basic principle is to check whether eachsection of the original file exists in the restored file We hereuse the MD5 hash function to speed up the scan e inputsto the algorithm are an original file A and a restored file Band the output is restorationAccuracy a list of storing res-toration accuracy values of all sections First we removegarbage and padding values from the restored file (Line 1)For each section Ai of A we obtain the length curLen of thesection and calculate a hash value H1 by applying the MD5hash function (Lines 4 to 7) We then calculate another hashvalue H2 from the recovered file B by considering only thelength of curLen (Line 9) If H1 equals to H2 (Line 10) itmeans that B contains Ai so we stop the comparison andcalculate the restoration accuracy of the section (Line 14)However if H1 and H2 are not equal we perform the hashvalue comparison again by decrementing the length curLenof Ai (Lines 6 to 13) We repeat this process to calculate therestoration accuracy for the section Ai (Line 14) Aftercompleting the accuracy calculation for all sections thealgorithm returns the list restorationAccuracy (Line 16)
5 Performance Evaluation
51 Experimental Environment In this section we presentthe results of experimental evaluation to show the superi-ority of the all-in-one unpacking system We conduct theexperiments in the virtual server as well as the physicalserver e reasons for experimenting in two server envi-ronments are as follows First unpacking results may bedifferent between the virtual server and the physical server ifa packed file has anti-unpacking features such as Anti-VM orAnti-Debug Second the dynamic unpacking of malware
Input A PE fileOutput OEP the start address of the unpacked original filebegin
(1) Measure the entropy value of each section(2) while the process is not reached to the end do(3) IP⟵ the current instruction pointer(4) if IPrsquos address gt current sectionrsquos last address then return Not-Found(5) else if IP is JMP or CJMP or RETN then(6) Execute instructions until IP(7) DstAddr⟵ the next instruction address(8) if IP notin IP-Histroy and DstAddr notin DstAddr-History then(9) Store IP and DstAddr into each history(10) if DstAddr is out of the file then return Not-Found(11) else(12) Re-calculate the entropy value of each section(13) if Entropy change is stable and DstAddr is in the different section then(14) return DstAddr(15) end-if(16) end-if(17) end-if(18) end-while(19) return Not-Found
end
ALGORITHM 2 Dynamic unpacking algorithm
Security and Communication Networks 7
files may infect the physical server us we performunpacking first in the virtual server and then perform theexperiment in the physical server only when there is noabnormality e hardware specifications of the experi-mental platform are as follows e virtual server is IntelCore i7-6700 340GHz with 8GB RAM and the physical
server is Intel Core i7-6700 340GHz with 32GB RAM etarget operating system is Windows 7 OS 32Bit and we usePEVIEW to view the structure and content of 32 bit PE filesAs the debugger we use Immunity Debugger 185 [25] toimplement the dynamic unpacking phase of Algorithm 2 Toimplement the all-in-one unpacking system including the
text section
rdata section
data section
rsrc section
0000040000000410000004200000043000000440
00012C0000012C1000012C2000012C3000012C40
0001F0000001F0100001F0200001F0300001F040
0002040000020410000204200002043000020440
5566D10F5E
904E00185A
D028209890
0010000100
8B8BFAB75D
FFFF00FEFE
414345464A
0000000000
EC022B11C3
0101000101
4141514141
0000000000
8B83F18DCC
0000000000
0000000000
0000000000
4DC28D49CC
7C40
DC2662
5080404818
0020005000
0C024202CC
FFFFFDFEFE
424445474B
0000000000
8B660166CC
0101010101
4141414141
0000000001
D185EB89CC
0000000000
0000000000
0080008000
56C00754CC
6C00F03ACC
88BCB02850
0018000001
8D758D0ECC
FF00FDFEFD
424445494B
0000000000
72F5A4FECC
0100010101
4141414141
0000000000
022B2466CC
0000000000
0000000000
0000000000
8DD60085CC
5E26024A7C
C0F420B080
0038000068
648B00D2CC
FFFFFEFEFE
424446494B
0000000000
24750075CC
0101010101
4141414141
0200100000
000800F0CC
0000000000
0000000000
0080000080
(a)
UPX0 section
UPX1 section
UPX1 section
rsrc section
0040100000401010004010200040103000401040
0041FB0C0041FB1C0041FB2C0041FB3C0041FB4C
0042A0000042A0100042A0200042A0300042A040
0042100000421010004210200042103000421040
5566D10F5E
904E00185A
0010000100
D028209890
8B8BFAB75D
FFFF00FEFE
0000000000
414345464A
EC022B11C3
0101000101
0000000000
4141514141
8B83F18DCC
0000000000
0000000000
0000000000
4DC28D49CC
7C40
DC2662
0020005000
5080404817
0C024202CC
FFFFFDFEFE
0000000000
424445474B
8B660166CC
0101010101
0000000001
4141414141
D185EB89CC
0000000000
0080008000
0000000000
56C00754CC
6C00F03ACC
0018000001
88BCB02850
8D758D0ECC
FF00FDFEFD
0000000000
424445494B
72F5A4FECC
0100010101
0000000000
4141414141
022B2466CC
0000000000
0000000000
0000000000
8DD60085CC
5E26024A7C
0038000068
C0F420B080
648B00D2CC
FFFFFEFEFE
0000000000
424446494B
24750075CC
0101010101
0200100000
4141414141
000800F0CC
0000000000
0080000080
0000000000
(b)
Figure 3 Comparison of sample bytecodes of the original and restored files (a) Sections of the original file (b) Sections of the restored file
Input A an original fileB a restored file
Output restorationAccuracy[1n] a list of restoration accuracy values for n sectionsbegin
(1) Remove garbage and padding values from B(2) A⟵ A1 A2 An1113864 1113865 where Ai is the i-th section of A(3) for each section Ai isin A do(4) totalLen⟵ Ailength(5) curLen⟵ totalLen(6) while curLen gt 0 do(7) H1⟵ hash (Ax[1 curLen]) hash a part of section Ai
(8) for j⟵ 1 to (Blength minus curLen + 1) do(9) H2⟵ hash (B[j j + curLen minus 1]) hash a part of B(10) if H1 H2 then break(11) else curLen⟵ curLen minus 1(12) end-for(13) end-while(14) restorationAccuracy[i] ⟵ (curLentotalLen) times 100(15) end-for(16) return restorationAccuracy
end
ALGORITHM 3 Restoration rate calculation algorithm
8 Security and Communication Networks
dynamic unpacking algorithm we run Python scripts on topof Immunity Debugger
In the experiment we use a total 130 PE files where 90files are provided by Microsoft [37] and 40 files are ran-domly collected from putty [39] and MD5 [40] sites Weperform packing of 130 executable files using 19 packersrespectively and construct a dataset consisting of a total of2600 packed files including original 130 filese 19 packersused are UPX ASPack NSPack MPRESS Yodarsquos ProtectorMEW Packman RLPack BeRoEXEPacker PECompactPetite JDpack Molebox eXpressor Yodarsquos Crypter FSGexe32pack WinUpack and Neolite e reason for con-figuring the dataset by packing the normal PE files directly isthat we can quantitatively measure the packing detectionaccuracy the false-positive rate and the false-negative rateonly if the experimental files are packed in advance
In this paper we conduct three experiments e firstexperiment is for an integration feature of the all-in-onesystem which verifies that the system supports all threephases of packing detection unpacking and verification inan integrated framework e second experiment is for thedetection phase where we first set Packing-Range an en-tropy range for judging the packing and then evaluate thepacking detection accuracy using the range e third ex-periment is for the unpacking and verification phases wherewe verify whether the system unpacks the PE file correctly bycalculating the restoration accuracy
52 Experimental Results on Overall Working MechanismIn this section we confirm through experiments that theintegration mechanism of all-in-one unpacking systemworks correctly Figure 4 shows the operation screenshot ofeach phase of the all-in-one system First Figure 4(a) showsan example screenshot of the detection phase To confirmall four cases of packing detection work correctly even if wedetect the packing by the previous case(s) we continueexecuting the remaining cases In Figure 4(a) we firstexamine DP-NEP of Algorithm 2 We detect the EP sectionin DP-NEP so we proceed to DP-SIG (the first two lines)By the DP-SIG test we detect the UPX packer so weconclude that the file is packed (the second two lines) Eventhough detection is completed we proceed to the next DP-ENT and DP-WR tests e entropy value is 793 which iscontained in Packing-Range and the attribute value of theEP section is ldquo0xe0000040rdquo which includes the WIRTEattribute us we conclude that the file is also packed byDP-ENT and DP-WR tests (the third three lines)According to Figure 4(a) we can confirm that DP-NEPDP-SIG DP-ENT and DP-WR tests of the detection phaseall work correctly
Figures 4(b) and 4(c) show example screenshots of theunpacking phase In the detection phase if an unpackinglibrary of the packer detected exists we perform staticunpacking as shown in Figure 4(b) if not we do dynamicunpacking as shown in Figure 4(c) Figure 4(b) shows theresult of static unpacking by the UPX commercial tool [33]for the UPX packed file In the figure the file size increasesfrom 285 760 bytes to 731 200 bytes by unpacking and it
means that the compression ratio is 3908 Figure 4(c)shows the result of the entropy-based dynamic unpackingIn the figure the system calculates the entropy of eachsection if IP encounters JMP and CJMP instructions itreturns the current DstAddr as OEP if the entropy change isstable and DstAddr is in the different section
Figure 4(d) shows an example screenshot of the verifi-cation phase In this example we measure the restorationaccuracy of the restored file the text and data sections of theoriginal filee length of the original text section is 18 439and the length of the unpacked file is 872 555 In the hashvalue comparison since the portion corresponding to theoriginal text section exists in the unpacked file the resto-ration accuracy becomes 100 Similarly we also calculatethe restoration accuracy of the data section as 100
53 Experimental Results onDetection Phase In this sectionwe describe two experimental results of the detection phasee first experiment is to set Packing-Range and the secondis to measure the detection accuracy of the proposed de-tection algorithm In the first experiment we experimentallydetermine Packing-Range used in DP-ENT of Algorithm 1e previous study by Lyda and Hamrock [17] uses theentropy value of the whole file and sets the entropy range ofthe packed file to [6677 6926] On the contrary Han andLee [16] focus on the EP section only and compute theentropy value for the EP section instead of the whole filerough the experiment they set the entropy value to 685and judge a file packed if its entropy is over 685
We also determine the entropy range Packing-Rangethrough the experiments on various packers Figure 5 showsthe entropy distribution of the EP section for differentpackers Figure 5(a) shows the average distribution of theoriginal and 19 packed files Figures 5(b)ndash5(m) show themore detailed entropy distribution for each packerAccording to Figures 5(b)ndash5(m) the entropy distribution ofthe packed file differs depending on the packer type Inparticular as shown in Figures 5(d) 5(e) and 5(k) somepacked files have smaller entropy values than the originalfile and thus we cannot simply set an upper threshold valueas a packed criterion as in the previous study [16] Howeverwe note that while the entropy of packer files distributesover a wide range as shown in Figures 5(c)ndash5(m) the en-tropy of original files distributes only in a specific range asshown in Figure 5(b) More specifically the entropy value ofthe normal file is between 600 and 685 as shown inFigure 5(b) Based on these experimental results we thus setPacking-Range to [0 600) or [685infin] of the EP sectionAccording to [9 29] the entropy method has the disad-vantage of being low accuracy and easy evading To over-come this point we use three other detection techniques ofDP-NEP DP-SIG and DP-WR together with DP-ENT in ahybrid manner Using this hybrid approach we can furtherimprove the detection accuracy
In the second experiment we evaluate the detectionaccuracy and false-negativefalse-positive rates Table 5compares the three detection techniques of the detectionphase with the proposed hybrid approach Since EP sections
Security and Communication Networks 9
(a) (b)
(c) (d)
Figure 4 Operation screenshots of detection unpacking and verification phases (a) Detection phase (b) Unpacking phase (staticunpacking) (c) Unpacking phase (dynamic unpacking) (d) Verification phase
0
50
100
150
200
250
300
350
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Not packed (b)Summation of (c) to (m)
Num
ber o
f file
s
(a)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(b)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(c)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(d)
Figure 5 Continued
10 Security and Communication Networks
Num
ber o
f file
s0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(e)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(f )
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(g)
Num
ber o
f file
s
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(h)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(i)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(j)
Num
ber o
f file
s
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(k)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(l)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(m)
Figure 5 Entropy distributions of the EP section for original and representative packed files (a) Average of original and 19 packers(b) Original (not packed) (c) UPX (d) ASPack (e) NSPack (f ) MPRESS (g) Yodarsquos Protector (h) RLPack (i) BeroEXE (j) MEW(k) PACKMAN (l) WinUpack (m) exe32pack
Security and Communication Networks 11
are mandatory in all PE files we can detect EP sectionsunless intentionally concealed All files of the dataset used inthe experiment also have EP sections and the detection rateby DP-NEP is trivially 0 In other words there is nopacking detection case due to absence of the EP section sowe exclude the experimental result of DP-NEP from Table 5
We now explain the results of Table 5 by each columnFirst we explain the column of DP-SIG In the case of DP-SIG Not-Packed shows a detection rate of 0 since nosignature exists in normal files is is because packingdetection does not occur in normal files which means thatDP-SIG performs an accurate detection for Not-Packed filesOn the other hand for the packed files DP-SIG detects 100of packers except NSPack e detection rate of NSPack ismerely 538 because of the nature of NSPack that thesignatures often exist in different sections rather than the EPsection In summary DP-SIG shows 950 of detectionaccuracy on average
Second for the experiment on the detection rate of DP-WR we intentionally remove the WRITE attribute from 39files (30 of packed files) per packer We perform thisremoval process in 18 packers except exe32pack e reasonfor changing the dataset is that normal packed files usuallyhaveWRITE attributes so we arbitrarily remove theWRITEattribute from 30 of files for the DP-WR experimentLooking at the DP-WR column in Table 5 DP-WR detectsonly 70 files per packer for 18 packers since we in-tentionally remove the WRITE attribute from 30 files Incase of exe32pack however we need not perform the re-moval process since it has no WRITE attribute in the PEheader And accordingly as shown in the column DP-WR ofTable 5 the detection rate of exe32pack by DP-WR istrivially 0 In the case of Not-Packed the detection rate is0 since normal files have noWRITE attribute In summary
checking the presence of theWRITE attribute is very simpleand DP-WR works correctly if the packed file has theWRITE attribute but it does not work well if the WRITEattribute does not exist or maliciously modified
ird the column of DP-ENT shows the result of en-tropy-based detection rates As shown in the table it showsthe 100 detection rate for 9 packers including NSPackUPX MEW and BeRoEXEPacker but it incurs some falsenegatives for 10 packers including ASPack Yodarsquos ProtectorPackman and MPRESS by showing the detection rate lowerthan 100 In particular for MPRESS Petite JDpackeXpressor and Neolite it shows a low detection rate below246 because their entropy value in the packer charac-teristic is in entropy distribution of the normal file DP-ENTalso has a false-positive rate of 384 for Not-Packed Insummary DP-ENT works very accurately for most packersbut it may incur false negatives for a few specific packers orfalse positives for some normal files To overcome thislimitation we use DP-ENT together with other detectiontechniques We can see the improvement through the ex-perimental results of Tables 5 and 6 In the tables when usingonly DP-ENT the detection accuracy is low in some packers(especially MPRESS) but when using it together with theother three techniques the accuracy is close to 100
Fourth the proposed hybrid approach of integratingfour techniques detects 100 of all packers except NSPackand shows no false positives e reason for this high de-tection rate is that the proposed method has all the ad-vantages of DP-SIG DP-WR and DP-ENT as well as DP-NEP In case of NSPack however the detection rate ismerely 700 this is because as mentioned above weforcedly remove the WRITE attribute from the 30 fileAccording to the results of Table 5 detection rate differsaccording to each detection technique and some techniques
Table 5 Comparison of packing detection rates of three techniques and the proposed hybrid approach
PackerTechnique
DP-SIG () DP-WR () DP-ENT () Hybrid approach ()ASPack 100 700 915 100NSPack 538 700 100 700MPRESS 100 700 246 100UPX 100 700 100 100Yodarsquos Protector 100 700 969 100MEW 100 700 100 100BeRoEXEPacker 100 700 100 100Packman 100 700 938 100RLPack 100 700 993 100PECompact 100 700 100 100Petite 100 700 000 100JDpack 100 700 562 100Molebox 100 700 100 100eXpressor 100 700 000 100Yodarsquos Crypter 100 700 915 100FSG 100 700 100 100exe32pack 100 000 100 100WinUpack 100 700 100 100Neolite 100 700 161 100Average 950 663 739 984Not-Packed 000 000 384 000
12 Security and Communication Networks
incur false positives or false negatives on the other hand theproposed hybrid shows the highest detection rate of 984and no false positives
Table 6 shows the detection accuracy of two or threecombinations of detection techniques and of the proposedhybrid approach First DP-SIG ENT a combination ofsignature and entropy tests detects 100 of the packed filebut detects 18 files incorrectly showing a false-positive rateof 073 Second DP-ENTWR a combination of entropyand WRITE tests shows a low detection rate of 920because the entropy value of MPRESS is in the distributionof normal file ird DP-SIG ENT WR a combination ofsignature entropy and WRITE tests and our hybrid ap-proach show 984 detection accuracy and 160 false-negative rate is is because DP-SIG ENT WR and thehybrid approach go through the same detection procedureexcept DP-NEP e false-negative rate of 160 is due tothe files having no WRITE attributes According to theresults of Tables 5 and 6 we believe that the proposed hybridapproach is an excellent method for high detection accuracyby combining the advantages of the existing and new de-tection techniques
54 Experimental Results on Unpacking and VerificationPhases In this section we verify the unpacking and veri-fication phases by performing the unpacking of Algorithm 2and calculating the restoration accuracy of the files throughthe actual restoration of Algorithm 3 However unpackingand verification phases take a lot of time so we experimentwith randomly selected 40 files of the dataset eunpacking phase consists of static unpacking and dynamicunpacking In the experiment we exclude static unpackingsince it uses the existing library as it is and focus on onlydynamic unpacking of Algorithm 2 since it tries to restorethe original file without any information of packers
Table 7 shows the restoration accuracy of the proposedmethod for text and data sections for each of the 19 packersWe cannot directly unpack Yodarsquos Protector and PECom-pact since they use Anti-Debug engineering techniques Toovercome this problem we unpack these packers bybypassing three Anti-Debugging API calls GetCurrent-ProcessID() BlockInput() and IsDebuggerPresent() [41]at is we run Algorithm 2 with the bypassing techniquesfor Yodarsquos Protector and PECompact and include its ex-perimental result in Table 7) In general since only text anddata sections of the original file are packed [42] in theexperiment we consider only the restoration accuracy ofthese text and data sections of the restored file In Table 7among 19 packers 11 packers including UPX NSPackMEW RLPack and BeRoEXEPacker show at 100 resto-ration accuracy for both text and data sections On the other
hand 8 packers including ASPack Packman and MPRESSshow less than 100 restoration accuracy because of themanner in which the packers handle the reloc section Morespecifically if there is a reloc section in the original file theunpacking relocates the text and data sections changing allthe data values in each section erefore the restorationaccuracy depends on whether there is a reloc section in theoriginal file
Table 8 shows the restoration results in detail accordingto the presence of the reloc section As shown in the tablethe restoration accuracy is 100 in the absence of the relocsection while it is very low in the range of 000 to 998 inthe presence of the reloc section However the low resto-ration accuracy is due to the nature of the packer and it doesnot mean that unpacking does not work correctly In otherwords even if the unpacking is successful the data valuemay change due to the packer characteristic and the res-toration accuracy is calculated to be low In summary wecan use the proposed verification algorithm to calculate therestoration accuracy of unpacked files which can quanti-tatively measure the accuracy of the actual restoration
In addition to text data and reloc sections we furtherconsider the PE header which includes the IAT informationIn general the IAT is located in the header of the PE file andif we actually need to run the unpacked file we must alsorestore this header information including the IAT In factthe study of [3 43] restores the header including the IATandproduces the actual whole executable file On the other handmost studies including this paper do not restore the actualwhole executable file itself but restore text data and relocsections which are major parts of the executable file is isbecause the purpose of unpacking is not to execute therestored file but to identify and extract malicious codesincluded in the restored file erefore in this paper we donot need to restore the header required for the actual ex-ecution and accordingly comparison of IAT information isalso not necessary Instead we restore and compare textdata and reloc sections that are more likely to hidemalicious code Restoring the PE header is beyond the scopeof this study we leave it as a separate research topic
6 Conclusions
In this paper we analyzed recent studies of packing de-tection and unpacking techniques and derived four im-portant observations no phase integration no detectioncombination no real-restoration and no unpacking verifi-cation We then proposed an all-in-one unpacking system toimprove these four observations First no phase integrationwas a difficulty in which analysts had to perform packingdetection unpacking and verification phases separately andwe solved this problem by presenting an all-in-one
Table 6 Comparison of packing detection accuracy of two or three combinations and of the hybrid approach
Metrics DP-SIG ENT () DP-ENT WR () DP-SIG ENT WR () Hybrid approach ()Detection accuracy 950 920 984 984False-negative rate 500 800 160 160False-positive rate 073 000 000 000
Security and Communication Networks 13
Tabl
e7
Restorationaccuracy
oftext
anddatasections
ofeach
packer
Section
UPX ()
NSP
ack
()
MEW ()
RLPa
ck(
)Be
RoEX
E(
)Neolite
()
FSG
()
eXpressor
()
Molebox
()
Petite
()
JDpack
()
ASP
ack
()
Yodarsquos
protector
()
exe32p
ack
()
MPR
ESS
()
Yodarsquos
crypter
()
PECom
pact
()
WinUpack
()
Packman
()
Average
()
text
100
100
100
100
100
100
100
100
100
100
100
843
840
808
788
745
740
680
550
892
data
100
100
100
100
100
100
100
100
100
100
100
843
840
100
753
745
740
680
485
902
14 Security and Communication Networks
unpacking system of integrating these three phases in anintegrated framework Second no detection combinationwas due to that there was no attempt to combine variouspacking detection methods and we solved this problem bypresenting a hybrid approach of combining existing and newpacking detection methods ird no real-restoration camefrom that there was little discussion about restoring theactual executable file and we solved this problem by actuallyrestoring an original image of the packed file Fourth nounpacking verification was due to that there was no explicitwork to measure the restoration accuracy of unpacked filesand we solved this problem by proposing a verificationalgorithm of quantitatively evaluating the restorationaccuracy
We constructed a dataset of 2600 PE files and experi-mentally evaluated the proposed system using the datasetFirst we performed all the phases of the all-in-oneunpacking system sequentially and confirmed that its overallworking mechanism worked well Next through the com-parison of the hybrid packing detection approach andexisting detection techniques we showed that the detectionaccuracy of the proposed method was the highest at 984on average with no false positives For this we also de-termined an entropy range of packing Packing-Rangethrough extensive experiments on 19 representative packersFinally in the unpacking and verification phases we con-firmed that unpacking was successful for all files except thefiles packed with Yodarsquos Protector We also showed that therestoration accuracy of unpacked files was nearly 100except for a few packers
As the future research we will focus on unpackingpacked files with Anti-VM Anti-Debug and Anti-Reverseengineering techniques We will also study how to computethe restoration accuracy of the file having reloc sections
Data Availability
All the data files used in the experiments are available athttpsgithubcomchesvectainPackingData
Conflicts of Interest
e authors declare that they have no conflicts of interest
Acknowledgments
is work was partly supported by the Institute of In-formation and Communications Technology Planning andEvaluation (IITP) grant funded by the Korea government(MSIT) (no 2017-0-00158 Development of Cyber reat
Intelligence (CTI) analysis and information sharing tech-nology for national cyber incident response) and the KoreaElectric Power Corporation (no R18XA05)
References
[1] M Morgenstern ldquoAV-TEST Security report 201718rdquo 2019httpswwwav-testorgfileadminpdfsecurity_reportAV-TEST_Security_Report_2017-2018pdf
[2] T Ban R Isawa S Guo D Inoue and K Nakao ldquoEfficientmalware packer identification using support vector machineswith spectrum kernelrdquo in Proceedings of the Eighth Asia JointConference on Information Security pp 69ndash76 Seoul SouthKorea July 2013
[3] B Cheng M Jiang J Fu et al ldquoTowards paving the way forlarge-scale Windows malware analysis generic binaryunpacking with orders-of-magnitude performance boostrdquo inProceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security pp 395ndash411 TorontoCanada October 2018
[4] A Gupta andM S Arya ldquoHashing based encryption and anti-debugger support for packing multiple files into single exe-cutablerdquo International Journal of Advanced Research inComputer Science vol 9 no 1 pp 914ndash920 2018
[5] Y Kawakoya M Iwamura and M Itoh ldquoMemory behavior-based automatic malware unpacking in stealth debuggingenvironmentrdquo in Proceedings of the 5th International Con-ference on Malicious and Unwanted Software pp 39ndash46Lorraine France October 2010
[6] G Liang J Pang Z Shan R Yang and Y Chen ldquoAutomaticbenchmark generation framework for malware detectionrdquoSecurity and Communication Networks vol 2018 Article ID4947695 8 pages 2018
[7] A Pektas and T Acarman ldquoA dynamic malware analyzeragainst virtual machine aware malicious softwarerdquo Securityand Communication Networks vol 7 no 12 pp 2245ndash22572014
[8] I Santos J Nieves and P G Bringas ldquoSemi-supervisedlearning for unknown malware detectionrdquo in InternationalSymposium on Distributed Computing and Artificial In-telligence A Abraham J M Corchado S R Gonzalez andJ F De Paz Santana Eds pp 415ndash422 Springer BerlinGermany 2011
[9] X Ugarte-Pedrero I Santos B Sanz C Laorden andP G Bringas ldquoCountering entropy measure attacks onpacked software detectionrdquo in Proceedings of the 9th In-ternational Conference on Consumer Communications andNetworking pp 164ndash168 Las Vegas NV USA January 2012
[10] H Won S-P Kim S Lee M-J Choi and Y-S MoonldquoSecure principal component analysis in multiple distributednodesrdquo Security and Communication Networks vol 9 no 14pp 2348ndash2358 2016
[11] S Cesare Y Xiang and W Zhou ldquoMalwisendashan effective andefficient classification system for packed and polymorphic
Table 8 Restoration accuracy according to the presence of the reloc section
SectionASPack () Packman
() Mpres () Yodarsquoscrypter ()
PECompact()
YodarsquosProtector
()
exe32pack()
WinUpack()
times times times times times times times times
text 000 100 998 100 000 100 000 100 000 100 000 100 000 100 000 100
data 000 100 881 100 000 100 004 100 000 100 005 100 100 100 000 100
Security and Communication Networks 15
malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013
[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008
[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008
[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017
[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008
[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean
[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007
[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008
[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013
[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe
[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009
[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008
[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid
[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml
[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml
[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006
[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017
[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010
[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007
[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007
[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016
[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016
[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio
[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014
[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581
[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip
[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip
[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992
[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml
[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https
wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one
[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011
[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016
16 Security and Communication Networks
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
stops the execution and returns DstAddr as OEP (Lines13 and 14)
43 Verification Phase In this section we present an al-gorithm for measuring the restoration accuracy by com-paring the restored file with its original file To execute thepacked file the proposed system uses dynamic unpackingand finds the OEP during its execution If the system findsthe OEP we assume the restoration of an original file issuccessful and regard that file as the restored file In theverification phase we quantitatively measure the restorationaccuracy by comparing such restored file and its original file
Figure 3 shows an example of comparing some bytecodesof the original file with those of the restored file For thecomparison we first pack the pspasswdexe file provided byMicrosoft [37] and then unpack it again by dynamicunpacking e figure shows that the bytecodes of the foursections of the original file text rdata data and rsrc areidentically appeared in UPX0 UPX1 and rsrc sections ofthe restored file at is even if the name and location ofeach section are different the same bytecodes exist in therestored file However the restored file contains padding andgarbage values and some bytecodes may be in other loca-tions in the original file us in order to compare theoriginal and the restored files we need to search where theoriginal bytecodes are located in the restored file Sincemanually comparing the bytecodes of the original and re-stored files takes a very long time in this paper we proposean efficient search algorithm using a hash function We usethe well-known MD5 [38] as a hash function of the searchalgorithm As a message reduction algorithm MD5 is widelyused in checking the integrity of long data
Algorithm 3 shows the restoration accuracy calculationalgorithm Its basic principle is to check whether eachsection of the original file exists in the restored file We hereuse the MD5 hash function to speed up the scan e inputsto the algorithm are an original file A and a restored file Band the output is restorationAccuracy a list of storing res-toration accuracy values of all sections First we removegarbage and padding values from the restored file (Line 1)For each section Ai of A we obtain the length curLen of thesection and calculate a hash value H1 by applying the MD5hash function (Lines 4 to 7) We then calculate another hashvalue H2 from the recovered file B by considering only thelength of curLen (Line 9) If H1 equals to H2 (Line 10) itmeans that B contains Ai so we stop the comparison andcalculate the restoration accuracy of the section (Line 14)However if H1 and H2 are not equal we perform the hashvalue comparison again by decrementing the length curLenof Ai (Lines 6 to 13) We repeat this process to calculate therestoration accuracy for the section Ai (Line 14) Aftercompleting the accuracy calculation for all sections thealgorithm returns the list restorationAccuracy (Line 16)
5 Performance Evaluation
51 Experimental Environment In this section we presentthe results of experimental evaluation to show the superi-ority of the all-in-one unpacking system We conduct theexperiments in the virtual server as well as the physicalserver e reasons for experimenting in two server envi-ronments are as follows First unpacking results may bedifferent between the virtual server and the physical server ifa packed file has anti-unpacking features such as Anti-VM orAnti-Debug Second the dynamic unpacking of malware
Input A PE fileOutput OEP the start address of the unpacked original filebegin
(1) Measure the entropy value of each section(2) while the process is not reached to the end do(3) IP⟵ the current instruction pointer(4) if IPrsquos address gt current sectionrsquos last address then return Not-Found(5) else if IP is JMP or CJMP or RETN then(6) Execute instructions until IP(7) DstAddr⟵ the next instruction address(8) if IP notin IP-Histroy and DstAddr notin DstAddr-History then(9) Store IP and DstAddr into each history(10) if DstAddr is out of the file then return Not-Found(11) else(12) Re-calculate the entropy value of each section(13) if Entropy change is stable and DstAddr is in the different section then(14) return DstAddr(15) end-if(16) end-if(17) end-if(18) end-while(19) return Not-Found
end
ALGORITHM 2 Dynamic unpacking algorithm
Security and Communication Networks 7
files may infect the physical server us we performunpacking first in the virtual server and then perform theexperiment in the physical server only when there is noabnormality e hardware specifications of the experi-mental platform are as follows e virtual server is IntelCore i7-6700 340GHz with 8GB RAM and the physical
server is Intel Core i7-6700 340GHz with 32GB RAM etarget operating system is Windows 7 OS 32Bit and we usePEVIEW to view the structure and content of 32 bit PE filesAs the debugger we use Immunity Debugger 185 [25] toimplement the dynamic unpacking phase of Algorithm 2 Toimplement the all-in-one unpacking system including the
text section
rdata section
data section
rsrc section
0000040000000410000004200000043000000440
00012C0000012C1000012C2000012C3000012C40
0001F0000001F0100001F0200001F0300001F040
0002040000020410000204200002043000020440
5566D10F5E
904E00185A
D028209890
0010000100
8B8BFAB75D
FFFF00FEFE
414345464A
0000000000
EC022B11C3
0101000101
4141514141
0000000000
8B83F18DCC
0000000000
0000000000
0000000000
4DC28D49CC
7C40
DC2662
5080404818
0020005000
0C024202CC
FFFFFDFEFE
424445474B
0000000000
8B660166CC
0101010101
4141414141
0000000001
D185EB89CC
0000000000
0000000000
0080008000
56C00754CC
6C00F03ACC
88BCB02850
0018000001
8D758D0ECC
FF00FDFEFD
424445494B
0000000000
72F5A4FECC
0100010101
4141414141
0000000000
022B2466CC
0000000000
0000000000
0000000000
8DD60085CC
5E26024A7C
C0F420B080
0038000068
648B00D2CC
FFFFFEFEFE
424446494B
0000000000
24750075CC
0101010101
4141414141
0200100000
000800F0CC
0000000000
0000000000
0080000080
(a)
UPX0 section
UPX1 section
UPX1 section
rsrc section
0040100000401010004010200040103000401040
0041FB0C0041FB1C0041FB2C0041FB3C0041FB4C
0042A0000042A0100042A0200042A0300042A040
0042100000421010004210200042103000421040
5566D10F5E
904E00185A
0010000100
D028209890
8B8BFAB75D
FFFF00FEFE
0000000000
414345464A
EC022B11C3
0101000101
0000000000
4141514141
8B83F18DCC
0000000000
0000000000
0000000000
4DC28D49CC
7C40
DC2662
0020005000
5080404817
0C024202CC
FFFFFDFEFE
0000000000
424445474B
8B660166CC
0101010101
0000000001
4141414141
D185EB89CC
0000000000
0080008000
0000000000
56C00754CC
6C00F03ACC
0018000001
88BCB02850
8D758D0ECC
FF00FDFEFD
0000000000
424445494B
72F5A4FECC
0100010101
0000000000
4141414141
022B2466CC
0000000000
0000000000
0000000000
8DD60085CC
5E26024A7C
0038000068
C0F420B080
648B00D2CC
FFFFFEFEFE
0000000000
424446494B
24750075CC
0101010101
0200100000
4141414141
000800F0CC
0000000000
0080000080
0000000000
(b)
Figure 3 Comparison of sample bytecodes of the original and restored files (a) Sections of the original file (b) Sections of the restored file
Input A an original fileB a restored file
Output restorationAccuracy[1n] a list of restoration accuracy values for n sectionsbegin
(1) Remove garbage and padding values from B(2) A⟵ A1 A2 An1113864 1113865 where Ai is the i-th section of A(3) for each section Ai isin A do(4) totalLen⟵ Ailength(5) curLen⟵ totalLen(6) while curLen gt 0 do(7) H1⟵ hash (Ax[1 curLen]) hash a part of section Ai
(8) for j⟵ 1 to (Blength minus curLen + 1) do(9) H2⟵ hash (B[j j + curLen minus 1]) hash a part of B(10) if H1 H2 then break(11) else curLen⟵ curLen minus 1(12) end-for(13) end-while(14) restorationAccuracy[i] ⟵ (curLentotalLen) times 100(15) end-for(16) return restorationAccuracy
end
ALGORITHM 3 Restoration rate calculation algorithm
8 Security and Communication Networks
dynamic unpacking algorithm we run Python scripts on topof Immunity Debugger
In the experiment we use a total 130 PE files where 90files are provided by Microsoft [37] and 40 files are ran-domly collected from putty [39] and MD5 [40] sites Weperform packing of 130 executable files using 19 packersrespectively and construct a dataset consisting of a total of2600 packed files including original 130 filese 19 packersused are UPX ASPack NSPack MPRESS Yodarsquos ProtectorMEW Packman RLPack BeRoEXEPacker PECompactPetite JDpack Molebox eXpressor Yodarsquos Crypter FSGexe32pack WinUpack and Neolite e reason for con-figuring the dataset by packing the normal PE files directly isthat we can quantitatively measure the packing detectionaccuracy the false-positive rate and the false-negative rateonly if the experimental files are packed in advance
In this paper we conduct three experiments e firstexperiment is for an integration feature of the all-in-onesystem which verifies that the system supports all threephases of packing detection unpacking and verification inan integrated framework e second experiment is for thedetection phase where we first set Packing-Range an en-tropy range for judging the packing and then evaluate thepacking detection accuracy using the range e third ex-periment is for the unpacking and verification phases wherewe verify whether the system unpacks the PE file correctly bycalculating the restoration accuracy
52 Experimental Results on Overall Working MechanismIn this section we confirm through experiments that theintegration mechanism of all-in-one unpacking systemworks correctly Figure 4 shows the operation screenshot ofeach phase of the all-in-one system First Figure 4(a) showsan example screenshot of the detection phase To confirmall four cases of packing detection work correctly even if wedetect the packing by the previous case(s) we continueexecuting the remaining cases In Figure 4(a) we firstexamine DP-NEP of Algorithm 2 We detect the EP sectionin DP-NEP so we proceed to DP-SIG (the first two lines)By the DP-SIG test we detect the UPX packer so weconclude that the file is packed (the second two lines) Eventhough detection is completed we proceed to the next DP-ENT and DP-WR tests e entropy value is 793 which iscontained in Packing-Range and the attribute value of theEP section is ldquo0xe0000040rdquo which includes the WIRTEattribute us we conclude that the file is also packed byDP-ENT and DP-WR tests (the third three lines)According to Figure 4(a) we can confirm that DP-NEPDP-SIG DP-ENT and DP-WR tests of the detection phaseall work correctly
Figures 4(b) and 4(c) show example screenshots of theunpacking phase In the detection phase if an unpackinglibrary of the packer detected exists we perform staticunpacking as shown in Figure 4(b) if not we do dynamicunpacking as shown in Figure 4(c) Figure 4(b) shows theresult of static unpacking by the UPX commercial tool [33]for the UPX packed file In the figure the file size increasesfrom 285 760 bytes to 731 200 bytes by unpacking and it
means that the compression ratio is 3908 Figure 4(c)shows the result of the entropy-based dynamic unpackingIn the figure the system calculates the entropy of eachsection if IP encounters JMP and CJMP instructions itreturns the current DstAddr as OEP if the entropy change isstable and DstAddr is in the different section
Figure 4(d) shows an example screenshot of the verifi-cation phase In this example we measure the restorationaccuracy of the restored file the text and data sections of theoriginal filee length of the original text section is 18 439and the length of the unpacked file is 872 555 In the hashvalue comparison since the portion corresponding to theoriginal text section exists in the unpacked file the resto-ration accuracy becomes 100 Similarly we also calculatethe restoration accuracy of the data section as 100
53 Experimental Results onDetection Phase In this sectionwe describe two experimental results of the detection phasee first experiment is to set Packing-Range and the secondis to measure the detection accuracy of the proposed de-tection algorithm In the first experiment we experimentallydetermine Packing-Range used in DP-ENT of Algorithm 1e previous study by Lyda and Hamrock [17] uses theentropy value of the whole file and sets the entropy range ofthe packed file to [6677 6926] On the contrary Han andLee [16] focus on the EP section only and compute theentropy value for the EP section instead of the whole filerough the experiment they set the entropy value to 685and judge a file packed if its entropy is over 685
We also determine the entropy range Packing-Rangethrough the experiments on various packers Figure 5 showsthe entropy distribution of the EP section for differentpackers Figure 5(a) shows the average distribution of theoriginal and 19 packed files Figures 5(b)ndash5(m) show themore detailed entropy distribution for each packerAccording to Figures 5(b)ndash5(m) the entropy distribution ofthe packed file differs depending on the packer type Inparticular as shown in Figures 5(d) 5(e) and 5(k) somepacked files have smaller entropy values than the originalfile and thus we cannot simply set an upper threshold valueas a packed criterion as in the previous study [16] Howeverwe note that while the entropy of packer files distributesover a wide range as shown in Figures 5(c)ndash5(m) the en-tropy of original files distributes only in a specific range asshown in Figure 5(b) More specifically the entropy value ofthe normal file is between 600 and 685 as shown inFigure 5(b) Based on these experimental results we thus setPacking-Range to [0 600) or [685infin] of the EP sectionAccording to [9 29] the entropy method has the disad-vantage of being low accuracy and easy evading To over-come this point we use three other detection techniques ofDP-NEP DP-SIG and DP-WR together with DP-ENT in ahybrid manner Using this hybrid approach we can furtherimprove the detection accuracy
In the second experiment we evaluate the detectionaccuracy and false-negativefalse-positive rates Table 5compares the three detection techniques of the detectionphase with the proposed hybrid approach Since EP sections
Security and Communication Networks 9
(a) (b)
(c) (d)
Figure 4 Operation screenshots of detection unpacking and verification phases (a) Detection phase (b) Unpacking phase (staticunpacking) (c) Unpacking phase (dynamic unpacking) (d) Verification phase
0
50
100
150
200
250
300
350
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Not packed (b)Summation of (c) to (m)
Num
ber o
f file
s
(a)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(b)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(c)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(d)
Figure 5 Continued
10 Security and Communication Networks
Num
ber o
f file
s0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(e)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(f )
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(g)
Num
ber o
f file
s
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(h)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(i)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(j)
Num
ber o
f file
s
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(k)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(l)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(m)
Figure 5 Entropy distributions of the EP section for original and representative packed files (a) Average of original and 19 packers(b) Original (not packed) (c) UPX (d) ASPack (e) NSPack (f ) MPRESS (g) Yodarsquos Protector (h) RLPack (i) BeroEXE (j) MEW(k) PACKMAN (l) WinUpack (m) exe32pack
Security and Communication Networks 11
are mandatory in all PE files we can detect EP sectionsunless intentionally concealed All files of the dataset used inthe experiment also have EP sections and the detection rateby DP-NEP is trivially 0 In other words there is nopacking detection case due to absence of the EP section sowe exclude the experimental result of DP-NEP from Table 5
We now explain the results of Table 5 by each columnFirst we explain the column of DP-SIG In the case of DP-SIG Not-Packed shows a detection rate of 0 since nosignature exists in normal files is is because packingdetection does not occur in normal files which means thatDP-SIG performs an accurate detection for Not-Packed filesOn the other hand for the packed files DP-SIG detects 100of packers except NSPack e detection rate of NSPack ismerely 538 because of the nature of NSPack that thesignatures often exist in different sections rather than the EPsection In summary DP-SIG shows 950 of detectionaccuracy on average
Second for the experiment on the detection rate of DP-WR we intentionally remove the WRITE attribute from 39files (30 of packed files) per packer We perform thisremoval process in 18 packers except exe32pack e reasonfor changing the dataset is that normal packed files usuallyhaveWRITE attributes so we arbitrarily remove theWRITEattribute from 30 of files for the DP-WR experimentLooking at the DP-WR column in Table 5 DP-WR detectsonly 70 files per packer for 18 packers since we in-tentionally remove the WRITE attribute from 30 files Incase of exe32pack however we need not perform the re-moval process since it has no WRITE attribute in the PEheader And accordingly as shown in the column DP-WR ofTable 5 the detection rate of exe32pack by DP-WR istrivially 0 In the case of Not-Packed the detection rate is0 since normal files have noWRITE attribute In summary
checking the presence of theWRITE attribute is very simpleand DP-WR works correctly if the packed file has theWRITE attribute but it does not work well if the WRITEattribute does not exist or maliciously modified
ird the column of DP-ENT shows the result of en-tropy-based detection rates As shown in the table it showsthe 100 detection rate for 9 packers including NSPackUPX MEW and BeRoEXEPacker but it incurs some falsenegatives for 10 packers including ASPack Yodarsquos ProtectorPackman and MPRESS by showing the detection rate lowerthan 100 In particular for MPRESS Petite JDpackeXpressor and Neolite it shows a low detection rate below246 because their entropy value in the packer charac-teristic is in entropy distribution of the normal file DP-ENTalso has a false-positive rate of 384 for Not-Packed Insummary DP-ENT works very accurately for most packersbut it may incur false negatives for a few specific packers orfalse positives for some normal files To overcome thislimitation we use DP-ENT together with other detectiontechniques We can see the improvement through the ex-perimental results of Tables 5 and 6 In the tables when usingonly DP-ENT the detection accuracy is low in some packers(especially MPRESS) but when using it together with theother three techniques the accuracy is close to 100
Fourth the proposed hybrid approach of integratingfour techniques detects 100 of all packers except NSPackand shows no false positives e reason for this high de-tection rate is that the proposed method has all the ad-vantages of DP-SIG DP-WR and DP-ENT as well as DP-NEP In case of NSPack however the detection rate ismerely 700 this is because as mentioned above weforcedly remove the WRITE attribute from the 30 fileAccording to the results of Table 5 detection rate differsaccording to each detection technique and some techniques
Table 5 Comparison of packing detection rates of three techniques and the proposed hybrid approach
PackerTechnique
DP-SIG () DP-WR () DP-ENT () Hybrid approach ()ASPack 100 700 915 100NSPack 538 700 100 700MPRESS 100 700 246 100UPX 100 700 100 100Yodarsquos Protector 100 700 969 100MEW 100 700 100 100BeRoEXEPacker 100 700 100 100Packman 100 700 938 100RLPack 100 700 993 100PECompact 100 700 100 100Petite 100 700 000 100JDpack 100 700 562 100Molebox 100 700 100 100eXpressor 100 700 000 100Yodarsquos Crypter 100 700 915 100FSG 100 700 100 100exe32pack 100 000 100 100WinUpack 100 700 100 100Neolite 100 700 161 100Average 950 663 739 984Not-Packed 000 000 384 000
12 Security and Communication Networks
incur false positives or false negatives on the other hand theproposed hybrid shows the highest detection rate of 984and no false positives
Table 6 shows the detection accuracy of two or threecombinations of detection techniques and of the proposedhybrid approach First DP-SIG ENT a combination ofsignature and entropy tests detects 100 of the packed filebut detects 18 files incorrectly showing a false-positive rateof 073 Second DP-ENTWR a combination of entropyand WRITE tests shows a low detection rate of 920because the entropy value of MPRESS is in the distributionof normal file ird DP-SIG ENT WR a combination ofsignature entropy and WRITE tests and our hybrid ap-proach show 984 detection accuracy and 160 false-negative rate is is because DP-SIG ENT WR and thehybrid approach go through the same detection procedureexcept DP-NEP e false-negative rate of 160 is due tothe files having no WRITE attributes According to theresults of Tables 5 and 6 we believe that the proposed hybridapproach is an excellent method for high detection accuracyby combining the advantages of the existing and new de-tection techniques
54 Experimental Results on Unpacking and VerificationPhases In this section we verify the unpacking and veri-fication phases by performing the unpacking of Algorithm 2and calculating the restoration accuracy of the files throughthe actual restoration of Algorithm 3 However unpackingand verification phases take a lot of time so we experimentwith randomly selected 40 files of the dataset eunpacking phase consists of static unpacking and dynamicunpacking In the experiment we exclude static unpackingsince it uses the existing library as it is and focus on onlydynamic unpacking of Algorithm 2 since it tries to restorethe original file without any information of packers
Table 7 shows the restoration accuracy of the proposedmethod for text and data sections for each of the 19 packersWe cannot directly unpack Yodarsquos Protector and PECom-pact since they use Anti-Debug engineering techniques Toovercome this problem we unpack these packers bybypassing three Anti-Debugging API calls GetCurrent-ProcessID() BlockInput() and IsDebuggerPresent() [41]at is we run Algorithm 2 with the bypassing techniquesfor Yodarsquos Protector and PECompact and include its ex-perimental result in Table 7) In general since only text anddata sections of the original file are packed [42] in theexperiment we consider only the restoration accuracy ofthese text and data sections of the restored file In Table 7among 19 packers 11 packers including UPX NSPackMEW RLPack and BeRoEXEPacker show at 100 resto-ration accuracy for both text and data sections On the other
hand 8 packers including ASPack Packman and MPRESSshow less than 100 restoration accuracy because of themanner in which the packers handle the reloc section Morespecifically if there is a reloc section in the original file theunpacking relocates the text and data sections changing allthe data values in each section erefore the restorationaccuracy depends on whether there is a reloc section in theoriginal file
Table 8 shows the restoration results in detail accordingto the presence of the reloc section As shown in the tablethe restoration accuracy is 100 in the absence of the relocsection while it is very low in the range of 000 to 998 inthe presence of the reloc section However the low resto-ration accuracy is due to the nature of the packer and it doesnot mean that unpacking does not work correctly In otherwords even if the unpacking is successful the data valuemay change due to the packer characteristic and the res-toration accuracy is calculated to be low In summary wecan use the proposed verification algorithm to calculate therestoration accuracy of unpacked files which can quanti-tatively measure the accuracy of the actual restoration
In addition to text data and reloc sections we furtherconsider the PE header which includes the IAT informationIn general the IAT is located in the header of the PE file andif we actually need to run the unpacked file we must alsorestore this header information including the IAT In factthe study of [3 43] restores the header including the IATandproduces the actual whole executable file On the other handmost studies including this paper do not restore the actualwhole executable file itself but restore text data and relocsections which are major parts of the executable file is isbecause the purpose of unpacking is not to execute therestored file but to identify and extract malicious codesincluded in the restored file erefore in this paper we donot need to restore the header required for the actual ex-ecution and accordingly comparison of IAT information isalso not necessary Instead we restore and compare textdata and reloc sections that are more likely to hidemalicious code Restoring the PE header is beyond the scopeof this study we leave it as a separate research topic
6 Conclusions
In this paper we analyzed recent studies of packing de-tection and unpacking techniques and derived four im-portant observations no phase integration no detectioncombination no real-restoration and no unpacking verifi-cation We then proposed an all-in-one unpacking system toimprove these four observations First no phase integrationwas a difficulty in which analysts had to perform packingdetection unpacking and verification phases separately andwe solved this problem by presenting an all-in-one
Table 6 Comparison of packing detection accuracy of two or three combinations and of the hybrid approach
Metrics DP-SIG ENT () DP-ENT WR () DP-SIG ENT WR () Hybrid approach ()Detection accuracy 950 920 984 984False-negative rate 500 800 160 160False-positive rate 073 000 000 000
Security and Communication Networks 13
Tabl
e7
Restorationaccuracy
oftext
anddatasections
ofeach
packer
Section
UPX ()
NSP
ack
()
MEW ()
RLPa
ck(
)Be
RoEX
E(
)Neolite
()
FSG
()
eXpressor
()
Molebox
()
Petite
()
JDpack
()
ASP
ack
()
Yodarsquos
protector
()
exe32p
ack
()
MPR
ESS
()
Yodarsquos
crypter
()
PECom
pact
()
WinUpack
()
Packman
()
Average
()
text
100
100
100
100
100
100
100
100
100
100
100
843
840
808
788
745
740
680
550
892
data
100
100
100
100
100
100
100
100
100
100
100
843
840
100
753
745
740
680
485
902
14 Security and Communication Networks
unpacking system of integrating these three phases in anintegrated framework Second no detection combinationwas due to that there was no attempt to combine variouspacking detection methods and we solved this problem bypresenting a hybrid approach of combining existing and newpacking detection methods ird no real-restoration camefrom that there was little discussion about restoring theactual executable file and we solved this problem by actuallyrestoring an original image of the packed file Fourth nounpacking verification was due to that there was no explicitwork to measure the restoration accuracy of unpacked filesand we solved this problem by proposing a verificationalgorithm of quantitatively evaluating the restorationaccuracy
We constructed a dataset of 2600 PE files and experi-mentally evaluated the proposed system using the datasetFirst we performed all the phases of the all-in-oneunpacking system sequentially and confirmed that its overallworking mechanism worked well Next through the com-parison of the hybrid packing detection approach andexisting detection techniques we showed that the detectionaccuracy of the proposed method was the highest at 984on average with no false positives For this we also de-termined an entropy range of packing Packing-Rangethrough extensive experiments on 19 representative packersFinally in the unpacking and verification phases we con-firmed that unpacking was successful for all files except thefiles packed with Yodarsquos Protector We also showed that therestoration accuracy of unpacked files was nearly 100except for a few packers
As the future research we will focus on unpackingpacked files with Anti-VM Anti-Debug and Anti-Reverseengineering techniques We will also study how to computethe restoration accuracy of the file having reloc sections
Data Availability
All the data files used in the experiments are available athttpsgithubcomchesvectainPackingData
Conflicts of Interest
e authors declare that they have no conflicts of interest
Acknowledgments
is work was partly supported by the Institute of In-formation and Communications Technology Planning andEvaluation (IITP) grant funded by the Korea government(MSIT) (no 2017-0-00158 Development of Cyber reat
Intelligence (CTI) analysis and information sharing tech-nology for national cyber incident response) and the KoreaElectric Power Corporation (no R18XA05)
References
[1] M Morgenstern ldquoAV-TEST Security report 201718rdquo 2019httpswwwav-testorgfileadminpdfsecurity_reportAV-TEST_Security_Report_2017-2018pdf
[2] T Ban R Isawa S Guo D Inoue and K Nakao ldquoEfficientmalware packer identification using support vector machineswith spectrum kernelrdquo in Proceedings of the Eighth Asia JointConference on Information Security pp 69ndash76 Seoul SouthKorea July 2013
[3] B Cheng M Jiang J Fu et al ldquoTowards paving the way forlarge-scale Windows malware analysis generic binaryunpacking with orders-of-magnitude performance boostrdquo inProceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security pp 395ndash411 TorontoCanada October 2018
[4] A Gupta andM S Arya ldquoHashing based encryption and anti-debugger support for packing multiple files into single exe-cutablerdquo International Journal of Advanced Research inComputer Science vol 9 no 1 pp 914ndash920 2018
[5] Y Kawakoya M Iwamura and M Itoh ldquoMemory behavior-based automatic malware unpacking in stealth debuggingenvironmentrdquo in Proceedings of the 5th International Con-ference on Malicious and Unwanted Software pp 39ndash46Lorraine France October 2010
[6] G Liang J Pang Z Shan R Yang and Y Chen ldquoAutomaticbenchmark generation framework for malware detectionrdquoSecurity and Communication Networks vol 2018 Article ID4947695 8 pages 2018
[7] A Pektas and T Acarman ldquoA dynamic malware analyzeragainst virtual machine aware malicious softwarerdquo Securityand Communication Networks vol 7 no 12 pp 2245ndash22572014
[8] I Santos J Nieves and P G Bringas ldquoSemi-supervisedlearning for unknown malware detectionrdquo in InternationalSymposium on Distributed Computing and Artificial In-telligence A Abraham J M Corchado S R Gonzalez andJ F De Paz Santana Eds pp 415ndash422 Springer BerlinGermany 2011
[9] X Ugarte-Pedrero I Santos B Sanz C Laorden andP G Bringas ldquoCountering entropy measure attacks onpacked software detectionrdquo in Proceedings of the 9th In-ternational Conference on Consumer Communications andNetworking pp 164ndash168 Las Vegas NV USA January 2012
[10] H Won S-P Kim S Lee M-J Choi and Y-S MoonldquoSecure principal component analysis in multiple distributednodesrdquo Security and Communication Networks vol 9 no 14pp 2348ndash2358 2016
[11] S Cesare Y Xiang and W Zhou ldquoMalwisendashan effective andefficient classification system for packed and polymorphic
Table 8 Restoration accuracy according to the presence of the reloc section
SectionASPack () Packman
() Mpres () Yodarsquoscrypter ()
PECompact()
YodarsquosProtector
()
exe32pack()
WinUpack()
times times times times times times times times
text 000 100 998 100 000 100 000 100 000 100 000 100 000 100 000 100
data 000 100 881 100 000 100 004 100 000 100 005 100 100 100 000 100
Security and Communication Networks 15
malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013
[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008
[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008
[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017
[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008
[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean
[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007
[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008
[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013
[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe
[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009
[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008
[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid
[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml
[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml
[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006
[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017
[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010
[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007
[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007
[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016
[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016
[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio
[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014
[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581
[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip
[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip
[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992
[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml
[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https
wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one
[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011
[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016
16 Security and Communication Networks
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
files may infect the physical server us we performunpacking first in the virtual server and then perform theexperiment in the physical server only when there is noabnormality e hardware specifications of the experi-mental platform are as follows e virtual server is IntelCore i7-6700 340GHz with 8GB RAM and the physical
server is Intel Core i7-6700 340GHz with 32GB RAM etarget operating system is Windows 7 OS 32Bit and we usePEVIEW to view the structure and content of 32 bit PE filesAs the debugger we use Immunity Debugger 185 [25] toimplement the dynamic unpacking phase of Algorithm 2 Toimplement the all-in-one unpacking system including the
text section
rdata section
data section
rsrc section
0000040000000410000004200000043000000440
00012C0000012C1000012C2000012C3000012C40
0001F0000001F0100001F0200001F0300001F040
0002040000020410000204200002043000020440
5566D10F5E
904E00185A
D028209890
0010000100
8B8BFAB75D
FFFF00FEFE
414345464A
0000000000
EC022B11C3
0101000101
4141514141
0000000000
8B83F18DCC
0000000000
0000000000
0000000000
4DC28D49CC
7C40
DC2662
5080404818
0020005000
0C024202CC
FFFFFDFEFE
424445474B
0000000000
8B660166CC
0101010101
4141414141
0000000001
D185EB89CC
0000000000
0000000000
0080008000
56C00754CC
6C00F03ACC
88BCB02850
0018000001
8D758D0ECC
FF00FDFEFD
424445494B
0000000000
72F5A4FECC
0100010101
4141414141
0000000000
022B2466CC
0000000000
0000000000
0000000000
8DD60085CC
5E26024A7C
C0F420B080
0038000068
648B00D2CC
FFFFFEFEFE
424446494B
0000000000
24750075CC
0101010101
4141414141
0200100000
000800F0CC
0000000000
0000000000
0080000080
(a)
UPX0 section
UPX1 section
UPX1 section
rsrc section
0040100000401010004010200040103000401040
0041FB0C0041FB1C0041FB2C0041FB3C0041FB4C
0042A0000042A0100042A0200042A0300042A040
0042100000421010004210200042103000421040
5566D10F5E
904E00185A
0010000100
D028209890
8B8BFAB75D
FFFF00FEFE
0000000000
414345464A
EC022B11C3
0101000101
0000000000
4141514141
8B83F18DCC
0000000000
0000000000
0000000000
4DC28D49CC
7C40
DC2662
0020005000
5080404817
0C024202CC
FFFFFDFEFE
0000000000
424445474B
8B660166CC
0101010101
0000000001
4141414141
D185EB89CC
0000000000
0080008000
0000000000
56C00754CC
6C00F03ACC
0018000001
88BCB02850
8D758D0ECC
FF00FDFEFD
0000000000
424445494B
72F5A4FECC
0100010101
0000000000
4141414141
022B2466CC
0000000000
0000000000
0000000000
8DD60085CC
5E26024A7C
0038000068
C0F420B080
648B00D2CC
FFFFFEFEFE
0000000000
424446494B
24750075CC
0101010101
0200100000
4141414141
000800F0CC
0000000000
0080000080
0000000000
(b)
Figure 3 Comparison of sample bytecodes of the original and restored files (a) Sections of the original file (b) Sections of the restored file
Input A an original fileB a restored file
Output restorationAccuracy[1n] a list of restoration accuracy values for n sectionsbegin
(1) Remove garbage and padding values from B(2) A⟵ A1 A2 An1113864 1113865 where Ai is the i-th section of A(3) for each section Ai isin A do(4) totalLen⟵ Ailength(5) curLen⟵ totalLen(6) while curLen gt 0 do(7) H1⟵ hash (Ax[1 curLen]) hash a part of section Ai
(8) for j⟵ 1 to (Blength minus curLen + 1) do(9) H2⟵ hash (B[j j + curLen minus 1]) hash a part of B(10) if H1 H2 then break(11) else curLen⟵ curLen minus 1(12) end-for(13) end-while(14) restorationAccuracy[i] ⟵ (curLentotalLen) times 100(15) end-for(16) return restorationAccuracy
end
ALGORITHM 3 Restoration rate calculation algorithm
8 Security and Communication Networks
dynamic unpacking algorithm we run Python scripts on topof Immunity Debugger
In the experiment we use a total 130 PE files where 90files are provided by Microsoft [37] and 40 files are ran-domly collected from putty [39] and MD5 [40] sites Weperform packing of 130 executable files using 19 packersrespectively and construct a dataset consisting of a total of2600 packed files including original 130 filese 19 packersused are UPX ASPack NSPack MPRESS Yodarsquos ProtectorMEW Packman RLPack BeRoEXEPacker PECompactPetite JDpack Molebox eXpressor Yodarsquos Crypter FSGexe32pack WinUpack and Neolite e reason for con-figuring the dataset by packing the normal PE files directly isthat we can quantitatively measure the packing detectionaccuracy the false-positive rate and the false-negative rateonly if the experimental files are packed in advance
In this paper we conduct three experiments e firstexperiment is for an integration feature of the all-in-onesystem which verifies that the system supports all threephases of packing detection unpacking and verification inan integrated framework e second experiment is for thedetection phase where we first set Packing-Range an en-tropy range for judging the packing and then evaluate thepacking detection accuracy using the range e third ex-periment is for the unpacking and verification phases wherewe verify whether the system unpacks the PE file correctly bycalculating the restoration accuracy
52 Experimental Results on Overall Working MechanismIn this section we confirm through experiments that theintegration mechanism of all-in-one unpacking systemworks correctly Figure 4 shows the operation screenshot ofeach phase of the all-in-one system First Figure 4(a) showsan example screenshot of the detection phase To confirmall four cases of packing detection work correctly even if wedetect the packing by the previous case(s) we continueexecuting the remaining cases In Figure 4(a) we firstexamine DP-NEP of Algorithm 2 We detect the EP sectionin DP-NEP so we proceed to DP-SIG (the first two lines)By the DP-SIG test we detect the UPX packer so weconclude that the file is packed (the second two lines) Eventhough detection is completed we proceed to the next DP-ENT and DP-WR tests e entropy value is 793 which iscontained in Packing-Range and the attribute value of theEP section is ldquo0xe0000040rdquo which includes the WIRTEattribute us we conclude that the file is also packed byDP-ENT and DP-WR tests (the third three lines)According to Figure 4(a) we can confirm that DP-NEPDP-SIG DP-ENT and DP-WR tests of the detection phaseall work correctly
Figures 4(b) and 4(c) show example screenshots of theunpacking phase In the detection phase if an unpackinglibrary of the packer detected exists we perform staticunpacking as shown in Figure 4(b) if not we do dynamicunpacking as shown in Figure 4(c) Figure 4(b) shows theresult of static unpacking by the UPX commercial tool [33]for the UPX packed file In the figure the file size increasesfrom 285 760 bytes to 731 200 bytes by unpacking and it
means that the compression ratio is 3908 Figure 4(c)shows the result of the entropy-based dynamic unpackingIn the figure the system calculates the entropy of eachsection if IP encounters JMP and CJMP instructions itreturns the current DstAddr as OEP if the entropy change isstable and DstAddr is in the different section
Figure 4(d) shows an example screenshot of the verifi-cation phase In this example we measure the restorationaccuracy of the restored file the text and data sections of theoriginal filee length of the original text section is 18 439and the length of the unpacked file is 872 555 In the hashvalue comparison since the portion corresponding to theoriginal text section exists in the unpacked file the resto-ration accuracy becomes 100 Similarly we also calculatethe restoration accuracy of the data section as 100
53 Experimental Results onDetection Phase In this sectionwe describe two experimental results of the detection phasee first experiment is to set Packing-Range and the secondis to measure the detection accuracy of the proposed de-tection algorithm In the first experiment we experimentallydetermine Packing-Range used in DP-ENT of Algorithm 1e previous study by Lyda and Hamrock [17] uses theentropy value of the whole file and sets the entropy range ofthe packed file to [6677 6926] On the contrary Han andLee [16] focus on the EP section only and compute theentropy value for the EP section instead of the whole filerough the experiment they set the entropy value to 685and judge a file packed if its entropy is over 685
We also determine the entropy range Packing-Rangethrough the experiments on various packers Figure 5 showsthe entropy distribution of the EP section for differentpackers Figure 5(a) shows the average distribution of theoriginal and 19 packed files Figures 5(b)ndash5(m) show themore detailed entropy distribution for each packerAccording to Figures 5(b)ndash5(m) the entropy distribution ofthe packed file differs depending on the packer type Inparticular as shown in Figures 5(d) 5(e) and 5(k) somepacked files have smaller entropy values than the originalfile and thus we cannot simply set an upper threshold valueas a packed criterion as in the previous study [16] Howeverwe note that while the entropy of packer files distributesover a wide range as shown in Figures 5(c)ndash5(m) the en-tropy of original files distributes only in a specific range asshown in Figure 5(b) More specifically the entropy value ofthe normal file is between 600 and 685 as shown inFigure 5(b) Based on these experimental results we thus setPacking-Range to [0 600) or [685infin] of the EP sectionAccording to [9 29] the entropy method has the disad-vantage of being low accuracy and easy evading To over-come this point we use three other detection techniques ofDP-NEP DP-SIG and DP-WR together with DP-ENT in ahybrid manner Using this hybrid approach we can furtherimprove the detection accuracy
In the second experiment we evaluate the detectionaccuracy and false-negativefalse-positive rates Table 5compares the three detection techniques of the detectionphase with the proposed hybrid approach Since EP sections
Security and Communication Networks 9
(a) (b)
(c) (d)
Figure 4 Operation screenshots of detection unpacking and verification phases (a) Detection phase (b) Unpacking phase (staticunpacking) (c) Unpacking phase (dynamic unpacking) (d) Verification phase
0
50
100
150
200
250
300
350
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Not packed (b)Summation of (c) to (m)
Num
ber o
f file
s
(a)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(b)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(c)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(d)
Figure 5 Continued
10 Security and Communication Networks
Num
ber o
f file
s0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(e)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(f )
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(g)
Num
ber o
f file
s
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(h)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(i)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(j)
Num
ber o
f file
s
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(k)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(l)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(m)
Figure 5 Entropy distributions of the EP section for original and representative packed files (a) Average of original and 19 packers(b) Original (not packed) (c) UPX (d) ASPack (e) NSPack (f ) MPRESS (g) Yodarsquos Protector (h) RLPack (i) BeroEXE (j) MEW(k) PACKMAN (l) WinUpack (m) exe32pack
Security and Communication Networks 11
are mandatory in all PE files we can detect EP sectionsunless intentionally concealed All files of the dataset used inthe experiment also have EP sections and the detection rateby DP-NEP is trivially 0 In other words there is nopacking detection case due to absence of the EP section sowe exclude the experimental result of DP-NEP from Table 5
We now explain the results of Table 5 by each columnFirst we explain the column of DP-SIG In the case of DP-SIG Not-Packed shows a detection rate of 0 since nosignature exists in normal files is is because packingdetection does not occur in normal files which means thatDP-SIG performs an accurate detection for Not-Packed filesOn the other hand for the packed files DP-SIG detects 100of packers except NSPack e detection rate of NSPack ismerely 538 because of the nature of NSPack that thesignatures often exist in different sections rather than the EPsection In summary DP-SIG shows 950 of detectionaccuracy on average
Second for the experiment on the detection rate of DP-WR we intentionally remove the WRITE attribute from 39files (30 of packed files) per packer We perform thisremoval process in 18 packers except exe32pack e reasonfor changing the dataset is that normal packed files usuallyhaveWRITE attributes so we arbitrarily remove theWRITEattribute from 30 of files for the DP-WR experimentLooking at the DP-WR column in Table 5 DP-WR detectsonly 70 files per packer for 18 packers since we in-tentionally remove the WRITE attribute from 30 files Incase of exe32pack however we need not perform the re-moval process since it has no WRITE attribute in the PEheader And accordingly as shown in the column DP-WR ofTable 5 the detection rate of exe32pack by DP-WR istrivially 0 In the case of Not-Packed the detection rate is0 since normal files have noWRITE attribute In summary
checking the presence of theWRITE attribute is very simpleand DP-WR works correctly if the packed file has theWRITE attribute but it does not work well if the WRITEattribute does not exist or maliciously modified
ird the column of DP-ENT shows the result of en-tropy-based detection rates As shown in the table it showsthe 100 detection rate for 9 packers including NSPackUPX MEW and BeRoEXEPacker but it incurs some falsenegatives for 10 packers including ASPack Yodarsquos ProtectorPackman and MPRESS by showing the detection rate lowerthan 100 In particular for MPRESS Petite JDpackeXpressor and Neolite it shows a low detection rate below246 because their entropy value in the packer charac-teristic is in entropy distribution of the normal file DP-ENTalso has a false-positive rate of 384 for Not-Packed Insummary DP-ENT works very accurately for most packersbut it may incur false negatives for a few specific packers orfalse positives for some normal files To overcome thislimitation we use DP-ENT together with other detectiontechniques We can see the improvement through the ex-perimental results of Tables 5 and 6 In the tables when usingonly DP-ENT the detection accuracy is low in some packers(especially MPRESS) but when using it together with theother three techniques the accuracy is close to 100
Fourth the proposed hybrid approach of integratingfour techniques detects 100 of all packers except NSPackand shows no false positives e reason for this high de-tection rate is that the proposed method has all the ad-vantages of DP-SIG DP-WR and DP-ENT as well as DP-NEP In case of NSPack however the detection rate ismerely 700 this is because as mentioned above weforcedly remove the WRITE attribute from the 30 fileAccording to the results of Table 5 detection rate differsaccording to each detection technique and some techniques
Table 5 Comparison of packing detection rates of three techniques and the proposed hybrid approach
PackerTechnique
DP-SIG () DP-WR () DP-ENT () Hybrid approach ()ASPack 100 700 915 100NSPack 538 700 100 700MPRESS 100 700 246 100UPX 100 700 100 100Yodarsquos Protector 100 700 969 100MEW 100 700 100 100BeRoEXEPacker 100 700 100 100Packman 100 700 938 100RLPack 100 700 993 100PECompact 100 700 100 100Petite 100 700 000 100JDpack 100 700 562 100Molebox 100 700 100 100eXpressor 100 700 000 100Yodarsquos Crypter 100 700 915 100FSG 100 700 100 100exe32pack 100 000 100 100WinUpack 100 700 100 100Neolite 100 700 161 100Average 950 663 739 984Not-Packed 000 000 384 000
12 Security and Communication Networks
incur false positives or false negatives on the other hand theproposed hybrid shows the highest detection rate of 984and no false positives
Table 6 shows the detection accuracy of two or threecombinations of detection techniques and of the proposedhybrid approach First DP-SIG ENT a combination ofsignature and entropy tests detects 100 of the packed filebut detects 18 files incorrectly showing a false-positive rateof 073 Second DP-ENTWR a combination of entropyand WRITE tests shows a low detection rate of 920because the entropy value of MPRESS is in the distributionof normal file ird DP-SIG ENT WR a combination ofsignature entropy and WRITE tests and our hybrid ap-proach show 984 detection accuracy and 160 false-negative rate is is because DP-SIG ENT WR and thehybrid approach go through the same detection procedureexcept DP-NEP e false-negative rate of 160 is due tothe files having no WRITE attributes According to theresults of Tables 5 and 6 we believe that the proposed hybridapproach is an excellent method for high detection accuracyby combining the advantages of the existing and new de-tection techniques
54 Experimental Results on Unpacking and VerificationPhases In this section we verify the unpacking and veri-fication phases by performing the unpacking of Algorithm 2and calculating the restoration accuracy of the files throughthe actual restoration of Algorithm 3 However unpackingand verification phases take a lot of time so we experimentwith randomly selected 40 files of the dataset eunpacking phase consists of static unpacking and dynamicunpacking In the experiment we exclude static unpackingsince it uses the existing library as it is and focus on onlydynamic unpacking of Algorithm 2 since it tries to restorethe original file without any information of packers
Table 7 shows the restoration accuracy of the proposedmethod for text and data sections for each of the 19 packersWe cannot directly unpack Yodarsquos Protector and PECom-pact since they use Anti-Debug engineering techniques Toovercome this problem we unpack these packers bybypassing three Anti-Debugging API calls GetCurrent-ProcessID() BlockInput() and IsDebuggerPresent() [41]at is we run Algorithm 2 with the bypassing techniquesfor Yodarsquos Protector and PECompact and include its ex-perimental result in Table 7) In general since only text anddata sections of the original file are packed [42] in theexperiment we consider only the restoration accuracy ofthese text and data sections of the restored file In Table 7among 19 packers 11 packers including UPX NSPackMEW RLPack and BeRoEXEPacker show at 100 resto-ration accuracy for both text and data sections On the other
hand 8 packers including ASPack Packman and MPRESSshow less than 100 restoration accuracy because of themanner in which the packers handle the reloc section Morespecifically if there is a reloc section in the original file theunpacking relocates the text and data sections changing allthe data values in each section erefore the restorationaccuracy depends on whether there is a reloc section in theoriginal file
Table 8 shows the restoration results in detail accordingto the presence of the reloc section As shown in the tablethe restoration accuracy is 100 in the absence of the relocsection while it is very low in the range of 000 to 998 inthe presence of the reloc section However the low resto-ration accuracy is due to the nature of the packer and it doesnot mean that unpacking does not work correctly In otherwords even if the unpacking is successful the data valuemay change due to the packer characteristic and the res-toration accuracy is calculated to be low In summary wecan use the proposed verification algorithm to calculate therestoration accuracy of unpacked files which can quanti-tatively measure the accuracy of the actual restoration
In addition to text data and reloc sections we furtherconsider the PE header which includes the IAT informationIn general the IAT is located in the header of the PE file andif we actually need to run the unpacked file we must alsorestore this header information including the IAT In factthe study of [3 43] restores the header including the IATandproduces the actual whole executable file On the other handmost studies including this paper do not restore the actualwhole executable file itself but restore text data and relocsections which are major parts of the executable file is isbecause the purpose of unpacking is not to execute therestored file but to identify and extract malicious codesincluded in the restored file erefore in this paper we donot need to restore the header required for the actual ex-ecution and accordingly comparison of IAT information isalso not necessary Instead we restore and compare textdata and reloc sections that are more likely to hidemalicious code Restoring the PE header is beyond the scopeof this study we leave it as a separate research topic
6 Conclusions
In this paper we analyzed recent studies of packing de-tection and unpacking techniques and derived four im-portant observations no phase integration no detectioncombination no real-restoration and no unpacking verifi-cation We then proposed an all-in-one unpacking system toimprove these four observations First no phase integrationwas a difficulty in which analysts had to perform packingdetection unpacking and verification phases separately andwe solved this problem by presenting an all-in-one
Table 6 Comparison of packing detection accuracy of two or three combinations and of the hybrid approach
Metrics DP-SIG ENT () DP-ENT WR () DP-SIG ENT WR () Hybrid approach ()Detection accuracy 950 920 984 984False-negative rate 500 800 160 160False-positive rate 073 000 000 000
Security and Communication Networks 13
Tabl
e7
Restorationaccuracy
oftext
anddatasections
ofeach
packer
Section
UPX ()
NSP
ack
()
MEW ()
RLPa
ck(
)Be
RoEX
E(
)Neolite
()
FSG
()
eXpressor
()
Molebox
()
Petite
()
JDpack
()
ASP
ack
()
Yodarsquos
protector
()
exe32p
ack
()
MPR
ESS
()
Yodarsquos
crypter
()
PECom
pact
()
WinUpack
()
Packman
()
Average
()
text
100
100
100
100
100
100
100
100
100
100
100
843
840
808
788
745
740
680
550
892
data
100
100
100
100
100
100
100
100
100
100
100
843
840
100
753
745
740
680
485
902
14 Security and Communication Networks
unpacking system of integrating these three phases in anintegrated framework Second no detection combinationwas due to that there was no attempt to combine variouspacking detection methods and we solved this problem bypresenting a hybrid approach of combining existing and newpacking detection methods ird no real-restoration camefrom that there was little discussion about restoring theactual executable file and we solved this problem by actuallyrestoring an original image of the packed file Fourth nounpacking verification was due to that there was no explicitwork to measure the restoration accuracy of unpacked filesand we solved this problem by proposing a verificationalgorithm of quantitatively evaluating the restorationaccuracy
We constructed a dataset of 2600 PE files and experi-mentally evaluated the proposed system using the datasetFirst we performed all the phases of the all-in-oneunpacking system sequentially and confirmed that its overallworking mechanism worked well Next through the com-parison of the hybrid packing detection approach andexisting detection techniques we showed that the detectionaccuracy of the proposed method was the highest at 984on average with no false positives For this we also de-termined an entropy range of packing Packing-Rangethrough extensive experiments on 19 representative packersFinally in the unpacking and verification phases we con-firmed that unpacking was successful for all files except thefiles packed with Yodarsquos Protector We also showed that therestoration accuracy of unpacked files was nearly 100except for a few packers
As the future research we will focus on unpackingpacked files with Anti-VM Anti-Debug and Anti-Reverseengineering techniques We will also study how to computethe restoration accuracy of the file having reloc sections
Data Availability
All the data files used in the experiments are available athttpsgithubcomchesvectainPackingData
Conflicts of Interest
e authors declare that they have no conflicts of interest
Acknowledgments
is work was partly supported by the Institute of In-formation and Communications Technology Planning andEvaluation (IITP) grant funded by the Korea government(MSIT) (no 2017-0-00158 Development of Cyber reat
Intelligence (CTI) analysis and information sharing tech-nology for national cyber incident response) and the KoreaElectric Power Corporation (no R18XA05)
References
[1] M Morgenstern ldquoAV-TEST Security report 201718rdquo 2019httpswwwav-testorgfileadminpdfsecurity_reportAV-TEST_Security_Report_2017-2018pdf
[2] T Ban R Isawa S Guo D Inoue and K Nakao ldquoEfficientmalware packer identification using support vector machineswith spectrum kernelrdquo in Proceedings of the Eighth Asia JointConference on Information Security pp 69ndash76 Seoul SouthKorea July 2013
[3] B Cheng M Jiang J Fu et al ldquoTowards paving the way forlarge-scale Windows malware analysis generic binaryunpacking with orders-of-magnitude performance boostrdquo inProceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security pp 395ndash411 TorontoCanada October 2018
[4] A Gupta andM S Arya ldquoHashing based encryption and anti-debugger support for packing multiple files into single exe-cutablerdquo International Journal of Advanced Research inComputer Science vol 9 no 1 pp 914ndash920 2018
[5] Y Kawakoya M Iwamura and M Itoh ldquoMemory behavior-based automatic malware unpacking in stealth debuggingenvironmentrdquo in Proceedings of the 5th International Con-ference on Malicious and Unwanted Software pp 39ndash46Lorraine France October 2010
[6] G Liang J Pang Z Shan R Yang and Y Chen ldquoAutomaticbenchmark generation framework for malware detectionrdquoSecurity and Communication Networks vol 2018 Article ID4947695 8 pages 2018
[7] A Pektas and T Acarman ldquoA dynamic malware analyzeragainst virtual machine aware malicious softwarerdquo Securityand Communication Networks vol 7 no 12 pp 2245ndash22572014
[8] I Santos J Nieves and P G Bringas ldquoSemi-supervisedlearning for unknown malware detectionrdquo in InternationalSymposium on Distributed Computing and Artificial In-telligence A Abraham J M Corchado S R Gonzalez andJ F De Paz Santana Eds pp 415ndash422 Springer BerlinGermany 2011
[9] X Ugarte-Pedrero I Santos B Sanz C Laorden andP G Bringas ldquoCountering entropy measure attacks onpacked software detectionrdquo in Proceedings of the 9th In-ternational Conference on Consumer Communications andNetworking pp 164ndash168 Las Vegas NV USA January 2012
[10] H Won S-P Kim S Lee M-J Choi and Y-S MoonldquoSecure principal component analysis in multiple distributednodesrdquo Security and Communication Networks vol 9 no 14pp 2348ndash2358 2016
[11] S Cesare Y Xiang and W Zhou ldquoMalwisendashan effective andefficient classification system for packed and polymorphic
Table 8 Restoration accuracy according to the presence of the reloc section
SectionASPack () Packman
() Mpres () Yodarsquoscrypter ()
PECompact()
YodarsquosProtector
()
exe32pack()
WinUpack()
times times times times times times times times
text 000 100 998 100 000 100 000 100 000 100 000 100 000 100 000 100
data 000 100 881 100 000 100 004 100 000 100 005 100 100 100 000 100
Security and Communication Networks 15
malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013
[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008
[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008
[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017
[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008
[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean
[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007
[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008
[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013
[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe
[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009
[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008
[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid
[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml
[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml
[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006
[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017
[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010
[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007
[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007
[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016
[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016
[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio
[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014
[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581
[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip
[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip
[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992
[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml
[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https
wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one
[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011
[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016
16 Security and Communication Networks
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
dynamic unpacking algorithm we run Python scripts on topof Immunity Debugger
In the experiment we use a total 130 PE files where 90files are provided by Microsoft [37] and 40 files are ran-domly collected from putty [39] and MD5 [40] sites Weperform packing of 130 executable files using 19 packersrespectively and construct a dataset consisting of a total of2600 packed files including original 130 filese 19 packersused are UPX ASPack NSPack MPRESS Yodarsquos ProtectorMEW Packman RLPack BeRoEXEPacker PECompactPetite JDpack Molebox eXpressor Yodarsquos Crypter FSGexe32pack WinUpack and Neolite e reason for con-figuring the dataset by packing the normal PE files directly isthat we can quantitatively measure the packing detectionaccuracy the false-positive rate and the false-negative rateonly if the experimental files are packed in advance
In this paper we conduct three experiments e firstexperiment is for an integration feature of the all-in-onesystem which verifies that the system supports all threephases of packing detection unpacking and verification inan integrated framework e second experiment is for thedetection phase where we first set Packing-Range an en-tropy range for judging the packing and then evaluate thepacking detection accuracy using the range e third ex-periment is for the unpacking and verification phases wherewe verify whether the system unpacks the PE file correctly bycalculating the restoration accuracy
52 Experimental Results on Overall Working MechanismIn this section we confirm through experiments that theintegration mechanism of all-in-one unpacking systemworks correctly Figure 4 shows the operation screenshot ofeach phase of the all-in-one system First Figure 4(a) showsan example screenshot of the detection phase To confirmall four cases of packing detection work correctly even if wedetect the packing by the previous case(s) we continueexecuting the remaining cases In Figure 4(a) we firstexamine DP-NEP of Algorithm 2 We detect the EP sectionin DP-NEP so we proceed to DP-SIG (the first two lines)By the DP-SIG test we detect the UPX packer so weconclude that the file is packed (the second two lines) Eventhough detection is completed we proceed to the next DP-ENT and DP-WR tests e entropy value is 793 which iscontained in Packing-Range and the attribute value of theEP section is ldquo0xe0000040rdquo which includes the WIRTEattribute us we conclude that the file is also packed byDP-ENT and DP-WR tests (the third three lines)According to Figure 4(a) we can confirm that DP-NEPDP-SIG DP-ENT and DP-WR tests of the detection phaseall work correctly
Figures 4(b) and 4(c) show example screenshots of theunpacking phase In the detection phase if an unpackinglibrary of the packer detected exists we perform staticunpacking as shown in Figure 4(b) if not we do dynamicunpacking as shown in Figure 4(c) Figure 4(b) shows theresult of static unpacking by the UPX commercial tool [33]for the UPX packed file In the figure the file size increasesfrom 285 760 bytes to 731 200 bytes by unpacking and it
means that the compression ratio is 3908 Figure 4(c)shows the result of the entropy-based dynamic unpackingIn the figure the system calculates the entropy of eachsection if IP encounters JMP and CJMP instructions itreturns the current DstAddr as OEP if the entropy change isstable and DstAddr is in the different section
Figure 4(d) shows an example screenshot of the verifi-cation phase In this example we measure the restorationaccuracy of the restored file the text and data sections of theoriginal filee length of the original text section is 18 439and the length of the unpacked file is 872 555 In the hashvalue comparison since the portion corresponding to theoriginal text section exists in the unpacked file the resto-ration accuracy becomes 100 Similarly we also calculatethe restoration accuracy of the data section as 100
53 Experimental Results onDetection Phase In this sectionwe describe two experimental results of the detection phasee first experiment is to set Packing-Range and the secondis to measure the detection accuracy of the proposed de-tection algorithm In the first experiment we experimentallydetermine Packing-Range used in DP-ENT of Algorithm 1e previous study by Lyda and Hamrock [17] uses theentropy value of the whole file and sets the entropy range ofthe packed file to [6677 6926] On the contrary Han andLee [16] focus on the EP section only and compute theentropy value for the EP section instead of the whole filerough the experiment they set the entropy value to 685and judge a file packed if its entropy is over 685
We also determine the entropy range Packing-Rangethrough the experiments on various packers Figure 5 showsthe entropy distribution of the EP section for differentpackers Figure 5(a) shows the average distribution of theoriginal and 19 packed files Figures 5(b)ndash5(m) show themore detailed entropy distribution for each packerAccording to Figures 5(b)ndash5(m) the entropy distribution ofthe packed file differs depending on the packer type Inparticular as shown in Figures 5(d) 5(e) and 5(k) somepacked files have smaller entropy values than the originalfile and thus we cannot simply set an upper threshold valueas a packed criterion as in the previous study [16] Howeverwe note that while the entropy of packer files distributesover a wide range as shown in Figures 5(c)ndash5(m) the en-tropy of original files distributes only in a specific range asshown in Figure 5(b) More specifically the entropy value ofthe normal file is between 600 and 685 as shown inFigure 5(b) Based on these experimental results we thus setPacking-Range to [0 600) or [685infin] of the EP sectionAccording to [9 29] the entropy method has the disad-vantage of being low accuracy and easy evading To over-come this point we use three other detection techniques ofDP-NEP DP-SIG and DP-WR together with DP-ENT in ahybrid manner Using this hybrid approach we can furtherimprove the detection accuracy
In the second experiment we evaluate the detectionaccuracy and false-negativefalse-positive rates Table 5compares the three detection techniques of the detectionphase with the proposed hybrid approach Since EP sections
Security and Communication Networks 9
(a) (b)
(c) (d)
Figure 4 Operation screenshots of detection unpacking and verification phases (a) Detection phase (b) Unpacking phase (staticunpacking) (c) Unpacking phase (dynamic unpacking) (d) Verification phase
0
50
100
150
200
250
300
350
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Not packed (b)Summation of (c) to (m)
Num
ber o
f file
s
(a)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(b)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(c)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(d)
Figure 5 Continued
10 Security and Communication Networks
Num
ber o
f file
s0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(e)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(f )
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(g)
Num
ber o
f file
s
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(h)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(i)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(j)
Num
ber o
f file
s
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(k)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(l)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(m)
Figure 5 Entropy distributions of the EP section for original and representative packed files (a) Average of original and 19 packers(b) Original (not packed) (c) UPX (d) ASPack (e) NSPack (f ) MPRESS (g) Yodarsquos Protector (h) RLPack (i) BeroEXE (j) MEW(k) PACKMAN (l) WinUpack (m) exe32pack
Security and Communication Networks 11
are mandatory in all PE files we can detect EP sectionsunless intentionally concealed All files of the dataset used inthe experiment also have EP sections and the detection rateby DP-NEP is trivially 0 In other words there is nopacking detection case due to absence of the EP section sowe exclude the experimental result of DP-NEP from Table 5
We now explain the results of Table 5 by each columnFirst we explain the column of DP-SIG In the case of DP-SIG Not-Packed shows a detection rate of 0 since nosignature exists in normal files is is because packingdetection does not occur in normal files which means thatDP-SIG performs an accurate detection for Not-Packed filesOn the other hand for the packed files DP-SIG detects 100of packers except NSPack e detection rate of NSPack ismerely 538 because of the nature of NSPack that thesignatures often exist in different sections rather than the EPsection In summary DP-SIG shows 950 of detectionaccuracy on average
Second for the experiment on the detection rate of DP-WR we intentionally remove the WRITE attribute from 39files (30 of packed files) per packer We perform thisremoval process in 18 packers except exe32pack e reasonfor changing the dataset is that normal packed files usuallyhaveWRITE attributes so we arbitrarily remove theWRITEattribute from 30 of files for the DP-WR experimentLooking at the DP-WR column in Table 5 DP-WR detectsonly 70 files per packer for 18 packers since we in-tentionally remove the WRITE attribute from 30 files Incase of exe32pack however we need not perform the re-moval process since it has no WRITE attribute in the PEheader And accordingly as shown in the column DP-WR ofTable 5 the detection rate of exe32pack by DP-WR istrivially 0 In the case of Not-Packed the detection rate is0 since normal files have noWRITE attribute In summary
checking the presence of theWRITE attribute is very simpleand DP-WR works correctly if the packed file has theWRITE attribute but it does not work well if the WRITEattribute does not exist or maliciously modified
ird the column of DP-ENT shows the result of en-tropy-based detection rates As shown in the table it showsthe 100 detection rate for 9 packers including NSPackUPX MEW and BeRoEXEPacker but it incurs some falsenegatives for 10 packers including ASPack Yodarsquos ProtectorPackman and MPRESS by showing the detection rate lowerthan 100 In particular for MPRESS Petite JDpackeXpressor and Neolite it shows a low detection rate below246 because their entropy value in the packer charac-teristic is in entropy distribution of the normal file DP-ENTalso has a false-positive rate of 384 for Not-Packed Insummary DP-ENT works very accurately for most packersbut it may incur false negatives for a few specific packers orfalse positives for some normal files To overcome thislimitation we use DP-ENT together with other detectiontechniques We can see the improvement through the ex-perimental results of Tables 5 and 6 In the tables when usingonly DP-ENT the detection accuracy is low in some packers(especially MPRESS) but when using it together with theother three techniques the accuracy is close to 100
Fourth the proposed hybrid approach of integratingfour techniques detects 100 of all packers except NSPackand shows no false positives e reason for this high de-tection rate is that the proposed method has all the ad-vantages of DP-SIG DP-WR and DP-ENT as well as DP-NEP In case of NSPack however the detection rate ismerely 700 this is because as mentioned above weforcedly remove the WRITE attribute from the 30 fileAccording to the results of Table 5 detection rate differsaccording to each detection technique and some techniques
Table 5 Comparison of packing detection rates of three techniques and the proposed hybrid approach
PackerTechnique
DP-SIG () DP-WR () DP-ENT () Hybrid approach ()ASPack 100 700 915 100NSPack 538 700 100 700MPRESS 100 700 246 100UPX 100 700 100 100Yodarsquos Protector 100 700 969 100MEW 100 700 100 100BeRoEXEPacker 100 700 100 100Packman 100 700 938 100RLPack 100 700 993 100PECompact 100 700 100 100Petite 100 700 000 100JDpack 100 700 562 100Molebox 100 700 100 100eXpressor 100 700 000 100Yodarsquos Crypter 100 700 915 100FSG 100 700 100 100exe32pack 100 000 100 100WinUpack 100 700 100 100Neolite 100 700 161 100Average 950 663 739 984Not-Packed 000 000 384 000
12 Security and Communication Networks
incur false positives or false negatives on the other hand theproposed hybrid shows the highest detection rate of 984and no false positives
Table 6 shows the detection accuracy of two or threecombinations of detection techniques and of the proposedhybrid approach First DP-SIG ENT a combination ofsignature and entropy tests detects 100 of the packed filebut detects 18 files incorrectly showing a false-positive rateof 073 Second DP-ENTWR a combination of entropyand WRITE tests shows a low detection rate of 920because the entropy value of MPRESS is in the distributionof normal file ird DP-SIG ENT WR a combination ofsignature entropy and WRITE tests and our hybrid ap-proach show 984 detection accuracy and 160 false-negative rate is is because DP-SIG ENT WR and thehybrid approach go through the same detection procedureexcept DP-NEP e false-negative rate of 160 is due tothe files having no WRITE attributes According to theresults of Tables 5 and 6 we believe that the proposed hybridapproach is an excellent method for high detection accuracyby combining the advantages of the existing and new de-tection techniques
54 Experimental Results on Unpacking and VerificationPhases In this section we verify the unpacking and veri-fication phases by performing the unpacking of Algorithm 2and calculating the restoration accuracy of the files throughthe actual restoration of Algorithm 3 However unpackingand verification phases take a lot of time so we experimentwith randomly selected 40 files of the dataset eunpacking phase consists of static unpacking and dynamicunpacking In the experiment we exclude static unpackingsince it uses the existing library as it is and focus on onlydynamic unpacking of Algorithm 2 since it tries to restorethe original file without any information of packers
Table 7 shows the restoration accuracy of the proposedmethod for text and data sections for each of the 19 packersWe cannot directly unpack Yodarsquos Protector and PECom-pact since they use Anti-Debug engineering techniques Toovercome this problem we unpack these packers bybypassing three Anti-Debugging API calls GetCurrent-ProcessID() BlockInput() and IsDebuggerPresent() [41]at is we run Algorithm 2 with the bypassing techniquesfor Yodarsquos Protector and PECompact and include its ex-perimental result in Table 7) In general since only text anddata sections of the original file are packed [42] in theexperiment we consider only the restoration accuracy ofthese text and data sections of the restored file In Table 7among 19 packers 11 packers including UPX NSPackMEW RLPack and BeRoEXEPacker show at 100 resto-ration accuracy for both text and data sections On the other
hand 8 packers including ASPack Packman and MPRESSshow less than 100 restoration accuracy because of themanner in which the packers handle the reloc section Morespecifically if there is a reloc section in the original file theunpacking relocates the text and data sections changing allthe data values in each section erefore the restorationaccuracy depends on whether there is a reloc section in theoriginal file
Table 8 shows the restoration results in detail accordingto the presence of the reloc section As shown in the tablethe restoration accuracy is 100 in the absence of the relocsection while it is very low in the range of 000 to 998 inthe presence of the reloc section However the low resto-ration accuracy is due to the nature of the packer and it doesnot mean that unpacking does not work correctly In otherwords even if the unpacking is successful the data valuemay change due to the packer characteristic and the res-toration accuracy is calculated to be low In summary wecan use the proposed verification algorithm to calculate therestoration accuracy of unpacked files which can quanti-tatively measure the accuracy of the actual restoration
In addition to text data and reloc sections we furtherconsider the PE header which includes the IAT informationIn general the IAT is located in the header of the PE file andif we actually need to run the unpacked file we must alsorestore this header information including the IAT In factthe study of [3 43] restores the header including the IATandproduces the actual whole executable file On the other handmost studies including this paper do not restore the actualwhole executable file itself but restore text data and relocsections which are major parts of the executable file is isbecause the purpose of unpacking is not to execute therestored file but to identify and extract malicious codesincluded in the restored file erefore in this paper we donot need to restore the header required for the actual ex-ecution and accordingly comparison of IAT information isalso not necessary Instead we restore and compare textdata and reloc sections that are more likely to hidemalicious code Restoring the PE header is beyond the scopeof this study we leave it as a separate research topic
6 Conclusions
In this paper we analyzed recent studies of packing de-tection and unpacking techniques and derived four im-portant observations no phase integration no detectioncombination no real-restoration and no unpacking verifi-cation We then proposed an all-in-one unpacking system toimprove these four observations First no phase integrationwas a difficulty in which analysts had to perform packingdetection unpacking and verification phases separately andwe solved this problem by presenting an all-in-one
Table 6 Comparison of packing detection accuracy of two or three combinations and of the hybrid approach
Metrics DP-SIG ENT () DP-ENT WR () DP-SIG ENT WR () Hybrid approach ()Detection accuracy 950 920 984 984False-negative rate 500 800 160 160False-positive rate 073 000 000 000
Security and Communication Networks 13
Tabl
e7
Restorationaccuracy
oftext
anddatasections
ofeach
packer
Section
UPX ()
NSP
ack
()
MEW ()
RLPa
ck(
)Be
RoEX
E(
)Neolite
()
FSG
()
eXpressor
()
Molebox
()
Petite
()
JDpack
()
ASP
ack
()
Yodarsquos
protector
()
exe32p
ack
()
MPR
ESS
()
Yodarsquos
crypter
()
PECom
pact
()
WinUpack
()
Packman
()
Average
()
text
100
100
100
100
100
100
100
100
100
100
100
843
840
808
788
745
740
680
550
892
data
100
100
100
100
100
100
100
100
100
100
100
843
840
100
753
745
740
680
485
902
14 Security and Communication Networks
unpacking system of integrating these three phases in anintegrated framework Second no detection combinationwas due to that there was no attempt to combine variouspacking detection methods and we solved this problem bypresenting a hybrid approach of combining existing and newpacking detection methods ird no real-restoration camefrom that there was little discussion about restoring theactual executable file and we solved this problem by actuallyrestoring an original image of the packed file Fourth nounpacking verification was due to that there was no explicitwork to measure the restoration accuracy of unpacked filesand we solved this problem by proposing a verificationalgorithm of quantitatively evaluating the restorationaccuracy
We constructed a dataset of 2600 PE files and experi-mentally evaluated the proposed system using the datasetFirst we performed all the phases of the all-in-oneunpacking system sequentially and confirmed that its overallworking mechanism worked well Next through the com-parison of the hybrid packing detection approach andexisting detection techniques we showed that the detectionaccuracy of the proposed method was the highest at 984on average with no false positives For this we also de-termined an entropy range of packing Packing-Rangethrough extensive experiments on 19 representative packersFinally in the unpacking and verification phases we con-firmed that unpacking was successful for all files except thefiles packed with Yodarsquos Protector We also showed that therestoration accuracy of unpacked files was nearly 100except for a few packers
As the future research we will focus on unpackingpacked files with Anti-VM Anti-Debug and Anti-Reverseengineering techniques We will also study how to computethe restoration accuracy of the file having reloc sections
Data Availability
All the data files used in the experiments are available athttpsgithubcomchesvectainPackingData
Conflicts of Interest
e authors declare that they have no conflicts of interest
Acknowledgments
is work was partly supported by the Institute of In-formation and Communications Technology Planning andEvaluation (IITP) grant funded by the Korea government(MSIT) (no 2017-0-00158 Development of Cyber reat
Intelligence (CTI) analysis and information sharing tech-nology for national cyber incident response) and the KoreaElectric Power Corporation (no R18XA05)
References
[1] M Morgenstern ldquoAV-TEST Security report 201718rdquo 2019httpswwwav-testorgfileadminpdfsecurity_reportAV-TEST_Security_Report_2017-2018pdf
[2] T Ban R Isawa S Guo D Inoue and K Nakao ldquoEfficientmalware packer identification using support vector machineswith spectrum kernelrdquo in Proceedings of the Eighth Asia JointConference on Information Security pp 69ndash76 Seoul SouthKorea July 2013
[3] B Cheng M Jiang J Fu et al ldquoTowards paving the way forlarge-scale Windows malware analysis generic binaryunpacking with orders-of-magnitude performance boostrdquo inProceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security pp 395ndash411 TorontoCanada October 2018
[4] A Gupta andM S Arya ldquoHashing based encryption and anti-debugger support for packing multiple files into single exe-cutablerdquo International Journal of Advanced Research inComputer Science vol 9 no 1 pp 914ndash920 2018
[5] Y Kawakoya M Iwamura and M Itoh ldquoMemory behavior-based automatic malware unpacking in stealth debuggingenvironmentrdquo in Proceedings of the 5th International Con-ference on Malicious and Unwanted Software pp 39ndash46Lorraine France October 2010
[6] G Liang J Pang Z Shan R Yang and Y Chen ldquoAutomaticbenchmark generation framework for malware detectionrdquoSecurity and Communication Networks vol 2018 Article ID4947695 8 pages 2018
[7] A Pektas and T Acarman ldquoA dynamic malware analyzeragainst virtual machine aware malicious softwarerdquo Securityand Communication Networks vol 7 no 12 pp 2245ndash22572014
[8] I Santos J Nieves and P G Bringas ldquoSemi-supervisedlearning for unknown malware detectionrdquo in InternationalSymposium on Distributed Computing and Artificial In-telligence A Abraham J M Corchado S R Gonzalez andJ F De Paz Santana Eds pp 415ndash422 Springer BerlinGermany 2011
[9] X Ugarte-Pedrero I Santos B Sanz C Laorden andP G Bringas ldquoCountering entropy measure attacks onpacked software detectionrdquo in Proceedings of the 9th In-ternational Conference on Consumer Communications andNetworking pp 164ndash168 Las Vegas NV USA January 2012
[10] H Won S-P Kim S Lee M-J Choi and Y-S MoonldquoSecure principal component analysis in multiple distributednodesrdquo Security and Communication Networks vol 9 no 14pp 2348ndash2358 2016
[11] S Cesare Y Xiang and W Zhou ldquoMalwisendashan effective andefficient classification system for packed and polymorphic
Table 8 Restoration accuracy according to the presence of the reloc section
SectionASPack () Packman
() Mpres () Yodarsquoscrypter ()
PECompact()
YodarsquosProtector
()
exe32pack()
WinUpack()
times times times times times times times times
text 000 100 998 100 000 100 000 100 000 100 000 100 000 100 000 100
data 000 100 881 100 000 100 004 100 000 100 005 100 100 100 000 100
Security and Communication Networks 15
malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013
[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008
[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008
[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017
[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008
[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean
[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007
[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008
[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013
[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe
[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009
[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008
[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid
[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml
[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml
[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006
[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017
[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010
[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007
[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007
[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016
[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016
[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio
[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014
[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581
[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip
[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip
[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992
[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml
[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https
wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one
[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011
[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016
16 Security and Communication Networks
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
(a) (b)
(c) (d)
Figure 4 Operation screenshots of detection unpacking and verification phases (a) Detection phase (b) Unpacking phase (staticunpacking) (c) Unpacking phase (dynamic unpacking) (d) Verification phase
0
50
100
150
200
250
300
350
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Not packed (b)Summation of (c) to (m)
Num
ber o
f file
s
(a)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(b)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(c)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(d)
Figure 5 Continued
10 Security and Communication Networks
Num
ber o
f file
s0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(e)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(f )
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(g)
Num
ber o
f file
s
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(h)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(i)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(j)
Num
ber o
f file
s
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(k)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(l)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(m)
Figure 5 Entropy distributions of the EP section for original and representative packed files (a) Average of original and 19 packers(b) Original (not packed) (c) UPX (d) ASPack (e) NSPack (f ) MPRESS (g) Yodarsquos Protector (h) RLPack (i) BeroEXE (j) MEW(k) PACKMAN (l) WinUpack (m) exe32pack
Security and Communication Networks 11
are mandatory in all PE files we can detect EP sectionsunless intentionally concealed All files of the dataset used inthe experiment also have EP sections and the detection rateby DP-NEP is trivially 0 In other words there is nopacking detection case due to absence of the EP section sowe exclude the experimental result of DP-NEP from Table 5
We now explain the results of Table 5 by each columnFirst we explain the column of DP-SIG In the case of DP-SIG Not-Packed shows a detection rate of 0 since nosignature exists in normal files is is because packingdetection does not occur in normal files which means thatDP-SIG performs an accurate detection for Not-Packed filesOn the other hand for the packed files DP-SIG detects 100of packers except NSPack e detection rate of NSPack ismerely 538 because of the nature of NSPack that thesignatures often exist in different sections rather than the EPsection In summary DP-SIG shows 950 of detectionaccuracy on average
Second for the experiment on the detection rate of DP-WR we intentionally remove the WRITE attribute from 39files (30 of packed files) per packer We perform thisremoval process in 18 packers except exe32pack e reasonfor changing the dataset is that normal packed files usuallyhaveWRITE attributes so we arbitrarily remove theWRITEattribute from 30 of files for the DP-WR experimentLooking at the DP-WR column in Table 5 DP-WR detectsonly 70 files per packer for 18 packers since we in-tentionally remove the WRITE attribute from 30 files Incase of exe32pack however we need not perform the re-moval process since it has no WRITE attribute in the PEheader And accordingly as shown in the column DP-WR ofTable 5 the detection rate of exe32pack by DP-WR istrivially 0 In the case of Not-Packed the detection rate is0 since normal files have noWRITE attribute In summary
checking the presence of theWRITE attribute is very simpleand DP-WR works correctly if the packed file has theWRITE attribute but it does not work well if the WRITEattribute does not exist or maliciously modified
ird the column of DP-ENT shows the result of en-tropy-based detection rates As shown in the table it showsthe 100 detection rate for 9 packers including NSPackUPX MEW and BeRoEXEPacker but it incurs some falsenegatives for 10 packers including ASPack Yodarsquos ProtectorPackman and MPRESS by showing the detection rate lowerthan 100 In particular for MPRESS Petite JDpackeXpressor and Neolite it shows a low detection rate below246 because their entropy value in the packer charac-teristic is in entropy distribution of the normal file DP-ENTalso has a false-positive rate of 384 for Not-Packed Insummary DP-ENT works very accurately for most packersbut it may incur false negatives for a few specific packers orfalse positives for some normal files To overcome thislimitation we use DP-ENT together with other detectiontechniques We can see the improvement through the ex-perimental results of Tables 5 and 6 In the tables when usingonly DP-ENT the detection accuracy is low in some packers(especially MPRESS) but when using it together with theother three techniques the accuracy is close to 100
Fourth the proposed hybrid approach of integratingfour techniques detects 100 of all packers except NSPackand shows no false positives e reason for this high de-tection rate is that the proposed method has all the ad-vantages of DP-SIG DP-WR and DP-ENT as well as DP-NEP In case of NSPack however the detection rate ismerely 700 this is because as mentioned above weforcedly remove the WRITE attribute from the 30 fileAccording to the results of Table 5 detection rate differsaccording to each detection technique and some techniques
Table 5 Comparison of packing detection rates of three techniques and the proposed hybrid approach
PackerTechnique
DP-SIG () DP-WR () DP-ENT () Hybrid approach ()ASPack 100 700 915 100NSPack 538 700 100 700MPRESS 100 700 246 100UPX 100 700 100 100Yodarsquos Protector 100 700 969 100MEW 100 700 100 100BeRoEXEPacker 100 700 100 100Packman 100 700 938 100RLPack 100 700 993 100PECompact 100 700 100 100Petite 100 700 000 100JDpack 100 700 562 100Molebox 100 700 100 100eXpressor 100 700 000 100Yodarsquos Crypter 100 700 915 100FSG 100 700 100 100exe32pack 100 000 100 100WinUpack 100 700 100 100Neolite 100 700 161 100Average 950 663 739 984Not-Packed 000 000 384 000
12 Security and Communication Networks
incur false positives or false negatives on the other hand theproposed hybrid shows the highest detection rate of 984and no false positives
Table 6 shows the detection accuracy of two or threecombinations of detection techniques and of the proposedhybrid approach First DP-SIG ENT a combination ofsignature and entropy tests detects 100 of the packed filebut detects 18 files incorrectly showing a false-positive rateof 073 Second DP-ENTWR a combination of entropyand WRITE tests shows a low detection rate of 920because the entropy value of MPRESS is in the distributionof normal file ird DP-SIG ENT WR a combination ofsignature entropy and WRITE tests and our hybrid ap-proach show 984 detection accuracy and 160 false-negative rate is is because DP-SIG ENT WR and thehybrid approach go through the same detection procedureexcept DP-NEP e false-negative rate of 160 is due tothe files having no WRITE attributes According to theresults of Tables 5 and 6 we believe that the proposed hybridapproach is an excellent method for high detection accuracyby combining the advantages of the existing and new de-tection techniques
54 Experimental Results on Unpacking and VerificationPhases In this section we verify the unpacking and veri-fication phases by performing the unpacking of Algorithm 2and calculating the restoration accuracy of the files throughthe actual restoration of Algorithm 3 However unpackingand verification phases take a lot of time so we experimentwith randomly selected 40 files of the dataset eunpacking phase consists of static unpacking and dynamicunpacking In the experiment we exclude static unpackingsince it uses the existing library as it is and focus on onlydynamic unpacking of Algorithm 2 since it tries to restorethe original file without any information of packers
Table 7 shows the restoration accuracy of the proposedmethod for text and data sections for each of the 19 packersWe cannot directly unpack Yodarsquos Protector and PECom-pact since they use Anti-Debug engineering techniques Toovercome this problem we unpack these packers bybypassing three Anti-Debugging API calls GetCurrent-ProcessID() BlockInput() and IsDebuggerPresent() [41]at is we run Algorithm 2 with the bypassing techniquesfor Yodarsquos Protector and PECompact and include its ex-perimental result in Table 7) In general since only text anddata sections of the original file are packed [42] in theexperiment we consider only the restoration accuracy ofthese text and data sections of the restored file In Table 7among 19 packers 11 packers including UPX NSPackMEW RLPack and BeRoEXEPacker show at 100 resto-ration accuracy for both text and data sections On the other
hand 8 packers including ASPack Packman and MPRESSshow less than 100 restoration accuracy because of themanner in which the packers handle the reloc section Morespecifically if there is a reloc section in the original file theunpacking relocates the text and data sections changing allthe data values in each section erefore the restorationaccuracy depends on whether there is a reloc section in theoriginal file
Table 8 shows the restoration results in detail accordingto the presence of the reloc section As shown in the tablethe restoration accuracy is 100 in the absence of the relocsection while it is very low in the range of 000 to 998 inthe presence of the reloc section However the low resto-ration accuracy is due to the nature of the packer and it doesnot mean that unpacking does not work correctly In otherwords even if the unpacking is successful the data valuemay change due to the packer characteristic and the res-toration accuracy is calculated to be low In summary wecan use the proposed verification algorithm to calculate therestoration accuracy of unpacked files which can quanti-tatively measure the accuracy of the actual restoration
In addition to text data and reloc sections we furtherconsider the PE header which includes the IAT informationIn general the IAT is located in the header of the PE file andif we actually need to run the unpacked file we must alsorestore this header information including the IAT In factthe study of [3 43] restores the header including the IATandproduces the actual whole executable file On the other handmost studies including this paper do not restore the actualwhole executable file itself but restore text data and relocsections which are major parts of the executable file is isbecause the purpose of unpacking is not to execute therestored file but to identify and extract malicious codesincluded in the restored file erefore in this paper we donot need to restore the header required for the actual ex-ecution and accordingly comparison of IAT information isalso not necessary Instead we restore and compare textdata and reloc sections that are more likely to hidemalicious code Restoring the PE header is beyond the scopeof this study we leave it as a separate research topic
6 Conclusions
In this paper we analyzed recent studies of packing de-tection and unpacking techniques and derived four im-portant observations no phase integration no detectioncombination no real-restoration and no unpacking verifi-cation We then proposed an all-in-one unpacking system toimprove these four observations First no phase integrationwas a difficulty in which analysts had to perform packingdetection unpacking and verification phases separately andwe solved this problem by presenting an all-in-one
Table 6 Comparison of packing detection accuracy of two or three combinations and of the hybrid approach
Metrics DP-SIG ENT () DP-ENT WR () DP-SIG ENT WR () Hybrid approach ()Detection accuracy 950 920 984 984False-negative rate 500 800 160 160False-positive rate 073 000 000 000
Security and Communication Networks 13
Tabl
e7
Restorationaccuracy
oftext
anddatasections
ofeach
packer
Section
UPX ()
NSP
ack
()
MEW ()
RLPa
ck(
)Be
RoEX
E(
)Neolite
()
FSG
()
eXpressor
()
Molebox
()
Petite
()
JDpack
()
ASP
ack
()
Yodarsquos
protector
()
exe32p
ack
()
MPR
ESS
()
Yodarsquos
crypter
()
PECom
pact
()
WinUpack
()
Packman
()
Average
()
text
100
100
100
100
100
100
100
100
100
100
100
843
840
808
788
745
740
680
550
892
data
100
100
100
100
100
100
100
100
100
100
100
843
840
100
753
745
740
680
485
902
14 Security and Communication Networks
unpacking system of integrating these three phases in anintegrated framework Second no detection combinationwas due to that there was no attempt to combine variouspacking detection methods and we solved this problem bypresenting a hybrid approach of combining existing and newpacking detection methods ird no real-restoration camefrom that there was little discussion about restoring theactual executable file and we solved this problem by actuallyrestoring an original image of the packed file Fourth nounpacking verification was due to that there was no explicitwork to measure the restoration accuracy of unpacked filesand we solved this problem by proposing a verificationalgorithm of quantitatively evaluating the restorationaccuracy
We constructed a dataset of 2600 PE files and experi-mentally evaluated the proposed system using the datasetFirst we performed all the phases of the all-in-oneunpacking system sequentially and confirmed that its overallworking mechanism worked well Next through the com-parison of the hybrid packing detection approach andexisting detection techniques we showed that the detectionaccuracy of the proposed method was the highest at 984on average with no false positives For this we also de-termined an entropy range of packing Packing-Rangethrough extensive experiments on 19 representative packersFinally in the unpacking and verification phases we con-firmed that unpacking was successful for all files except thefiles packed with Yodarsquos Protector We also showed that therestoration accuracy of unpacked files was nearly 100except for a few packers
As the future research we will focus on unpackingpacked files with Anti-VM Anti-Debug and Anti-Reverseengineering techniques We will also study how to computethe restoration accuracy of the file having reloc sections
Data Availability
All the data files used in the experiments are available athttpsgithubcomchesvectainPackingData
Conflicts of Interest
e authors declare that they have no conflicts of interest
Acknowledgments
is work was partly supported by the Institute of In-formation and Communications Technology Planning andEvaluation (IITP) grant funded by the Korea government(MSIT) (no 2017-0-00158 Development of Cyber reat
Intelligence (CTI) analysis and information sharing tech-nology for national cyber incident response) and the KoreaElectric Power Corporation (no R18XA05)
References
[1] M Morgenstern ldquoAV-TEST Security report 201718rdquo 2019httpswwwav-testorgfileadminpdfsecurity_reportAV-TEST_Security_Report_2017-2018pdf
[2] T Ban R Isawa S Guo D Inoue and K Nakao ldquoEfficientmalware packer identification using support vector machineswith spectrum kernelrdquo in Proceedings of the Eighth Asia JointConference on Information Security pp 69ndash76 Seoul SouthKorea July 2013
[3] B Cheng M Jiang J Fu et al ldquoTowards paving the way forlarge-scale Windows malware analysis generic binaryunpacking with orders-of-magnitude performance boostrdquo inProceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security pp 395ndash411 TorontoCanada October 2018
[4] A Gupta andM S Arya ldquoHashing based encryption and anti-debugger support for packing multiple files into single exe-cutablerdquo International Journal of Advanced Research inComputer Science vol 9 no 1 pp 914ndash920 2018
[5] Y Kawakoya M Iwamura and M Itoh ldquoMemory behavior-based automatic malware unpacking in stealth debuggingenvironmentrdquo in Proceedings of the 5th International Con-ference on Malicious and Unwanted Software pp 39ndash46Lorraine France October 2010
[6] G Liang J Pang Z Shan R Yang and Y Chen ldquoAutomaticbenchmark generation framework for malware detectionrdquoSecurity and Communication Networks vol 2018 Article ID4947695 8 pages 2018
[7] A Pektas and T Acarman ldquoA dynamic malware analyzeragainst virtual machine aware malicious softwarerdquo Securityand Communication Networks vol 7 no 12 pp 2245ndash22572014
[8] I Santos J Nieves and P G Bringas ldquoSemi-supervisedlearning for unknown malware detectionrdquo in InternationalSymposium on Distributed Computing and Artificial In-telligence A Abraham J M Corchado S R Gonzalez andJ F De Paz Santana Eds pp 415ndash422 Springer BerlinGermany 2011
[9] X Ugarte-Pedrero I Santos B Sanz C Laorden andP G Bringas ldquoCountering entropy measure attacks onpacked software detectionrdquo in Proceedings of the 9th In-ternational Conference on Consumer Communications andNetworking pp 164ndash168 Las Vegas NV USA January 2012
[10] H Won S-P Kim S Lee M-J Choi and Y-S MoonldquoSecure principal component analysis in multiple distributednodesrdquo Security and Communication Networks vol 9 no 14pp 2348ndash2358 2016
[11] S Cesare Y Xiang and W Zhou ldquoMalwisendashan effective andefficient classification system for packed and polymorphic
Table 8 Restoration accuracy according to the presence of the reloc section
SectionASPack () Packman
() Mpres () Yodarsquoscrypter ()
PECompact()
YodarsquosProtector
()
exe32pack()
WinUpack()
times times times times times times times times
text 000 100 998 100 000 100 000 100 000 100 000 100 000 100 000 100
data 000 100 881 100 000 100 004 100 000 100 005 100 100 100 000 100
Security and Communication Networks 15
malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013
[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008
[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008
[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017
[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008
[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean
[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007
[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008
[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013
[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe
[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009
[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008
[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid
[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml
[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml
[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006
[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017
[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010
[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007
[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007
[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016
[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016
[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio
[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014
[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581
[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip
[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip
[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992
[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml
[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https
wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one
[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011
[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016
16 Security and Communication Networks
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
Num
ber o
f file
s0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(e)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(f )
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(g)
Num
ber o
f file
s
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(h)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(i)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(j)
Num
ber o
f file
s
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
(k)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(l)
0
20
40
60
80
100
120
140
0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section
Num
ber o
f file
s
(m)
Figure 5 Entropy distributions of the EP section for original and representative packed files (a) Average of original and 19 packers(b) Original (not packed) (c) UPX (d) ASPack (e) NSPack (f ) MPRESS (g) Yodarsquos Protector (h) RLPack (i) BeroEXE (j) MEW(k) PACKMAN (l) WinUpack (m) exe32pack
Security and Communication Networks 11
are mandatory in all PE files we can detect EP sectionsunless intentionally concealed All files of the dataset used inthe experiment also have EP sections and the detection rateby DP-NEP is trivially 0 In other words there is nopacking detection case due to absence of the EP section sowe exclude the experimental result of DP-NEP from Table 5
We now explain the results of Table 5 by each columnFirst we explain the column of DP-SIG In the case of DP-SIG Not-Packed shows a detection rate of 0 since nosignature exists in normal files is is because packingdetection does not occur in normal files which means thatDP-SIG performs an accurate detection for Not-Packed filesOn the other hand for the packed files DP-SIG detects 100of packers except NSPack e detection rate of NSPack ismerely 538 because of the nature of NSPack that thesignatures often exist in different sections rather than the EPsection In summary DP-SIG shows 950 of detectionaccuracy on average
Second for the experiment on the detection rate of DP-WR we intentionally remove the WRITE attribute from 39files (30 of packed files) per packer We perform thisremoval process in 18 packers except exe32pack e reasonfor changing the dataset is that normal packed files usuallyhaveWRITE attributes so we arbitrarily remove theWRITEattribute from 30 of files for the DP-WR experimentLooking at the DP-WR column in Table 5 DP-WR detectsonly 70 files per packer for 18 packers since we in-tentionally remove the WRITE attribute from 30 files Incase of exe32pack however we need not perform the re-moval process since it has no WRITE attribute in the PEheader And accordingly as shown in the column DP-WR ofTable 5 the detection rate of exe32pack by DP-WR istrivially 0 In the case of Not-Packed the detection rate is0 since normal files have noWRITE attribute In summary
checking the presence of theWRITE attribute is very simpleand DP-WR works correctly if the packed file has theWRITE attribute but it does not work well if the WRITEattribute does not exist or maliciously modified
ird the column of DP-ENT shows the result of en-tropy-based detection rates As shown in the table it showsthe 100 detection rate for 9 packers including NSPackUPX MEW and BeRoEXEPacker but it incurs some falsenegatives for 10 packers including ASPack Yodarsquos ProtectorPackman and MPRESS by showing the detection rate lowerthan 100 In particular for MPRESS Petite JDpackeXpressor and Neolite it shows a low detection rate below246 because their entropy value in the packer charac-teristic is in entropy distribution of the normal file DP-ENTalso has a false-positive rate of 384 for Not-Packed Insummary DP-ENT works very accurately for most packersbut it may incur false negatives for a few specific packers orfalse positives for some normal files To overcome thislimitation we use DP-ENT together with other detectiontechniques We can see the improvement through the ex-perimental results of Tables 5 and 6 In the tables when usingonly DP-ENT the detection accuracy is low in some packers(especially MPRESS) but when using it together with theother three techniques the accuracy is close to 100
Fourth the proposed hybrid approach of integratingfour techniques detects 100 of all packers except NSPackand shows no false positives e reason for this high de-tection rate is that the proposed method has all the ad-vantages of DP-SIG DP-WR and DP-ENT as well as DP-NEP In case of NSPack however the detection rate ismerely 700 this is because as mentioned above weforcedly remove the WRITE attribute from the 30 fileAccording to the results of Table 5 detection rate differsaccording to each detection technique and some techniques
Table 5 Comparison of packing detection rates of three techniques and the proposed hybrid approach
PackerTechnique
DP-SIG () DP-WR () DP-ENT () Hybrid approach ()ASPack 100 700 915 100NSPack 538 700 100 700MPRESS 100 700 246 100UPX 100 700 100 100Yodarsquos Protector 100 700 969 100MEW 100 700 100 100BeRoEXEPacker 100 700 100 100Packman 100 700 938 100RLPack 100 700 993 100PECompact 100 700 100 100Petite 100 700 000 100JDpack 100 700 562 100Molebox 100 700 100 100eXpressor 100 700 000 100Yodarsquos Crypter 100 700 915 100FSG 100 700 100 100exe32pack 100 000 100 100WinUpack 100 700 100 100Neolite 100 700 161 100Average 950 663 739 984Not-Packed 000 000 384 000
12 Security and Communication Networks
incur false positives or false negatives on the other hand theproposed hybrid shows the highest detection rate of 984and no false positives
Table 6 shows the detection accuracy of two or threecombinations of detection techniques and of the proposedhybrid approach First DP-SIG ENT a combination ofsignature and entropy tests detects 100 of the packed filebut detects 18 files incorrectly showing a false-positive rateof 073 Second DP-ENTWR a combination of entropyand WRITE tests shows a low detection rate of 920because the entropy value of MPRESS is in the distributionof normal file ird DP-SIG ENT WR a combination ofsignature entropy and WRITE tests and our hybrid ap-proach show 984 detection accuracy and 160 false-negative rate is is because DP-SIG ENT WR and thehybrid approach go through the same detection procedureexcept DP-NEP e false-negative rate of 160 is due tothe files having no WRITE attributes According to theresults of Tables 5 and 6 we believe that the proposed hybridapproach is an excellent method for high detection accuracyby combining the advantages of the existing and new de-tection techniques
54 Experimental Results on Unpacking and VerificationPhases In this section we verify the unpacking and veri-fication phases by performing the unpacking of Algorithm 2and calculating the restoration accuracy of the files throughthe actual restoration of Algorithm 3 However unpackingand verification phases take a lot of time so we experimentwith randomly selected 40 files of the dataset eunpacking phase consists of static unpacking and dynamicunpacking In the experiment we exclude static unpackingsince it uses the existing library as it is and focus on onlydynamic unpacking of Algorithm 2 since it tries to restorethe original file without any information of packers
Table 7 shows the restoration accuracy of the proposedmethod for text and data sections for each of the 19 packersWe cannot directly unpack Yodarsquos Protector and PECom-pact since they use Anti-Debug engineering techniques Toovercome this problem we unpack these packers bybypassing three Anti-Debugging API calls GetCurrent-ProcessID() BlockInput() and IsDebuggerPresent() [41]at is we run Algorithm 2 with the bypassing techniquesfor Yodarsquos Protector and PECompact and include its ex-perimental result in Table 7) In general since only text anddata sections of the original file are packed [42] in theexperiment we consider only the restoration accuracy ofthese text and data sections of the restored file In Table 7among 19 packers 11 packers including UPX NSPackMEW RLPack and BeRoEXEPacker show at 100 resto-ration accuracy for both text and data sections On the other
hand 8 packers including ASPack Packman and MPRESSshow less than 100 restoration accuracy because of themanner in which the packers handle the reloc section Morespecifically if there is a reloc section in the original file theunpacking relocates the text and data sections changing allthe data values in each section erefore the restorationaccuracy depends on whether there is a reloc section in theoriginal file
Table 8 shows the restoration results in detail accordingto the presence of the reloc section As shown in the tablethe restoration accuracy is 100 in the absence of the relocsection while it is very low in the range of 000 to 998 inthe presence of the reloc section However the low resto-ration accuracy is due to the nature of the packer and it doesnot mean that unpacking does not work correctly In otherwords even if the unpacking is successful the data valuemay change due to the packer characteristic and the res-toration accuracy is calculated to be low In summary wecan use the proposed verification algorithm to calculate therestoration accuracy of unpacked files which can quanti-tatively measure the accuracy of the actual restoration
In addition to text data and reloc sections we furtherconsider the PE header which includes the IAT informationIn general the IAT is located in the header of the PE file andif we actually need to run the unpacked file we must alsorestore this header information including the IAT In factthe study of [3 43] restores the header including the IATandproduces the actual whole executable file On the other handmost studies including this paper do not restore the actualwhole executable file itself but restore text data and relocsections which are major parts of the executable file is isbecause the purpose of unpacking is not to execute therestored file but to identify and extract malicious codesincluded in the restored file erefore in this paper we donot need to restore the header required for the actual ex-ecution and accordingly comparison of IAT information isalso not necessary Instead we restore and compare textdata and reloc sections that are more likely to hidemalicious code Restoring the PE header is beyond the scopeof this study we leave it as a separate research topic
6 Conclusions
In this paper we analyzed recent studies of packing de-tection and unpacking techniques and derived four im-portant observations no phase integration no detectioncombination no real-restoration and no unpacking verifi-cation We then proposed an all-in-one unpacking system toimprove these four observations First no phase integrationwas a difficulty in which analysts had to perform packingdetection unpacking and verification phases separately andwe solved this problem by presenting an all-in-one
Table 6 Comparison of packing detection accuracy of two or three combinations and of the hybrid approach
Metrics DP-SIG ENT () DP-ENT WR () DP-SIG ENT WR () Hybrid approach ()Detection accuracy 950 920 984 984False-negative rate 500 800 160 160False-positive rate 073 000 000 000
Security and Communication Networks 13
Tabl
e7
Restorationaccuracy
oftext
anddatasections
ofeach
packer
Section
UPX ()
NSP
ack
()
MEW ()
RLPa
ck(
)Be
RoEX
E(
)Neolite
()
FSG
()
eXpressor
()
Molebox
()
Petite
()
JDpack
()
ASP
ack
()
Yodarsquos
protector
()
exe32p
ack
()
MPR
ESS
()
Yodarsquos
crypter
()
PECom
pact
()
WinUpack
()
Packman
()
Average
()
text
100
100
100
100
100
100
100
100
100
100
100
843
840
808
788
745
740
680
550
892
data
100
100
100
100
100
100
100
100
100
100
100
843
840
100
753
745
740
680
485
902
14 Security and Communication Networks
unpacking system of integrating these three phases in anintegrated framework Second no detection combinationwas due to that there was no attempt to combine variouspacking detection methods and we solved this problem bypresenting a hybrid approach of combining existing and newpacking detection methods ird no real-restoration camefrom that there was little discussion about restoring theactual executable file and we solved this problem by actuallyrestoring an original image of the packed file Fourth nounpacking verification was due to that there was no explicitwork to measure the restoration accuracy of unpacked filesand we solved this problem by proposing a verificationalgorithm of quantitatively evaluating the restorationaccuracy
We constructed a dataset of 2600 PE files and experi-mentally evaluated the proposed system using the datasetFirst we performed all the phases of the all-in-oneunpacking system sequentially and confirmed that its overallworking mechanism worked well Next through the com-parison of the hybrid packing detection approach andexisting detection techniques we showed that the detectionaccuracy of the proposed method was the highest at 984on average with no false positives For this we also de-termined an entropy range of packing Packing-Rangethrough extensive experiments on 19 representative packersFinally in the unpacking and verification phases we con-firmed that unpacking was successful for all files except thefiles packed with Yodarsquos Protector We also showed that therestoration accuracy of unpacked files was nearly 100except for a few packers
As the future research we will focus on unpackingpacked files with Anti-VM Anti-Debug and Anti-Reverseengineering techniques We will also study how to computethe restoration accuracy of the file having reloc sections
Data Availability
All the data files used in the experiments are available athttpsgithubcomchesvectainPackingData
Conflicts of Interest
e authors declare that they have no conflicts of interest
Acknowledgments
is work was partly supported by the Institute of In-formation and Communications Technology Planning andEvaluation (IITP) grant funded by the Korea government(MSIT) (no 2017-0-00158 Development of Cyber reat
Intelligence (CTI) analysis and information sharing tech-nology for national cyber incident response) and the KoreaElectric Power Corporation (no R18XA05)
References
[1] M Morgenstern ldquoAV-TEST Security report 201718rdquo 2019httpswwwav-testorgfileadminpdfsecurity_reportAV-TEST_Security_Report_2017-2018pdf
[2] T Ban R Isawa S Guo D Inoue and K Nakao ldquoEfficientmalware packer identification using support vector machineswith spectrum kernelrdquo in Proceedings of the Eighth Asia JointConference on Information Security pp 69ndash76 Seoul SouthKorea July 2013
[3] B Cheng M Jiang J Fu et al ldquoTowards paving the way forlarge-scale Windows malware analysis generic binaryunpacking with orders-of-magnitude performance boostrdquo inProceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security pp 395ndash411 TorontoCanada October 2018
[4] A Gupta andM S Arya ldquoHashing based encryption and anti-debugger support for packing multiple files into single exe-cutablerdquo International Journal of Advanced Research inComputer Science vol 9 no 1 pp 914ndash920 2018
[5] Y Kawakoya M Iwamura and M Itoh ldquoMemory behavior-based automatic malware unpacking in stealth debuggingenvironmentrdquo in Proceedings of the 5th International Con-ference on Malicious and Unwanted Software pp 39ndash46Lorraine France October 2010
[6] G Liang J Pang Z Shan R Yang and Y Chen ldquoAutomaticbenchmark generation framework for malware detectionrdquoSecurity and Communication Networks vol 2018 Article ID4947695 8 pages 2018
[7] A Pektas and T Acarman ldquoA dynamic malware analyzeragainst virtual machine aware malicious softwarerdquo Securityand Communication Networks vol 7 no 12 pp 2245ndash22572014
[8] I Santos J Nieves and P G Bringas ldquoSemi-supervisedlearning for unknown malware detectionrdquo in InternationalSymposium on Distributed Computing and Artificial In-telligence A Abraham J M Corchado S R Gonzalez andJ F De Paz Santana Eds pp 415ndash422 Springer BerlinGermany 2011
[9] X Ugarte-Pedrero I Santos B Sanz C Laorden andP G Bringas ldquoCountering entropy measure attacks onpacked software detectionrdquo in Proceedings of the 9th In-ternational Conference on Consumer Communications andNetworking pp 164ndash168 Las Vegas NV USA January 2012
[10] H Won S-P Kim S Lee M-J Choi and Y-S MoonldquoSecure principal component analysis in multiple distributednodesrdquo Security and Communication Networks vol 9 no 14pp 2348ndash2358 2016
[11] S Cesare Y Xiang and W Zhou ldquoMalwisendashan effective andefficient classification system for packed and polymorphic
Table 8 Restoration accuracy according to the presence of the reloc section
SectionASPack () Packman
() Mpres () Yodarsquoscrypter ()
PECompact()
YodarsquosProtector
()
exe32pack()
WinUpack()
times times times times times times times times
text 000 100 998 100 000 100 000 100 000 100 000 100 000 100 000 100
data 000 100 881 100 000 100 004 100 000 100 005 100 100 100 000 100
Security and Communication Networks 15
malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013
[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008
[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008
[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017
[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008
[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean
[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007
[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008
[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013
[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe
[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009
[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008
[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid
[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml
[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml
[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006
[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017
[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010
[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007
[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007
[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016
[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016
[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio
[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014
[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581
[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip
[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip
[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992
[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml
[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https
wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one
[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011
[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016
16 Security and Communication Networks
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
are mandatory in all PE files we can detect EP sectionsunless intentionally concealed All files of the dataset used inthe experiment also have EP sections and the detection rateby DP-NEP is trivially 0 In other words there is nopacking detection case due to absence of the EP section sowe exclude the experimental result of DP-NEP from Table 5
We now explain the results of Table 5 by each columnFirst we explain the column of DP-SIG In the case of DP-SIG Not-Packed shows a detection rate of 0 since nosignature exists in normal files is is because packingdetection does not occur in normal files which means thatDP-SIG performs an accurate detection for Not-Packed filesOn the other hand for the packed files DP-SIG detects 100of packers except NSPack e detection rate of NSPack ismerely 538 because of the nature of NSPack that thesignatures often exist in different sections rather than the EPsection In summary DP-SIG shows 950 of detectionaccuracy on average
Second for the experiment on the detection rate of DP-WR we intentionally remove the WRITE attribute from 39files (30 of packed files) per packer We perform thisremoval process in 18 packers except exe32pack e reasonfor changing the dataset is that normal packed files usuallyhaveWRITE attributes so we arbitrarily remove theWRITEattribute from 30 of files for the DP-WR experimentLooking at the DP-WR column in Table 5 DP-WR detectsonly 70 files per packer for 18 packers since we in-tentionally remove the WRITE attribute from 30 files Incase of exe32pack however we need not perform the re-moval process since it has no WRITE attribute in the PEheader And accordingly as shown in the column DP-WR ofTable 5 the detection rate of exe32pack by DP-WR istrivially 0 In the case of Not-Packed the detection rate is0 since normal files have noWRITE attribute In summary
checking the presence of theWRITE attribute is very simpleand DP-WR works correctly if the packed file has theWRITE attribute but it does not work well if the WRITEattribute does not exist or maliciously modified
ird the column of DP-ENT shows the result of en-tropy-based detection rates As shown in the table it showsthe 100 detection rate for 9 packers including NSPackUPX MEW and BeRoEXEPacker but it incurs some falsenegatives for 10 packers including ASPack Yodarsquos ProtectorPackman and MPRESS by showing the detection rate lowerthan 100 In particular for MPRESS Petite JDpackeXpressor and Neolite it shows a low detection rate below246 because their entropy value in the packer charac-teristic is in entropy distribution of the normal file DP-ENTalso has a false-positive rate of 384 for Not-Packed Insummary DP-ENT works very accurately for most packersbut it may incur false negatives for a few specific packers orfalse positives for some normal files To overcome thislimitation we use DP-ENT together with other detectiontechniques We can see the improvement through the ex-perimental results of Tables 5 and 6 In the tables when usingonly DP-ENT the detection accuracy is low in some packers(especially MPRESS) but when using it together with theother three techniques the accuracy is close to 100
Fourth the proposed hybrid approach of integratingfour techniques detects 100 of all packers except NSPackand shows no false positives e reason for this high de-tection rate is that the proposed method has all the ad-vantages of DP-SIG DP-WR and DP-ENT as well as DP-NEP In case of NSPack however the detection rate ismerely 700 this is because as mentioned above weforcedly remove the WRITE attribute from the 30 fileAccording to the results of Table 5 detection rate differsaccording to each detection technique and some techniques
Table 5 Comparison of packing detection rates of three techniques and the proposed hybrid approach
PackerTechnique
DP-SIG () DP-WR () DP-ENT () Hybrid approach ()ASPack 100 700 915 100NSPack 538 700 100 700MPRESS 100 700 246 100UPX 100 700 100 100Yodarsquos Protector 100 700 969 100MEW 100 700 100 100BeRoEXEPacker 100 700 100 100Packman 100 700 938 100RLPack 100 700 993 100PECompact 100 700 100 100Petite 100 700 000 100JDpack 100 700 562 100Molebox 100 700 100 100eXpressor 100 700 000 100Yodarsquos Crypter 100 700 915 100FSG 100 700 100 100exe32pack 100 000 100 100WinUpack 100 700 100 100Neolite 100 700 161 100Average 950 663 739 984Not-Packed 000 000 384 000
12 Security and Communication Networks
incur false positives or false negatives on the other hand theproposed hybrid shows the highest detection rate of 984and no false positives
Table 6 shows the detection accuracy of two or threecombinations of detection techniques and of the proposedhybrid approach First DP-SIG ENT a combination ofsignature and entropy tests detects 100 of the packed filebut detects 18 files incorrectly showing a false-positive rateof 073 Second DP-ENTWR a combination of entropyand WRITE tests shows a low detection rate of 920because the entropy value of MPRESS is in the distributionof normal file ird DP-SIG ENT WR a combination ofsignature entropy and WRITE tests and our hybrid ap-proach show 984 detection accuracy and 160 false-negative rate is is because DP-SIG ENT WR and thehybrid approach go through the same detection procedureexcept DP-NEP e false-negative rate of 160 is due tothe files having no WRITE attributes According to theresults of Tables 5 and 6 we believe that the proposed hybridapproach is an excellent method for high detection accuracyby combining the advantages of the existing and new de-tection techniques
54 Experimental Results on Unpacking and VerificationPhases In this section we verify the unpacking and veri-fication phases by performing the unpacking of Algorithm 2and calculating the restoration accuracy of the files throughthe actual restoration of Algorithm 3 However unpackingand verification phases take a lot of time so we experimentwith randomly selected 40 files of the dataset eunpacking phase consists of static unpacking and dynamicunpacking In the experiment we exclude static unpackingsince it uses the existing library as it is and focus on onlydynamic unpacking of Algorithm 2 since it tries to restorethe original file without any information of packers
Table 7 shows the restoration accuracy of the proposedmethod for text and data sections for each of the 19 packersWe cannot directly unpack Yodarsquos Protector and PECom-pact since they use Anti-Debug engineering techniques Toovercome this problem we unpack these packers bybypassing three Anti-Debugging API calls GetCurrent-ProcessID() BlockInput() and IsDebuggerPresent() [41]at is we run Algorithm 2 with the bypassing techniquesfor Yodarsquos Protector and PECompact and include its ex-perimental result in Table 7) In general since only text anddata sections of the original file are packed [42] in theexperiment we consider only the restoration accuracy ofthese text and data sections of the restored file In Table 7among 19 packers 11 packers including UPX NSPackMEW RLPack and BeRoEXEPacker show at 100 resto-ration accuracy for both text and data sections On the other
hand 8 packers including ASPack Packman and MPRESSshow less than 100 restoration accuracy because of themanner in which the packers handle the reloc section Morespecifically if there is a reloc section in the original file theunpacking relocates the text and data sections changing allthe data values in each section erefore the restorationaccuracy depends on whether there is a reloc section in theoriginal file
Table 8 shows the restoration results in detail accordingto the presence of the reloc section As shown in the tablethe restoration accuracy is 100 in the absence of the relocsection while it is very low in the range of 000 to 998 inthe presence of the reloc section However the low resto-ration accuracy is due to the nature of the packer and it doesnot mean that unpacking does not work correctly In otherwords even if the unpacking is successful the data valuemay change due to the packer characteristic and the res-toration accuracy is calculated to be low In summary wecan use the proposed verification algorithm to calculate therestoration accuracy of unpacked files which can quanti-tatively measure the accuracy of the actual restoration
In addition to text data and reloc sections we furtherconsider the PE header which includes the IAT informationIn general the IAT is located in the header of the PE file andif we actually need to run the unpacked file we must alsorestore this header information including the IAT In factthe study of [3 43] restores the header including the IATandproduces the actual whole executable file On the other handmost studies including this paper do not restore the actualwhole executable file itself but restore text data and relocsections which are major parts of the executable file is isbecause the purpose of unpacking is not to execute therestored file but to identify and extract malicious codesincluded in the restored file erefore in this paper we donot need to restore the header required for the actual ex-ecution and accordingly comparison of IAT information isalso not necessary Instead we restore and compare textdata and reloc sections that are more likely to hidemalicious code Restoring the PE header is beyond the scopeof this study we leave it as a separate research topic
6 Conclusions
In this paper we analyzed recent studies of packing de-tection and unpacking techniques and derived four im-portant observations no phase integration no detectioncombination no real-restoration and no unpacking verifi-cation We then proposed an all-in-one unpacking system toimprove these four observations First no phase integrationwas a difficulty in which analysts had to perform packingdetection unpacking and verification phases separately andwe solved this problem by presenting an all-in-one
Table 6 Comparison of packing detection accuracy of two or three combinations and of the hybrid approach
Metrics DP-SIG ENT () DP-ENT WR () DP-SIG ENT WR () Hybrid approach ()Detection accuracy 950 920 984 984False-negative rate 500 800 160 160False-positive rate 073 000 000 000
Security and Communication Networks 13
Tabl
e7
Restorationaccuracy
oftext
anddatasections
ofeach
packer
Section
UPX ()
NSP
ack
()
MEW ()
RLPa
ck(
)Be
RoEX
E(
)Neolite
()
FSG
()
eXpressor
()
Molebox
()
Petite
()
JDpack
()
ASP
ack
()
Yodarsquos
protector
()
exe32p
ack
()
MPR
ESS
()
Yodarsquos
crypter
()
PECom
pact
()
WinUpack
()
Packman
()
Average
()
text
100
100
100
100
100
100
100
100
100
100
100
843
840
808
788
745
740
680
550
892
data
100
100
100
100
100
100
100
100
100
100
100
843
840
100
753
745
740
680
485
902
14 Security and Communication Networks
unpacking system of integrating these three phases in anintegrated framework Second no detection combinationwas due to that there was no attempt to combine variouspacking detection methods and we solved this problem bypresenting a hybrid approach of combining existing and newpacking detection methods ird no real-restoration camefrom that there was little discussion about restoring theactual executable file and we solved this problem by actuallyrestoring an original image of the packed file Fourth nounpacking verification was due to that there was no explicitwork to measure the restoration accuracy of unpacked filesand we solved this problem by proposing a verificationalgorithm of quantitatively evaluating the restorationaccuracy
We constructed a dataset of 2600 PE files and experi-mentally evaluated the proposed system using the datasetFirst we performed all the phases of the all-in-oneunpacking system sequentially and confirmed that its overallworking mechanism worked well Next through the com-parison of the hybrid packing detection approach andexisting detection techniques we showed that the detectionaccuracy of the proposed method was the highest at 984on average with no false positives For this we also de-termined an entropy range of packing Packing-Rangethrough extensive experiments on 19 representative packersFinally in the unpacking and verification phases we con-firmed that unpacking was successful for all files except thefiles packed with Yodarsquos Protector We also showed that therestoration accuracy of unpacked files was nearly 100except for a few packers
As the future research we will focus on unpackingpacked files with Anti-VM Anti-Debug and Anti-Reverseengineering techniques We will also study how to computethe restoration accuracy of the file having reloc sections
Data Availability
All the data files used in the experiments are available athttpsgithubcomchesvectainPackingData
Conflicts of Interest
e authors declare that they have no conflicts of interest
Acknowledgments
is work was partly supported by the Institute of In-formation and Communications Technology Planning andEvaluation (IITP) grant funded by the Korea government(MSIT) (no 2017-0-00158 Development of Cyber reat
Intelligence (CTI) analysis and information sharing tech-nology for national cyber incident response) and the KoreaElectric Power Corporation (no R18XA05)
References
[1] M Morgenstern ldquoAV-TEST Security report 201718rdquo 2019httpswwwav-testorgfileadminpdfsecurity_reportAV-TEST_Security_Report_2017-2018pdf
[2] T Ban R Isawa S Guo D Inoue and K Nakao ldquoEfficientmalware packer identification using support vector machineswith spectrum kernelrdquo in Proceedings of the Eighth Asia JointConference on Information Security pp 69ndash76 Seoul SouthKorea July 2013
[3] B Cheng M Jiang J Fu et al ldquoTowards paving the way forlarge-scale Windows malware analysis generic binaryunpacking with orders-of-magnitude performance boostrdquo inProceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security pp 395ndash411 TorontoCanada October 2018
[4] A Gupta andM S Arya ldquoHashing based encryption and anti-debugger support for packing multiple files into single exe-cutablerdquo International Journal of Advanced Research inComputer Science vol 9 no 1 pp 914ndash920 2018
[5] Y Kawakoya M Iwamura and M Itoh ldquoMemory behavior-based automatic malware unpacking in stealth debuggingenvironmentrdquo in Proceedings of the 5th International Con-ference on Malicious and Unwanted Software pp 39ndash46Lorraine France October 2010
[6] G Liang J Pang Z Shan R Yang and Y Chen ldquoAutomaticbenchmark generation framework for malware detectionrdquoSecurity and Communication Networks vol 2018 Article ID4947695 8 pages 2018
[7] A Pektas and T Acarman ldquoA dynamic malware analyzeragainst virtual machine aware malicious softwarerdquo Securityand Communication Networks vol 7 no 12 pp 2245ndash22572014
[8] I Santos J Nieves and P G Bringas ldquoSemi-supervisedlearning for unknown malware detectionrdquo in InternationalSymposium on Distributed Computing and Artificial In-telligence A Abraham J M Corchado S R Gonzalez andJ F De Paz Santana Eds pp 415ndash422 Springer BerlinGermany 2011
[9] X Ugarte-Pedrero I Santos B Sanz C Laorden andP G Bringas ldquoCountering entropy measure attacks onpacked software detectionrdquo in Proceedings of the 9th In-ternational Conference on Consumer Communications andNetworking pp 164ndash168 Las Vegas NV USA January 2012
[10] H Won S-P Kim S Lee M-J Choi and Y-S MoonldquoSecure principal component analysis in multiple distributednodesrdquo Security and Communication Networks vol 9 no 14pp 2348ndash2358 2016
[11] S Cesare Y Xiang and W Zhou ldquoMalwisendashan effective andefficient classification system for packed and polymorphic
Table 8 Restoration accuracy according to the presence of the reloc section
SectionASPack () Packman
() Mpres () Yodarsquoscrypter ()
PECompact()
YodarsquosProtector
()
exe32pack()
WinUpack()
times times times times times times times times
text 000 100 998 100 000 100 000 100 000 100 000 100 000 100 000 100
data 000 100 881 100 000 100 004 100 000 100 005 100 100 100 000 100
Security and Communication Networks 15
malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013
[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008
[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008
[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017
[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008
[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean
[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007
[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008
[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013
[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe
[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009
[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008
[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid
[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml
[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml
[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006
[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017
[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010
[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007
[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007
[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016
[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016
[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio
[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014
[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581
[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip
[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip
[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992
[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml
[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https
wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one
[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011
[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016
16 Security and Communication Networks
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
incur false positives or false negatives on the other hand theproposed hybrid shows the highest detection rate of 984and no false positives
Table 6 shows the detection accuracy of two or threecombinations of detection techniques and of the proposedhybrid approach First DP-SIG ENT a combination ofsignature and entropy tests detects 100 of the packed filebut detects 18 files incorrectly showing a false-positive rateof 073 Second DP-ENTWR a combination of entropyand WRITE tests shows a low detection rate of 920because the entropy value of MPRESS is in the distributionof normal file ird DP-SIG ENT WR a combination ofsignature entropy and WRITE tests and our hybrid ap-proach show 984 detection accuracy and 160 false-negative rate is is because DP-SIG ENT WR and thehybrid approach go through the same detection procedureexcept DP-NEP e false-negative rate of 160 is due tothe files having no WRITE attributes According to theresults of Tables 5 and 6 we believe that the proposed hybridapproach is an excellent method for high detection accuracyby combining the advantages of the existing and new de-tection techniques
54 Experimental Results on Unpacking and VerificationPhases In this section we verify the unpacking and veri-fication phases by performing the unpacking of Algorithm 2and calculating the restoration accuracy of the files throughthe actual restoration of Algorithm 3 However unpackingand verification phases take a lot of time so we experimentwith randomly selected 40 files of the dataset eunpacking phase consists of static unpacking and dynamicunpacking In the experiment we exclude static unpackingsince it uses the existing library as it is and focus on onlydynamic unpacking of Algorithm 2 since it tries to restorethe original file without any information of packers
Table 7 shows the restoration accuracy of the proposedmethod for text and data sections for each of the 19 packersWe cannot directly unpack Yodarsquos Protector and PECom-pact since they use Anti-Debug engineering techniques Toovercome this problem we unpack these packers bybypassing three Anti-Debugging API calls GetCurrent-ProcessID() BlockInput() and IsDebuggerPresent() [41]at is we run Algorithm 2 with the bypassing techniquesfor Yodarsquos Protector and PECompact and include its ex-perimental result in Table 7) In general since only text anddata sections of the original file are packed [42] in theexperiment we consider only the restoration accuracy ofthese text and data sections of the restored file In Table 7among 19 packers 11 packers including UPX NSPackMEW RLPack and BeRoEXEPacker show at 100 resto-ration accuracy for both text and data sections On the other
hand 8 packers including ASPack Packman and MPRESSshow less than 100 restoration accuracy because of themanner in which the packers handle the reloc section Morespecifically if there is a reloc section in the original file theunpacking relocates the text and data sections changing allthe data values in each section erefore the restorationaccuracy depends on whether there is a reloc section in theoriginal file
Table 8 shows the restoration results in detail accordingto the presence of the reloc section As shown in the tablethe restoration accuracy is 100 in the absence of the relocsection while it is very low in the range of 000 to 998 inthe presence of the reloc section However the low resto-ration accuracy is due to the nature of the packer and it doesnot mean that unpacking does not work correctly In otherwords even if the unpacking is successful the data valuemay change due to the packer characteristic and the res-toration accuracy is calculated to be low In summary wecan use the proposed verification algorithm to calculate therestoration accuracy of unpacked files which can quanti-tatively measure the accuracy of the actual restoration
In addition to text data and reloc sections we furtherconsider the PE header which includes the IAT informationIn general the IAT is located in the header of the PE file andif we actually need to run the unpacked file we must alsorestore this header information including the IAT In factthe study of [3 43] restores the header including the IATandproduces the actual whole executable file On the other handmost studies including this paper do not restore the actualwhole executable file itself but restore text data and relocsections which are major parts of the executable file is isbecause the purpose of unpacking is not to execute therestored file but to identify and extract malicious codesincluded in the restored file erefore in this paper we donot need to restore the header required for the actual ex-ecution and accordingly comparison of IAT information isalso not necessary Instead we restore and compare textdata and reloc sections that are more likely to hidemalicious code Restoring the PE header is beyond the scopeof this study we leave it as a separate research topic
6 Conclusions
In this paper we analyzed recent studies of packing de-tection and unpacking techniques and derived four im-portant observations no phase integration no detectioncombination no real-restoration and no unpacking verifi-cation We then proposed an all-in-one unpacking system toimprove these four observations First no phase integrationwas a difficulty in which analysts had to perform packingdetection unpacking and verification phases separately andwe solved this problem by presenting an all-in-one
Table 6 Comparison of packing detection accuracy of two or three combinations and of the hybrid approach
Metrics DP-SIG ENT () DP-ENT WR () DP-SIG ENT WR () Hybrid approach ()Detection accuracy 950 920 984 984False-negative rate 500 800 160 160False-positive rate 073 000 000 000
Security and Communication Networks 13
Tabl
e7
Restorationaccuracy
oftext
anddatasections
ofeach
packer
Section
UPX ()
NSP
ack
()
MEW ()
RLPa
ck(
)Be
RoEX
E(
)Neolite
()
FSG
()
eXpressor
()
Molebox
()
Petite
()
JDpack
()
ASP
ack
()
Yodarsquos
protector
()
exe32p
ack
()
MPR
ESS
()
Yodarsquos
crypter
()
PECom
pact
()
WinUpack
()
Packman
()
Average
()
text
100
100
100
100
100
100
100
100
100
100
100
843
840
808
788
745
740
680
550
892
data
100
100
100
100
100
100
100
100
100
100
100
843
840
100
753
745
740
680
485
902
14 Security and Communication Networks
unpacking system of integrating these three phases in anintegrated framework Second no detection combinationwas due to that there was no attempt to combine variouspacking detection methods and we solved this problem bypresenting a hybrid approach of combining existing and newpacking detection methods ird no real-restoration camefrom that there was little discussion about restoring theactual executable file and we solved this problem by actuallyrestoring an original image of the packed file Fourth nounpacking verification was due to that there was no explicitwork to measure the restoration accuracy of unpacked filesand we solved this problem by proposing a verificationalgorithm of quantitatively evaluating the restorationaccuracy
We constructed a dataset of 2600 PE files and experi-mentally evaluated the proposed system using the datasetFirst we performed all the phases of the all-in-oneunpacking system sequentially and confirmed that its overallworking mechanism worked well Next through the com-parison of the hybrid packing detection approach andexisting detection techniques we showed that the detectionaccuracy of the proposed method was the highest at 984on average with no false positives For this we also de-termined an entropy range of packing Packing-Rangethrough extensive experiments on 19 representative packersFinally in the unpacking and verification phases we con-firmed that unpacking was successful for all files except thefiles packed with Yodarsquos Protector We also showed that therestoration accuracy of unpacked files was nearly 100except for a few packers
As the future research we will focus on unpackingpacked files with Anti-VM Anti-Debug and Anti-Reverseengineering techniques We will also study how to computethe restoration accuracy of the file having reloc sections
Data Availability
All the data files used in the experiments are available athttpsgithubcomchesvectainPackingData
Conflicts of Interest
e authors declare that they have no conflicts of interest
Acknowledgments
is work was partly supported by the Institute of In-formation and Communications Technology Planning andEvaluation (IITP) grant funded by the Korea government(MSIT) (no 2017-0-00158 Development of Cyber reat
Intelligence (CTI) analysis and information sharing tech-nology for national cyber incident response) and the KoreaElectric Power Corporation (no R18XA05)
References
[1] M Morgenstern ldquoAV-TEST Security report 201718rdquo 2019httpswwwav-testorgfileadminpdfsecurity_reportAV-TEST_Security_Report_2017-2018pdf
[2] T Ban R Isawa S Guo D Inoue and K Nakao ldquoEfficientmalware packer identification using support vector machineswith spectrum kernelrdquo in Proceedings of the Eighth Asia JointConference on Information Security pp 69ndash76 Seoul SouthKorea July 2013
[3] B Cheng M Jiang J Fu et al ldquoTowards paving the way forlarge-scale Windows malware analysis generic binaryunpacking with orders-of-magnitude performance boostrdquo inProceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security pp 395ndash411 TorontoCanada October 2018
[4] A Gupta andM S Arya ldquoHashing based encryption and anti-debugger support for packing multiple files into single exe-cutablerdquo International Journal of Advanced Research inComputer Science vol 9 no 1 pp 914ndash920 2018
[5] Y Kawakoya M Iwamura and M Itoh ldquoMemory behavior-based automatic malware unpacking in stealth debuggingenvironmentrdquo in Proceedings of the 5th International Con-ference on Malicious and Unwanted Software pp 39ndash46Lorraine France October 2010
[6] G Liang J Pang Z Shan R Yang and Y Chen ldquoAutomaticbenchmark generation framework for malware detectionrdquoSecurity and Communication Networks vol 2018 Article ID4947695 8 pages 2018
[7] A Pektas and T Acarman ldquoA dynamic malware analyzeragainst virtual machine aware malicious softwarerdquo Securityand Communication Networks vol 7 no 12 pp 2245ndash22572014
[8] I Santos J Nieves and P G Bringas ldquoSemi-supervisedlearning for unknown malware detectionrdquo in InternationalSymposium on Distributed Computing and Artificial In-telligence A Abraham J M Corchado S R Gonzalez andJ F De Paz Santana Eds pp 415ndash422 Springer BerlinGermany 2011
[9] X Ugarte-Pedrero I Santos B Sanz C Laorden andP G Bringas ldquoCountering entropy measure attacks onpacked software detectionrdquo in Proceedings of the 9th In-ternational Conference on Consumer Communications andNetworking pp 164ndash168 Las Vegas NV USA January 2012
[10] H Won S-P Kim S Lee M-J Choi and Y-S MoonldquoSecure principal component analysis in multiple distributednodesrdquo Security and Communication Networks vol 9 no 14pp 2348ndash2358 2016
[11] S Cesare Y Xiang and W Zhou ldquoMalwisendashan effective andefficient classification system for packed and polymorphic
Table 8 Restoration accuracy according to the presence of the reloc section
SectionASPack () Packman
() Mpres () Yodarsquoscrypter ()
PECompact()
YodarsquosProtector
()
exe32pack()
WinUpack()
times times times times times times times times
text 000 100 998 100 000 100 000 100 000 100 000 100 000 100 000 100
data 000 100 881 100 000 100 004 100 000 100 005 100 100 100 000 100
Security and Communication Networks 15
malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013
[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008
[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008
[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017
[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008
[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean
[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007
[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008
[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013
[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe
[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009
[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008
[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid
[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml
[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml
[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006
[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017
[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010
[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007
[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007
[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016
[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016
[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio
[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014
[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581
[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip
[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip
[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992
[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml
[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https
wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one
[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011
[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016
16 Security and Communication Networks
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
Tabl
e7
Restorationaccuracy
oftext
anddatasections
ofeach
packer
Section
UPX ()
NSP
ack
()
MEW ()
RLPa
ck(
)Be
RoEX
E(
)Neolite
()
FSG
()
eXpressor
()
Molebox
()
Petite
()
JDpack
()
ASP
ack
()
Yodarsquos
protector
()
exe32p
ack
()
MPR
ESS
()
Yodarsquos
crypter
()
PECom
pact
()
WinUpack
()
Packman
()
Average
()
text
100
100
100
100
100
100
100
100
100
100
100
843
840
808
788
745
740
680
550
892
data
100
100
100
100
100
100
100
100
100
100
100
843
840
100
753
745
740
680
485
902
14 Security and Communication Networks
unpacking system of integrating these three phases in anintegrated framework Second no detection combinationwas due to that there was no attempt to combine variouspacking detection methods and we solved this problem bypresenting a hybrid approach of combining existing and newpacking detection methods ird no real-restoration camefrom that there was little discussion about restoring theactual executable file and we solved this problem by actuallyrestoring an original image of the packed file Fourth nounpacking verification was due to that there was no explicitwork to measure the restoration accuracy of unpacked filesand we solved this problem by proposing a verificationalgorithm of quantitatively evaluating the restorationaccuracy
We constructed a dataset of 2600 PE files and experi-mentally evaluated the proposed system using the datasetFirst we performed all the phases of the all-in-oneunpacking system sequentially and confirmed that its overallworking mechanism worked well Next through the com-parison of the hybrid packing detection approach andexisting detection techniques we showed that the detectionaccuracy of the proposed method was the highest at 984on average with no false positives For this we also de-termined an entropy range of packing Packing-Rangethrough extensive experiments on 19 representative packersFinally in the unpacking and verification phases we con-firmed that unpacking was successful for all files except thefiles packed with Yodarsquos Protector We also showed that therestoration accuracy of unpacked files was nearly 100except for a few packers
As the future research we will focus on unpackingpacked files with Anti-VM Anti-Debug and Anti-Reverseengineering techniques We will also study how to computethe restoration accuracy of the file having reloc sections
Data Availability
All the data files used in the experiments are available athttpsgithubcomchesvectainPackingData
Conflicts of Interest
e authors declare that they have no conflicts of interest
Acknowledgments
is work was partly supported by the Institute of In-formation and Communications Technology Planning andEvaluation (IITP) grant funded by the Korea government(MSIT) (no 2017-0-00158 Development of Cyber reat
Intelligence (CTI) analysis and information sharing tech-nology for national cyber incident response) and the KoreaElectric Power Corporation (no R18XA05)
References
[1] M Morgenstern ldquoAV-TEST Security report 201718rdquo 2019httpswwwav-testorgfileadminpdfsecurity_reportAV-TEST_Security_Report_2017-2018pdf
[2] T Ban R Isawa S Guo D Inoue and K Nakao ldquoEfficientmalware packer identification using support vector machineswith spectrum kernelrdquo in Proceedings of the Eighth Asia JointConference on Information Security pp 69ndash76 Seoul SouthKorea July 2013
[3] B Cheng M Jiang J Fu et al ldquoTowards paving the way forlarge-scale Windows malware analysis generic binaryunpacking with orders-of-magnitude performance boostrdquo inProceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security pp 395ndash411 TorontoCanada October 2018
[4] A Gupta andM S Arya ldquoHashing based encryption and anti-debugger support for packing multiple files into single exe-cutablerdquo International Journal of Advanced Research inComputer Science vol 9 no 1 pp 914ndash920 2018
[5] Y Kawakoya M Iwamura and M Itoh ldquoMemory behavior-based automatic malware unpacking in stealth debuggingenvironmentrdquo in Proceedings of the 5th International Con-ference on Malicious and Unwanted Software pp 39ndash46Lorraine France October 2010
[6] G Liang J Pang Z Shan R Yang and Y Chen ldquoAutomaticbenchmark generation framework for malware detectionrdquoSecurity and Communication Networks vol 2018 Article ID4947695 8 pages 2018
[7] A Pektas and T Acarman ldquoA dynamic malware analyzeragainst virtual machine aware malicious softwarerdquo Securityand Communication Networks vol 7 no 12 pp 2245ndash22572014
[8] I Santos J Nieves and P G Bringas ldquoSemi-supervisedlearning for unknown malware detectionrdquo in InternationalSymposium on Distributed Computing and Artificial In-telligence A Abraham J M Corchado S R Gonzalez andJ F De Paz Santana Eds pp 415ndash422 Springer BerlinGermany 2011
[9] X Ugarte-Pedrero I Santos B Sanz C Laorden andP G Bringas ldquoCountering entropy measure attacks onpacked software detectionrdquo in Proceedings of the 9th In-ternational Conference on Consumer Communications andNetworking pp 164ndash168 Las Vegas NV USA January 2012
[10] H Won S-P Kim S Lee M-J Choi and Y-S MoonldquoSecure principal component analysis in multiple distributednodesrdquo Security and Communication Networks vol 9 no 14pp 2348ndash2358 2016
[11] S Cesare Y Xiang and W Zhou ldquoMalwisendashan effective andefficient classification system for packed and polymorphic
Table 8 Restoration accuracy according to the presence of the reloc section
SectionASPack () Packman
() Mpres () Yodarsquoscrypter ()
PECompact()
YodarsquosProtector
()
exe32pack()
WinUpack()
times times times times times times times times
text 000 100 998 100 000 100 000 100 000 100 000 100 000 100 000 100
data 000 100 881 100 000 100 004 100 000 100 005 100 100 100 000 100
Security and Communication Networks 15
malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013
[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008
[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008
[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017
[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008
[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean
[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007
[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008
[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013
[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe
[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009
[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008
[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid
[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml
[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml
[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006
[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017
[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010
[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007
[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007
[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016
[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016
[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio
[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014
[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581
[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip
[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip
[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992
[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml
[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https
wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one
[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011
[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016
16 Security and Communication Networks
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
unpacking system of integrating these three phases in anintegrated framework Second no detection combinationwas due to that there was no attempt to combine variouspacking detection methods and we solved this problem bypresenting a hybrid approach of combining existing and newpacking detection methods ird no real-restoration camefrom that there was little discussion about restoring theactual executable file and we solved this problem by actuallyrestoring an original image of the packed file Fourth nounpacking verification was due to that there was no explicitwork to measure the restoration accuracy of unpacked filesand we solved this problem by proposing a verificationalgorithm of quantitatively evaluating the restorationaccuracy
We constructed a dataset of 2600 PE files and experi-mentally evaluated the proposed system using the datasetFirst we performed all the phases of the all-in-oneunpacking system sequentially and confirmed that its overallworking mechanism worked well Next through the com-parison of the hybrid packing detection approach andexisting detection techniques we showed that the detectionaccuracy of the proposed method was the highest at 984on average with no false positives For this we also de-termined an entropy range of packing Packing-Rangethrough extensive experiments on 19 representative packersFinally in the unpacking and verification phases we con-firmed that unpacking was successful for all files except thefiles packed with Yodarsquos Protector We also showed that therestoration accuracy of unpacked files was nearly 100except for a few packers
As the future research we will focus on unpackingpacked files with Anti-VM Anti-Debug and Anti-Reverseengineering techniques We will also study how to computethe restoration accuracy of the file having reloc sections
Data Availability
All the data files used in the experiments are available athttpsgithubcomchesvectainPackingData
Conflicts of Interest
e authors declare that they have no conflicts of interest
Acknowledgments
is work was partly supported by the Institute of In-formation and Communications Technology Planning andEvaluation (IITP) grant funded by the Korea government(MSIT) (no 2017-0-00158 Development of Cyber reat
Intelligence (CTI) analysis and information sharing tech-nology for national cyber incident response) and the KoreaElectric Power Corporation (no R18XA05)
References
[1] M Morgenstern ldquoAV-TEST Security report 201718rdquo 2019httpswwwav-testorgfileadminpdfsecurity_reportAV-TEST_Security_Report_2017-2018pdf
[2] T Ban R Isawa S Guo D Inoue and K Nakao ldquoEfficientmalware packer identification using support vector machineswith spectrum kernelrdquo in Proceedings of the Eighth Asia JointConference on Information Security pp 69ndash76 Seoul SouthKorea July 2013
[3] B Cheng M Jiang J Fu et al ldquoTowards paving the way forlarge-scale Windows malware analysis generic binaryunpacking with orders-of-magnitude performance boostrdquo inProceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security pp 395ndash411 TorontoCanada October 2018
[4] A Gupta andM S Arya ldquoHashing based encryption and anti-debugger support for packing multiple files into single exe-cutablerdquo International Journal of Advanced Research inComputer Science vol 9 no 1 pp 914ndash920 2018
[5] Y Kawakoya M Iwamura and M Itoh ldquoMemory behavior-based automatic malware unpacking in stealth debuggingenvironmentrdquo in Proceedings of the 5th International Con-ference on Malicious and Unwanted Software pp 39ndash46Lorraine France October 2010
[6] G Liang J Pang Z Shan R Yang and Y Chen ldquoAutomaticbenchmark generation framework for malware detectionrdquoSecurity and Communication Networks vol 2018 Article ID4947695 8 pages 2018
[7] A Pektas and T Acarman ldquoA dynamic malware analyzeragainst virtual machine aware malicious softwarerdquo Securityand Communication Networks vol 7 no 12 pp 2245ndash22572014
[8] I Santos J Nieves and P G Bringas ldquoSemi-supervisedlearning for unknown malware detectionrdquo in InternationalSymposium on Distributed Computing and Artificial In-telligence A Abraham J M Corchado S R Gonzalez andJ F De Paz Santana Eds pp 415ndash422 Springer BerlinGermany 2011
[9] X Ugarte-Pedrero I Santos B Sanz C Laorden andP G Bringas ldquoCountering entropy measure attacks onpacked software detectionrdquo in Proceedings of the 9th In-ternational Conference on Consumer Communications andNetworking pp 164ndash168 Las Vegas NV USA January 2012
[10] H Won S-P Kim S Lee M-J Choi and Y-S MoonldquoSecure principal component analysis in multiple distributednodesrdquo Security and Communication Networks vol 9 no 14pp 2348ndash2358 2016
[11] S Cesare Y Xiang and W Zhou ldquoMalwisendashan effective andefficient classification system for packed and polymorphic
Table 8 Restoration accuracy according to the presence of the reloc section
SectionASPack () Packman
() Mpres () Yodarsquoscrypter ()
PECompact()
YodarsquosProtector
()
exe32pack()
WinUpack()
times times times times times times times times
text 000 100 998 100 000 100 000 100 000 100 000 100 000 100 000 100
data 000 100 881 100 000 100 004 100 000 100 005 100 100 100 000 100
Security and Communication Networks 15
malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013
[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008
[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008
[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017
[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008
[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean
[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007
[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008
[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013
[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe
[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009
[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008
[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid
[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml
[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml
[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006
[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017
[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010
[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007
[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007
[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016
[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016
[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio
[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014
[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581
[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip
[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip
[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992
[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml
[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https
wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one
[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011
[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016
16 Security and Communication Networks
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013
[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008
[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008
[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017
[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008
[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean
[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007
[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008
[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013
[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe
[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009
[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008
[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid
[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml
[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml
[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006
[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017
[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010
[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007
[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007
[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016
[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016
[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio
[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014
[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581
[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip
[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip
[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992
[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml
[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https
wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one
[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011
[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016
16 Security and Communication Networks
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom