Web Application Forensics: Taxonomy and Trends

Web Application Forensics

Taxonomy and Trends

term paper

Krassen Deltchev

[email protected]

5. September 2011

Ruhr-University of Bochum

Department of Electrical Engineering and Information Technology

Chair of Network and Data Security

Horst Görtz Institute

First examiner: Prof. Jörg Schwenk

Second Examiner and Supervisor: M.Sc. Dominik Birk

Contents List of Figures .................................................................................................................................. 3 List of Tables ................................................................................................................................... 3 Abbreviations ................................................................................................................................... 4 Abstract ............................................................................................................................................ 51. Introduction .................................................................................................................................. 7

1.1. What is Web Application Forensics? .................................................................................... 71.2. Limitations of this paper ....................................................................................................... 81.3. Reference works ................................................................................................................... 9

2. Intruder profiles and Web Attacking Scenarios .......................................................................... 112.1. Intruder profiling ................................................................................................................. 122.2. Current Web Attacking scenarios ........................................................................................ 142.3. New Trends in Web Attacking deployment and preventions .............................................. 15

3. Web Application Forensics ......................................................................................................... 193.1. Examples of Webapp Forensics techniques ........................................................................ 233.2. WebMail Forensics ............................................................................................................. 253.3. Supportive Forensics ........................................................................................................... 27

4. Webapp Forensics tools .............................................................................................................. 294.1. Requirements for Webapp forensics tools .......................................................................... 294.2. Proprietary tools .................................................................................................................. 314.3. Open Source tools ............................................................................................................... 34

5. Future work ................................................................................................................................ 396. Conclusion .................................................................................................................................. 41 Appendixes .................................................................................................................................... 42 Appendix A .................................................................................................................................... 42 Application Flow Analysis ............................................................................................................ 42 WAFO victim environment preparedness ...................................................................................... 44 Appendix B .................................................................................................................................... 45 Proprietary WAFO tools ................................................................................................................ 45 Open Source WAFO tools ............................................................................................................. 48 Results of the tool's comparison .................................................................................................... 49 List of links .................................................................................................................................... 50 Bibliography .................................................................................................................................. 52

2

List of FiguresFigure 1: General Digital Forensics Classification, WAFO allocation ............................................. 8Figure 2: Web attacking scenario taxonomic construction .............................................................. 15Figure 3: Digital Forensics: General taxonomy .............................................................................. 20Figure 4: WAFO phases, in Jess Garcia[1] ...................................................................................... 21Figure 5: Extraneous White Space on Request Line, in [3] ............................................................ 23Figure 6: Google Dorks example, in [3] .......................................................................................... 24Figure 7: Malicious queries at Google search by spammers, in [3] ................................................ 24Figure 8: faked Referrer URL by spammers, in [3] ......................................................................... 24Figure 9: RFI, pulling c99 shell, in [3] ............................................................................................ 24Figure 10: Simple Classic SQLIA, in [3] ........................................................................................ 25Figure 11: NBO evidence in Webapp log, in [3] ............................................................................. 25Figure 12: HTML representation of spam-mail( e-mail spoofing) .................................................. 26Figure 13: e-mail header snippet of the spam-mail in Figure 12 .................................................... 26Figure 14: Spam-assassin sanitized malicious HTML redirection, from example Figure 12 ......... 27Figure 15: Main PyFlag data flow, as [L26] .................................................................................... 35Figure 16: Improving the Testing process of Web Application Scanners, Rafal Los [10] .............. 43Figure 17: Flow based Threat Analysis, Example, Rafal Los [10] .................................................. 43Figure 18: Forensics Readiness, in Jess Garcia [13] ....................................................................... 44Figure 19: MS LogParser general flow, as [L16] ............................................................................ 45Figure 20: LogParser-scripting example, as [L17] .......................................................................... 45Figure 21: Splunk licenses' features ................................................................................................ 46Figure 22: Splunk, Windows Management Instrumentation and MSA( ISA) queries, at WWW .. 47Figure 23: PyFlag- load preset and log file output, at WWW ......................................................... 48Figure 24: apache-scalp or Scalp! log file output( XSS query), as [L25] ....................................... 48

List of TablesTable 1: Abbreviations ....................................................................................................................... 4Table 2: A proposal for general taxonomic approach, considering the complete WAFO description ... 11Table 3: Example of possible Webapp attacking scenario ............................................................... 16Table 4: Standard vs. Intelligent Web intruder ................................................................................ 17Table 5: Web Application Forensics Overview, in [15] ................................................................... 21Table 6: A general Taxonomy of the Forensics evidence, in [1] ..................................................... 22Table 7: Common Players in Layer 7 Communication, in Jess Garcia [1] ..................................... 22Table 8: Traditional vs. Reactive forensics Approaches, in [13] ..................................................... 29Table 9: Functional vs. Security testing, Rafal Los [10] ................................................................. 42Table 10: Standards & Specifications of EFBs, Rafal Los [10] ...................................................... 42Table 11: Basic EFD Concepts [10] ................................................................................................ 42Table 12: Definition of Execution Flow Action and Action Types, Rafal Los [10] ........................ 42Table 13: TRR completion on LogParser, Splunk, PyFlag, Scalp! ................................................. 49Table 14: List of links ...................................................................................................................... 51

3

Abbreviations

Anti-Virus AV

Application-Flow Analysis AFA

Business-to-Business B2B

Cloud-computing CC

Cloud(-computing) Forensics CCFO

Digital Forensics DFO

Digital Image Forensics DIFO

Execution-Flow-Based approach EFB

Incident Response IR

Microsoft MS

Network Forensics NFO

Non- persistent XSS NP-XSS

NULL-Byte-Injection NBI

Operating System(s) OS(es)

Operating System(s) forensics OSFO

Persistent( stored) XSS P-XSS

Proof of Concept PoC

Regular Expression RegEx

Relational Database System RDBMS

Remote File Inclusion RFI

SQL Injection Attacks SQLIA

Tool's requirements rules TRR

Web Application Firewall(s) WAF(s)

Web Application Forensics WAFO

Web Application Scanner WAS

Web Attacking Scenario(s) WASC

Web Services Forensics WSFO

Table 1: Abbreviations

4

Abstract

The topic, covering Web Application Forensics is challenging. There are not enough references, discussing this subject, especially in the Scientific communities. Often is the the term 'Web Application Forensics' misunderstood and mixed with IDS/ IPS defensive security approaches.

Another issue is to discern the Web Application Forensics, short Webapp Forensics, from Network Forensics and Web Services Forensics, and in general to allocate it in the Digital/ Computer Forensics classification.

Nowadays, Web Platforms are vastly growing, not to mention the so called Web 2.0 hype. Furthermore, Business Web Applications blast the common security knowledge and premise rapid inventory of the current security best practices and approaches. The questions, concerning the automation of the security defensive and investigation methods, are becoming undeniable important.

In this paper we should try to dispute the questions, concerning taxonomic approaches regarding the Webapp Forensics; discuss trends, referenced to this topic and debate the matter of automation tools for Webapp forensics.

KeywordsWeb Application Security, WebMail Security, Web Application Forensics, WebMail Forensics, Header Inspection, Plan Cache Inspection, Forensic Tools, Forensics Taxonomy, Forensics Trends

5

1.Introduction

1. Introduction

In [1], Jess Garcia gives a definition of the term 'Forensics Readiness':

“ Forensics Readiness is the “art” of Maximizing an Environment's Ability to collect Credible Digital Evidence”. This statement we should keep in mind in the further exposition of the paper. It points out several important aspects. Foremost, forensics rely on maximal collection of digital evidence. If the observed environment1 is not well prepared for forensic investigation, discovering the root, for the system is been attacked, could be: sophisticated, not efficient in time and even non deterministic in finding an appropriate remediation of the problem.

Another essential aspect of Forensics, as Jess Garcia, is- the forensic investigation is an art.

It is obvious to point out furthermore that, defining best practices, concerning the proper deployment of forensic work, is unbefitting. An intelligent intruder will always find drawbacks in such best-practice scenarios and try to exploit them as well to accomplish new attacks, complete them successfully and remain concealed.

In this way of thoughts, appears the question, how can we suggest taxonomy, regarding forensic work, if we are aware a priori of the risks such recipes include?

We shall propose several general intruders' strategies and profiling of the modern Web attacker in the paper, keeping in mind not to hurt the universal validity of the statements we discuss. In some cases we shall give examples and paradigms through references, though only for the matter of the good illustration of the statements in the current thesis.

Let us describe more precisely the matters, concerning the Webapp Forensics in the next section.

1.1. What is Web Application Forensics?

Web Application Forensics( WAFO) is a post mortem investigation of a compromised Web Application( Webapp) system. WAFO consider especially attacks on Layer 7 of the ISO/OSI model. In distinction to this, capturing and filtering of internet protocols on-the-fly is not a concern of the Webapp forensics. More precisely, such issues in general are in the focus of Network Forensics( NFO). Nevertheless, examining the log files of such automated tools( IDS/ IPS/ traffic filters/ WAF etc.) is supportive for the right deployment of the Webapp forensic investigation.

As stated above, NFO examine in concrete such issues, that's why we should like to discern Webapp Forensics from it, keeping in mind the supportive function, which Network forensic tools can supply to WAFO.

Consequently, we should like to specifically allocate WAFO in the Digital Forensics( DFO) structure, because some main topics in DFO are not implicitly referred to Layer 7 of the ISO/OSI Model. Such should be designated as follows: Memory Investigations, Operating Systems Forensics investigations, Secure Data Recovery on physical storage of OSes etc. Nevertheless, DFO consider investigations of image manipulations [L1], [L2], which in some cases could be also very supportive for the proper deployment of WAFO.

At last, we should categorize WAFO as a sub-class of Cloud Forensics( CCFO) [2]. Cloud

1 we assume that, the reader understands the abstraction of the Webapp as a WAFO environment

7

1.Introduction

Forensics is a relatively new term in the Security communities. Historically, the existence of Web Applications lead in phase to the Cloud-Computing( CC). Concerning the complexity of the Web applications, platforms and services presented by the CC, CCFO cover larger investigation areas than the WAFO. As an example, WAFO is not explicitly observing fraud on Web Services. Web Services are covered by the Web Services Forensics( WSFO), another sub-class of CCFO, and should be categorical discerned from WAFO, please read further.

Let us illustrate the DFO taxonomic structure in the next Figure:

On behalf of this short introduction of the different Computer Forensics categories, let's designate explicitly the limitations of the paper. This concerns the better understanding of the paper's exposition and explain the absence of examples, covering different exotic attacking scenarios.

1.2. Limitations of this paper

This term paper discusses Web Application Forensics, which excludes topics as on-the-fly packet capturing, packet inspection of sensitive data over ( security) internet protocols. Once again to mention, it does not cover attacks, or attacking scenarios on lower layer than Layer 7 ISO/ OSI Model. For the interested reader, a very good correlation of the Layer 7 Attacks and below, concerning Web Application Security and Forensics can be found at [3]. In distinction to Web Services Forensics [5] and CCFO [2], the presented paper covers only a small topic, concerning the varieties of fraud Web Applications:

• RIA( AJAX, RoR2, Flash, Silverlight et al.) ,

2 RoR- Ruby on Rails, http://rubyonrails.org/

8

Figure 1: General Digital Forensics Classification, WAFO allocation

1.Introduction

• static Web Applications,

• dynamic Web Applications and Web Content( .asp(x), .php, .do etc. ),

• other Web Implementations( like different CMSes), excluding research on fraud, concerning Web Services Security, or CC Implementations, but explicitly Web Applications.

Due to the marginal limitations of the term paper, the reader shall find a couple of illustrating examples, which do not pretend to cover the variety of illustrative scenarios of Web Attacking Techniques and Web Application Forensics approaches.

For the reader concerned, attacks on Layer 7 are introduced and some of them discussed in detail

at [4].

Furthermore, we should denote a clarification, regarding the references in this paper, considering their proper uniformity, as follows. General knowledge should be referenced by footnotes at the appropriate position. The scientifically approved works are indexed at the end of the paper in the Bibliography, as ordinary. Non scientifically approved works, also video-tutorials, live video snapshots of conferences, blogs etc. are indexed by the List of links after the Appendix of this paper.

We should imply this strict references' sources division, with respect to the Security Scientific Communities. In addition to this, let us introduce some of the interesting related works dedicated on the topic of WAFO.

1.3. Reference works

An extensive approach, covering the different aspects of Web Application Forensics, is given in the book “Detecting Malice” [3], by Robert Hansen3. The interested reader can find much more than just WAFO discussions in this book, but in addition to these also examples of attacks on lower level than Layer 7, correlated to the WAFO investigations and many paradigms, derived from real-life WAFO investigations.

The unprepared reader should notice that, the topics in the book, discussing WAFO tools, are limited. The author of the book points out the sentence, that every WAFO investigation should be considered as unique, especially in its tactical accomplishment, therefore favoring of top automated tools, should be assumed as inappropriate, please read further.

Another interesting approach is given by SANS Institute as Practical Assignment, covering three notable topics: penetration testing of a compromised Linux System, a post mortem WAFO on the observed environment and discussions on the legal aspects of the Forensics investigation [6]. Despite the fact that, this tutorial in its Version 1.4 is no more relying on an up-to-date example, it illustrates very important basics, concerning WAFO and can be used still as a fundamental reading for further research on the WAFO topic.

BSI4, Germany, describes in the Section, Forensic Toolkits, at “Leitfaden “IT-Forensik” [7], Version 1.0, September 2010, different Forensic tools for automated analysis, many of them concerning implicitly WAFO. The toolkits are compared by the following aspects:

• analyzing of log-data,

3 http://www.sectheory.com/bio.htm4 https://www.bsi.bund.de/EN/Home/home_node.html

9

1.Introduction

• tests, concerning time consistency,

• tests, concerning syntax consistency,

• tests, concerning semantic consistency,

• log-data reduction,

• log-data correlation, concerning integration and combining of different log-data sources in a consistent timeline, integration/ combining of events to super-events,

• detection of timing correlations( MAC timings) between events.

The given approaches can be related to WAFO log file analysis, which designates them as reasonable supportive WAFO investigation methods.

Another tutorial, giving basic overview, which should be also considered as fundamental regarding WAFO research, is: “Web Application Forensics: The Uncharted Territory”, presented at [8]. Although, the paper is published in 2002, it should not be categorized it in a speedy manner as obsolete.

Other papers, articles and presentation papers, concerning specific WAFO aspects, complete the group of the related references, concerning the Web Application Forensics research in this term paper. These should be referenced at the appropriate paragraphs in the paper's exposition and not be discussed individually in this section, furthermore.

Let's describe the structure of the term paper. Chapter 2 should give a taxonomic illustration on the topics, designating intruders' profiling and modern Web Attacking Scenarios. Chapter 3 deliberates WAFO investigation methods and techniques more detailed and concerns further discussion on the matter of signification of a possible WAFO taxonomy. In Chapter 4 are illustrated the WAFO investigation supportive tools. An important section outlines the questions, concerning the requirements of WAFO toolkits, which points out the reasonable aspects for determining the tools either as relevant, or inappropriate for adequate WAFO investigations. Two major group of favorite tools should be designated: Proprietary Toolkits and Open Source solutions. Chapter 5 represents the final discussion on the paper's thesis and suggestions for future work on behalf of the discussed topics in the former chapters. In Chapter 6 is deliberated the Conclusion on the proposed thesis. The Appendix demonstrates an additional information( tables, diagrams, screenshots and code snippets) on specific topics, discussed in the exposition part of the paper.

Let us proceed with the description of the Web Attacking Scenarios and ( Web) Intruder profiles.

10

2.Intruder profiles and Web Attacking Scenarios

2. Intruder profiles and Web Attacking Scenarios

In the introduction part of this thesis is outlined that, the scientifically approved research, concerning Web Application Forensics by the Security and Scientific Communities, should be still considered as insufficient and as not well-established. That's why, an appropriate categorization of the different Forensic Fields and the correct allocation of WAFO in the Digital Forensics hierarchy are adequately appointed as required in the former chapter, which satisfies one of the objectives of the current paper.

For all that, this classification does not present a complete fundamental basis for further academic research on WAFO. Therefore, we should extend the abstract Model, concerning WAFO, by introducing two other fundamentals: the profile of the modern Web intruder and methodologies as abstract schemae, current Cyber ( Web) attacks are accomplished by.

Thus, we should follow the proposed schema for describing completely the aspects of WAFO, see the following Table:

1. represent the Digital Forensics hierarchy and

2. allocate the field of interest, concerning WAFO,

3. explain the Security Model, WAFO is observing, by:

• designating the intruder,

• describing the victim environment( Webapps),

• specifying the fraudulent methods;

4. demonstrate the WAFO tasks, supporting the security remediation plan

Table 2: A proposal for general taxonomic approach, considering the complete WAFO description

In this way of thoughts, we should stress that, the intruders' attacks on existing Web Applications and other Web Implementations nowadays, should be denoted as highly sophisticated. Such Web attacks are rapidly adaptive in their variations and alternations, and in some cases precarious to be effectively sanitized. Example of such attacks like CSRF, Compounded SQLIA and Compounded CSRF are described in [4]. A good representative in this group is the famous Sammy worm, which is still wrongly considered to be a pure XSS Attack. Another confusing example demonstrate the Third Wave of XSS Attacks, DOM based XSS( DOMXSS) [20]. The fact that, DOMXSS attacks cannot be detected by IDS/ IPS, or WAF systems, if the payload is obfuscated as an URL parameter, e.g. Web Application server do not record HTML parameters in the log file, but only the primary URL prefix, should be designated as ominous. If the nature of such Attacking scenarios is fundamentally mistaken, then it is a matter of time that, attacks' derivatives should success in their further fraudulent activities on the Web.

The task to sanitize a compromised Web application by CSRF is very difficult. It requires immense efforts of Reverse Engineering and Source Code rectification in reasonable boundaries for time and efficiency. The more general problem is, Web Applications are per se not stealth5. Thus, hardening a

5 Exceptions to these could be Intranet-Webapps, which designate another class of Webapps, concerning the term

11


Webapp is not equivalent to hardening of a local host. In other words, the utilization of known preventive techniques, like security-through-obscurity, should be anted to secured Intranet Web applications, Admin Web Interfaces, non-public FTP servers etc., but commercial B2B Webapps, On-line Banking, Social Network Web sites, On-line magazines, WebMail applications and others. These last mentioned applications are meant to be employed from all over the world per definition; they exist, because of the huge amount of their users and customers per se. That's why, the securing of such Web constructs is more complex and intensive. Of course, there are basic and advanced authentication techniques applied to Web implementations, though these do not make the Webapp stealth for intruders. They just apply the so called user restriction for using sensitive parts of the Web implementation. In this way of thoughts, pointing out exaggerative cases of Web fraud like Child pornography and personal image offending issues, is only the top of the iceberg of examples for Web crime. The problem is, nowadays Identity Theft and speculations with sensitive personal data, should not be further categorized as exotic examples of existing Cyber crimes6 over the internet on Web Platforms. Such crimes designate an everyday persistence. Social networks, social and health insurance companies strive for more impressive Web representation. E-Commerce Platforms for daily monetary transactions are undeniable nowadays. We should not consider nowadays Web 2.0 as a hype, we should keep in mind that, the former dynamic E-commerce Web representations become nowadays sophisticated RIA Web platforms. Such Webapps respect the better marketing representation of the Business Logic of the firms, which profit depends at the present days on the complexity, rapidly changing dynamic adaption and more user-friendly features for satisfying the Web customer at any time. These aspects explain the huge intruders' interest for compromising Web applications, and furthermore Web Services as well. There is no kind of deterministic conclusions on the prediction of Web Attacking Scenarios, or the amount of the damage they cause every day.

In [3], Robert Hansen compares the intensity of Web Attacks' representations and amount of damage they cause comparatively to the computer viruses. Both of the security topics should not loose attention of the Security communities for a long period of time. Moreover, as already stated, their remediation could not be ascertained straight-forward. As we know, there is no default approach for proper sanitization against computer viruses. The same statement is applicable for Webapp attacking scenarios. Rather, it is a matter of extensive 24/7/365 deployment of proper security hardening techniques and strategies, and the adaptive improvement of those. Knowing your friends is good, knowing your enemies is crucial. Let's proceed in this way of thoughts, after giving this conclusive explanation for the motivational purpose of the paper, with the representation of modern Web fraud in detail as follows.

2.1. Intruder profiling

Two general categories should be designated in this section: the standard intruder profile and the profile of the intelligent intruder, performing terrible Cyber crime, short- intelligent intruder profile. We should use the adjective 'intelligent', describing the second intruder's profile, as very reasonable, respecting the fact- if we as representatives of the Security Communities, pretend to posses knowledge and know-how, concerning the proper deployment of our duties, this kind of intruders posses it too and much more.

paper's definitions, where extensive intruder's effort is a pre-requirement for breaking the Intranet security, and should not be discussed here as relevant.

6 http://www.justice.gov/criminal/cybercrime/

12


There are also fuzzy definitions of intruders, which designate states in between the above mentioned ones. In fact, these profiles are very agile in their representation. For example- a 'former' intelligent intruder should be categorized better as a latent one, and a motivated standard attacker should not be disrespected. This violator could fulfill the requirements of the category, related to the intelligent intruder profile, at any time with sufficient likelihood.

In the category of standard intruder we should determine: script kiddies and hacker wannabes, “fans” of YouTube, or other video platforms, capturing knowledge and know-how from easy how-to video tutorials. Bad configured robots and spiders, and any other kind of not well educated, not enough motivated, even not enough skilled daily violators. Specific for this group of intruders is the lack of personal knowledge and know-how, utilization of well known attacking techniques and scenarios well-established on the Web. Such violators are ignorant to and disrespecting the noise7 they produce, while trying to accomplish the attacks. These features explain the deduction- a standard attacking scenario, could be sanitized in greater likelihood with standard prevention and hardening techniques( best-practices). In cases of successfully deployed attack(s) on behalf of such standard scenarios, the investigation and detection approaches could be considered as standard with greater likelihood too.

For all that, there are cases, which represent attacking scenarios, designated as shadow scenarios. It is not important, whether these are accomplished successfully, or not at the specific time of the attack's deployment. Their utilization is to cover the deployment of the real attacking scenario. That's why, we should rather concern, whether these are cases of intelligent intruders' attacks.

The group of intelligent intruders should deliberate: 'former' ethical hackers; pen testers; security professionals, who have changed sides, disrespecting their duties; intelligently set up automated tools for Web Intrusion, such as Web Scanners, Web Crawlers, Robots, Spiders etc.

The most notable feature describing these representatives is the possession of inferior independent knowledge and know-how. Furthermore, patience, accuracy in the accomplishment of the attacking scenario deployment, strive to learn and assimilate new know-how.

Interesting examples, related to this profile, are given at [3]. We should mention some types of such ones. Intelligent hackers are recruited by law firms to achieve a Proof of Concept( PoC) on a targeted Web implementation. If the PoC is positive, this could alter the outcome of the legal case, as this PoC could be used as decisive juristic evidence in most of the situations in account of the hacker recruiting law firm. Such intruders' attacks are difficult to be detected right on time.

Furthermore, there are other cases, where the damage of the accomplished attack is the determinant alarm after havoc is consequently presented. As already stated, the sanitization of the compromised Web Application(s) after such successful attacks is in some cases unfeasible and more often requires sophisticated methods to be achieved. Examples of these are CSRF compromised Webapps, like the case: PDP GMail CSRF attack8, see also [4]. Therefore, reasonable supportive part to the accurate sanitization of the compromised Webapp, demonstrates the proper deployment of Web Application Forensics investigations.

Let's mention several examples of modern Web Attacking Scenarios in the next section of

Chapter 2.

7 We should emphasize here: the Communication Complexity and amount of false positive attempts by the violator(s) in their strive to complete the intended Web attacking scenario(s), which should not be mistaken with the utilization of attacking techniques, where producing communication noise is the core of the attacking strategy, like different DDoS implementations: Fast Fluxing SQLIA, DDoS via XSS, DDoS via XSS with CSRF etc.

8 http://www.gnucitizen.org/blog/google-gmail-e-mail-hijack-technique/

13


2.2. Current Web Attacking scenarios

In May, 2009 Joe McCray9 concludes in his presentation [9] on 'Advanced SQL Injection' at LayerOne10, that Classic SQLIA should no more be categorized as a trend or conventional.

In [4] Classic SQLIA are discussed as a part of the current SQLIA taxonomy till 2010. Despite the fact, their categorization by Joe McCray should be respected as reasonable. This controversial issue is presented at many of the current Web Attacking Vectors. To achieve a complete taxonomic approach, pertaining to a concrete Webapp Attacking vector, many obsolete representations of the Attacking sub-classes should be illustrated, revering the real Web Environment. The mentioned above Classic SQLIA illustrate obsolete and more over unfeasible Attacking Techniques, considering the properly employed modern defensive methods. The main reason, explaining this issue is- Web platforms are vastly changing, not only according to its development aspects, but rather the attacking and security hardening scenarios, anted to them. Most likely, an intelligent intruder should not use obsolete techniques, because of the expectant presence of Web Application security protection. Detecting deployment of obsolete Attacking Scenarios on a modern Web construct, could be classified as an investigation on the standard intruder's profile. Nevertheless, this conclusion should not be underestimated, as previously discussed, see shadow scenarios.

Let's give some interesting examples of current successful accomplished Web Attacks.

In July, 2009 a dynamic CSRF Attack is accomplished on the Web Platform of Newsweek [4], [L4].

The Tool, called MonkeyFist11, utilized for this first completely automated CSRF Attack, represents a Python- based small web server, configured via XML. The victim site is been already hardened via protecting of the generation of its dynamic elements by security tokens12 and strong session IDS. For all that, this new attacking technique achieves positive results, which designates open questions, concerning the impact of the 'See surfing' sleeping giant.

Another recent attack is the SQLIA over the British Navy Website[L5] in November, 2010, which was only meant to be a PoC by a Romanian hacker, that Web Application Security can be broken even at such high-level hardened Web Implementations.

In April 2011, different mass infection by SQLIA is detected. 28000 Web sites are compromised, even several Apple Itunes Store index sites are infected. The SQLIA injects a PHP script, which redirects the user to a cross-origin phishing site, pretending to deliver an on-line Anti-Virus( AV) protection. The attack is known in the Security Communities as LizaMoon Mass SQLIA13 [L6].

The list of such impressive Web Attacking incidents can be proceeded, which should not be enumerated further in the paper. The interested reader should refer further to :

• The Web Hacking Incidents Database14

• OWASP Top Ten Project15

9 http://www.linkedin.com/in/joemccray10 LayerOne- IT- Security conference, http://layerone.info11 http://www.neohaxor.org/2009/08/12/monkeyfist-fu-the-intro/12 The anti-CSRF token is originally suggested by Thomas Schreiber, in 2004:

www.securenet.de/papers/Session_Riding.pdf13 http://blogs.mcafee.com/mcafee-labs/lizamoon-the-latest-sql-injection-attack14 http://projects.webappsec.org/w/page/13246995/Web-Hacking-Incident-Database15 http://www.owasp.org/index.php/Category:OWASP_Top_Ten_Project

14


At the end of this Chapter let's deliberate some interesting trends, concerning the current Web Attacks.

2.3. New Trends in Web Attacking deployment and preventions

Discussing the deployment of Web Attacks, we should consider a more realistic approach, for categorizing Web Attacking Vectors. As mentioned above, there are two general profiles of the Web Intruders. Keeping in mind, the differences of the Attacks' deployment and the level of Attacks' sophistication, it should be more appropriate to discuss the accomplishment of Web Attacking Scenarios, rather than the deployment of Web Application Attacks. In such Attacking Scenarios, which represent a fundamental construct, the Web Attacks should be denoted as execution techniques in a given attacking setting. This allows us to define single layer attacks, multi-layer attacks, and special attacking sequences as specific implementations in the realization of the Web Attacking Scenario. Such scenarios can adequately illustrate the intention of the different profiles of Web Intruders. In distinction to the intelligent Web Intruder, the standard Intruder tries to accomplish a simple attacking scenario, reduced to the utilization of a special Web attacking technique. The Web attacking scenario represents a simple deployment construct: try a well-established attacking procedure(s) and wait for result(s), no matter what.

As mentioned above, the intelligent Intruder utilizes more sophisticated scenarios. Some of them could be planned and sequentially accomplished in a long period of time, till achieving the expected result(s). There are cases in which the intelligent attacker could gain enough feedback from the victim application and thus intentionally reduce the attacking scenario to the deployment of one or a compact amount of attacking techniques, which resembles the scenario to the level of the standard intruder's scenario. Nevertheless, important aspects like utilization of non-standard attacking techniques and less noise at the attacking environment obviously discern the one profile from the another. These conclusions should be extended in the Chapters, concerning the more detailed representation of WAFO.

Let's illustrate the Web Application Scenario construction in the next Figure:

15

Figure 2: Web attacking scenario taxonomic construction


The proposed construct should be extended in the next Table, which denotes an example of a possible Web attacking scenario:

Example Attack on well-known CMS[inject c99 shell on the CMS, as a paradigm]

Scenario • What is the particular goal: PoC, ID Theft, destroying Personal Image etc.• determine the CMS version,• determine the technical implementation type: concurrent attacking, or sequentially attacking

of specific Webapp modules• localize the modules to be compromised: Web Front-end, RDBMS, WebMail interface,

News feeder etc.• if CMS version obsolete:

• find published exploits( at best 0days16) and utilize them to gather feedback from the victim environment

• respect scanning noise as low as possible• if version is up-to-date utilize:

• blind application scanning techniques with noise reduction and wait for positive feedback

• analyze the results and proceed with further more specific attacking techniques• if success, utilize a refinement of the attack and if of interest, wait for CMS admins reaction-

gives feedback on sanitization response time, efforts, utilized hardening techniques etc.• if not successful:

• audit the gathered feedback• wait for new published 0day exploits• develop a 0day(s) independently

• utilize an scenario sequence execution loop till achieving the goal with respect to:• ( communication) attacking noise• and...try to stay concealed

Technique(s)(these should be ordered, or reordered according the attacking scenario)

XSS:* NP-XSS17

* P-XSS

SQLIA:* error response* timing SQLIA ...

CSRF CSFU Particular 0day(s)

... Common well-established

like:sniffing for open admin debugging console access on

port 1099

Procedures( these should be ordered, or reordered as appropriate)

NP-XSS:• detect dynamic modules on the Webapp,• find variables to be compromised,• craft the malicious GET- Request and

taint the input value of the variable to be exploited

• gather feedback• resemble the procedure till expected

results are achieved• spread the malicious link to as many as

possible 'Confused Deputies'[4]

Error response SQLIA:• Step 1,• Step 2,• …• Step n;

...

Table 3: Example of possible Webapp attacking scenario

16 http://netsecurity.about.com/od/newsandeditorial1/a/aazeroday.htm17 NP- XSS denotes non-persistent XSS; P-XSS abbreviates the Persistent XSS

16


How this respects the proposed profiles of modern Web intruders, should be illustrated as:

Profile Standard Intruder Intelligent Intruder

Attacking Scenario execution

static:remains on the level of published and well-established 'Web attacks'

highly dynamically adaptive18

Techniques static:(as a comment: … better watch it on

YouTube19, see [4])

could remain static, but preferably the Cyber criminal would adapt

according the successful completion of the Attacking

Scenario

Procedures static:“... just copy and pase”,

0day with less likelihood

Could be static, but preferably the Intruder should seek for a 0day(s)

Table 4: Standard vs. Intelligent Web intruder

Another important aspect, respecting the prevention and sanitization of successfully deployed Web Application Attacking Scenarios, is illustrated by Rafal Los20 in his presentation at OWASP AppSecDC in October, 2010 [10]. Main topic of his research, concerns the Execution-Flow-Based approach as a supportive technique to the Web Application security (pen-)testing. The utilization of Web Application scanners( WAS) should be determined as impressive, supporting the pen-testing job of the security professional/ ethical hacker and not to forget the intelligent intruder [11], [4]. Indeed, WAS can effectively map the attacking surface of the Webapp, intended to be compromised. Still, open questions remain, like- do WAS provide full Webapp function- and data-flow coverage, which reports greater feedback, concerning a complete security auditing of the Web construct in detail. Most of the pen-testers/ ethical hackers, do not care what kind of functions, related to the Webapp, should be tested. If they do not exactly know the functional structure and the data-flow of the Web Application, how should they consider appropriate and complete functional coverage during the pen-testing of the Webapp?

The job of the pen-tester is to reveal exploits and drawbacks in the realization of a Web Application prior to the intelligent intruder. Consequently to this, appears the next question, what are the objective parameters to designate the pen-testing job completed and well-done?

As Rafal Los states, nowadays the pen-testing of Webapps, utilizing WAS, should be still digested as “point'n'scan web application security”. The security researcher suggests in his presentation that, a more reasonable Webapp hardening approach is the combination of the Application function-/data-flow analysis with the consequent security scanning of the observed Web implementation. A valuable comparison between the Rafal Los' indicated approach and the common security testing of Webapp(s), outlining the drawbacks of the second one, is given in

Table 9, Appendix A.

18 Respecting the current level of sanitization know-how, produced attacking noise, reactions of the security professionals to sanitize the particular Webapp, the specific goal for compromising the victim Webapp

19 The author of the paper do not intend to be offensive to YouTube, nevertheless the facts are: this video on-line platform is well-established and popular, there are tons of videos, hosted on it, concerning: Classic SQLIA derivatives, XSS derivatives etc., which could be easily found and utilized by script kiddies, hacker wannabes ...

20 http://preachsecurity.blogspot.com/

17


Let's summarize these drawbacks, as follows. The current Webapp pen-testing approaches via scanning tools do not deliver adequate functional coverage of modern and dynamic high sophisticated Web Applications. Furthermore, the Business Logic of the Webapp(s) is often underestimated as a requirement for the proper pen-testing utilization. A complete coverage of the functional mapping of the Web Application could still not be approved. If the application execution flow is not explicitly conversant, the questions, regarding completeness and validity of the results from the tested data, should be denoted as open.

Therefore, Rafal Los suggests, utilization of Application-Flow Analysis( AFA) in the preparation part prior to the deployment of the specific Web Application scanning. This combination of the two approaches should deliver better results than those from the blind point'n'scan examinations. Explanation of this approach is illustrated in Figures 16, 17 and Tables 10, 11, 12, given at Appendix A. For more information, please refer to [10], or consider studying the snapshot of the live presentation[L7].

We should designate these statements as highly applicable for the better utilization of WAFO, as well. The lack of complete and precise knowledge of the functional structure and data flow of the forensically observed Webapp, should definitely detain the proper and accurate implementation of WAFO. We should keep in mind these conclusions and extend them in the following Chapters of the paper.

Let's proceed with the more detailed representation of the Web Application Forensics.

18

3.Web Application Forensics

3. Web Application Forensics

The main task, this Chapter represents, is to proceed further with the taxonomic description of WAFO, by describing the victim environment, e.g. to designate in detail the Web application in production environment. This should be specifically utilized on behalf of the facts: explaining, how Webapp forensics is applied to this environment; determining, what are the main concerning aspects to WAFO; establishing these statements via particular examples and outlining collaborative techniques, which extend the proper WAFO investigation. See again Table 2.

We proposed in the former Chapters that, utilizing WAFO on behalf of best practices and only should not be considered as reasonable. Presuming this, we should emphasize further explicitly that, trial-and-error approaches and conclusions,relying on personal experience and high-level skills, can not be approved as sufficient requirements for proper WAFO deployment.

On the one hand we discover high information abundance, concerning the prior discussed complexity aspects of RIA Webapps, on the other the impulse for applying appropriate WAFO on these high-level sophisticated applications is immense.

Once again, this confirms the need for proper taxonomy- not best-practices, presenting a recipe shaping of the Web Application Forensics investigation, but categorizations, approved to be universally valid and compact in their representation. Let's conclude the illustration of the Webapp forensics' categorization and extend the described taxonomic aspects heretofore.

Respecting the post mortem strategies, after intruder's attack is successfully accomplished and damage is presented, we specify two general approaches for Webapp sanitization- Incident Response( IR) and Web Application Forensics. In a word, the differences between them, should be outlined as follows. The remediation scenario, applied to the compromised application and focused on the regaining of the implementation's complete functionality, is the main concern of the Incident Response. In distinction to this, the Forensics investigation focuses on gathering the maximum collection of evidence, which is relevant for the IR utilization and should be employed to a court of jurisdiction, if required.

Let's demonstrate the complete overview of the Digital Forensics structure and point out the dependencies between IR and CFO, as well as, the dependencies between WAFO and the other Forensics fields. This is illustrated in the next Figure 3.

19


For the reader concerned, please refer to [12], where IR and Forensics approaches are compared in detail. More general representation on the topics IR and Forensics should be found at [1], [13], [14].

In this way of thoughts, we should derive and should specify the following fundamental questions( *), concerning WAFO:

1. how can we describe an environment as ready for Forensics investigations,

2. what evidence should we look for and

3. what is the definition of their location,

4. how can we extract the payload of the Forensics evidence raw data, concerning its proper application in the further steps of IR.

Let's designate the general procedure in the implementation of WAFO. The next Figure 4:

20

Figure 3: Digital Forensics: General taxonomy


This illustrates, respecting the universal validity, the following steps in the WAFO deployment:

• seizure- the problem should be designated,

• Preliminary Analysis- preparation for the specific WAFO investigation,

• Investigation/ Analysis loop- analyzing the collected evidence and proceeding in this manner till the collection of those is maximal and complete

In this way of thoughts, we should underscore the Standard Tasks, WAFO is utilizing, as in [15]:

1. Understand the “normal” flow of the application

2. Capture application and server configuration files

3. Review log files:

• Web Server 4. Identify potential anomalies:

• Malicious input from client

• Breaks in normal web access trends

• Application Server

• Unusual referrers

• Mid-session changes to cookie values

• Database Server 5. Determine a remediation plan

• Application

Table 5: Web Application Forensics Overview, in [15]

Let's categorize the evidence, as an argumentation to the second fundamental question, see (2,*), in Table 6:

21

Figure 4: WAFO phases, in Jess Garcia[1]


Digital Forensics evidence:

• Human Testimony • Peripherals

• Environmental • External Storage

• Network traffic • Mobile Devices

• Network Devices • … ANYTHING !

• Host: Operating Systems, Databases, Applications

Table 6: A general Taxonomy of the Forensics evidence, in [1]

To specify the source of the different Forensics evidence, see (3,*), we should clarify the 'Players', as Jess Garcia in [1], contributing to the Layer 7 communication as follows, see Table 7:

Type of 'Players': … and their Implementation in the Web traffic:

CommonNetwork Traffic

Operating Systems

Client Side ( Web) Browsers

Server SideWeb Servers

Application Servers

Database Servers

Table 7: Common Players in Layer 7 Communication, in Jess Garcia [1]

A reasonable WAFO should present an inspection/ analysis of all evidence these 'Players' produce, which consists of: inspecting the Network traffic logs( inspecting logs of supportive Applications as NIDS, IDS, IPS), analysis of the hosts OS logs( incl. HIPS, HIDS, Event logs etc.), header and cookie inspection of the users' Browsers, inspection of the Server logs, belonging to the Web Application Architecture, cache inspection etc. As we propose in the former Chapter 2, this should not be a simple task, especially when the Webapp is highly process-driven( e.g. AJAX, Silverlight, Flash etc.). This should require additional application-flow analysis, which considers an explicit knowledge, respecting the functional- and data- flow map of the Webapp. The human factor should not be underestimated in this regard. Finally, there is also the important matter of the legal aspects, related to the deployment of the WAFO investigation, which the security professional should be aware of and should maintain during the Web Application Forensics process. We should not discuss this matter in detail. The interested reader should find more information, concerning this topic at [16] and also, as already proposed, in [7]. With respect to the forth fundamental question, see (4,*), focusing on the evidence payload extraction, we should discuss this more detailed in the next

Section 3.1. of this Chapter.

To conclude this discussion, we should consent to argue the leading fundamental question, pointing out the Forensics readiness concerns, see (1,*).

22


An environment, which is not prepared for Forensics investigation in an appropriate manner:

• application logging is not present or not adequate adjusted,

• no kind of supportive forensic tools are applied to the WAFO environment( IDS/ IPS etc.),

• users are not well trained for Forensics collaboration;

could detain the Web Application Forensics investigation in a way that, the evidence collection is considerably incomplete and WAFO could not be anted to the environment, at all [1]. That's why, the matter of Forensics Readiness should be approved as fundamental in the taxonomy of WAFO, concerning the Preliminary Analysis phase of the Web Application Forensics deployment.

An illustrative example of the Forensics Readiness, should be found in [13], referenced in Appendix A, Figure 18. As we specified the general taxonomy, respecting WAFO victim environment, let's proceed with further examples, designating the deployment of different Web Application Forensics techniques. On one hand, they demonstrate in a more illustrative manner the paper's exposition; on the other, refer to the reasonable question argumentation on how WAFO payload data is gained from evidence in practice.

3.1. Examples of Webapp Forensics techniques

In this Section we should describe different cases of WAFO deployment, concerning Client Side and Server Side forensics analysis, on given real-life examples, organized as follows: main topic, possible attacks, WAFO techniques illustration.

Extraneous White Space on the Request Line

This example is discussed in [3], which provides evidence for anomalies in HTTP requests, stored in the Webapp server log. The whitespace between the requesting URL and the protocol should be considered as suspicious. In the next Figure is illustrated a poorly constructed robot, which obviously intends to accomplish a remote file inclusion:

Google Dorks

Exploiting the Google search capabilities, may be illustrated with the next search query [3]:

23

Figure 5: Extraneous White Space on Request Line, in [3]


intitle:”Index of” master.passwd

The produced evidence should appear in the server logs as follows:

The author of the book [3] states, that such requests are still very un-targeted, because of the fact that such requests are chaotic, in term of, the target is not explicitly specified in the search query. Nevertheless, they should not be considered underestimated. In respect to this, follows the next example, produced by spammers, utilizing the Google search engine for the same purpose:

Faking a Referring URL

A great21 job for faking Referrer URL22 credentials is done by spammers. In the next example the faked part of the URL is presented in the anchor identifier, which is unique for accessing different parts on the displayed web page content. Such GET requests should not be approved as valid log file entries via clicks on the Web page, because the Web server reproduces the whole Web page and do not matter explicitly about its content, thus such log entry should be determined as malicious and , once again to be mentioned, not produced by a regular Web surfing activity:

Remote File Inclusion

A good example for Common Request URL Attacks could be illustrated by the next Remote File Inclusion( RFI)23 attempt stored in the Web Server log:

The attempt to pull the well known c99 shell on the running machine on behalf of a GET Request is obvious. The c99 shell is classified as a malicious PHP backdoor. There is a great likelihood that, Web intruders try to inject and execute such kind of code on Open Source PHP Webapps, like different PHP-based CMSes, or PHP-forums. In most cases RFIs are deployed to extend the structure of compromised machines and support the utilization of botnets.

21 'great job' in terms of, discussing the algorithmic approach as security professionals and by no means as favoring the malicious intentions of the Cyber criminal

22 RFC 173823 http://projects.webappsec.org/w/page/13246955/Remote-File-Inclusion

24

Figure 6: Google Dorks example, in [3]

Figure 7: Malicious queries at Google search by spammers, in [3]

Figure 8: faked Referrer URL by spammers, in [3]

Figure 9: RFI, pulling c99 shell, in [3]


Another reason for RFI is the attempt to execute code on compromised machine and gain access to sensitive data on it.

A simple Classic SQLIA

The following general example illustrates the utilization of SQLIA [4] on a PHP Webapp on behalf of a malicious GET request:

The intruder tries to compromise the 'admin' account on the Webapp, utilizing Tautologies Classic SQLIA: ' password= ' or 1=1 - - '. To utilize: the apostrophe, the white spaces and the equals sign ASC II characters, in the GET request, these are substituted as follows: %27, %20 and %3D, via their URL Encoding representatives.

NULL-Byte-Injection

A NULL-Byte-Injection( NBI)24 could be also accomplished on behalf of a GET Request, as:

In the same manner as in the former example the Null ASC II character is URL encoded here by %00. The attack tries to compromise the Perl login.cgi -script and utilizing the NBI to open the sensitive .cgi file.

The provided examples illustrate different header inspection cases as part of the Server Side Forensics.

This list can be extended by further paradigms, related to user client Browser investigation techniques: Browser Session-Restore Forensics [17] and Cookie inspection etc. Though, we should not consider further illustrations of WAFO techniques in this section, with respect to the marginal boundaries of the term paper. The interested reader should refer to [3] and [15] for more information. Let's proceed with an example, concerning the WebMail forensics.

3.2. WebMail Forensics

Web based Mail( WebMail) represents a separate construct in an Web Application. Furthermore, many firms deploy Web based mail services, like: Yahoo, Amazon etc. Moreover, the WebMail denotes another data input source on a Webapp, therefore, the strive for compromising Web based Mail implementations still matters. The next Figure 12 illustrates a faked ( spam) e-mail:

24 http://projects.webappsec.org/w/page/13246949/Null-Byte-Injection

25

Figure 10: Simple Classic SQLIA, in [3]

Figure 11: NBO evidence in Webapp log, in [3]


This should designate the last case study in the examples exposition. The spam-mail should be considered as representative of one of the most utilized attacking techniques, concerning WebMail- e-mail spoofing. We should illustrate according to this a fragment of the mail header, see Figure 13:

26

Figure 12: HTML representation of spam-mail( e-mail spoofing)

Figure 13: e-mail header snippet of the spam-mail in Figure 12


Furthermore, a diffrent supportive attacking technique designate the e-mail sniffing, which should not be discussed in this paper. For the reader concerned, please refer to [18], [19]. The Author of the paper receives the illustrated spam-mail in January 201125. Let's demonstrate a WebMail header inspection on the given example, as in Figure 13 already shown, which should explain the e-mail stuffing attempt. On one hand, inspecting the Received- header the domain appears to be valid and belongs to facebook.com26; on the other, the Return-Path- header, as well as the X-Envelope-Sender- header reveal a totally different sender. The domain, specified there, appears to belong to a home building company in the US. Moreover, there is another domain very similar to the one in the example: 'cedarhomes.com.au'. Inspecting as next the Sender header, the sender name appears to be a common name in Australia27. The correlation of the evidence is illustrative. More important, the e-mail-spoofing attempt is identified.

A different crucial matter also concerns the discussed spam-mail. A more detailed investigation on the HTML- content of the spam e-mail, provoked by the suspicious appearance of the Hyper-Link 'here', as in Figure 12, the second row from the bottom of the HTML mask: '…, please click here to unsubscribe.'; reveals the following dangerous HTML-Tag content, see next Figure:

It appears to be that, the spam-mail is intelligently devised, as the intruder is not actually interested only in spamming the e-mail accounts. With greater likelihood, a receiver, who does not use social platforms, or just dislike to receive such e-mails, should click on the un-subscribing link, which should lead him to a malicious site. Modern versions of Mozilla Firefox Browser can detect the compromised and malicious domain 'promelectroncert.kiev.ua' and warn the Browser user right on time,as appropriate. This interesting example illustrates the argumentation, explaining why should WebMail Forensics matter.

Thus, we conclude this section and proceed to the last part of this Chapter 3, concerning aspects on collaborative approaches from the other Forensics investigation fields, supporting WAFO.

3.3. Supportive Forensics

In this section we should discuss briefly the supporting part of Network, Digital Image and (OS)-Database Forensics, extending the evidence collection for WAFO investigation. The presence of log data, derived from IDS/IPS prevention systems, supports the more precise detection of the intruders' activities on the Webapp and IP provenance. The amount of noise over the network, the intruder produces, is sufficient as described formerly, to determine properly the violator's profile. In some cases, Forensics investigations on digital images uploaded to a compromised Web Application should lead to the successful detection of the intruders' origins.

25 At this point,the author of the paper should like to express his gratitude to Rechenzentrum at Ruhr-University of Bochum, for the successful sanitization of the spam-mail, utilizing spam-assassin right on time, http://www.rz.ruhr-uni-bochum.de/ , http://spamassassin.apache.org/

26 http://www.mtgsy.net/dns/utilities.php27 http://search.ancestry.com.au

27

Figure 14: Spam-assassin sanitized malicious HTML redirection, from example Figure 12


This denotes once again the reasonable suggestion for extensive correlation of the different payload as forensic evidence, which should reduce false positives appearances in the results and consequently to this, more precise attacking detection should be achieved.

A very interesting example is pointed out in [3], page 285, concerning the Sharm el Sheikh Case Study.

At last, we should also mention the notable case, in which WAFO is detained, because of the lack of sufficient Database log data. Root for such issues could be: the proper utilization of concealing techniques a Web intruder applies to cover the attacks' traces, malfunction in the Database engine, lack of proper WAFO Readiness utilization- logging capabilities of the RDBMS are not adequate adjusted etc. In such cases the WAFO successful examination of compromised RDBMS as a Back-End to a Webapp is constitutive doubtful. Nevertheless, if the RDBMS Application Server has not been restarted since the time prior to the moment as the Attacking Scenario is executed, there is a reasonable chance to extract important forensic evidence from the RDBMS plan cache. This essential approach is discussed in detail in [16].

We discuss in this Chapter techniques for deployment of WAFO, which should be considered as manual techniques. If the observed environment is compact and the amount of sufficient evidence, could be examined by a human in acceptable time and efforts, expanding the collection of such forensic techniques is undeniably fundamental and relevant.

For all that, there are many cases, concerning modern Webapps, in which the observation of the log files exceeds the human abilities, like the capacity of logs provided by Web Scanners equal to a couple of Gigabytes[L8].

Another example is the utilization of WAFO investigation accomplished rapid in time.

In such cases the questions, concerning the utilization of automated tools, enhancing the deployment of Webapp forensics, become undoubtedly significant.

Let's introduce in the next Chapter 4 such tools, respecting WAFO automation techniques.

28

4.Webapp Forensics tools

4. Webapp Forensics tools

In [13], Jess Garcia proposes a categorization of the Forensics approaches, separating them in two classes: Traditional forensics methods and Reactive forensics methods. A good illustration of the main parameters, designating the two classes, is represented in the next Table, derived from [13]:

Traditional Forensics Approaches: Reactive Forensics Approaches:

• Slow • Faster

• Manual • Manual/ Automated

• More accurate( if done properly) • Risk of False Positives/ Negatives

• More forensically Sound • Less forensically Sound( ?)

• Older evidence • Fresher evidence

Table 8: Traditional vs. Reactive forensics Approaches, in [13]

According to the examples in Chapter 3, we should clarify that, the detection of those could be established only by well trained security professional in acceptable matter of time. Manually deployed WAFO investigations should be determined as very precise with less false tolerance, though only if applied appropriate. As mentioned above, the complexity of the current Web Attacking Scenarios drives the investigation process to be unacceptable, respecting the time aspect. Business Webapps do not tolerate down-time, which is undoubtedly required that, the Webapp image should be processed for reasonable WAFO. This designates the dualistic matter of Web Application Forensics investigation: slow and precise versus faster and error prone.

On one hand WAFO should be deployed uniquely for every single case of compromised Webapp, on the other the utilization of new techniques, as employment of automated tools in the WAFO investigation, should gain without a doubt new( 'fresher') forensic evidence. This is very important, concerning the maximal Forensics evidence collection, as already proposed. In this way of thoughts, we should explain the fact that, the utilization of new automated techniques in WAFO, is only acceptable in case of the proper training prior to their implementation in production environment. It is crucial to know the particular features of the automated tool, which should be utilized; to know the reactions of the Webapp environment as the tool is implemented to it; to know the level of transparency, concerning the distance between the raw log files data and the tool's feedback as evidence payload etc. Let's illustrate some of the fundamental requirements parameters, considering WAFO automated tools as appropriate for their enforcement in the Forensics investigation process.

4.1. Requirements for Webapp forensics tools

An essential categorization of the requirements for WAFO automated tools is given by Robert Hansen in [L9]. We should designate them as tool's requirements rules( TRR), as follows:

1. an automated tool candidate for WAFO should be able to parse log files in different formats

2. it should be able to take two independent and differently formatted logs and combine them

29


3. the WAFO tool must be able to normalize by time

4. it should be able to handle big log files in the range of GiB

5. it should allow utilization of regular expressions and binary logic on any observed parameter in the log file

6. the tool should be able to narrow down to a subset of logical culprits

7. the automated tool should allow implementation of white-lists

8. it should allow a probable culprits' list construction, on which basis the security investigator should be able to pivot against

9. it should be also able to maintain a list of suspicious requests, which should indicate a potential compromise

10. the WAFO tool should utilize, decoding of URL data so that, it can be searched easier in readable formate

As we should experience in the further Sections of the Chapter, we should consent that, the compliance of the heretofore enumerated requirements is still unfeasible.

Let's represent a short explanation of them, which should define them as an appropriate constitutive basis.

No matter, if the specific tool imply all of these requirements, or not, this should support a more appropriate categorization of its skills and utilization area(s). As current Webapps require, with reasonable likelihood, more than one different Web- Servers( for example), parsing the different log formats, could be not an easy task. This is a fundamental reason to decide, whether it is more appropriate to utilize specialized tools, related to the specific logfile format, or to seek further for an application, with wide variety of supported log- data formats. Two sufficient candidates are: Microsoft IIS file format and Apache Web server log data format28. In this way of thoughts, the concern is important, stressing the fact, how to combine the raw data from such concurrent running different Web- Servers to achieve a better correlation of the evidence, provided by the proper extraction of the payload from their log data.

Furthermore, to outline coincidences, we should consider proper investigation on time-stamps.

A normalization on time is crucial.

The matter of the current amount of collected log files is discussed enough heretofore and clearly sufficient.

The aspects, explaining the utilization of Regular Expressions, should be designated as crucial too. To illustrate this, let's mention the fact, respecting the differences in the implementations of Regular Expressions on Black- Lists basis and those on White- Lists basis, which employs a further parameter in the requirements list. The white- listing utilization should concern cases, in which the traced payload should express a well-defined construction. If the observed input string differs from this limited form, it should be outlined as suspicious. Example, Regular Expressions( RegEx) for filtering of tamper data of input fields in Webapp as login-ID from an e-mail type.

On the contrary, the Black- listing specifies, what kind of construct is wrong and suspicions as default. Such filters could be eluded in a simple manner by altering in an appropriate way of the injection code, so the RegEx should fail with greater likelihood to detect it properly. It is a very

28 Statistics for the utilization of the different Web- Server should be found at: http://news.netcraft.com/

30


controversial task to define a Black- List RegEx, which is covering a class of malicious strings and sustain precise('fresh'). Furthermore, it is a challenge to implement a forensics tool with minimal and compact collection of malicious signature, which should be able to sustain universally valid. Probability analysis, supporting a right on time detection of malicious signature, is a further challenging topic.

Moreover, it should be very useful, if the tool is expendable by the forensics investigator, in terms of the security professional is allowed to refresh and update the list of malicious payload detecting RegExes manually. In the examples in Section 3.1. and 3.2. the illustration of the importance of proper URL- Encoding is designated and requires no further discussion on it.

These conclusions advocate the statement that, TRR1 up to TRR 10 are relevant and fundamentally important for proper WAFO.

Let's present a couple of interesting examples of particular WAFO automated tools candidates in the next Sections 4.2. and 4.3.. As tools' requirements basis is already specified, we should classify the tools in general into Open Source and proprietary ones and describe an appropriate tuple of those accordingly.

4.2. Proprietary tools

As we discuss Business related Webapps as sufficient criterion, we should describe at first the Business-to-Business implementations of WAFO automated tools. Current representatives in this class should be enumerated as follows: EnCase[L10], FTK[L11], Microsoft LogParser[L12], Splunk[L13] etc. According to the WAFO tools requirements the author of the paper deliberates the following favorites in this category, see further.

Microsoft LogParser

This forensics tool is developed by Gabriele Giuseppini29. A brief history of MS LogParser is given at [L15],[L16]. The application can be obtained and utilized for free, see [L12], though as [L14] Microsoft rather designate it as “skunkware” and dislike to give an official support for it. Current version of the tool is LogParser 2.2, released at 2005. An unofficial Support site, concerning the tool should be found at: www.logparser.com30.

The parser includes in general the following 3 main units: an input engine, a SQL-like query engine core and an output engine. A good illustration of the tool's structure is given at [L16], see further Appendix B, Figure 19. MS LogParser utilizes support for many autonomous input file formats: IIS log files( Netmon-Capture-logs), Event log files, text files( W3C, CSV, TSV, XML etc.), Windows Registry databases, SQL Server databases, MS ISA Server log files, MS Exchange log files, SMTP protocol log files, extended W3C log files( like Firewall log files) etc. Another achievement of the tool is, it can search for specific files in the observed file system and also search for specific Active Directory objects. Furthermore, the input engine can combine payload of the different input file formats, which allows a consolidated parsing and data correlation, thus TRR 1 and TRR 2 are satisfied. Acceptable input data types are: INTEGER, STRING, TIMESTAMP, REAL, and NULL,

29 http://nl.linkedin.com/in/gabrielegiuseppini30 Unluckily at the present moment this site seems to be down.

31


which satisfies TRR 3. As [L17] parsing of the input data is achieved in efficient time, which designates another positive feature of the tool. As the data is supplied to the core engine, the Forensics examiner is allowed to parse, utilizing SQL-like queries. As default, this is implemented on behalf of a standard command line console, explicitly explained in [21]. Before illustrating this via example, let's mention that, there should be unofficial front-ends providing more user-friendly GUIs, like: simpleLPview0031. However, as the domain logparser.com seems to be down at the paper's development phase, the author of the paper isn't able to utilize tests on the GUI front-end. For the reader concerned, the GUI versions of MS LogParser aren't limited to that front-end. Developers can extend the MS LogParser UI via COM- objects, see [L15], which enables the Forensics professional to extend the tool's abilities by programming custom input format plug-ins. Let's illustrate the MS LogParser syntax, see [L15]:

C:\Logs>logparser "SELECT * INTO EventLogsTable FROM System" -i:EVT -o:SQL -database:LogsDatabase -iCheckpoint:MyCheckpoint.lpc

The following example, represents a SQL-like query, where the input file format specified by -i concerns the MS Event logs; the output format is SQL, which means the results are stored in a database and could be filtered further as appropriate. An important option is -iCheckpoint, which designates the ability for setting checkpoint on the log files and thus achieve an incremental parsing, on the observed log data, which increases the efficiency on parsing large log files and satisfies in some way the TRR 4. The next example demonstrates, see [L15]:

C:\>logparser "SELECT ComputerName, TimeGenerated AS LogonTime, STRCAT(STRCAT(EXTRACT_TOKEN (Strings, 1, '|'), '\\'), EXTRACT_TOKEN(Strings, 0, '|')) AS Username FROM \\SERVER01 \Security WHERE EventID IN (552; 528) AND EventCategoryName = 'Logon/Logoff'" -i:EVT

a simple string manipulation, which could be extended by RegExes and satisfies TRR 5, 7.

Different interesting paradigms can be found at [15], [L15], [L16], [L17].

Another notable aspect of the MS LogParser is its ability to execute automated tasks. One approach is to write batch-jobs for the tool and make system scheduler entries for their automated execution, please consider [L14]. Furthermore, the examiner can utilize windows scripting on MS LogParser, as [L17]. Appendix B, Figure 20 illustrates this. The standard implementation scenario is given as follows, see [L17]:

• register the LogParser.dll

• create the Logparser object

• define and configure the Input format object

• define and configure the Output format object

• specify the LogParser query

• execute the query and obtain the payload

This briefly introduction of the MS LogParser demonstrates its mightiness without a doubt. However, we should consider the tool as appropriate only concerning MS Windows based

31 http://www.logparser.com/simpleLPview00.zip

32


environments, such as .asp, .aspx, .mspx Web applications.

An open question remains, regarding the proper examination of Silverlight implementations. Another possible issue could be the iCheckpoint, configuring the incremental parsing jobs. Locating the .lpc configuration file(s) could easily lead the intruder to the log files, related to the forensics jobs, which should be exploited straight ahead.

Splunk

This tool is developed and maintained by Splunk Inc32. It's current stable release is 4.2.2, 2011. Although the professional version of the tool is high priced, there is a test version for the limited time of 30 days and a bounded amount of parsed log files up to 500 MB. The test version can be employed for free. Furthermore, there is a community support, concerning Splunk as a mailing list and Community Wikipedia, hosted on the Splunk Inc. domain. Official support regarding Splunk documentation, version releases and FAQ/ Case studies is presented at the tool's website, which require a free registration.

Another advantage of Splunk is the on-the-fly Official-/Community-IRC-Support. Next interesting feature is the users' and official professionals' uploaded Video-tutorials, demonstrating specific usage scenarios and case studies.

The tool has wide OS support: Windows, Linux, Solaris, Mac OS, FreeBSD, AIX and HP-UX. Splunk can be considered as a highly hardware consuming application33. It was tested on an Intel Pentium T7700 with 3GB of RAM machine under Windows XP Professional SP3 and Ubuntu Linux 10.04 Lucid Lynx. In both of the cases the setup runs flawlessly with less additional installation effort on the user's side. After successful installation Splunk registers a new user on the host OS, which can be deactivated. The tool represents a python based application. It antes a Web server, an OpenSSL server and an OpenLDAP, which interact with the different parsers for input data. The configuration of the different Splunk elements is implemented via XML, which allows them to be userfriendly adjusted. Splunk has even greater input format support than MS LogParser, which designates the tool as not only OS independent, but also input format all-round. An interesting combination of Splunk with Nagios is discussed at [L18]. A screenshot of the official deliberated features of the tool is illustrated at Appendix B, Figure 21. These aspects relate to the TRR 1, 2, 3, 4, 5. TRR 7, 9, 10 should be tested more extensive in particular.

The user interaction with Splunk is utilized via common Web-browser. The different Splunk elements are organized on a dashboard, which allows to be reordered and organized in an user-friendly manner.

Let's represent more detailed the main Splunk units. Their description is based on [L19], which concerns Splunk version 3.2.6. Although after ver. 4.0 Splunk is completely rewritten, the main Business logic units sustain.

In general, the idea behind this tool is not only to parse different log file formats and support different network protocols, but also to index the parsed data. Thus, the tool impersonates a valuable search engine, like those largely known nowadays on the Internet. This allows the user to accomplish more userfriendly and precise searches on specific criterion. Indeed, the query responses from the tail- dashboard are significantly fast. Intuitively, we designate the first Splunk

32 http://www.splunk.com/33 http://www.splunk.com/base/Documentation/latest/installation/SystemRequirements

33


unit- the index engine. It supports SNMP and syslog as well. Consequently to this, the second unit represents the search core engine. One can include different search operators on specific criterion like Boolean, nested, quoted, wildcards, which respects as already stated TRR 5 and 7. The third unit is the alert engine, which somehow satisfies TRR 9. The notifications can be sent via RSS, email, SNMP, or even particular Web hyperlinks. In addition to this, the fourth unit implements the reporting ability of Splunk, TRR 2 and 3. On a specific prepared dashboard the user/ forensic examiner can not only gain detailed results on the parsed payload in text formate, but also gain derived information as interactive charts and graphs, and specific formated tables according to the auditing jobs. These are well illustrated in Appendix B, Figure 22. An interesting example describes the reporting abilities of Splunk to detect JavaScript onerror entries, on behalf of user-developed json- script, see [L22].

The fifth and last unit represents the sharing engine/ feature of Splunk. This explains the strive for users' collaborative work on behalf of this tool, where as know-how exchange is encouraged. Another motivation for this unit is a distributed Splunk environment, where not only single instance of Splunk is serving the specific network. Further abilities of the forensic tool should be mentioned as: scaling the observed network and security of the parsed data.

This last feature is important to be discussed more detailed. An open question remains, as denoted at MS LogParser, whether the tool is hardened enough on itself, considering the fact that, the large payload data it is not only indexed, but also userfriendly represented. As Splunk is without a doubt an interface to every log file and protocol on the observed network, it is highly likable to get this bonding point compromised. If an attacker succeeds in this matter, one can get every detail, related to the observed network represented in an userfriendly format, which disburdens the intruder to collect valuable payload data and minimizes his/ hers penetration efforts. As the Splunk front-end is represented via Web-browser, intuitively the reader concerned, can notice that,

CSRF [4] and CSFU [L20] could be respectable candidates for such attacking scenarios, especially combined with DOM based XSS attacks [20], [L21], which can trigger the malicious events in the Browser engine. If such scenarios could be achieved, then Splunk could alter into a favorite jump-start platform for exploiting secured networks, instead of to be utilized as appropriate forensic investigation tool. This designates an essential aspect, concerning the future work on WAFO. We should not extend this discussion further as it goes beyond the boundaries of the present paper.

Let's introduce the selected Open Source WAFO tools, as mentioned above.

4.3. Open Source tools

At first let's describe PyFlag.

PyFlag

As at the previously described tool, there is a team behind the PyFlag development: Dr. Michael Cohen, David Collett, Gavin Jackson. The tool's name represents an abbreviation of: python based Forensic and Log Analysis GUI. PyFlag is another python implementation of forensic investigation tools, which utilizes as a Front-end for the user the common Web-browser. Current version of the

34


tool is pyflag-0.87-pre1, 2008. The tool is hosted at SourceForge34 and as an Open Source App it can be obtained for free under the GPL. A support site is considered to be www.pyflag.net. This domain also stores the PyFlag Wiki with presentations of the tool and video tutorials. A different advantage is the predefined for examination forensics image also hosted on the support site. This image can be employed for the purposes of training on forensic investigation.

The general structure of the tool can be described as follows. The python App antes a Web Server for displaying the parsing output and further, the collected input data is stored in a MySQL server, which allows the tool to operate with large amount of log files code lines, respecting TRR 4. The IO Source engine designates the interface to the forensic images, which enable the tool to operate with large scale input file types, as Splunk.

As the observed image is loaded by the Loader engine in the Virtual File System, different scanners can be utilized for gaining the forensic relevant payload from the raw data. For the reader concerned, please refer to [L26]. The main PyFlag data flow is illustrated in the next Figure 15:

PyFlag is natively written to support Unix-like OSes. A Windows based port is currently presented on the support Web site, PyFlagWindows35. This makes the tool OS independent as well. The PyFlag developers state that, the tool is not only a forensic investigation tool, but rather a rich development framework. The tool can be used in two modes: either as a python shell, called PyFlash, or as a userfriendly Web-GUI. The installation process requires some user input; more detailed, common installation routines like: unpacking the archive to a destination on the host OS, configuring the source via ./configure on Linux systems, checking for dependency issues and utilizing the make install, are demanded.

The first start of the tool requires from the forensics investigator to configure the MySQL administrative account and the Upload directory. This location is crucial for the forensics images, which should be observed. In general PyFlag represents: a Web Application forensic tool( log files) , Network forensic tool( capture images via pcap) and an OS forensic investigation tool. As we denoted in the introduction of the paper, we should only concentrate on the log files analysis by PyFlag, discerning its other features considering NFO and OSFO( Operating System Forensics). The authors of the tool encourage the forensics investigators to correlate the different evidence from WAFO, NFO and OSFO, as it was proposed already before.

34 http://sourceforge.net/35 http://www.pyflag.net/cgi-bin/moin.cgi/PyFlagWindows

35

Figure 15: Main PyFlag data flow, as [L26]


More detailed, PyFlag supports variety of different and independent input file formats like: IIS log format, Apache log files, iptables and syslog formats, respecting TRR 1, 2 and 3. The tool supports also different kind of level of the formats customization, e.g. Apache logs can be formatted by default, or customized by the security professional.

Let's explain this. After the installation is completely set up, the user can work with the Browser-Gui PyFlag environment. For analyzing a specific log file, PyFlag presents presets, which are templates, allowing to parse a collection of a specific class log files, e.g. IIS log file format. The preset controls the driver for parsing the specific log as appropriate. As standard routine for an IIS log file analysis set up is described in [22], as follows:

• Select “create Log Preset” from the PyFlag “Log Analysis”- Menu

• Select the “pyflag_iis_standard_log” file to test the preset against

• Select “IIS” as the log driver and utilize the parsing

A more extensive introduction to the WAFO utilization of the tool is presented at Linux.conf.au, 2008, please consider watching the presentation video [L23]. After the tool starts to collect payload data from the input source, the Forensics investigator can either employ pre-defined queries and thus minimize the parsing time on-the-fly, or wait for complete data collection. The data noise in the obtained collection could be also reduced via white-listing, as TRR 7. Moreover, after data is collected, the examiner can apply index searching via natural language like queries, comparatively to Splunk. These features explain the efficient searching by the PyFlag. Another interesting aspect of the tool is the implementation of GeoIP36( Apache). It can be either obtained from the Debian repository, which presents a smaller GeoIP collection, or downloaded from the GeoIP website as a complete collection. GeoIP allows to parse the IPs and Timestamps and correlate them to the origin location of the GET/POST requests in the log file. This respects TRR 3. The tool can also store the collected evidence payload in output formats like .cvs, which explains its utilization as a Front-end to other tools applied to the investigation. An illustration of the PyFlag Web-GUI is given at Appendix B, Figure 23. To conclude the tool's description, we should mention once more time the open question of possible compromising of the Web-GUI as explained at the Splunk representation. A well known attack concerning HTTP Pollution on ModSecurity37 is presented by Luca Carettoni38 in 2009, where the IDS is exploited by an XSS instead of utilizing an image upload to the system. As mentioned above this advocates the fact that, the tool should be revised for such kind of exploits and especially rechecked for possible DOM based XSS exploits, concerning its own source.

Apache-scalp or Scalp!

This tool should be considered as explicitly WAFO investigation tool. Scalp! is developed by Romain Gaucher and the project is hosted on code.google.com. Its current version is 0.4 rev. 28, 2008. The tool is the only one of the described above, which definitely deploys RegExes. It represents a python script, which can be run in the python console on the common OSes and makes it OS independent. The tool is published under the Apache License 2.0 and is specified for parsing especially Apache log files, which denotes its usability only on the class of these log files and does not respect TRR 1 and 2. It is tested only on a couple of MiB log files, which disrespects further

36 http://www.maxmind.com/app/mod_geoip37 http://www.modsecurity.org/38 http://www.linkedin.com/in/lucacarettoni

36


TRR 4. The tool's developer states that, it is possible to release a C++ based version of the tool, considering more efficient parsing of the log data. This should be a main topic of the Scalp! future work.

For all that, a great advantage of the tool is the utilization of the RegEx PHP-IDS39 filter, which is nowadays perhaps one of the most powerful implementations for detecting Web-Attacking Vectors' fingerprints. That's why, choosing this tool makes it very reasonable, because it is the only one that, states explicitly for parsing log files against known Web 2.0 security culprits like: SQLIA, XSS, CSRF, LFI, DoS, Spam etc. The usage of the tool is straight forward as on every other common command line pen-testing tool. Let's illustrate some of the running modes of Scalp!, as [L24]:

• exhaustive mode: the tool will test the complete log file, or a log file snippet and will not break on the first payload pattern,

• tough mode40: will parse the log data on behalf of PHP-IDS RegEx, which will reduce false positives in the output, respecting TRR 5, 7 and 9,

• period mode: valuable for parsing on time-frame limitation at the supplied log file, which satisfies TRR 3

• sample mode: the tool parses only on an a priory specified sample the log file and ignores the rest of the input data, respecting TRR 7,

• attack mode: a very important mode, where the forensics investigator can parse the log file against known Web attacking vectors, which satisfies in these cases- TRR 6.

The output format of the Scalp! is designated as: TEXT, XML, HTML, see Appendix B, Figure 24.

Let's provide an example of the apache-scalp usage, as [L24]:

$./scalp-0.4.py -l /var/log/httpd_log -f ./default_filter.xml -o ./scalp-output --html

We should emphasize here that, the leading symbol '$' is typical for user driven Unix- like terminal command line consoles. This is ,further, not required for utilizing the proper syntax in Scalp!

The -l option specifies the Apache Web Server log file, the -f parameter specifies the inclusion of the Filters file, in this case an up-to-date PHP-IDS RegEx41, the -o option specifies the output file and –html denotes its type, illustrated in Figure 24.

As a conclusion to this paragraph, we should point out that Scalp! is a specific WAFO tool, perhaps someone can even classify it as limited, though it is pretty useful for parsing log files of the Apache Web Server class against known Web 2.0 culprits. This tool could be an excellent addition to Splunk, or PyFlag, as both of them can parse the huge log files efficiently and prepare appropriate output for apache-scalp, so the Forensics investigator can further parse it more specifically on behalf of the mighty PHP-IDS RegEx. This combination of the tools is recommended, concerning proper WAFO on the Apache Web Server.

Let's summarize once again the TRR completion on the presented tools, see Table 13, Appendix B.

39 https://phpids.org/40 It is not straight-forward, whether this satisfies completely TRR 6, TRR 8 and TRR 1041 https://dev.itratos.de/projects/php-ids/repository/raw/trunk/lib/IDS/default_filter.xml

37

5.Future work

5. Future work

In the paper's exposition are outlined several aspects, which require future work.

Let's summarize these in a more systematic way in the current Chapter. We should follow a simple schema to achieve this: conclude what is done so far, specify what is not done, suggest what else should be done in the future.

The complexity of current Web applications is denoted above and is obvious. Consequently to this, we describe the sophistication of current Web attacking scenarios, as well. The conclusion that, best-practices should not be determined as sufficient approaches, conducting a proper utilization of Web application security, or Webapp Forensics, is reasonable. This advocates the fact that, an adequate WAFO taxonomy is highly required.

The proposals and schemae for classifying the different WAFO aspects and WAFO in general are unambiguously compact. First of all, these are related to the topic without a doubt. The requirements, that the classifications should be fundamental and universally valid, are respected. There are no tautologies, nor redundancies in their construct. Furthermore, a classification on the requirements rules, in terms of the proper choice of WAFO automated tools, is also designated. The TRRs, enumerated in Section 4.1 are fundamental and specify a compact collection as required.

For all that, the approval, whether the proposed WAFO taxonomy is complete, is not demonstrated. This aspect represents a great opportunity for future work on the field of WAFO in general.

Another important matter is the automation of Webapp Forensics. Once again, the estimation, whether these rules enumerate a complete group of sufficient requirements, is not presented and should not be considered at this stage as approved, concerning the aspect of completeness. TRR6 and TRR8 are essential, proposing future work on them. As Table 13, Appendix B, there is no deterministic conclusion, whether the observed tools in this paper satisfy all of the mentioned above rules, or not.

We denoted in the exposition part of the paper, that several modern Web attacks and constructed Attacking scenarios by their application, are in debatable amount of cases unfeasible to be fingerprinted as a stable static abstract construct. Representatives in this group, once again, are CSRF and CSFU, especially combined with XSS attacks, like DOMXSS. This stresses the question, whether we should be able to develop every single 'logical culprit' and pivot against it, as TRR 6. Additionally to this, the 'probable culprits'( TRR8) should represent a next great challenge for security professionals.

In this way of thoughts, a different issue, pretending to be a reasonable candidate for WAFO future work is the implementation of strong RegEx filters in the WAFO tools. From the group of the observed WAFO scanners in this paper, only Scalp! exclusively uses a strong RegEx collection, provided and regularly maintained by the PHP-IDS project. We should stress that, concerning the other tools this should not be approved. Of course this feature can be applied by the Forensics examiner according the particular cases. Though, the approach presented at Scalp! is more reliable. In a word, the decision making after observing the obtained log files data belongs still to the Forensics investigator.

As Table 13 in concrete illustrates, there is no currently available automated tool, which can utilize Application Flow Analyses on an particularly observed RIA Web application.

39

5.Future work

If we should approve the intruders' environment as non-deterministic, then we should strive for precise image of the victim environment, e.g. the Web application. This conclusion advocates the next task for WAFO future work- the evaluation of Application Flow Analysis, as in Section 2.3 is already proposed. Whether future implementations of fully automated WAFO tools, able of decision making, is possible or not, the more important issue is to detain the “point'n'scan web application security”, and instead of consider an AFA: complete function- and data-flow mapping of the observed victim environment, combined with a consequent precise application scanning for detecting properly the real-world culprits, which compromise the modern Web applications.

At last, let's specify the mentioned above 'decision making' problem concerning WAFO automated tools. As designated above, the Forensics professional should still deploy the following eventful tasks, respecting the proper WAFO tools application: set-up the tool, update the filtering rules, trace the Forensics image by them, do the proper decisions. This denotes the statement- WAFO tools demonstrate a semi-automated Forensic-investigation, see previous Chapter 4, Table 6.

If we would like to suggest fully automated WAFO tools' solutions, this means we should like to solve an Assignment problem( optimization problem). More detailed, the WAFO tool should be able to utilize iterative tasks in the input data filtering process. Further, it should be able to reorder the up-to-date scanning filters, producing less false positives and better Attacking detection. In some cases, filters should be optimized and should require restructuring, or even to be specified once again from scratch, to meet the TRRs completely. Even though, as already stated, new Attacking techniques establish their existence on the Web once and again, therefore the tools should be able to construct in some cases entirely new detection filters. For all that, the open questions sustain: how rules should be outlined as obsolete and what should be the comparative criterion that 'fresher' filters should derive from, implementing a fully automated execution without human interactions.

An important criterion should be the evaluation of the impact of a successfully accomplished Web attacking technique(s). On behalf of that, the tool should be able to re-adjust and re-new the filtering collection, employed in the automated part of the WAFO investigation.

40

6.Conclusion

6. Conclusion

The current term paper represents an overview on the complex topic, concerning Web Application Forensics.

The main goal denotes a systematic approach to Webapp Forensics. This is achieved by proposing categorizations, which do consider to preserve universal validity and sustain fundamental on behalf of estimating the aspects of compactness and completeness.

As a consequence to this, the following particular tasks are accomplished as follows: localization of WAFO in the construct, describing Digital Forensics in general; designating the WAFO security Model, explained by defining the profiles of the current Web Application intruders and classification of the present Web Attacking Scenarios. Thus, a complete illustration of the intruders' environment is proposed.

Furthermore, a description of the victim environment is suggested on behalf of the discussion on the aspects: WAFO deployment phases, WAFO general tasks representation, WAFO evidence taxonomy and WAFO 'Players' classification. The matters, pertaining to the questions on automation of the Web Application Forensics investigation, are also deliberated. A fundamental and compact collection of requirements, related to the automated tools and their proper implementation in the WAFO investigation, is proposed. As stated in Chapter 5., an open question, outlining the approval of this requirements list, sustains.

The second objective, concerning modern aspects in the Web Application Forensics, is also covered by the discussion on the Web Application penetration testing trends and their valuable application on WAFO in the last section of Chapter 2 and at the end of Chapter 4.

A more practical approach is also presented, as WAFO illustrative paradigms and case studies are enumerated by examples.

Thus, the thesis of the current paper is covered, at least for the present time of the term paper's development.

The abstractions and categorizations in this paper should be denoted as challenging in the future, considering their redefinition and accuracy aspects, and evaluation on behalf of their compactness and completeness.

41

Appendixes

Appendix A

Application Flow Analysis

Q/A TEAM INFOSECURITY TEAM

Functions known Functions unknown

Application understood Application unknown

Rely on functional specifications Rely on crawlers + experience + luck

Coverage known Coverage unknown

Highlight key business logic Highlight “found” functionality

Table 9: Functional vs. Security testing, Rafal Los [10]

EFD Execution Flow Diagram – Functional paths through the application logic

ADM Application Data Mapping – Mapping data requirements against functional paths

Table 10: Standards & Specifications of EFBs, Rafal Los [10]

Graph(s) of flows through the application

Nodes represent application states

Edges represent different actions

Paths between nodes represent state changes

A set of paths is a flow

Table 11: Basic EFD Concepts [10]

Execution Flow Action Something that causes a change in state

A human, server or browser-driven event

Action Types Direct

Supplemental

Indirect

Table 12: Definition of Execution Flow Action and Action Types, Rafal Los [10]

42

43

Figure 16: Improving the Testing process of Web Application Scanners, Rafal Los [10]

Figure 17: Flow based Threat Analysis, Example, Rafal Los [10]

WAFO victim environment preparedness

44

Figure 18: Forensics Readiness, in Jess Garcia [13]

Appendix B

Proprietary WAFO tools

MS LogParser

45

Figure 19: MS LogParser general flow, as [L16]

Figure 20: LogParser-scripting example, as [L17]

Splunk

46

Figure 21: Splunk licenses' features

47

Figure 22: Splunk, Windows Management Instrumentation and MSA( ISA) queries, at WWW

Open Source WAFO tools

PyFlag

apache-scalp or Scalp!

48

Figure 23: PyFlag- load preset and log file output, at WWW

Figure 24: apache-scalp or Scalp! log file output( XSS query), as [L25]

Results of the tool's comparison

TRR completion on the presented tools:

TRR Web Forensics tools

MS LogParser Splunk PyFlag Scalp!

1 +42 +43 +44 X45

2 + + + X

3 + + + +

4 + + + X

5 + + + +

6 ? + + +

7 ? ? ? +

8 ? ? ? ?

9 ? + ? +

10 ? + ? ?

Table 13: TRR completion on LogParser, Splunk, PyFlag, Scalp!

Legend: +- denotes, is definitely presented

X- denotes, is not presented

?- denotes, is not explicitly officially stated, which requires future research for approving these aspects( features)

42 Supported input formats: IIS log files( Netmon-Capture-logs), Event log files, text files( W3C, CSV, TSV, XML etc.), Windows Registry databases, SQL Server databases, MS ISA Server log files, MS Exchange log files, SMTP protocol log files, extended W3C log files( like Firewall log files) etc.

43 Supported input formats: all possible, WAFO related, input data types44 Supported input formats: all possible, WAFO related, input data types45 Supported input formats: Apache log files

49

List of links

L1 LayerOne 2006, Andrew Immerman, Digital Forensics: http://www.youtube.com/watch?v=N8whBp2cp6A

L2 NIST Colloquium Series, Digital Forensics:http://www.youtube.com/watch?v=9DKJ6gP5lJY

L3 Cloud Computing: http://csrc.nist.gov/groups/SNS/cloud-computing/

L4 'MonkeyFist' Launches Dynamic CSRF Web Attacks:http://www.darkreading.com/insider-threat/167801100/security/application-security/218900214/index.html

L5 Hacker Hits British Navy Website with SQL Injection Attack: http://www.whitehatsec.com/home/news/10newsarchives/110810eweek.html

L6 LizaMoon Mass SQLIA:http://www.eweek.com/c/a/Security/LizaMoon-Mass-SQL-Injection-Attack-Points-to-Rogue-AV-Site-852537/

L7 Rafal Los, Into the Rabbit Hole: Execution Flow-based Web Application Testing:http://www.youtube.com/watch?v=JJ_DdgRlmb4&feature=related

L8 Jeremiah Grossman, Our infrastructure -- Assessing Over 2,000 websites:http://jeremiahgrossman.blogspot.com/2010/09/our-infrastructure-assessing-over-2000.html

L9 Robert Hansen, Web Server Log Forensics App Wanted:http://ha.ckers.org/blog/20100613/web-server-log-forensics-app-wanted/

L10 EnCase:http://www.guidancesoftware.com/

L11 FTK:http://accessdata.com/products/forensic-investigation/ftk

L12 Microsoft LogParser:http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=24659

L13 Splunk:www.splunk.com

L14 Steve Bunting, Log Parser (Microsoft), The "Swiss Army Knife" for Intrusion Investigators and Computer Forensics Examiners:http://www.stevebunting.org/udpd4n6/forensics/logparser.htm

50

L15 Professor Windows - May 2005 Gabriele Giuseppini , How Log Parser 2.2 Works:http://technet.microsoft.com/en-us/library/bb878032.aspx

L16 Marc Grote, Using the Logparser Utility to Analyze Exchange/IIS Logs:http://www.msexchange.org/tutorials/using-logparser-utility-analyze-exchangeiis-logs.html?printversion

L17 The Scripting Guys, Tales from the ScriptJanuar 2005:http://www.microsoft.com/germany/technet/datenbank/articles/600634.mspx

L18 Splunk:Next time your beeper goes off, turn to Splunk. http://www.nagios.org/products/enterprisesolutions/splunk

L19 Bill Varhol, What the Splunk?:http://www.ethicalhacker.net/content/view/206/2/

L20 Petko D. Petkov, Cross-site File Upload Attacks:http://www.gnucitizen.org/blog/cross-site-file-upload-attacks/

L21 Mario Heiderich, DOMXSS - Angriffe aus dem Nirgendwo:http://it-republik.de/php/artikel/DOMXSS---Angriffe-aus-dem-Nirgendwo-3565.html?print=0

L22 Carl Yestrau, JavaScript Error Logging with Splunk:http://www.splunk.com/view/SP-CAAACJK

L23 Gavin Jackson, Incident Response using PyFlag - the Forensic and Log Analysis GUI:http://mirror.linux.org.au/linux.conf.au/2008/Thu/mel8-099a.ogg

L24 Romain Gaucher, Apache-scalpHow it works:http://code.google.com/p/apache-scalp/

L25 Scalp: Logfile-Analyzer findet Web-Angriffe:http://www.linux-magazin.de/NEWS/Scalp-Logfile-Analyzer-findet-Web-Angriffe

L26 PyFlag manual:http://pyflag.sourceforge.net/Documentation/manual/index.html

Table 14: List of links

51

Bibliography[1] Jess Garcia, Web Forensics, 2006[2] Dominik Birk, Forensic Identification and Validation of Computational Structures in

Distributed Environments, 2010[3] Robert Hansen, Detecting Malice, 2009[5] Anoop Singhal, Murat Gunestas, Duminda Wijesekara, Forensics Web Services(FWS),

2010[4] Krassen Deltchev, New Web 2.0 Attacks, 2010[6] Kevin Miller, Fight crime. Unravel incidents... one byte at a time., 2003[7] BSI, Leitfaden "IT-Forensik", 2010[8] Ory Segal, Web Application Forensics: The Uncharted Territory, 2002[20] Shreeraj Shah, Hacking Browser's DOM - Exploiting Ajax and RIA, 2010[9] Joe McCray, Advanced SQL Injection, 2009[10] Rafal Los, Into the Rabbit Hole: Execution Flow-based Web Application Testing, 2010[11] Larry Suto, Analyzing the Effectiveness and Coverage of Web Application Security

Scanners, 2009[12] Felix C. Freiling, Bastian Schwittay, A Common Process Model for Incident Response

and Computer Forensics, 2007[13] Jess Garcia, Proactive & Reactive Forensics, 2005[14] Lance Müller, User Panel: Forensics & Incident Response It’s important to have options!,

2008[15] Rohyt Belani, Chuck Willis, Web Application Incident Response & Forensics: A Whole

New Ball Game!, 2007[16] Edgar Weippl, Database Forensics, 2009[17] Harry Parsonage, Web Browser Session Restore Forensics, 2010[18] William L. Farwell, Email forensics, 2002[19] Thomas Akin, WebMail Forensics, 2003[21] Gabriele Giuseppini et al., Microsoft Log Parser Toolkit: A Complete Toolkit for

Microsoft's Undocumented Log Analysis Tool, 2005[22] Dr. Michael Cohen et al., PyFlag, Forensic and Log Analysis GUI, 2006

52

Education

Web Application Forensics: Taxonomy and Trends