13
ISeCure The ISC Int'l Journal of Information Security July 2014, Volume 6, Number 2 (pp. 155–167) http://www.isecure-journal.org A Hybrid Approach for Database Intrusion Detection at Transaction and Inter-transaction Levels Mostafa Doroudian 1 , and Hamid Reza Shahriari 1,* 1 Computer Engineering and Information Technology Department, Amirkabir University of Technology, Tehran, Iran ARTICLE I N F O. Article history: Received: 19 March 2014 Revised: 13 November 2014 Accepted: 16 November 2014 Published Online: 24 November 2014 Keywords: Intrusion Detection, Database Security, State machine, Inter-Transaction Dependency, Inter-Transaction Sequence. ABSTRACT Nowadays, information plays an important role in organizations. Sensitive information is often stored in databases. Traditional mechanisms such as encryption, access control, and authentication cannot provide a high level of confidence. Therefore, the existence of Intrusion Detection Systems in databases are necessary. In this paper, we propose an intrusion detection system for detecting attacks in both database transaction level and inter-transaction level (user task level). For this purpose, we propose a detection method at transaction level, which is based on describing the expected transactions within the database applications. Then at inter-transaction level, we propose a detection method that is based on anomaly detection and uses data mining to find dependency and sequence rules. The main advantage of this system, in comparison with the previous database intrusion detection systems, is that it can detect malicious behaviors in both transaction and inter-transaction levels. Also, it gains advantages of a hybrid method, including specification-based detection and anomaly detection, to minimize both false positive and false negative alarms. In order to evaluate the accuracy of the proposed system, some experiments have been done. The experiment results demonstrate that the true positive rate (recall metric) is higher than 80%, and the false positive rate is lower than 10% per different data sets and choosing appropriate ranges for support and confidence thresholds. The experimental evaluation results show high accuracy and effectiveness of the proposed system. © 2014 ISC. All rights reserved. 1 Introduction M any organizations employ DBMS as the main technology of data management in order to store and access their data. This information is often con- sidered as a valuable asset in the organizations. Hence, the data in the databases must be protected from unau- * Corresponding author. Email addresses: [email protected] (M. Doroudian), [email protected] (H.R. Shahriari). ISSN: 2008-2045 © 2014 ISC. All rights reserved. thorized access and changes. Traditional database se- curity mechanisms such as encryption, access control, and authentication are not sufficient for protecting of sensitive information against innovative security attacks [1]. Intrusion detection systems (IDSs) in the database can be used for detecting attacks whenever traditional prevention mechanisms are infeasible or may be bypassed. The user task in this paper refers to a set of trans- actions that are always submitted to the DBMS to- gether to achieve a certain goal, also the time gap ISeCure

A Hybrid Approach for Database Intrusion Detection at ... › article_39158_f18c739d9... · a detection method that is based on anomaly detection and uses data mining to nd dependency

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Hybrid Approach for Database Intrusion Detection at ... › article_39158_f18c739d9... · a detection method that is based on anomaly detection and uses data mining to nd dependency

ISeCureThe ISC Int'l Journal ofInformation Security

July 2014, Volume 6, Number 2 (pp. 155–167)

http://www.isecure-journal.org

AHybridApproach forDatabase IntrusionDetection at Transaction

and Inter-transaction Levels

Mostafa Doroudian 1, and Hamid Reza Shahriari 1,∗1Computer Engineering and Information Technology Department, Amirkabir University of Technology, Tehran, Iran

A R T I C L E I N F O.

Article history:

Received: 19 March 2014

Revised: 13 November 2014

Accepted: 16 November 2014

Published Online: 24 November 2014

Keywords:Intrusion Detection, Database

Security, State machine,

Inter-Transaction Dependency,Inter-Transaction Sequence.

A B S T R A C T

Nowadays, information plays an important role in organizations. Sensitive

information is often stored in databases. Traditional mechanisms such as

encryption, access control, and authentication cannot provide a high level

of confidence. Therefore, the existence of Intrusion Detection Systems in

databases are necessary. In this paper, we propose an intrusion detection system

for detecting attacks in both database transaction level and inter-transaction

level (user task level). For this purpose, we propose a detection method at

transaction level, which is based on describing the expected transactions

within the database applications. Then at inter-transaction level, we propose

a detection method that is based on anomaly detection and uses data mining

to find dependency and sequence rules. The main advantage of this system, in

comparison with the previous database intrusion detection systems, is that it

can detect malicious behaviors in both transaction and inter-transaction levels.

Also, it gains advantages of a hybrid method, including specification-based

detection and anomaly detection, to minimize both false positive and false

negative alarms. In order to evaluate the accuracy of the proposed system,

some experiments have been done. The experiment results demonstrate that

the true positive rate (recall metric) is higher than 80%, and the false positive

rate is lower than 10% per different data sets and choosing appropriate ranges

for support and confidence thresholds. The experimental evaluation results

show high accuracy and effectiveness of the proposed system.

© 2014 ISC. All rights reserved.

1 Introduction

M any organizations employ DBMS as the maintechnology of data management in order to store

and access their data. This information is often con-sidered as a valuable asset in the organizations. Hence,the data in the databases must be protected from unau-

∗ Corresponding author.

Email addresses: [email protected] (M. Doroudian),[email protected] (H.R. Shahriari).

ISSN: 2008-2045 © 2014 ISC. All rights reserved.

thorized access and changes. Traditional database se-curity mechanisms such as encryption, access control,and authentication are not sufficient for protectingof sensitive information against innovative securityattacks [1]. Intrusion detection systems (IDSs) in thedatabase can be used for detecting attacks whenevertraditional prevention mechanisms are infeasible ormay be bypassed.

The user task in this paper refers to a set of trans-actions that are always submitted to the DBMS to-gether to achieve a certain goal, also the time gap

ISeCure

Page 2: A Hybrid Approach for Database Intrusion Detection at ... › article_39158_f18c739d9... · a detection method that is based on anomaly detection and uses data mining to nd dependency

156 Database Intrusion Detection at Transaction and Inter-transaction Levels — M. Doroudian, and H.R. Shahriari

between two tasks of one user is much longer thanit between two transactions within a user task. Here,the database attacks refer to malicious behaviors thatmay appear in the requests issued to database systems.It is possible that all of the transactions in a user taskare legitimate, whereas there exist some anomalies inrelations among transactions. We can divide the de-tectable attacks by database IDSs into four categoriesincluding SQL query level attack, the transaction level,inter-transaction level (user task level) as well as SQLinjection attack.

The existing database intrusion detection methodsare often based on anomaly detection approach, andanomaly-based methods are usually prone to generat-ing a relatively large number of false alarms. On theother hand, most of these proposed methods focuson detecting the database attacks in SQL query ortransaction levels and cannot detect anomalies amongtransactions at the user task level.

In this paper, we propose an intrusion detectionsystem for database in transaction level and user tasklevel. The proposed system at transaction level usesa detection method that is based on specifying theexpected transactions within database applications inthe organization. The issued transactions that are dif-ferent from the existing specifications are consideredas malicious transactions. At the user task level, wepresent a type of detection method, which is based onanomaly detection. This method creates normal be-havior profile via mining the dependency and sequencerelation rules among transactions. The differences be-tween new user tasks and existing dependency andsequence rules are recognized as abnormal events.

The main advantage of our approach is that we usea hybrid method at transaction and user task levels,including specification-based detection and anomalydetection. Thus we gain their both advantages tominimize false positive and false negative detectionalarms.

The rest of the paper is organized as follows. Sec-tion 2 describes past efforts related to database IDS.In Section 3, we explain our proposed approach. Sec-tion 4 presents the evaluation results of the presentedapproach. Section 5 offers conclusions of the paper.

2 RelatedWork

There exist a few research works that concentrateon database IDSs. These systems are used to detectmalicious behaviors in one of three levels; SQL query,transaction and user task. Also, SQL injection attacksare attacks that can be detected by some of thesesystems.

One of the early works in this field is a database

misuse detection system called DEMIDS [1]. It usesa data mining algorithm to capture working scopesof users. The algorithm extracts frequent itemsets asworking scopes using the notion of distance measures.The frequent itemsets are stored in user profiles todescribe typical behaviors of users. The techniquesoffered in [2, 3] can be used in detecting of querylevel attacks. They apply the profiling of normal SQLquery for a database role using a Nave Bayes cluster-ing approach on the training log file. The anomalousbehaviors of users are identified by comparing SQLqueries submitted by users and normal SQL queriesstored in role profiles.

Most of the previous researches in this field concen-trate on modeling the normal behaviors of users attransaction level to detect malicious activities againstdatabase system. The method used in [4] applies thesame approach existing in [2, 3] (mentioned above)for detecting attacks at transaction level. Hu et al. [5]propose an intrusion detection method based on min-ing the dependencies among data items at transactionlevel using static analysis of the application sourcecode. These dependencies are represented as sets ofreading and writing operations for each data item. An-other intrusion detection method presented by Hu etal. [6] uses a sequential pattern mining technique onthe training database log for identifying frequent se-quences among data items at transaction level. Thetransactions that do not comply with the data depen-dencies are recognized as malicious transactions. Theproposed method [7] is similar to [6] with the differ-ence that modifies an existing sequential mining algo-rithm and make it security sensitive sequential miningby introducing weights for each attribute based onthe sensitivity group. Lee et al. [8] describe an intru-sion detection method in real-time database systemsusing time signatures. In this method the temporaldata objects are tagged with time signatures of updatelatency ranges that are unknown to the intruders.

Intrusions at inter-transaction level have been lessconsidered by existing database IDSs. Hu et al. [9]present an algorithm for identifying frequent data de-pendencies among data items at the inter-transactionlevel in order to detect user task level attacks. Inthis approach, first, transactions existing in the train-ing database log are clustered into user tasks using aproposed algorithm. Then, the data dependencies inuser task level are discovered using a proposed algo-rithm. These dependencies are represented as inter-transaction dependency rules and inter-transactionsequence rules for each data item.

There are some other systems that have been spe-cially presented for detecting SQL injection attacks.For instance, the technique used by DIDAFIT [10]

ISeCure

Page 3: A Hybrid Approach for Database Intrusion Detection at ... › article_39158_f18c739d9... · a detection method that is based on anomaly detection and uses data mining to nd dependency

July 2014, Volume 6, Number 2 (pp. 155–167) 157

tries to characterize legitimate accesses through fin-gerprinting their constitutive SQL queries to identifyanomalous accesses to the database system. This tech-nique also imposed a constraint on the ordering ofSQL queries submitted. The proposed system in [11]is also a novel framework based on anomaly detectionthat creates a fingerprint of an application programbased on SQL queries. Then, it uses association rulemining techniques on the fingerprints to extract usefulrules that represent the normal behaviors of databaseapplications.

3 Proposed Approach

Our proposed approach is an intrusion detection mech-anism that can be employed for detecting the mali-cious requests issued to a database system by a setof database applications within an organization, al-though this mechanism can be used for only one appli-cation instead of a group of database applications atorganization granularity level. Here, the assumption isthat all the considered applications use a finite set oftransactions for issuing requests to the database sys-tem and an identifier is assigned to each transaction.Also, the transactions in a user task can be submittedfrom different applications.

For identifying the malicious behaviors at submit-ted transactions to the database, the proposed IDSfirst identifies the existing transactions within orga-nization applications through analyzing the sourcecode of applications. Then, the system specifies thesetransactions through a formal specification method.In the detection phase, the new transactions existingin the current database log are recognized as legiti-mate if they are matched to the corresponding storedspecifications.

In the training phase, the proposed system discov-ers the dependency and sequence rules among trans-actions in the user tasks through a data mining al-gorithm in order to identify the abnormal relationsamong transactions in performing a user task. For thispurpose, the system uses a training database log thatcontains the normal database requests of consideredapplications during a certain period of time. In thedetection phase, if transactions appear in a new usertask so that the relations among them are matched tothe stored rules, this user task is considered as normal.

3.1 System Architecture

The proposed system acts as a security mechanismthat is independent of the database system operationsand placed beside the DBMS to enhance its security.For that, in detection phase, the proposed system usesthe current database log file to analyze the requestssubmitted from the database applications for detect-

ing the malicious behaviors at transactions and therelations among them.

The architecture of proposed system consists of twomodules that include the Analysis and Generationmodule along with the Detection module. The systemarchitecture is shown in Figure 1.

The Analysis and Generation module consists of twosub-modules of Specification Creation and Rule Gen-eration. The Specification Creation sub-module firstidentifies the transactions existing in organization ap-plications using the Static Analyzer and then the iden-tified transactions are specified as a type of state ma-chine specifications at Specification Creator. The RulePreprocessor at Rule Generation sub-module clustersthe transactions existing in the training database loginto user tasks and sends them to Rule Generator, andthe Rule Generator generates the dependency andsequence rules through a data mining algorithm

For every new transaction and new user task indatabase log, the Detection Preprocessor extractsthe required information from them and sends ex-tracted information to Specification Matcher andRule Matcher respectively. The new transactions arematched against the existing specifications based ontheir information at Specification Matcher. The RuleMatcher examines the relations among transactionsin the performed user tasks based on the stored rules.If existing transactions in a new user task and rela-tions among them are recognized as legitimate, thenthe user task is considered as normal and stored intoNormal UTs storage, otherwise, the system raises analarm. The Rule Generator periodically uses exist-ing user tasks in Normal UTs storage to update theexisting rules.

3.2 Analysis and Generation Module

As mentioned earlier, the Specification Creation sub-module of the Analysis and Generation module spec-ifies the identified transactions as state machines.Whereas, the Rule Generation sub-module discoversthe dependency and sequence rules of relations amongidentified transactions and then stores them.

3.2.1 Static Analyzer

The Static Analyzer analyzes the source code ofdatabase applications in the organization and placesextracted information from identifying transactionsinto a file named Specification. This analyzer placesderived information in the Specification file as follows:

• The “Tb” expression is inserted at the beginningof each transaction and “Te” expression is placedat the end of it with a space character of the

ISeCure

Page 4: A Hybrid Approach for Database Intrusion Detection at ... › article_39158_f18c739d9... · a detection method that is based on anomaly detection and uses data mining to nd dependency

158 Database Intrusion Detection at Transaction and Inter-transaction Levels — M. Doroudian, and H.R. Shahriari

Figure 1. Architecture of proposed system

previous expression.• It inserts the transaction identifier (Tr) after “Tb”

expression at the beginning of each transaction.• The SQL queries existing at transactions are

placed into the file with a “?” symbol betweenadjacent SQL queries.

Furthermore, some methods are considered for show-ing the special statuses such as conditional choosingamong several SQL queries and a loop on the SQLquery sequence.

3.2.2 Extracting Transaction Information

The Specification Preprocessor extracts the requiredinformation from existing SQL queries within the Spec-ification file and then transfers them to SpecificationCreator in the form of tuples. This component con-siders each SQL query containing the where clause astwo queries. The first query is equivalent to the projec-tion clause of intended query, which consists of SQLcommands such as Select, Insert, Update, and Deleteon the set of attribute within the database relations.Also, the second query is equivalent to a Select queryon the relation attributes existing at the where clause.The reason of this conversion is that the existing at-tributes at the where clause are read before applyingthe equality operator. In addition, the command re-lated to the where clause is processed at first.

This component creates a tuple for each SQL queryand then puts the derived information from query in it,which each tuple consists of the following nine fields:

(Beg Tr, Cmd, Count, Att [ ], WC, End Tr, Tr ID,Additional Fields, Tuple ID)

The Beg Tr is a binary (bit) field that contains 1when the intended SQL query is at the beginning ofthe transaction. The Cmd field shows the involvedSQL command that has values ‘S’, ‘I’, ‘U’, and ‘D’whenever the SQL command is equal to Select, In-sert, Update, and Delete respectively. The Count fieldcorresponds to the number of the database relationattributes included in the SQL query command. TheAtt [ ] vector contains the attributes number includedin the SQL command respectively. The value of WCbinary field is equal to 1 if the SQL command is de-rived from the where clause. The End Tr is a binaryfield that contains 1 when the SQL query is at the endof the transaction. The Tr ID field corresponds tothe identifier of transaction containing the expectedSQL command. The Additional Fields are fields thatused in the particular cases such as conditional choos-ing among the SQL commands and a loop of SQLqueries. Furthermore, the sequence number of the tu-ple is stored in the Tuple ID field. By the way, thiscomponent processes the transactions with the sameidentifier only once.

3.2.3 Transaction Specification

As mentioned earlier, the Specification Creator cre-ates a state machine specification for each expectedtransaction, which is based on the information derivedfrom the SQL queries within the intended transaction.Here, we use a type of state machine that is called Ex-

ISeCure

Page 5: A Hybrid Approach for Database Intrusion Detection at ... › article_39158_f18c739d9... · a detection method that is based on anomaly detection and uses data mining to nd dependency

July 2014, Volume 6, Number 2 (pp. 155–167) 159

tended Finite State Automata (EFSA) [12]. In general,an EFSM is a tuple (Σ, Q, s, f, V, D, δ), where:

• Σ corresponds to event alphabet of the EFSA,where each event is characterized by an eventname along with the event arguments.

• Q is a finite set of states of the EFSA.• s ∈ Q describes the start state of the EFSA.• f ∈ Q corresponds to the final state.• V is a finite tuple (v1, ..., vn) of state variables.• D indicates a finite tuple (D1, ..., Dn) of domain

values for variables.• δ : Q × D × Σ→ (Q, D) denotes the transition

relations between states.

The transition relation is characterized as rules of theform:

event(x1, ..., xn)|condition −→ action

Where, event is an event name, and the variablesx1, ..., xn represent the arguments of this event. Thecondition is a condition expression that if satisfied,the actions defined by the action expression will bedone. An example of single state transition is depictedin Figure 2.

Figure 2. Example of single state transition

This component creates the state transitions ac-cording to the incoming tuples from the Preprocessorthat are associated to the expected SQL commands.The condition expression of each transition examinesthe matching between the arguments of issued eventsand the values existing at the state variables. Also,the state variables will be set to expected values ofthe next SQL query, in the action expression.

Here, the state machine uses two groups of statevariables that the first group with the name of queryvariables contains information about the involved SQLqueries. The second group of variables that are calledtransaction variables includes information about thetransactions and also about the relations among SQLqueries. Moreover, it applies two types of events withthe names query and abort. The query event consistsof two arguments (q, TrID), where q contains theinformation about the issued SQL query, which cor-responds to the tuple fields created by Preprocessor.The TrID is string and corresponds to the identifier ofthe issued transaction. The abort event consists of theTrID argument and indicates the occurrence of therollback command within the intended transaction.

Moreover, the transaction variables are CurrSt, Cur-rTr, and NextSt. The CurrTr variable contains theidentifier of the current transaction, which is initial-ized in the first transition of each state machine. TheCurrSt variable corresponds to the current state num-ber, also the NextSt variable contains the states num-bers that are reachable from the current state.

For example, consider the following transaction:

T1 : Select Name, Family, Number fromEmployee

Update Table1 set Name = Joe,Family = “Petre”

Figure 3 shows the state machine for above trans-action, where the numbers 1, 2, 3, and 4 are assignedto attributes Name, Family, Number, and Tel, respec-tively.

3.2.4 Clustering Transactions into UserTasks

The Rule Preprocessor component uses a trainingaudit log file to extract information about existingtransactions in database applications. For this pur-pose, the component first extracts some informationfrom each transaction, such as start time, end timeand identifier, and stores them into three separate vec-tors. Then, the three mentioned vectors are used forclustering the transactions into user tasks using theclustering method in [9]. As mentioned earlier, a usertask is a set of transactions that are submitted to theDBMS together to perform a given task of one user.Generally the time gap between two adjacent tasksof one user is much bigger than the gap between twoadjacent transactions inside one user task. The men-tioned clustering algorithm uses this observation forclustering transactions to different user tasks. Afterdoing this, the component sends sequence of transac-tion identifiers separately for each identified user taskto Rule Generator.

3.2.5 Inter-transaction Rules Generation

The Rule Generator is used for discovering the inter-transaction dependency rules and inter-transactionsequence rules that are defined as follows.

Definition 1 : An Inter-Transaction Depen-dency is the set of transactions that are almost rele-vant to each other so that often issued together in adatabase user task, formally is the set of the form {T1 ,T2 , ..., Tn}, n≥ 2, where T1 , ..., Tn are instance trans-actions belonging to the set T that consists of the iden-tifier of all the transactions identified in the databaseapplications of the organization. Meanwhile, the orderof elements in the inter-transaction dependency is not

ISeCure

Page 6: A Hybrid Approach for Database Intrusion Detection at ... › article_39158_f18c739d9... · a detection method that is based on anomaly detection and uses data mining to nd dependency

160 Database Intrusion Detection at Transaction and Inter-transaction Levels — M. Doroudian, and H.R. Shahriari

Figure 3. Example of state machine specification for transaction T1

important. The support value for an inter-transactiondependency is the percentage of the identified usertasks within the training database log that containthis dependency among the intended transactions. Aninter-transaction dependency is considered to be a fre-quent dependency, if calculated support value for thisdependency is above the predefined support threshold.All the frequent inter-transaction dependencies arestored in the set ITD to be used for generating theinter-transaction dependency rules. The general ideaof Appriori algorithm [13] can be used for discoveringthe frequent inter-transaction dependencies.

The Algorithm 1 presents the procedure for gen-erating frequent inter-transaction dependencies. Theinput to this algorithm is the sequences of transactionidentifiers existing in the identified user tasks and iden-tified transactions set T along with support s. Also,the maximum number of transactions in a user taskis placed as input to this algorithm. In this algorithm,the symbol ‘‖’ is used for showing the concatenationoperator of two sequences.

Definition 2 : An Inter-Transaction Depen-dency Rule with support value s and confidencevalue c is defined as a rule of the form:

{T1, T2, . . ., Tj} −→ {Tj+1, Tj+2, . . ., Tn} [s, c], n ≥ 2

Where T1, . . ., Tn correspond to the instance trans-actions belonging to the set T that consists of allthe specified transactions existing in the organization.This rule represents that if transactions T1, . . ., Tj aresubmitted through a user task to the database system,the transactions Tj+1, . . ., Tn is most probably issuedto the database within this user task. The rules aregenerated based on the frequent inter-transaction de-pendencies that already defined. The support value sof a rule corresponds to the percentage of the identifieduser tasks within the training database log that thisrule is generated based on them. Also, the confidencevalue c of a rule is the percentage of the identifieduser tasks that contain both the antecedent and conse-quent of the rule. For generating the inter-transaction

dependency rules, we first extract a set C ITDR ofrules as candidate rules from the discovered frequentdependencies existing in the set ITD and calculatethe confidence value for each of them. Algorithm 2contains steps for generating inter-transaction depen-dency rules. The sequences of transaction identifierswithin identified user tasks and identified transactionsset T are placed as input to the algorithm.

Definition 3 : An Inter-Transaction Sequenceconsists of the transactions that are often accessed insequence within a user task of the database applica-tion. Formally, an inter-transaction sequence is rep-resented as < T1, T2, . . ., Tn >, n ≥ 2, where T1, . . .,Tn correspond to the instance transactions belongingto the set T that contains the specified transactions.If support value of an inter-transaction sequence isabove the predefined support threshold, the sequenceis selected as a frequent sequence and stored into theset ITS to be used for generating the inter-transactionsequence rules. We can use the sequential pattern min-ing algorithm AprioriAll in [14] for discovering thefrequent inter-transaction sequences.

Definition 4 : An Inter-Transaction SequenceRule is a rule with one of two forms:

< T1, T2, . . . , Tj > −→ < Tj+1, Tj+2, . . . , Tn > [s, c],n ≥ 2

< T1, T2, . . . , Tj >←− < Tj+1, Tj+2, . . . , Tn > [s, c],n ≥ 2

Where T1, . . ., Tn are instance transactions belong-ing to the set T that consists of the identifier of allthe existing transactions in the organization. The firstrule determines that

If transactions T1, . . ., Tj are issued to the databasesystem with the mentioned order among them within auser task, the transaction sequence < Tj+1, . . . , Tn >is accessed within this user task afterward, with sup-port text s and confidence c. The second rule repre-sents that if transactions Tj+1, . . . , Tn are submittedto the database system as a sequence through a usertask, the transactions T1, . . ., Tj are accessed in se-

ISeCure

Page 7: A Hybrid Approach for Database Intrusion Detection at ... › article_39158_f18c739d9... · a detection method that is based on anomaly detection and uses data mining to nd dependency

July 2014, Volume 6, Number 2 (pp. 155–167) 161

Algorithm 1 Inter-Transaction Dependency Generation Algorithm

Input: The set UT s consisting of sequences of transaction identifiers existing in legitimate identified user tasks,identified transaction set T , support threshold s, maximum number of transactions in user task with inputargument n.Output: The frequent inter-transaction dependency set ITD.

1: Initialize inter-transaction dependency set ITD, i.e., ITD = {}2: for j = 1,...,n do3: Lj = {}, cj = {}4: end for5: for each transaction Ti ∈ T do6: if Support(Ti) > s then7: L1 = L1 ∪ {Ti}8: else9: if Support(Ti)=0 then

10: Delete stored specification for transaction Ti11: end if12: end if13: end for14: i =215: for each itd ∈ Li−1 and frequent transaction ft ∈ L1 do16: ci = ci ∪ {itd||ft}17: end for18: for each itdc ∈ ci do19: if Support(itdc) > s then20: Li = Li ∪ {itdc}21: end if22: end for23: if Li 6= {} then24: i = i+ 125: Jump to step 1526: else27: for j=2,..., i-1 do28: for each itd ∈ Lj do29: ITD = ITD ∪ {itd}30: end for31: end for32: end if

quence before them within this user task, with supports and confidence c. The inter-transaction sequencerules are generated using the same approach of gen-erating the inter-transaction dependency rules withthe difference that the frequent inter-transaction se-quences are used for generating them. The generatedinter-transaction sequence rules are placed into theset ITSR.

Algorithm 3 illustrates the algorithm for generatinginter-transaction sequence rules. The input to thisalgorithm is the sequences of transaction identifiersin the identified user tasks and identified transactionsset T along with confidence c. Also, the generateddependency rules set ITDR is placed as input to thisalgorithm to modify based on the generated sequencerules. In this algorithm, the ITDRT1,T2,...,Tj

is usedfor showing the set of inter-transaction dependency

rules that their antecedents consist of transactionsT1, T2, . . . , Tj . Also, the ITSRT1,T2,...,Tj set is a set ofinter-transaction sequence rules that the transactionsequence < T1, T2, . . . , Tj > are placed as antecedentof these rules.

3.3 Detection Module

The Detection module identifies the malicious trans-actions and the abnormal user tasks based on theexisting transaction specifications and existing inter-transaction rules respectively. For this purpose, theDetection module examines the SQL queries issuedby database applications in the form of transactionsand user tasks stored in the database log file. In thismodule, the Detection Preprocessor immediately afterobserving each new issued transaction in the log file,extracts its information and sends this information to

ISeCure

Page 8: A Hybrid Approach for Database Intrusion Detection at ... › article_39158_f18c739d9... · a detection method that is based on anomaly detection and uses data mining to nd dependency

162 Database Intrusion Detection at Transaction and Inter-transaction Levels — M. Doroudian, and H.R. Shahriari

Algorithm 2 Inter-Transaction Dependency Rule Generation Algorithm

Input: The set UTs consisting of the sequences of transaction identifiers in legitimate identified user tasks,identified transactions set T, confidence threshold c.Output: Inter-transaction dependency rules set ITDR.

1: Initialize inter-transaction dependency set ITD, i.e., ITD = {}2: Initialize inter-transaction dependency rules set ITDR, i.e., ITDR = {}3: Generate frequent inter-transaction dependencies using the frequent itemsets mining algorithm Apriori [13]

from the set UTs and store them into the set ITD.4: for each frequent inter-transaction dependency itd ∈ ITD do5: for each set d ⊆ itd do

6: ifSupport(itd)

Support(d)> c then

7: ITDR = ITDR ∪ {d −→ itd − d}8: end if9: end for

10: end for11: for each rule r ∈ ITDR do12: if there exists a rule r∗ ∈ ITDR that its antecedent is equal to r,s antecedent and r,s consequence is the

subset of its consequence then13: ITDR = ITDR − {r}14: end if15: end for

the Specification Matcher through the tuples of theform introduced in Specification Preprocessor compo-nent (Section 3.2.2). Also after identifying any usertask consisting of legitimate transactions, the compo-nent forwards sequence of transaction identifiers tothe Rule Matcher.

The Specification Matcher is used for matching thenew transactions against the existing specificationscorresponding to them. To do so, the component usesthe transaction information received from the Pre-processor as tuples and the stored specification as astate machine corresponding to the transaction iden-tifier. For each SQL query within the transaction, astate transition must be done in the intended statemachine. The transaction is considered as a legitimateuser transaction if at the end of matching operation,the state machine is in its final state.

The Rule Matcher matches the user tasks againstthe existing inter-transaction rules. For that, this com-ponent uses sequence of transaction identifiers withinnew user tasks. In order to perform the matching oper-ation, the component first selects the rules that theirantecedent have been appeared in the user task. Then,the selected rules are grouped into the sets based ontheir antecedent so that each set contains the ruleswith the same antecedent. If the user task sequencesatisfies at least one rule from each set, the user taskis considered to be a normal user task.

Generally, detection process consists of three follow-ing steps:

• Step 1 : the Detection Preprocessor performs

following actions respectively:◦ Extract transaction information after observ-

ing each new issued transaction in the currentlog file.◦ Send this information to the SpecificationMatcher.◦ Forward sequence of transaction identifiers to

the Rule Matcher after identifying any usertask consisting of legitimate transactions.

• Step 2 : the Specification Matcher matches eachnew issued transaction against correspondingspecification and raises an alarm after observingincoherence.

• Step 3 : the RuleMatcher matches the user tasksagainst the existing inter-transaction rules.

The algorithm for detecting malicious behaviors atuser task level is shown in Algorithm 4. The sequenceof transaction identifiers in each new user task isplaced as input to the algorithm.

4 Evaluations and Results

We analyze our proposed method in terms of qualita-tive and experimental. At the first part, we presentthe results of several experiments that have beenconducted to evaluate the accuracy rate of proposedmethod. At the second part, we compare our methodwith the previous methods in qualitative terms.

4.1 Experimental Evaluation

Several experiments have been conducted for evaluat-ing the effectiveness of the proposed database intru-

ISeCure

Page 9: A Hybrid Approach for Database Intrusion Detection at ... › article_39158_f18c739d9... · a detection method that is based on anomaly detection and uses data mining to nd dependency

July 2014, Volume 6, Number 2 (pp. 155–167) 163

Algorithm 3 Inter-transaction Sequence Rule Generation Algorithm

Input: The set UT s consisting of sequences of transaction identifiers existing in the legitimate user tasks,generated inter-transaction dependency rules set ITDR, identified transaction set T , confidence threshold c.Output: Inter-transaction sequences set ITSR, modified inter-transaction dependency rules ITDR.

1: Initialize frequent inter-transaction sequences set ITS, i.e., ITS = {}2: Initialize inter-transaction sequence rule set ITSR, i.e., ITSR = {}3: Generate frequent inter-transaction sequences using sequential pattern mining algorithm AprioriAll [14] from

the set UT s and store them into the set ITS.4: for each sequence its = 〈T1, T2, . . . , Tn〉 ∈ ITS do5: for j =1,..., n-1 do6: for each two subsequences A = 〈T1, T2, . . . , Tj〉, B = 〈Tj+1, Tj+2, . . . , Tn〉 do7: if (Support(〈 A,B 〉) /Support(A)) > c then8: ITSR = ITSR ∪ {A→ B}9: end if

10: if (Support(〈 A,B 〉) /Support(B)) > c then11: ITSR = ITSR ∪ {A← B}12: end if13: end for14: end for15: end for16: for each rule r ∈ ITDR do17: if there exists a rule r∗ ∈ ITSR that its antecedent and its consequence are equal to r’s antecedent and

r’s consequence respectively then18: ITDR = ITDR− {r}19: end if20: end for21: for each rule r ∈ ITSR do22: if there exists a rule r∗ ∈ ITSR that its antecedent equal to r’s antecedent and r’s subsequence of its

consequence then23: ITSR = ITSR− {r}24: end if25: end for26: for each rule r = 〈T1, T2, . . . , Tj〉 → 〈Tj+1, Tj+2, . . . , Tn〉 ∈ ITSR or rule r = 〈T1, T2, . . . , Tj〉 ←〈Tj+1, Tj+2, . . . , Tn〉 ∈ ITSR do

27: ITSRT1,T2,...,Tj= ITSRT1,T2,...,Tj

∪ {r}28: end for29: for each rule r = 〈T1, T2, . . . , Tj〉 → 〈Tj+1, Tj+2, . . . , Tn〉 ∈ ITDR do30: ITDRT1,T2,...,Tj

= ITDRT1,T2,...,Tj∪ {r}

31: end for32: for each set s = ITSRT1,T2,...,Tn

⊆ ITSR do33: Assign an identifier to the set s34: end for35: for each set s = ITDRT1,T2,...,Tn ⊆ ITDR do36: Assign an identifier to the set s37: end for

ISeCure

Page 10: A Hybrid Approach for Database Intrusion Detection at ... › article_39158_f18c739d9... · a detection method that is based on anomaly detection and uses data mining to nd dependency

164 Database Intrusion Detection at Transaction and Inter-transaction Levels — M. Doroudian, and H.R. Shahriari

Algorithm 4 Detection Algorithm at User Task Level

Input: The set UT consisting of the sequence of transaction identifiers in new user task ut, inter-transactionsequence rules set ITSR and identifiers assigned to the subsets of ITSR set, inter-transaction dependency rulesset ITDR and identifiers assigned to the subsets of ITDR set.Output: Status of user task ut (“user task ut is normal” or “user task ut is abnormal”).

1: Initialize candidate rules set C R, i.e., C R = {}2: for each set ITDRT1,T2,...,Ti

⊆ ITDR with the identifier k do3: if the transactions T1, T2, . . ., Ti appear in the transaction sequence of user task ut then4: C R = C R ∪ {k}5: end if6: end for7: for each set ITSRT1,T2,...,Ti ⊆ ITSR with the identifier k do8: if the transaction sequence < T1, T2, . . . , Ti > appears in the transaction sequence of user task ut then9: C R = C R ∪ {k}

10: end if11: end for12: for each set R ⊆ ITDR or set R ⊆ ITSR with the identifier k ∈ C R do13: if there exists a rule r ∈ R that is not satisfied by ut then14: Consider ut as an abnormal user task and raise an alarm15: else16: Consider ut as a normal user task and store ut into normal UTs storage17: end if18: end for

sion detection system. We used the programming lan-guage C# from .NET Framework 4.5 for preliminaryimplementation of proposed system. Also, a trainingdatabase log file, 106 normal user tasks with average5 legitimate transactions per user tasks, was used inthe Analysis and Generation module. For generatingthis database log, we considered an instance organiza-tion with 6 database applications. The programminglanguages of these applications were C# from .NETFramework and Java. Also, these applications wereconnected to the database systems of Microsoft SQLServer 2012. The database applications contained 45different types of user tasks and 10 different types oftransactions. The transactions are issued by differentapplications in the form of user tasks (at least 4 trans-actions per user tasks) so that the same transactionshave the same transaction identifier. Also, the existinguser tasks at this log were generated randomly basedon the certain real scenarios among the database ap-plications in instance organization. In addition, weused four different test data sets as instance databaselogs in Detection module. These data sets consist ofa number of normal user tasks along with the sev-eral user tasks were treated as malicious user tasksthat contained either malicious transactions or abnor-mal relations among transactions. The number of usertasks existing in these data sets were 500, 1000, 104

and 105 respectively, and the percentage of anomaloususer tasks in data sets were 5%, 10%, 15%, and 20%.Meanwhile, the malicious user tasks in these data setswere very near to the normal user tasks so that there

was only one anomalous relation among transactionsin a malicious user task, therefore, the conducted eval-uations represent the accuracy rate of the proposedsystem in the worst case.

For evaluating the system, we analyze the results ofexperiments using the standard measures of True Pos-itive Rate (Recall), False Positive Rate, and Precision.These statistics are defined as follows:

True Positive Rate = #TruePositive#TruePositive + #FalseNegative

False Positive Rate =#FalsePositive

#FalsePositive + #TrueNegative

Precision = #TruePositive#TruePositive + #FalsePositive

Where, #True Positive corresponds to the numberof intended attacks that the proposed system is ableto detect them, and #True Negative is defined as thenumber of genuine events that the proposed systemconsiders them as genuine. Also, #False Positive and#False Negative are the number of false alarms raisedby the system and the number of malicious eventsidentified as genuine, respectively.

In the considered experiments, first, the supportthreshold was set to 0.1 and the values of true/falsepositive rate and precision were calculated by settingthe confidence threshold to the values from 0.5 to 1.Then, the confidence threshold was considered equalto 0.9 and the values of true/false positive rate andprecision were calculated per the various values of thesupport from 0.05 to 0.5.

ISeCure

Page 11: A Hybrid Approach for Database Intrusion Detection at ... › article_39158_f18c739d9... · a detection method that is based on anomaly detection and uses data mining to nd dependency

July 2014, Volume 6, Number 2 (pp. 155–167) 165

Figure 4 illustrates the relationship between theconfidence threshold of rules and true positive rate ofthe proposed system against the four test data sets,and Figure 5 illustrates the relationship between theconfidence threshold of rules and false positive rate.Also, Figure 6 shows the ratio of precision measure toconfidence threshold of rules. From Figure 4, 5, and 6it can be seen that the false positive rate and precisionare very sensitive to the variations of confidence, whilethe true positive rate is not very susceptible to thesevariations.

Figure 7 illustrates the relationship between thesupport threshold of rules and true positive rate of theproposed system against the four data sets, and Fig-ure 8 illustrates the relationship between the supportthreshold of rules and false positive rate. Also, Fig-ure 9 shows the ratio of precision measure to supportthreshold of rules. From Figure 7, 8, and 9, it can beobserved that the true positive rate and precision arevery susceptible to the changes of support, whereasthe false positive rate is not very sensitive to thesechanges.

By observing the Figure 4, 5, 7, and 8 it is takenthat the true positive rate graph at all points hasa higher value than the corresponding points in thefalse positive rate graph; That means the proposedsystem raises the true alarms much more than thefalse alarms in all the cases. This implies that theproposed system has high accuracy and effectivenessin detecting database malicious behaviors at bothtransaction and user task levels.

From the viewable results in Figure 4 to Figure 9 itcan be taken the ideal range of confidence thresholdcorresponds to [0.83, 1] and the desired range of sup-port threshold includes the values from 0.05 to 0.115.

4.2 Qualitative Analysis

We consider the methods [1, 2, 6, 9, 10] in the case ofdatabase IDS with the most citations and differentapproaches to compare them with our method inqualitative terms.

We can divide profile granularity levels to four lev-els; user, role, application, and organization. Hu etal. [6, 9] keep profiles at the level of organization,whereas Chung et al. [1] and Bertino et al. [2] representtheir profiles in the granularity level of user and role re-spectively. Also, the normal profiles of method [10] arecaptured in application level. Unlike these methods,our proposed method can be used in both granularitylevels of application and organization and also can beeasily extended to apply at the granularity level ofuser or role. It is important that the profile granularitylevel can be chosen according to the type of organiza-

Figure 4. Relation between true positive rate and confi-dence threshold of rules for four data sets

Figure 5. Relation between false positive rate and confi-

dence threshold of rules for four data sets

tion. However, our method is scalable enough to use indifferent levels according to the type of organization.

Our proposed method is capable to identify ma-licious database behaviors at transaction and inter-transaction levels as well as many types of SQL in-jection attacks, whereas previous methods are mostlyused for identifying attacks only at one of the men-tioned levels. For instance, Chung et al. [1], Bertino etal. [2] and Lee et al. [10] propose some methods toidentify attacks in SQL query level. Also, Hu et al. [6]concentrate on transaction level attacks, whereas theproposed method by Hu et al. [9] only deals withanomalous behaviors in relations among transactions.

Furthermore, as mentioned earlier, our method isbased on the specification in the levels of query andtransaction so that specifies transactional featuresin a formal manner and precisely. This causes thatour method produces false alarms less than othermethods that use frequent attribute set (Chung etal. [1]), tuple (Bertino et al. [2]), attribute dependency(Hu et al. [6, 9]), and query fingerprint (Lee et al. [10])to represent a transaction.

ISeCure

Page 12: A Hybrid Approach for Database Intrusion Detection at ... › article_39158_f18c739d9... · a detection method that is based on anomaly detection and uses data mining to nd dependency

166 Database Intrusion Detection at Transaction and Inter-transaction Levels — M. Doroudian, and H.R. Shahriari

Figure 6. Relation between precision and confidence thresh-old of rules for four data sets

Figure 7. Relation between true positive rate and support

threshold of rules for four data sets

5 Conclusions

In this paper, we proposed a novel IDS to detectmalicious behaviors at both transaction and inter-transaction (user task) levels in database systems. Forthis purpose, first, a specification-based method hasbeen offered at the transaction level by which the ex-pected transactions in existing database applicationsin the organization are detected. These specificationsare used to detect malicious transactions, since themalicious transactions cannot be matched to any ofthe specifications. Then in user task level, an anomaly-based method is presented, which uses a data miningmethod to discover dependency and sequence rules ofthe relations among the identified transactions. Thediscovered rules are used to detect malicious relationsamong new transactions at the user task level.

The advantage of the proposed system is that it candetect malicious behaviors in both transaction andinter-transaction levels. This system utilizes a hybridmethod (specification-based detection and anomalydetection) to decrease both false positive and falsenegative alarms. In addition, it can be employed atboth granularity levels of application and organization.The results of experimental evaluations demonstratethe system operations to be highly accurate at thedesired confidence and support levels.

Figure 8. Relation between false positive rate and supportthreshold of rules for four data sets

Figure 9. Relation between precision and support threshold

of rules for four data sets

References

[1] C.Y. Chung, M. Gertz, and K. Levitt. DEMIS: AMisuse Detection System for Database Systems.In Third Annual IFIP TC-11 WG 11.5 WorkingConference on Integrity and Internal Control inInformation Systems, pages 159–178, 1999.

[2] E. Bertino, A. Kamra, and E. Terzi. IntrusionDetection in RBAC-administered Databases. InProceedings of the 21st Annual Computer SecurityApplications Conference (ACSAC), pages 170–182, 2005.

[3] A. Kamra, E. Bertino, and E. Terzi. Detect-ing Anomalous Access Patterns in RelationalDatabases. The International Journal on VeryLarge Data Bases, 17(5):1063–1077, 2008.

[4] U.P. Rao, G.J. Sahani, and D.R. Patel. Ma-chine Learning Proposed Approach for DetectingDatabase Intrusions in RBAC Enabled Databases.In The International Conference on Comput-ing Communication and Networking Technologies(ICCCNT), pages 1–4, 2010.

[5] Y. Hu and B. Panda. Identification of MaliciousTransactions in Database Systems. In Proceedingsof the International Database Engineering andApplications Symposium (IDEAS ’03), pages 329–335, 2003.

ISeCure

Page 13: A Hybrid Approach for Database Intrusion Detection at ... › article_39158_f18c739d9... · a detection method that is based on anomaly detection and uses data mining to nd dependency

July 2014, Volume 6, Number 2 (pp. 155–167) 167

[6] Y. Hu and B. Panda. A Data Mining Approachfor Database Intrusion Detection. In Proceedingsof the ACM Symposium on Applied Computing,pages 711–716, 2004.

[7] A. Srivastava, S. Sural, and A.K. Majumdar.Weighted Intra-transactional Rule Mining forDatabase Intrusion Detection. In Proceedings ofthe Pacific-Asia Knowledge Discovery and Data(PAKDD), pages 611–620, 2006.

[8] V.C.S. Lee, J.A. Stankovic, and S.H. Son. Intru-sion Detection in Real-time Database SystemsVia Time Signatures. In Proceedings of the 6thIEEEReal Time Technology and Application Sym-posium (RTAS), pages 124–133, 2000.

[9] Y. Hu and B. Panda. Mining Inter-transactionData Dependencies for Database Intrusion Detec-tion. In Innovations and Advances in ComputerSciences and Engineering, pages 67–72, 2010.

[10] W.L. Low, J. Lee, and P. Teoh. DIDAFIT: Detect-ing Intrusions in Databases Through FingerprintTransactions. In Proceedings of the 4th Inter-national Conference on Enterprise InformationSystems, pages 264–269, 2002.

[11] E. Bertino, A. Kamra, and J.P. Early. Profil-ing Database Application to Detect SQL Injec-tion Attacks. In IEEE International Perfor-mance Computing and Communications Confer-ence (IPCCC), pages 449–458, 2007.

[12] R. Sekar, A. Gupta, and J. Frullo. Specification-based Anomaly Detection: A New Approach forDetecting Network Intrusions. In Proceedingsof the 9th ACM Conference on Computer andCommunications Security, pages 265–274, 2002.

[13] R. Agrawal, T. Imieliiski, and A. Swami. MiningAssociation Rules Between Sets of Items in LargeDatabases. In Proceedings of the 1993 ACM SIG-MOD International Conference on Managementof Data, pages 207–216, 1993.

[14] R. Agrawal and R. Srikant. Mining SequentialPatterns. In Proceedings of the 11th Interna-tional Conference on Data Engineering, pages3–14, 1995.

Mostafa Doroudian received theB.S. degree in computer engineer-ing (software) from Mazandaran Uni-versity of Science and Technology,Mazandaran, Iran, in 2010, and theM.S. degree in information technol-ogy engineering (information secu-

rity) from Amirkabir University of Technology (TehranPolytechnic), Tehran, Iran, in 2012. His research in-terests include areas of database security, intrusiondetection, network security, computer forensics, andalert correlation systems.

HamidReza Shahriari is currentlyan assistant professor at the Depart-ment of Computer Engineering andInformation Technology in AmirkabirUniversity of Technology, Tehran,Iran. He received his Ph.D. in com-puter engineering from Sharif Univer-

sity of Technology in 2007. His research interests in-clude information security especially in vulnerabilityanalysis, security in e-commerce, trust and reputationmodels, and database security.

ISeCure