Bribes EGovREPORT

  • View
    216

  • Download
    1

Embed Size (px)

DESCRIPTION

MEXL v2 Getting Started Tutorial 130605

Text of Bribes EGovREPORT

  • Business Intelligence Using Data Mining Bribe Payments For Land Registrations

    Submitted By: Hussain Boltwala 61210213

    Karthik Vemparala 61210505

    Naveen Kumar HS 61210144

    Salman Siddiqui 61210626

    Smita Chakravorty 61210558

  • BIDM Bribing Behaviour for e-Governance Services

    P a g e | 2

    INTRODUCTION 3

    PROBLEM STATEMENT 3

    DATA PREPARATION AND VISUALIZATION 4

    THE PREDICTION METHOD 15

    CLASSIFICATION TREES 15

    K- NEAREST NEIGHBOUR 16

    NAVE BAYES 17

    CONCLUSION & FURTHER ANALYSIS 18

  • BIDM Bribing Behaviour for e-Governance Services

    P a g e | 3

    Introduction

    The project is based on the data collected over a period of time from the customers who have used e-

    Governance services for the land registration process. This framework will be useful for intermediaries who

    can target customers based on their demographic criteria. These intermediaries can charge a fee, that is

    typically lesser than the bribe paid, and provide a convenient and fast service to people who are most

    susceptible to pay bribes. This is similar to freelance notaries outside the court houses who charge a fee to

    customers for guiding them through any legal process. The framework will also provide insights into

    customer behaviour and the effectiveness of e-Governance initiatives.

    This project also analyses the relationship between customers who paid bribes and the differentiating

    factors like age, level of education, place etc that significantly contribute to payment of bribes. Our analysis

    is based on Land Registration transactions carried out in Delhi, Haryana and Gujarat. Data was collected via

    a hand written survey with people availing the survey being interviewed. This has resulted in a lot of

    misclassified data and the group had endeavoured to clean and interpret as many data points to ensure a

    robust model is obtained.

    Problem Statement Predict whether a person availing the e-Governance Service will pay a bribe of over INR 100.

  • BIDM Bribing Behaviour for e-Governance Services

    P a g e | 4

    Data Preparation and Visualization In order to better understanding the key predictors for susceptibility to bribing behaviour, different metrics

    were analysed whether the bribe was paid (categorical) and the amount of bribe paid (numerical). Some

    insights are presented below:

    1 Below Rs.500

    2 Rs. 500-1000

    3 Rs.1000-2999

    4 Rs.3000-4999

    5 Rs.5000-6999

    6 Rs.7000-9999

    7 More than Rs.10,000

    The amount of bribe paid by people in higher income brackets (7000-9999 and more than 10,000) is higher in both

    Delhi and Haryana.

    Gujarat seemed to have the least amount of bribing culture, where Delhi and Haryana fared badly on most

    markers. This could possibly indicate that affluent people are generally targeted by officials. This is also

    depicted in the bar chart below which depicts the number of people who paid bribes (code -1, in pink) vs.

    those who did not (code-2, in blue). Gujarat has the largest number of non-bribe payers.

  • BIDM Bribing Behaviour for e-Governance Services

    P a g e | 5

    From the above plot, we see that most people in Delhi and Haryana have paid bribes between Rs 100-200.

  • BIDM Bribing Behaviour for e-Governance Services

    P a g e | 6

    In Delhi, the number of people who did pay a bribe increased were the once who were more infrequent in availing

    the services offered by the TCC. However, in Haryana, there is no information on the service availing frequency and

    bribing pattern. The total amount of bribe paid also increases if the services are availed less frequently as seen from

    the bar graph below.

    1 Once in 3 Months

    2 Once in 6 Months

    3 Once in a Year

    4 Less than once a year

    5 Others

  • BIDM Bribing Behaviour for e-Governance Services

    P a g e | 7

    In Delhi, more number of people paid a bribe on their first trip, but this number decreases as the number of trips to

    the TCC increased. Haryana doesnt really follow any discernible pattern.

    It may be that people who frequented the office at least once every 3 months and made more than 1 trip,

    paid very little in bribes. This may indicate that people who have a high level of familiarity (and perhaps

    have built relationships with officials) dont pay too much to get their work done. Or they may simply not be

    able / willing to pay a bribe and hence have to make more number of trips to avail the same services.

  • BIDM Bribing Behaviour for e-Governance Services

    P a g e | 8

    If we look at box plots of the age of an individual to see whether s/he has paid a bribe greater than Rs. 100,

    we dont see any discernible pattern.

    But if we plot bribe amount and try and classify in different age brackets, we find that mostly elderly people

    end up paying bribes less than Rs. 200

  • BIDM Bribing Behaviour for e-Governance Services

    P a g e | 9

    1 Illiterate

    2 Literate without Education

    3 Below Primary

    4 Primary

    5 Middle

    6 Matric/Secondary

    7 Higher Secondary/Intermediate

    8 Non-Technical Diploma

    9 Technical Diploma

    10 Graduate & Above

    11 Others

    The median amount of bribe paid across education level remains between 100-150 with the only exception of the

    individuals who are literate without education. The amount of bribe paid by this group is higher.

    The above plot indicates that semi-urban areas generally paid much higher in bribes than either rural or

    urban areas.

  • BIDM Bribing Behaviour for e-Governance Services

    P a g e | 10

    This came as no surprise that larger pockets of land attracted relatively higher amount of bribes.

  • BIDM Bribing Behaviour for e-Governance Services

    P a g e | 11

    Distance from the Land Registry office did not seem to play any significant role in the bribing patterns,

    however wage loss did i.e. the higher the loss of wage, higher the bribe amount.

    Whilst total cost of availing the service was seen as an important aspect, this was ultimately ruled out since

    this included the total amount paid by the user, including the land registry charges.

  • BIDM Bribing Behaviour for e-Governance Services

    P a g e | 12

    Surprisingly, amount of bribes were closely tied to satisfaction levels, with Delhi and Haryana reporting the

    most data. This could indicate that bribing is considered a part of any government transaction and it has no

    bearing on the overall perceptions of satisfaction.

  • BIDM Bribing Behaviour for e-Governance Services

    P a g e | 13

    From a service providers perspective, the most amount of bribes given were under Rs. 100. This is not

    considered the target market and only those people who would pay over Rs. 100 are being considered in

    this study.

    Also, most bribes were paid in order to expedite the process thus it was logical to look at predictors that

    would cause the individual to spend more time at the land registry office.

    Total Bribes Paid

  • BIDM Bribing Behaviour for e-Governance Services

    P a g e | 14

  • BIDM Bribing Behaviour for e-Governance Services

    P a g e | 15

    The Prediction Method

    Classification Trees

    Since there are a lot of variables, we decided to run a classification tree to find out what are the most

    relevant predictor variables. Wage loss, service charges, wait time, total payment, age, level of education,

    occupation, mode of travel, no. of trips made to the TCC, travel time, and reason for bribe payment (this is

    largely to expedite the process).

    Certain predictors above are not relevant for a prediction model. For example, reason for bribe payment will not

    apply as it will not be available at the time of prediction. Also, a person who has already paid a bribe, may not want to

    avail the services of an intermediary. However a person who might have tried to avail the services previously but had

    a long wait time might be more inclined to use the services of a broker.

    0.5

    90 72.5

    0260 1.5 5.5

    0170 175

    0 11.5

    1 0Sub Tree beneath

    0 1 0

    travel_mode

    serv_charge wait_time

    total_paymen expedite_pro Occupation

    serv_charge wage_loss expedite_pro

    405 255

    376 29 133 122

    22 7 43 90 72 50

    6 1 18 25 15 35

    Full Tree

    Pruned Tree

  • BIDM Bribing Behaviour for e-Governance Services

    P a g e | 16

    K- Nearest Neighbour

    Running a K-NN with the above predictor variables, we get an error rate of 12% on the validation data and 11% on

    the test data.

    AgeLev_Educatio

    nOccupation travel_mode no_of_trip travel_time w ait_time w age_loss serv_charge

    expedite_pro

    c

    total_paymen

    t

    Variables

    # Input Variables 11

    Input variables

    Output variable Bribe > 100

    Training Data scoring - Summary Report (for k=1)

    0.5

    Actual Class 1 0

    1 168 0

    0 0 932

    Class # Cases # Errors % Error

    1 168 0 0.00

    0 932 0 0.00

    Overall 1100 0 0.00

    Validation Data scoring - Summary Report (for k=1)

    0.5

    Actual Class 1 0

    1 80 35

    0 47 498

    Class # Cases # Errors % Error

    1 115 35 30.43

    0 545 47 8.62

    Overall 660 82 12.42