Download pdf - Technology Vertical Lead Generation · 2018-07-29 · 1 Technology Vertical Lead Generation 1K.S.Hemapriya , 2Anshul Saxena , 3Dr.N.Sa ngeetha , 1Student , Department of Management

1

Technology Vertical Lead Generation

1K.S.Hemapriya ,

2Anshul Saxena,

3Dr.N.Sangeetha,

1Student, Department of Management,Kumaraguru College of Technology,

Coimbatore 641049, Tamil Nadu, India 2Assistant Professor, Department of Management ,Kumaraguru College of Technology,

Coimbatore 641049, Tamil Nadu, India 3Senior Associate Professor, Department of Mechanical EngineeringKumaraguru College of Technology,

Saravanampatty,Coimbatore: 49, Tamil Nadu, India. [email protected]

Abstract. This study is to build a model which generate leads from major project posting portals like

freelancer,based on thetechnologyverticalsof the company. It is highly important to reach out the potential

target customer than by selling to everyone in the market hence identifying the right project to bid is very

essential.The study also finds the area of skill set enhancement by finding the frequently demanded skills

along with the key skills of the technical team, which increases their scope of getting more projects.

Keywords: Competition, lead generation, model, market, skill set, services, target customer.

1.INTRODUCTION

Lead generation which is creating customer

interest or enquiry into services of a business in

IT, is becoming challenging these days, because

of high competition in the market and fast

advancement in technologies and innovations.

Lead generation helps to make customers show

interest towards products or services of a

company or to find the prospective project for the

company. It is highly important to reach out the

potential target customer than by selling to

everyone in the market. Analyzing the market

trend and possessing the right skilled employees,

help provide additional services to the client and

also to get new clients. This study aims at

creating a model that generate leads for the

technical team at an IT services company based

on their technology verticals by collecting project

data from major project posting portals like

freelancer. Thisstudy also finds the area of skill

set enhancement of theteam which increases their

chances of getting more projects.

2. REVIEW OF LITERATURE.

International Journal of Pure and Applied MathematicsVolume 119 No. 17 2018, 2687-2697ISSN: 1314-3395 (on-line version)url: http://www.acadpubl.eu/hub/Special Issue http://www.acadpubl.eu/hub/

2687

2

1.1Ku Chun Kit and Dr. David Rossiter in their

paper" Business Lead Qualification by Online

Information Scraping" have created an

automated lead qualifier and online information

scraping. they have collected real sales data from

a partnering company, then classified the

companies into two groups depending on whether

a deal was made with that company or not. The

system retrieved company website URL and

scraped information from the company websites

and social network profile pages. The information

was then cleaned up and used to train three

models. Predictions generated by models were

combined using an algorithm to collectively

qualify new business leads

2.2Jeffrey Kohl Wilkins, Jack Marshall

Zoken in their invention "Internet-enabled lead

generation" has created a model of generating

intender leads in a distributed computer system

which includes the steps of identifying purchase

indicators and extracting prospect identifiers from

the purchase indicators. Purchase indicators are

pieces of data that represent a potential future

purchase by a prospect. For example, an online

classified advertisement selling an automobile is

a purchase indicator for a potential future

purchase of a new car by the old car seller. The

1Kit, K. C., &Rossiter(2017), D. Business Lead

Qualification by Online Information Scraping.

2Wilkins, J. K., &Zoken, J. M. (2005). U.S. Patent No.

6,868,389. Washington, DC: U.S. Patent and Trademark

Office.

prospect identifier, such as a telephone number or

email address, uniquely identifies the prospect

likely to make the future purchase. Preferably, the

method also contains the steps of obtaining full

contact information for the prospect from a

profile database, applying a predictive model to

the prospects to select intender leads, and

transferring the intender leads to an interested

party, such as a direct marketing service or sales

force. An intender lead is a lead for a person

intending to make a purchase of a particular

product or service within a given time period.

Only some of the prospects are actual intenders.

3.3Richard Baron Penman, Timothy

Baldwin, David Martinez in their paper "Web

Scraping Made Simple with Site Scraper" has

created their tool Site Scraper which gets

automatically gets learning XPath-based patterns

to identify where a user-defined list of strings

occurs in a given web page set. To train, Site

Scraper is given a small set of example URLs

from a given website and the strings that the user

wishes to scrape from each. This is used to

generate an XPath query describing where to find

the desired strings, which can be applied to scrape

these from any webpage with a similar structure.

Importantly, the user interacts with Site Scraper

at the level of content, not mark-up, so no

specialist knowledge is required, and if the

structure of a website is changed but the content

3Penman, R. B., Baldwin, T., & Martinez, D. (2009). Web

scraping made simple with sitescraper.

International Journal of Pure and Applied Mathematics Special Issue

2688

3

stays constant, then Site Scraper can

automatically retrain its model without human

intervention.

4.4John J. Salerno and Douglas M.

Boulware in their invention "Method and

apparatus for improved web scraping" has

found to enable the parser component of a web

search engine to adapt in response to frequent

web page format changes at web sites. Parser

“learns” from a set of defined HTTP links, how to

find and parse web pages returned from a search

engine query. The invention intelligently locates

various token/strings that will correctly extract

attributes associated with the returned item.

Present invention may operate either

automatically or in a user-assisted fashion.

5.5Dr. N. FathimaThabassum, in her

“Study on The Freelancing Remote Job

Websites" has studied about the various

freelancing websites available online, its working,

realities of online job market and the services

offered by them. she has also studied the top free

lancing websites and has made a comparative

study on various aspects

4Salerno, J. J., &Boulware, D. M. (2006) Method and

apparatus for improved web scraping,. U.S. Patent No.

7,072,890. Washington, DC: U.S. Patent and Trademark

Office.

5Thabassum, N. F. (2013). A Study on The Freelancing

Remote Job Websites. International Journal of Business

Research and Management, 4, 42-50.

6 .6Mr.HimanshuKunwar, in his model

“Logistic Regresssion in R” in kaggle to predict

the purchase of products based on social media

advertising based on various factors

7.7Mr. Salem Marafi in his model to do

Market Basket Analysis with R for the

groceries dataset.1010

S.Arunadevi and

VijetaIyer(2017)He uses Apriori Algorithm to the

Analysis

3.RESEARCH METHODOLOGY

CRISP DM - Cross-Industry Process For Data

Mining

3.1. BUSINESS UNDERSTANDING

3.1.1. Business Objectives

● To generate project leads for the

technology verticals in the company.

● To identify the commonly used

technologies in market along with the

company’s existing technologies.

3.1.2 Determine data mining goals

● To Generate leads based on the skill set of

the team.

● To Apply logistic regression to train and

predict Acceptance of project..

6Himanshu, Kunwar,. (2018, January 05). Logistic

Regresssion in R. Retrieved April 09, 2018, from

https://www.kaggle.com/suncor/social-adv

7. Market Basket Analysis with R. Retrieved April 11,

2018, from http://www.salemmarafi.com/code/market-

basket-analysis-with-r/


2689

4

● To Analyse the commonly occurring skill

set using apriori algorithm in the project

database.

3.1.3. Project plan

● Identify the technology verticals in the

team

● Collect the project data from project

website like Freelancer.com.

● Data manipulation and data preparation

for the model input

● Build a model that predicts the acceptance

of project on the test data based on the

train data.

● Find the related skills to the technology

verticals.

3.1.4.Business success criteria

● Successful lead conversion and project

confirmation

● Efficient model for predicting leads.

3.2.DATA UNDERSTANDING

3.2.1.Initial Data: The appropriate fields for data

collection are chosen and data is scrapped from

job portals like freelancer, up work and guru in

separate tables.

3.2.2 Data Description: The project posts from

Major Job posting websites are extracted which

has fields like title, description, skill set required,

Bid, price, days left, location, ID.

3.2.3.Data Quality: The quality of the data has to

be checked. Missing values and fields was

checked and replaced with NAs. Outliers in bid

value was identified and removed. Derived fields

like continent were derived from country and

location.

3.3.DATA PREPARATION:

3.3.1. Data set description:

The fields common to the three websites

include

1. Title of the project

2. Description about the project

3. Skill set required

4. Bid: Average bid by other freelancers

5.Days Left: Active days of the project

6.Location: Location of the project employer

7.ID: project Id

8. Verified: If the project employer is verified or

not.

3.3.2. Data selection: Major project websites are

chosen and the data is collected from website that

has access to data scrapping.

3.3.3. Data Cleansing: redundant and fake

projects are removed and outliers are eliminated.

3.3.4. Integrate data: Data from all three

websites has to be merged and integrated for

analysis.

3.3.5. Construct data: Normalise the data

(multiple skill requirement give for each project

is normalized) and the derived data is like

continent is extracted from country.


2690

5

3.3.6. Format data: The fields are converted (all

bid values are converted in dollars) and kept in

common formats.

3.4. MODELLING

3.4.1 Modelling Technique. Model to generate

leads based on the skill set of the team is created

based on the matching of the skills in skillset

field.1111

Irfan Ahmed Mohammed Saleem, Dr. S.

Jaisankar (2018To predict the acceptance of the

leads, Multiple logistic Regression is used,

which takes multiple factors (skills, average bid

values, verified) to decide on the acceptance

value. As these three fields are the primary

factors considered before bidding a project.

3.4.2. Test design

● Data is sampled from the master data by

selecting equal projects from each

category. The data is split into training

(70%) and testing data (30%). The

acceptance value of the train data is got

from the team member. Based on the

value from the train data the acceptance

value is predicted in the test data.

3.4.3. Model

● Model which predicts the acceptance of

project based on the skill set of the team.

● The skillset specialization of the team,

average bid value, verified are

independent factors(as the team mostly

selects projects based on these factors)

that are used to build the model. Based on

the co-efficient of co-relation, factors

having high co-relation(Java, My SQL,

mongo DB, Apache, Avg Bid Value) are

considered as factors determining

acceptance of project and the remaining

are eliminated.

3.4.4. Model Assessment:

● The model is assessed using Mean

Absolute Error(MAE)

This is found to be 81.67% for this model.

● And Receiver Operating Characteristics

(ROC Curve)which has 89.48% of the

area under the curve

3.5. EVALUATION

The model is assessed by checking on the

conversion of leads periodically. The data is

collected periodically is assessed based on its

conversion rate and a process review is made.


2691

6

3.6. DEPLOYMENT

3.6.1. Deployment plan: Deployment is done by

registering the company as a freelancer at the

freelancing websites and assign team to bid for

projects and to create a bid writer for writing

bids.

3.6.2. Monitoring and maintenance plan.

Periodically the data is scrapped form the

websites to check if there is a need to upgrade the

technical team or to include technology vertical.

4.MODEL

4.1. Model to extract the project leads

according to skill set of the team.

A model is created in python which matches the

skillset of the team members with the skillset

required for the projects and segregates the

projects which matches. The entire

projects(45580) in freelancer website on Feb 26th

2018 was scrapped and the model is run on the

project data collected.

import requests

import pymysql

import sys

import csv

Con = pymysql.connect(host="127.0.0.1",

user="root", password="", db="skillset",

autocommit=True, charset='utf8')

Cursor = Con.cursor()

cur = Cursor.execute("select

group_concat(skills) from UnionSkills")

rows = Cursor.fetchall()

for row in rows:

row = list(row)

for iinrow:

row = i.split(',')

print(row)

for skill in row:

query= "select * from

projecttablefreelancer where skillset like

'%"+str(skill)+"%'"

Cursor = Con.cursor()

Cursor.execute(query)

rows = Cursor.fetchall()

print(skill, len(rows))

if len(rows):

with open('skills_'+skill+'.csv', 'a',

newline='') as f:

writer = csv.writer(f, delimiter =',')

data_rows = []

for data in rows:

d = [str(i).replace('\n', '').strip() for

iindata]

d.append(skill)

data_rows.append(d)

try:

writer.writerows(data_rows)

except Exception as e:

print(e)

4.2. Model to predict acceptance of project

from the sampled data from the leads

generated.

The project segregated is transformed based on

the skill set known to the technology team

member(known=1,Not known-0), Avg bid value

and verified. The acceptance value is got for a

sampled data from the technical member whose

skill set is used is build the model. The data is

split as 70% of train data and 30% test data, a

model is built which is used to predict the

acceptance of 30% of the test data.

4.3.1.Logistic regression

Logistic regression is used to describe data and

to explain the relationship between one dependent

binary variable and one or more independent

variables. This article covers the case of a binary


2692

7

dependent variable—that is, where the output can

take only two values, "0" and "1",

4.3.2.Model

Interpretation

It is found that significant value is high(<0.05) for

Java, Mysql, No Sql Couch and Mongo, Apache,

Bid value in $ and Verified fields hence they are

considered as factors which affect acceptance.

4.3.3. Prediction and accuracy of the model

ROC Curve

The ROC curve is a fundamental tool for

diagnostic test evaluation. In a Receiver

Operating Characteristic (ROC) curve the true

positive rate (Sensitivity) is plotted in function of

the false positive rate (100-Specificity) for

different cut-off points. Each point on the ROC

curve represents a sensitivity/specificity pair

corresponding to a particular decision threshold.

A test with perfect discrimination (no overlap in

the two distributions) has a ROC curve that

passes through the upper left corner (100%

sensitivity, 100% specificity). Therefore the

closer the ROC curve is to the upper left corner,

the higher the overall accuracy of the test


2693

8

The model is assessed using Mean Absolute

Error(MAE)

This is found to be 81.67% for this model.

Interpretation:

The model is built using the train data and the

acceptance value is predicted for the test data.

The accuracy under Mean absolute error is found

to be 81.6% and the accuracy under ROC curve

the area under the curve is found to be 89.4%.

4.4.Frequently occurring skills sets along with

skills that have higher significant values with

the acceptance value in the predictive model.

4.4.1.Apriori algorithm.

Apriori is an algorithm for frequent item set

mining and association rule learning over

transactional databases. It proceeds by identifying

the frequent individual items in the database and

extending them to larger and larger item sets as

long as those item sets appear sufficiently often in

the database. The frequent item sets determined

by Apriori can be used to determine association

rules which highlight general trends in

the database. Association rules analysis is a

technique to uncover how items are associated to

each other. There are three common ways to

measure association.

Support tells popular is a skillset

Confidence says how likely skill Y is occurs

when skill X is occurs, expressed as {X -> Y}.

Lift says how likely skill Y occurs when skill X

occurs, while controlling for how popular skill

Y is.

Larger circles imply higher support, while red

circles imply higher lift

Association Rules can be created for the skills

that have higher significant value in the model

(Java, Apache, My SQL and Mongo DB) and

other frequently occurring skills sets in projects

along with these skillets is found. Association


2694

9

rules can also be created for PHP which is the

most demanded Language in the project database

which helps identify related skills frequently

demanded along with the key skills of the team.

4.4.2.Most frequently found skills along with

java in the project database.

Graph

Figure 4.15.Association rule graph-Java

Association rules

Table 4.7. Association rule-Java

Inference

● The most popular pattern of skill set is

JavaScript and Vue.js

● Another popular combination was of Big

data and Java

● If a skill Mathematics is demanded, it is

likely to demand matlab and mathematica

as well

5. FINDINGS

1. A model to predict the acceptance of project

for the member in technology team was built and

the accuracy was found to be 89.4%(ROC Curve)

2. Association rule for the skillset which has high

significant value for acceptance is found which

tells the most frequents occurring combination of

skillset that is demanded in projects, along with

those skillsets which influences the acceptance of

the project.


2695

10

5.1.Future work and enhancement.

1.A dynamic application which analyses all the

projects skillset according to the technology

vertical can be built, that suggests projects

dynamically which project has more chance of

acceptance.

5.2.CONCLUSION.

Thus a model which generate leads according to

the skillset of the team is created and market

analysis of skillset location wise and according to

frequency and avg bid value is done and insights

were found.

6.BIBILIOGRAPHY

[1] 1.Kit, K. C., &Rossiter, D. Business Lead


[2] Kit, K. C., &Rossiter, D. Business Lead


[3] Wilkins, J. K., &Zoken, J. M. (2005). U.S. Patent

No. 6,868,389. Washington, DC: U.S. Patent and

Trademark Office

[4] Penman, R. B., Baldwin, T., & Martinez, D.

(2009). Web scraping made simple with

sitescraper.

[5] Salerno, J. J., &Boulware, D. M. (2006). U.S.

Patent No. 7,072,890. Washington, DC: U.S.

Patent and Trademark Office.

[6] Thabassum, N. F. (2013). A Study on The

Freelancing Remote Job Websites. International

Journal of Business Research and Management,

4, 42-50.

[7] Himanshu, Kunwar,. (2018, January 05).

Logistic Regresssion in R. Retrieved April 09,

2018, from

https://www.kaggle.com/suncor/social-adv

[8] .HendraHerviawan, M. (2017, December 25).

Customer Segmentation using RFM Analysis

(R). Retrieved April 09, 2018, from

https://www.kaggle.com/hendraherviawan/custo

mer-segmentation-using-rfm-analysis-r/notebook

[9] https://public.tableau.com/views/TableauSuperst

oreRFMAnalysis_0/TableauSuperstoreRFMAnal

ysis.

[10] S.Arunadevi and VijetaIyer(2017)A Study on

M/M/C Queue Model under Monte Carlo

simulation in Traffic Model, International

Journal of Pure and Applied Mathematics, Vol.

116, no. 12, pp. 199-207,

[11] Irfan Ahmed Mohammed Saleem, Dr. S.

Jaisankar (2018), A Study On Kaizen Based Soft-

Computing In Electric Vehicle Manufacturing

Processes, International Journal Of Innovations

InScientificAndEngineeringResearch,

Vol5Issue5,.31-39.

[12] https://www.kdnuggets.com/2016/04/associati

on-rules-apriori-algorithm-tutorial.html.

[13] http://www.salemmarafi.com/code/market-

basket-analysis-with-r/

[14] http://www.citationmachine.net/items/707381

222/copy?copy-full-bib=true


2696

2697

2698