Knowledge Base Analysis and Content Management using Text Analytics
Internship Organization:
School:
Vrushali SawantStudent ID: 800900838
Presentation Flow
• Introduction
• Approach and Methods
• Results
• Discussions and Conclusions
9/2/2016 Vrushali Sawant 2
Introduction• About Me
• Project Problem Statement
• Internship Objectives
9/2/2016 Vrushali Sawant 3
Project Problem Statement
• To create a simple, intuitive search interface by extracting data from shared content across Intranet web-servers, SVN code repository and files on shared disks, using NLP, Machine Learning and Text Analytics leading to an insightful view of the team’s Knowledge Management content
9/2/2016 Vrushali Sawant 5
Internship Objectives
• This Internship project focusses on addressing following issues currently faced by my team with respect to knowledge management• Knowing what is available
• Helping users find relevant content
• Addressing outdated content
• Request for more interactive content
9/2/2016 Vrushali Sawant 6
Approach and Methods• Project Process Flow
• Data Description
• Tools and Techniques
• Project Timeline
9/2/2016 Vrushali Sawant 7
Project Process Flow
9/2/2016 Vrushali Sawant 8
Prepare Data• Data sources: Files
on Intranet web servers, code repository and shared folders
• Extraction: File Crawling, Web scraping
• Integration and Cleaning: Data transformation and integration
Model• Text Analytics,
NLP and Machine Learning: Generate Text Topic Models and Taxonomies
Business Rules• Refining model using
business expertise from the team
UI and Report• Visual
Analytics dashboard generation
Data Description
9/2/2016 Vrushali Sawant 9
Data sources
SHARED FOLDER
• Folder name and file
names
• File location on
shared drive
• File contents
• Date information
INTRANET FILES
• Lunch and learn topic
• Date information
• Link to lunch and learn
page
CODE REPOSITORY
• Folder and file names• File location• Data extracted from
code files• Purpose of
code• Date
information• Author
Tools and Techniques
9/2/2016 Vrushali Sawant 10
Data extraction tools• Base SAS
• SAS Information Retrieval Studio
Data modeling and UI• SAS Contextual Analytics
• SAS Visual Analytics
Techniques• Text mining
• Machine Learning
• Natural Language Processing
Project Timeline
8/10/2016 Vrushali Sawant 11
RESEARCH
• Research on the
existing tools used for
Text analytics
• Look for other
software/ techniques
that can be used for
content categorization
DESIGN
• Design the
Knowledge
management system
• Finalize the tools to
be used for the
project
DEVELOP AND TEST
• Extract data from
all data sources
• Integrate the data
into single SAS data
set or csv file
• Perform Text
analytics/ content
categorization
• Create User
Interface using
Visual Analytics
USER ACCEPTANCE
• Generate Use Cases
• Perform user
acceptance testing
• Go live
23rd May - 27th
May31st May - 6th Jun 7th Jun – 09th Aug
10th Aug – 23rd
Aug
Achieved Objectives
• Effectively Extracted, Integrated data from different sources
• Built Text Topic Model on the integrated data set
• Built Taxonomies for document categorization
• Generated Integrated UI for Knowledge search and analysis
9/2/2016 Vrushali Sawant 13
Text Topic Model
9/2/2016 Vrushali Sawant 14
Text Topic Model generated using SAS Contextual Analytics
Integrated UI: Discovering Themes
9/2/2016 Vrushali Sawant 15
Select Topic from the drop down list available on Knowledge Search Dashboard
Updated dashboard when we select a particular topic
Integrated UI: Search String
9/2/2016 Vrushali Sawant 16
Enter Search String in Search box available on Knowledge Search Dashboard
Updated Dashboard for the results of “Search String”
Integrated UI: Uncovering Outdated Content
9/2/2016 Vrushali Sawant 17
Click on the Document Repository report available on Knowledge Search Dashboard
Opens a “Document Repository Report”
Integrated UI: Discovering Relationships between Documents
9/2/2016 Vrushali Sawant 18
Select “Topic Term map”
Network diagram for Topic terms
Integrated UI: Help Documentation
9/2/2016 Vrushali Sawant 19
Click on “Help Documentation” link available on Knowledge Search Dashboard
Information window pops up on Knowledge Search Dashboard
Integrated UI: Before and After
9/2/2016 Vrushali Sawant 20
“The Before” of Team’s Knowledge Search Solution
“The After” of Team’s Knowledge Search Solution
Conclusions from Internship Experiences
• Text mining project business value to team• Shorten Team Onboarding time
• Re-use and avoid Re-work
• Improve quality of projects
• Internship value to me• Integrated professional skills development as well as the technical skills
enhancement
• Opportunity to solve real world problem using analytics
• Work with the industry experts in analytics
9/2/2016 Vrushali Sawant 22