Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
Scalable Computational Approach to Understanding IT Innovations
Ping Wang
August 7, 2010
OCIS, RM, OMT, TIM Professional Development Workshop
Making the Most of Digital Text Data:Opportunities, Challenges, and Best Practices
2
My Research Questions What makes an IT innovation popular? What impact do popular ITs have on organizations?
3
Counts Tell Stories, But Not Enough to Capture Richness of Digital Text Data
TM=TreemapsCT=Cone TreesHT=Hyperbolic Trees
Tra
de P
ress
A
rtic
les
Aca
dem
ic
Pap
ers
Pat
ents
Shneiderman, B., Wang, P., Qu, Y., and Dunne, C. 2010. "Analyzing Trends in Science & Technology Innovation," Human-Computer Interaction Lab (HCIL) 27th Annual Symposium, University of Maryland, College Park, MD.
4
The BIG Picture
Social Structure
Social Cognition
IT Innovation
EntityEvent
Relation
SentimentValues
Sensemaking
PopularityAdoption/Sales
Policy
5
SOA
Cloud Computing
BPO
Semantic Web
Portable Personality
RFID
Tera-architectures
Business Intelligence
Mashup
Ajax
Web2.0
DRM
Ultramobile Devices
Distributed Encryption
Chatbots
Thin Provisioning
CRM
VoIP
SaaS
OSS
Application Quality Dashboards
Identity Management
SCM
We Have Lots of ITs, But …
6
… Little and Dated Understanding
19931998
7
Digital Text Data Downloaded full-text articles published in
1998-2007 from six magazines: ComputerWorld & InformationWeek BusinessWeek & The Economist Newsweek & US News and World Report
Extracted ~220,000 paragraphs containing 50 IT innovations.
8
IT Innovations Included in Analysis
YouTubeYouTubeLinuxLinuxWikipediaWikipediaKnowledge managementKMWikiWikiiPodiPodWi-FiWiFiiPhoneiPhoneWeb servicesWebServInstant messagingIMWeb 2.0Web2GroupwareGrpwareVirtual private networkVPNGlobal positioning systemGPSVirtualizationVirtualizationExpert systemExpertSysUtility computingUtiCompEnterprise resource planningERPTablet PCTabletPCElectronic data interchangeEDITelecommutingTelecommuteElectronic commerceeComService oriented architectureSOAElectronic businesseBizSocial networkingSocNetData warehouseDWSalesforce automationSFADecision support systemDecisionSSSupply chain managementSCMDigital subscriber lineDSLSmart cardSmartCardDistance learningDLearnRadio frequency identificationRFIDDigital cameraDigiCamPersonal digital assistantPDACustomer relationship managementCRMOutsourcingOutsourceCloud computingCloudComOpen source softwareOSSBusiness process reengineeringBizProReenOnline analytical processingOLAPBluetoothBluetoothNeural netNeuralNetBlogBlogMySpaceMySpaceBusiness intelligenceBIMP3 playerMP3Application service providerASPMultimediaMultimediaArtificial intelligenceAI
9
Co-Occurrence of IT Innovations
“Over the past few years, we have seen the ERP vendors-led by SAP-move into different business areas,” says Byron Miller, an analyst with the Giga Information Group. “The competitive advantage of just having ERP has diminished. The next big thing beyond ERP is supply-chain management.”
Links between groupware and ERP applications speed users' access from within a groupware application to key business data, such as purchase orders, inventory, customer histories, and other supply-chain information.
Hie
rarc
hica
l Clu
ster
ing
(co-
occu
rren
ce)
Cluster 1
Cluster 2
Cluster 3
Cluster 4
Cluster 5
Cluster 7
Cluster 6
Cluster 8
Cluster 1
Cluster 2
Cluster 3
Cluster 4
Cluster 5
Cluster 7
Cluster 6
Cluster 8
Cluster 1
Cluster 2
Cluster 3
Cluster 4
Cluster 5
Cluster 7
Cluster 6
Cluster 8
VPN
DLearn
DSL Telecommute
SmartCard
PDA
Multimedia
WiFi
IM
DigiCam
GPS
TabletPC
Bluetooth
MP3
RFID
Web2.0
WikiWikipedia
YouTube
SocNetMySpace
Blog
iPhone
iPod
Linux
WebServ
SOA
UtiComp
CloudCom
Virtualization
OSS
ExpertSys
NeuralNet
AI
OLAP
DecisionSS
Outsource
eCom
ERPCRM
eBiz
ASP
EDI
Grpware
SFASCM
DW
KM
BI
BizProReen
VPNVPN
DLearn
DSL Telecommute
SmartCard
PDA
Multimedia
WiFi
IM
DigiCam
GPS
TabletPC
Bluetooth
MP3
RFID
Web2.0
WikiWikipedia
YouTube
SocNetMySpace
Blog
iPhone
iPod
Linux
WebServ
SOA
UtiComp
CloudCom
Virtualization
OSS
ExpertSys
NeuralNet
AI
OLAP
DecisionSS
Outsource
eCom
ERPCRM
eBiz
ASP
EDI
Grpware
SFASCM
DW
KM
BI
BizProReen
DLearn
DSL Telecommute
SmartCard
PDA
Multimedia
WiFi
IM
DigiCam
GPS
TabletPC
Bluetooth
MP3
RFID
Web2.0
WikiWikipedia
YouTube
SocNetMySpace
Blog
iPhone
iPod
Linux
WebServ
SOA
UtiComp
CloudCom
Virtualization
OSS
ExpertSys
NeuralNet
AI
OLAP
DecisionSS
Outsource
eCom
ERPCRM
eBiz
ASP
EDI
Grpware
SFASCM
DW
KM
BI
BizProReen
12
Kullback-Leibler (KL) Divergence KL divergence measures difference between two
probability distributions.
Symmetrized KL divergence matrix by averaging divergence values in each direction.
( ) log( ( ) / ( )) ( ) log( ( ) / ( ))( || )
2
P i P i Q i Q i Q i P ii iD P QKL
( || ) ( ) log( ( ) / ( ))D P Q P i P i Q iKL i
Cluster 1
Cluster 2
Cluster 3
Cluster 4
Cluster 5
Cluster 1
Cluster 2
Cluster 3
Cluster 4
Cluster 5
Cluster 1
Cluster 2
Cluster 3
Cluster 4
Cluster 5Hie
rarc
hica
l Clu
ster
ing
(KL
Div
erge
nce)
OLAP
UtiComp
SCMCAD EDI Virtualization
SFA
DW
Grpware
AI
BISOA
KM
ERPASPCRMeBiz
WebServ
Outsource eComLinux
DLearn
ATM
RFID
VPN
DSL
SmartCardTelecommute
TabletPC GPS
PDA
WiFi
Multimedia
iPodiPhone
DigiCam
MP3
Web2.0
Blog
SocNet
Wiki
Wikipedi
OSS
IMMySpace
YouTube
Bluetooth
OLAP
UtiComp
SCM
OLAP
UtiComp
SCMCAD EDI Virtualization
SFA
DW
Grpware
AI
BISOA
KM
ERPASPCRMeBiz
WebServ
Outsource eComLinux
DLearn
ATM
RFID
VPN
DSL
SmartCardTelecommute
TabletPC GPS
PDA
WiFi
Multimedia
iPodiPhone
DigiCam
MP3
Web2.0
Blog
SocNet
Wiki
Wikipedi
OSS
IMMySpace
YouTube
Bluetooth
CAD EDI VirtualizationSFA
DW
Grpware
AI
BISOA
KM
ERPASPCRMeBiz
WebServ
Outsource eComLinux
DLearn
ATM
RFID
VPN
DSL
SmartCardTelecommute
TabletPC GPS
PDA
WiFi
Multimedia
iPodiPhone
DigiCam
MP3
Web2.0
Blog
SocNet
Wiki
Wikipedi
OSS
IMMySpace
YouTube
Bluetooth
15
Benefits of This Approach Scalable
More IT concepts to study Monitor and understand popularity
More data sources Represent reality by pooling data Compare to exam segments of communities
Dynamic Multiple periods
Reveal what exactly is diffusing Visualize species and speciation of innovations
16
The BIG Picture
Social Structure
Social Cognition
IT Innovation
EntityEvent
Relation
SentimentValues
Sensemaking
PopularityAdoption/Sales
Policy
17
Detecting Entities and Sentiments Adapt existing tools to our domain Develop our own tools Crowdsource evaluation
Question: Do MTurk annotators confirm the intuitions of the expert annotators who designed the tool?
Answer: Yes…quickly and cheaply!
Sayeed, A., Meyer, T., Nguyen, H., Weinberg, A., and Buzek, O. 2010. Crowdsourcing the Evaluation of a Domain-Adapted Named-Entity Recognition System, in Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics, Los Angeles, CA, pp. 345–348.
19
Takeaways Computational/automated approach is
scalable, but needs significant adaptation & development.
Effective solution comes from effectively combining the best of what humans and computers can offer.
Crowdsourcing is a cheap and quick evaluation method, but design must strive to be straightforward.
20
Thanks from Teams PopIT & STICK
Thanks to National Science Foundation for grants IIS-0729459 and SBE-0915645
http://terpconnect.umd.edu/~pwang/ [email protected]
STICK: Science & Technology Innovation Concept Knowledge-base
PopIT: Scalable Computational Analysis of the Diffusion of Technological Concepts
Making the Most of Digital Text Data:Opportunities, Challenges, & Best Practices
Emmanuelle VaastBonnie NardiEleanor Wynn
Cathy UrquhartPing Wang
22
Digital Text Data Are … ubiquitous … relatively easy to obtain often reliable diverse situated in rich contexts – not easy to
capture and analyze voluminous shrouded in ethical uncertainty
23
How to Make the Most of It? How to ethically collect digital text data? How to efficiently collect digital text data? What theories are particularly congenial
with digital text data? What analytical methods are especially
effective for what type of data? …?