Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Integrating economic statistics in monitoring the 2030 AgendaIntegrating economic statistics in monitoring the 2030 Agenda
Plenary session 1
Emerging techniques in using big data
Integrating economic statistics in monitoring the 2030 Agenda
Memorable quotes from yesterday:
• “it’s my camera”
• “glad to see the room is packed with Statisticians”
• “timely, accurate, and comprehensive”
• need to think about “the future of economic statistics”
• “regional cooperation”
• “revisit the trade-offs”
• “don’t break the time-series!”
• “communication with users”
Integrating economic statistics in monitoring the 2030 Agenda
Sample survey of APES 2019 participants:
“what is the best opportunity and the biggest challenge for your country to use big data in monitoring the 2030 Agenda?”
Category of points made:
Integrating economic statistics in monitoring the 2030 Agenda
Reference papersPaper title Authors Panel member
Lessons for effective Public-Private
Partnerships (PPPs) from the use of mobile
phone data in Indonesian tourism statistics
Titi Kanti Lestari, Siim Esko
(BPS Indonesia and
Positium)
Titi and Siim
Subjective Happiness Index based on Twitter
in Indonesia
Asita Sekar Asri, Siti
Mariyah
(BPS Indonesia – District
Office in Sulawesi)
Sita
Measuring Commuting Statistics in Indonesia
Using Mobile Positioning Data
Amanda Pratama Putra,
Ignatius Aditya Setyadi,
Siim Esko, Titi Kanti Lestari
(BPS Indonesia and
Positium)
Ignatius, Titi
and Siim
Development of a Big Data Analysis System
(Case Study: Unemployment Statistics)
Maftukhatul Qomariyah
Virati, Rachmi Agustiyani,
Siti Mariyah, Setia Pramana
(BPS Statistic, District
Office in Kalimantan)
Vira
Integrating economic statistics in monitoring the 2030 Agenda
Integrating economic statistics in monitoring the 2030 Agenda
Part III – SDG Data Sources and Gaps
Integrating economic statistics in monitoring the 2030 Agenda
TRUST HOTSPOT
TRUST HOTSPOT
Integrating economic statistics in monitoring the 2030 Agenda
Discussion session (1): responding to user needs
• Policy-making needs (SDGs) in Indonesia driving big data developments in:
• MPD in commuting statistics
• MPD in tourism statistics
• Social media analysis for well-being/happiness
• Analysis system for unemployment statistics
• How do we plan/design methodologies and approaches for big data that meet user needs?
Integrating economic statistics in monitoring the 2030 Agenda
Methodology
Figure 1. Usual Environment (Saluveer, E)
Integrating economic statistics in monitoring the 2030 Agenda
Figure 2. Commuters algorithm
GS
BP
M 5
.0
GSBPM new areas for NSO in terms of big data
Sim
ilar –
NSO
has
exp
erti
se
Sim
ilar –
NSO
has
ex
per
tise
Dif
fere
nt –
Exp
erti
se li
es
in e
xter
nal
par
ties
Ver
y d
iffe
ren
t –
Exp
erti
se is
ext
ern
al
Ver
y d
iffe
ren
t –
MN
O
exp
erti
se
Ver
y d
iffe
ren
t –
Exp
erti
se is
ex
tern
al
Rel
ativ
ely
sim
ilar
–N
SO h
as
exp
erti
se
Rel
ativ
ely
sim
ilar –
NSO
has
ex
per
tise
Sim
ilar –
NSO
has
exp
erti
se
Sim
ilar –
NSO
has
ex
per
tise
Conceptually – NSO; Technically - external
GSBPM for cross-border tourism processing
BP
S –
Stat
isti
cs In
do
nes
ia BP
S –
ou
tpu
ts
and
va
riab
les
Posi
tiu
m –
Co
llect
ion
an
d p
roce
ssin
g d
esig
n
Posi
tiu
m -
Bu
ild c
olle
ctio
n a
nd
pro
cess
ing
Telk
om
sel -
Co
llect
Telk
om
sel -
Pro
cess
BP
S w
ith
Po
siti
um
–
Wei
ght
and
ag
greg
ate
BP
S
BP
S
BP
S
Positium – QA and metadata tools
User Needs in the 2nd and 3rd Level Government
Time
Publication
Once every several year
Coverage
Analysis
Till 2nd level government
Autonomy
Data
Sometimes manipulated (not by
BPS)
Subjectivity
Real Case
In every region
DataWhat, How, Why?
Wu et al. (2015) & Mitchell et al. (2013)
Literature
Python 2.7 – Tweepy –Streaming API – JSON
Collecting data
NLP – LabMT DictionaryAnalysis
Dataset
⚫ April till July, 2M overall, 67.589 tourism tweets and 3.800 health tweets.
SDGs
Integrating economic statistics in monitoring the 2030 Agenda
Integrating economic statistics in monitoring the 2030 Agenda
User Activity in the System
Working System Time Estimation Total Proposed System Time Estimation Total
Opening news site 30 seconds 30 seconds
Web scraping news site* @1 second30 news x 1 second = 30
seconds
Open the news link @30 seconds30 news X 30 seconds =
900 seconds
Copy Paste news into
excel@30 seconds
30 news X 30 seconds =
900 seconds
Compiling news 300 seconds 300 secondsDownload news that has
been collected*@5 seconds
6 sites X 5 seconds = 30
seconds
Skimming News to search
related news@120 seconds
30 news X 120 seconds =
360 secondsSearching news* 2 seconds 2 seconds
Opening jobs vacancy site 30 seconds 30 seconds
Web scraping job vacancy
site*@1 second
30 job vacancy x 1 second
= 30 seconds
Open the job vacancy link @30 seconds30 job vacancy X 30
seconds = 900 seconds
Copy Paste job vacancy
into excel@30 seconds
30 job vacancy X 30
seconds = 900 seconds
Compiling job vacancy 300 seconds 300 secondsDownload job vacancy
that has been collected*5 seconds 5 seconds
Opening & Searching
Index Google Trend site60 seconds 60 seconds
Monitoring Index Google
Trend*5 seconds 5 seconds
Opening & Searching
Tweet and Trending
Topic
60 seconds 60 seconds Monitoring Tweet* 5 seconds 5 seconds
Total Activity : 11Total : 3840 seconds ~
64 minutesTotal Activity : 7
Total : 107 seconds ~
1.783 minutes
Integrating economic statistics in monitoring the 2030 Agenda
Discussion session (1): responding to user needs
• Policy-making needs (SDGs) in Indonesia driving big data developments in:
• MPD in commuting statistics
• MPD in tourism statistics
• Social media analysis for well-being/happiness
• Analysis system for unemployment statistics
• How do we plan/design methodologies and approaches for big data that meet user needs?
Integrating economic statistics in monitoring the 2030 Agenda
Discussion session (2): data quality issues with new methodologies
Integrating economic statistics in monitoring the 2030 Agenda
Data Science Journal (May 2015): The Challenges of Data Quality and Data Quality Assessment in the Big Data; Era Li Cai and Yangyong Zhu, Fudan University, Shanghai, China
Integrating economic statistics in monitoring the 2030 Agenda
Data Science Journal (May 2015): The Challenges of Data Quality and Data Quality Assessment in the Big Data; Era Li Cai and Yangyong Zhu, Fudan University, Shanghai, China
Integrating economic statistics in monitoring the 2030 Agenda
Discussion session (2) - data quality issues with new methodologies
Methodology Main quality gains Main quality challenges
Happiness Index via Twitter
Timeliness; data re-use Processing; biases; cultural/language
Big Data Analysis System for unemployment statistics
Timeliness; accessibility Processing
Commuting statistics using MPD
Timeliness; relevance Reliability/ validity/ accessibility long term
Preprocessing
Tweet
URL + Symbol Removal
Formalizing
Cleaning after formalizing
Stemming
Stop Word Removal
Tweet
Emoji Extraction
Translating emoji to string
Cleaning text Final
Tweet
Life Satisfaction
Affections
Eudaimonia
Personally
Social
Processing
Final Tweet
Tokenization
Matching with labMT dictionary
The scoring :
T = containing N words in each areaf = frequency of ith wordh = happiness value of that wordpi = normalized frequency
2.068.232 tweet clean
SHI vs Happiness Index in Indonesia
highest
highest
lowest
lowest
67.5275.86
BPS >> 2017SHI >> 2018
??
?
Festival
Daily Ranting
Demak Regency 5135 5.4622
Lombok Timur Regency 4719 5.4542
Karang Asem Regency 2821 5.4018
Cilegon City 2873 5.7582
Bangka Regency 2444 5.7164
Kapuas Regency 8593 5.6888
highest
lowest
SHI By 3rd Level Government
Integrating economic statistics in monitoring the 2030 Agenda
Questions and comments?
Integrating economic statistics in monitoring the 2030 Agenda
<h1 class="title-large"> Balikpapan international airport sees 40% drop in passengers this Idul Fitri</h1>
<span class="day">Fri, June 7, 2019</span>
<p>The Sultan Aji Muhammad Sulaiman International Airport in Balikpapan, East Kalimantan saw a 40 percent drop in passengers this Idul Fitri holiday as travelers choose to depart from the newly developed Aji PangeranTumenggung Pranoto in Samarinda.</p>
Integrating economic statistics in monitoring the 2030 Agenda
Questions and comments?
Integrating economic statistics in monitoring the 2030 Agenda
Figure 4 Commuting Pattern from Household Survey and MPD
Survey MPD
Integrating economic statistics in monitoring the 2030 Agenda
Figure 5 Correlation of Household Survey and MPD
Integrating economic statistics in monitoring the 2030 Agenda
Questions and comments?
Integrating economic statistics in monitoring the 2030 Agenda
Discussion session (3): forming effective partnerships
Integrating economic statistics in monitoring the 2030 Agenda
Admin data providers
Official Statisticians
Data Scientists
Policymakers (SDGs)
Other?
Big data providers
Public consent/
trust
Eg Lawmakers, software providers, psychologists etc
The partnership web for big data in official statistics
PPP may be needed
Integrating economic statistics in monitoring the 2030 Agenda
Wider international perspectives
• UNESCAP workshop on big data, Jakarta June 2019
• GWG on big data emerging tools
• UNDP international conference on big data, Kigali, May 2019
• ISI WSC, Kuala Lumpur, August 2019
Integrating economic statistics in monitoring the 2030 Agenda
You draw the conclusions
On post-it notes, for each discussion topic give:
• your main conclusions from the debate and/or
• where regional cooperation is needed
• matching big data opportunities to user needs/SDGs
• managing big data quality effectively in official statistics
• forming effective partnerships for using big data