Drone Emprit: Konsep dan Teknologi

  • View
    3.993

  • Download
    72

  • Category

    Internet

Preview:

Citation preview

Drone EmpritKonsep dan Teknologi

Ismail Fahmi, PhD.Drone Emprit

Media Kernels Indonesia

Ismail.fahmi@gmail.com

IT CAMP – BIG DATA & DATA MININGOnno Center, Situ Gintung - Jakarta

1 Oktober 2017

2

1992 – 1997 S1, Teknik Elektro, ITB2003 – 2004 S2, Computational Linguistics, Universitas Groningen, Belanda2004 – 2009 S3, Computational Linguistics, Universitas Groningen, Belanda

2000 – 2003 Inisiator IndonesiaDLN (Digital Library Network pertama di Indonesia)Mengembangkan Ganesha Digital Library (GDL)Mendirikan Knowledge Management Research Group (KMRG) ITBMembangun Digital Library ITB

2009 – Sekarang Engineer di Weborama, Perusahaan berbasis big data (Paris/Amsterdam)2012 – Sekarang Co-Founder Awesometrics, Media Monitoring & Analytics Company2014 – Sekarang Founder PT. Media Kernels Indonesia, a Natural Language Processing Company2015 – Sekarang Konsultan Perpustakaan Nasional, Inisiator Indonesia OneSearch2017 – Sekarang Dosen Tetap Magister Teknik Informatika Universitas Islam Indonesia

Ismail Fahmi, PhD.Ismail.fahmi@gmail.com

AgendaSESI 1• Konsep

• Tentang Drone Emprit• Data, tambang emas baru• Arsitektur & Fitur

• Teknologi• Crawler

• Twitter• Facebook• Online News

• Indexing• Sharding• Replication

• Analytics• Sentiment Analysis• Opinion Analysis• Term Extraction• Clustering• Social Network Analysis

• Visualisasi

SESI 2• Studi Kasus

• Analisis Pilkada Jawa Barat• Analisis Pro-Kontra PKI• Membaca Agenda Setting Media

• Demo• Membuat topik monitoring baru• Membaca hasil analisis• Edit sentimen • Social Network Analysis

3

Tentang Drone Emprit

4

Media Kernels a.k.a Drone Emprit

• Sebuah sistem untuk memonitor dan menganalisa media online dan sosial berbasis teknologi big data.

• Dikembangkan sejak tahun 2009 di Amsterdam, Belanda, oleh anak bangsa, melalui Media Kernels Netherlands B.V.

• Mulai tahun 2012 digunakan di Indonesia.• Berbasis teknologi Artificial Intelligent (Machine

Learning) dan Natural Language Processing(NLP).

• Dikenal sebagai ‘Drone Emprit’ dalam berbagai pemberitaan di TV dan Media Nasional.

5

Drone Emprit

6

2-8 Januari 2017

TEMPO

Topik: Peternakan hoax di media sosial

Media Kernels: • Diberitakan dengan

name ‘Drone Emprit’.• Menyajikan peta

Social Network Analysis (SNA) tentang bagaimana sebuah hoax berasal, menyebar, siapa influencers utama, dan siapa groupnya.

• Beberapa isu yang dianalisis: 10 Juta Tenaga Kerja China, dan Aleppo (ISIS).

LAPORAN UTAMA TEMPO, 2-8 Januari 2017

confidential

7

12 Januari 2017

KANTOR STAF PRESIDEN

Kasus: Isu hoaxmenyerang pemerintah tentang 10 Juta Tenaga Kerja China Illegal.

Media Kernels: • Menyajikan dua studi

kasus: 10 Juta tenaga kerja china illegal, dan sentimen negatif terhadap gerakan anti hoax.

• Menunjukkan timelineresonansi isu, dan peta percakapan dengan fitur SNA.

• Menunjukkan kurang efektifnya komunikasi pemerintah, dan apa yang bisa dilakukan untuk perbaikan.

FGD KEHUMASAN SELURUH KEMENTERIAN DAN LEMBAGA DI KANTOR STAF PRESIDEN (KSP)

confidential

8

22 Maret 2017

MATA NAJWA

Kasus: Virus Dusta (alias Hoax)

Nara Sumber:• Stanley (Dewan Pers)• Johan Budi (Stafsus

Presiden)• Boy Rafli (Humas Polri)• Ismail Fahmi (MK)• Septiaji & Khairul

Anshar (Masy. Anti Hoax)

Media Kernels: • Menyajikan analisis ttg

10 Juta Tenaga Kerja China Illegal.

• Hoax Panglima TNI vs PKI.

MATA NAJWA LIVE ‘VIRUS DUSTA’

Data is New Gold

9

10

6 Mei 2017

Data Collection: Gold = Expensive

11

Free Data

12

Twitter Analysis: World Eco. Forum 2016

13https://medium.com/@swainjo/wef16-davos-twitter-sna-analysis-4c38cf4bc46d

14

Arsitektur

15

MK Big Data Architecture

confidential

16

News Crawler

Twitter Crawler

Twitter Streaming

FB Page Crawler

Data Pipeline

Data

SOLR Indexer 1 SOLR Indexer 2 SOLR Indexer 3 SOLR Indexer 4

Hadoop Framework

Physical Hardware

Insight

Data Ing

est M

anagem

ent & Q

ueue

Realtime

Job

Processing

Google Custom Search

Database Framework

Scheduled

Job

Processing

Map Reduce

Sentiment

Analysis

Other

Processings

Data &

Workflow

M

anagem

ent

Access

Visualization

Other sources

Analytics UI

17

Social Media

Twitter

Facebook

Sear

ch +

JSO

N

Detik (ID)

Reuters (EN)

Etc..RSS

+ H

TML

Gatra (ID)

Bloomberg (EN)

Etc..

HTM

L

Kaskus

Detik Forum

Etc..

HTM

L

Online News

Forums

Twitter StreamJS

ON

Kompas

TEX

T

Warta Ekonomi

Etc..

Print

PUSH

JSO

NSubscriber

Projects Storage

Search + AccountCrawler

RSS + HTMLCrawler

HTML Crawler

HTML Crawler

SOLR NodesShard 1

SOLR NodesShard N

Index Servers

Redis Queue

Cache Manager

Mentions Storage

Keywords + Accounts Filters

deletes

Sentiment Analysis

Sentiment Models

Backtrack Filters

Sentiment Analysis

Analyses

Control Room Screens

Smart phones, tablets

Desktops

Client(s)

Converter

System Architecture

Fitur-fitur Media Kernels

confidential18

Trends

DASHBOARD

Comparison

Topic Map

NEWS PORTAL

Latest News

Media

ANALYTICS

News Sites

Page Ranks

Sentiment Analysis

PF-Chart

Engagement

Exposure

Retweets

TOPICS

Replies

Most Shared URLs

Most Shared Videos

Topic Map

Word Cloud

Impact

INFLUENCERS

Engagement

Reach

Most Engaged

Followers

Influencer Network

SNA

Topic Network

PR-Values

Reach

Hashtags Posts

Bubble Map

Twitter User Map

DEMOGRAPHY

User Locations

Edit Sentiments

MENTIONS

Training & Learning

Backtracking

Compare SNA

COMPARE

Compare Projects

Popularity vs Favorability

Background Jobs

Upload Report

REPORTING

Download Report

User Management

ADMIN

Project Management

Client Management

Source Management

Label and Training

OPINION ANALYSIS

Opinion Chart

Insight Explorer

News Crawler

19

Online News

20

Dan Ratusan Media Non-mainstream

Crawling Online News

21

Crawler Indeks Server

Web Crawler Tools

22

http://bigdata-madesimple.com/top-50-open-source-web-crawlers-for-data-mining/

Web Crawler Tools (2)

23

http://bigdata-madesimple.com/top-50-open-source-web-crawlers-for-data-mining/

Contoh: Scrapy.org

24

Web Crawler Drone Emprit

25

Bikin sendiri, powered by:

Anatomi: Metadata dan Fullteks

26

Ambil: Tanggal, judul, isi berita, penulis, url gambar

Buang:Iklan, daftar headline, komentar.

Twitter API

27

API: search/tweets

28

Contoh: Free Twitter Search

29

History: 7 daysStart search

100% results

API: Realtime (Sample)

30

Random SampleAll Statuses

Kurang dari 10%

API: Realtime (Filter)

31

API: Realtime (Filter)

32

Filtered StatusesAll Statuses

~ 100%

POST statuses/filterFilter max 400 keywords

Filter:Max 400 keywords

API: > 400 keywords?

33

All Statuses

Max 400 keywords

ServerIPAddr 1

ServerIPAddr 2

ServerIPAddr n

Max 400 keywords

Max 400 keywords

Twitter API Tools

34

Net::Twitter

Twitter API: Drone Emprit

35

Net::TwitterAnyEvent::Twitter::Stream

Facebook API

36

FB API (v1): Public Search

37

April 2014 à distop Facebook

FB API (v2): Searching

38

FB API (v2): Object

39

https://graph.facebook.com/$object_id/$type?fields=id,

parent_id,from,to,type,status_type,story,message,link,likes.summary(true),shares,comments.order(reverse_chronological).summary(true),created_time,updated_time

&order=reverse_chronological&access_token=$access_token&limit=$limit&until=$last_timestamp

$object_id = FB Page ID, etc

$type = [feed, comment, ...]

FB API Tools

40

Facebook::Graph

fb 0.4.0

FB API: Drone Emprit

41

WWW::Curl

Bikin sendiri, powered by:

Question: Perl or Python?

42

Of course!

Why Perl?

43

Perl yang menolong manusia setelah jatuh di

bumi, dan tentu lebih ‘nyunah’

Python yang bikin Adam-Hawa tergoda, lalu turun dari surga

Search Engine/Indexing

44

Full Text Indexing

45

Data Sources Search Engine

Full Text Search Engines

46

Search Engine: Drone Emprit

47

Simple - Powerful - Robust - Scalable

Solr Server Configuration

48

Sharding

49

Replication

50

Analytics

51

Analytics: Server Configuration

52

Slave Analysis Results

AnalysisProcesses

Analytics Engine

53

Search byKeywords

News, Twits, Statuses, etc

Sentiment Analysis

Opinion Analysis

Term Extraction

Segmentation

Quote Extraction

Named Entity Recognition

SearchResults

Paragraph Segmentation

54

NEWS ARTICLES MENTIONS

Sentiment Analysis

55

Sentiment Analysis

56

Positif

Negatif

Netral

?

MENTIONS

Sentiment Analysis

57

Positif

?

MENTIONS

Untuk Setya Novanto

Sentiment Analysis

58

Negatif?

MENTIONS

Untuk KPK

Sentiment Analysis

59

Netral

?

MENTIONS

Untuk Hakim Cepi Iskandar

Sentiment Analysis Techniques

60http://www.sciencedirect.com/science/article/pii/S2090447914000550

Evaluasi

61http://www.sciencedirect.com/science/article/pii/S2090447914000550

”one model for all” tidak bisa memberi label yang tepat untuk setiap subyek.

Lexicon base tergantung dari keberadaan kata dalam kamus sentimen, tidak bisa memberi label yang tepat untuk subyek yang berbeda.

Sentiment Analysis Tools

62

https://breakthroughanalysis.com/2012/01/08/what-are-the-most-powerful-open-source-sentiment-analysis-tools/

Text Mining Module

Sentiment Analysis: Drone Emprit

63

Adaptive Multiple Models

Training Data

64DOI: 10.1109/ICMLA.2015.22

81.000

Opinion Analysis

65

Kapolri: Opinion Analysis

66

Bersama DivHumas Polri di Kompas Petang

67

Fitur Opinion Analysis MK

68

Analisis Terhadap Statistik

69

Membaca Voice, bukan Noise

70

Analisis Terpengaruh Noise

71

Sayang, analisis berbasis ‘noise’ ini yang menjadi viral.

Opinion Analysis Techniques

72

Drone EmpritRegular Expression

Opinion Analysis

Quote Extraction

73

Quote Extraction

74

QUOTE QUOTE HOLDER

Quote Extraction: Drone Emprit

75

Pattern Matching dengan

Regular Expression

Named Entity Recognition

76

Named Entity Recognition

77

LOCATION PERSON ORGANIZATION

NER Tools

78

NER: Drone Emprit

79

Contoh NER

80

Clustering

81

Clustering

82

Clustering Types

83

Clustering Tools

84

http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm

Topic Map: Document Clustering

85

Social Network Analysis

86

SNA: Social Network Analysis

• SNA adalah pemetaan terhadap relasi antar orang, organisasi, topik, lokasi, dan entitas informasi lainnya.

• Node atau titik di dalam jaringan menggambarkan orang, organisasi, lokasi, atau entitas informasi.

• Garis sambungan antar titik menggambarkan relasi antar titik.

87

Betweenness Centrality

88

Betweenness Centrality: a measure of centrality.

Highest betweenness centrality(8 connections)

Lowest betweenness centrality(4 connections)

Anatomi Sebuah Twit

89

Anatomi Sebuah Twit

90

Relasi Retweet

91

Link Functions: Retweet / Mention

92

Retweet Network

94

Mention Network

Information Arbitrage

95

96

Information arbitrage: translateinformation across groups

Visualization

97

User Dashboard

98

Analysis Results

Slave

Visualization Tools

99

D3js.org

100

Drone Emprit is Hiring

101

System Administrator & Programmer

Terimakasih

102

Ismail Fahmi, PhDDrone EmpritPT Media Kernels IndonesiaEmail: ismail.fahmi@gmail.comHp: 0812 8908 3894

Recommended