View
219
Download
2
Category
Tags:
Preview:
Citation preview
VoIP DataIIIT Allahabad Margaret H. DunhamDepartment of Computer Science and EngineeringSouthern Methodist UniversityDallas, Texas 75275, USAmhd@lyle.smu.edu
Support provided by Fulbright Grant and IIIT Allahabad
IIIT
All
ah
ab
ad
1
VoIP Advantages• Travel• Cost reduction• Additional Features: Voice messages, call forwarding, logs,
caller ID, …• Integration of business tools• Common network infrastructure
IIIT
All
ah
ab
ad
4
Telephone-VoIP Steps• Analog Telephone Adapter (ATA) converts analog phone call to
digital signal.• Sent over internet as data packets.• Converted back to digital analog. II
IT A
lla
ha
ba
d
6
VoIP Codec• Software on server or ATA that converts voice signal into
digital data.• COmpressor – DECompressor• COder – DECoder• Sample (8000, 24000, 32000 times per second)• Sort • Compress• Packetize
IIIT
All
ah
ab
ad
7
Protocols• SIP (Session Initiation Protocol)• Signaling to set up and tear down sessions.
• SDP (Session Description Protocol) • Describe call
• RTP (Realtime Transport Protocol) • Exchange data/voice packets• Media Transport to transmit packets
IIIT
All
ah
ab
ad
8
SIP• Setup• Connect• Disconnect• Syntax similar to HTTP• Bind to IP address using SIP registration• URLs for address format: mhd@lyle.smu.edu• Independent of application or data types• Uses RTP and SDP
IIIT
All
ah
ab
ad
9
SIP Overview
http://www.voipmechanic.com/sip-basics.htm
IIIT
All
ah
ab
ad
10
VoIP Data• Any of this digital data could be saved and analyzed.• Typically only statistical/summary information about the calls
is saved• These Call Detail Records (CDR) are use for billing and analysis II
IT A
lla
ha
ba
d
12
Call Detail Record• Log of VoIP usage• May be by account• Typical attributes:• Source• Destination• Duration of call• Amount billed• Total usage time in billing period• Remaining time in billing period• Total charge in billing period
• The format of the CDR varies among VoIP providers or programs. Some programs allow CDRs to be configured by the user.
IIIT
All
ah
ab
ad
13
CDR Generation [3]• Usually created through special Authentication, Authorization,
and Accounting (AAA) server. • May also be created by logging capabilities at gateway or
router using a syslog server software.• Normally simply csv format.• Normally uses UDP, so underlying data packets are not
sequenced and may be lost (Redundancy of servers can help.)• Timestamps between routers can be synchronized using a
Network Time Protocol (NTP). • CDR generated for both forward and return leg of call.• http://
www.cisco.com/en/US/tech/tk1077/technologies_tech_note09186a0080094e72.shtml
IIIT
All
ah
ab
ad
14
Example: CISCO CDR Data• VoIP traffic in their Richardson, Texas facility from Mon Sep
22 12:17:32 2003 to Mon Nov 17 11:29:11 2003. • Over 1.5 million call trials were logged• 272,646 connected calls• 66 attributes including source, destination, starting time,
duration, routing/switching, device, etc• Application: Anomaly Detection (Classification)• Goal: Find unusual call patterns based on type and time of
call• Technique: New data structure, New classification
algorithm, New visualization technique• Sample of raw csv data:http://lyle.smu.edu/~mhd/iiit/start.csv
IIIT
All
ah
ab
ad
15
CISCO Preprocessing• Remove the attributes other than source, destination, starting
time, duration from the logs. • Count the connected calls and discard unconnected calls. • The total number of connected calls was 272,646.5 phone
classes: internal, local, national, international, unknown.• 25 link classes (source class + destination class)• Data is aggregated into 15 minute time intervals. • The total number of time points is 5422 and the total number
of attributes is 26.• Add two attributes, namely, type of day (workday or weekend)
and time of the day, to the processed data. This step gives a spatio-temporal cube in the model space.
• http://www.engr.smu.edu/~mhd/7331f08/CISCOEMM.xls
IIIT
All
ah
ab
ad
16
CISCO Data Visualization
IIIT
All
ah
ab
ad
http://www.lyle.smu.edu/~mhd/7331f11/CiscoEMM.png
17
IIIT
All
ahab
ad
Spatiotemporal Stream Data
Records may arrive at a rapid rateHigh volume (possibly infinite) of continuous dataConcept drifts: Data distribution changes on the flyData does not necessarily fit any distribution patternMultidimensionalTemporalSpatialData are collected in discrete time intervals,Data are in structured format, <a1, a2, …>Data hold an approximation of the Markov property.
18
IIIT
All
ahab
ad
Spatiotemporal Environment• Events arriving in a stream• At any time, t, we can view the state of
the problem as represented by a vector of n numeric values:
Vt = <S1t, S2t, ..., Snt>
V1 V2 … VqS1 S11 S12 … S1qS2 S21 S22 … S2q… … … … …Sn Sn1 Sn2 … Snq
Time 19
IIIT
All
ahab
ad
Data Stream Modeling• Single pass: Each record is examined at most once• Bounded storage: Limited Memory for storing synopsis• Real-time: Per record processing time must be low• Summarization (Synopsis )of data• Use data NOT SAMPLE• Temporal and Spatial• Dynamic• Continuous (infinite stream)• Learn• Forget• Sublinear growth rate - Clustering
20
20
IIIT
All
ahab
ad
MMA first order Markov Chain is a finite or countably infinite
sequence of events {E1, E2, … } over discrete time points, where Pij = P(Ej | Ei), and at any time the future behavior of the process is based solely on the current state
A Markov Model (MM) is a graph with m vertices or states, S, and directed arcs, A, such that:• S ={N1,N2, …, Nm}, and• A = {Lij | i 1, 2, …, m, j 1, 2, …, m} and Each arc,
Lij = <Ni,Nj> is labeled with a transition probability Pij = P(Nj | Ni).
21
IIIT
All
ahab
ad
Extensible Markov Model (EMM)• Time Varying Discrete First Order Markov Model• Nodes are clusters of real world states.• Learning continues during application phase.• Learning:• Transition probabilities between nodes• Node labels (centroid/medoid of cluster)• Nodes are added and removed as data arrives
22
IIIT
All
ahab
ad
EMM Creation
<18,10,3,3,1,0,0>
<17,10,2,3,1,0,0>
<16,9,2,3,1,0,0>
<14,8,2,3,1,0,0>
<14,8,2,3,0,0,0>
<18,10,3,3,1,1,0.>
1/3
N1
N2
2/3
N3
1/11/3
N1
N2
2/3
1/1
N3
1/1
1/2
1/3
N1
N2
2/31/2
1/2
N3
1/1
2/3
1/3
N1
N2
N1
2/21/1
N1
1
23
IIIT
All
ah
ab
ad
EMMRare• EMMRare algorithm indicates if the current input
event is rare. Using a threshold occurrence percentage, the input event is determined to be rare if either of the following occurs:• The frequency of the node at time t+1 is below
this threshold • The updated transition probability of the MC
transition from node at time t to the node at t+1 is below the threshold
24
References1. VoIP Mechanic, “What is VoIP?, a tutorial.” http://www.voipmechanic.com/what-is-voip.htm .2. Yu Meng, Margaret Dunham, Marco Marchetti, and Jie Huang, ”Rare Event Detection in a Spatiotemporal
Environment,” Proceedings of the IEEE Conference on Granular Computing, May 2006, pp 629-634.3. Cisco, “CDR Logging Configuration with Syslog Servers and Cisco IOS Gateways,” Document ID: 14068,
February 24, 2006, http://www.cisco.com/en/US/tech/tk1077/technologies_tech_note09186a0080094e72.shtml .
4. Cisco, “Voice Over IP – Per Call Bandwidth Consumption,” Document ID: 7934, February 2, 2008, http://www.cisco.com/en/US/tech/tk652/tk698/technologies_tech_note09186a0080094ae2.shtml .
5. “VoIPThink”, http://www.en.voipforo.com , Accessed February 1, 2012.6. Jie Huang, Yu Meng, and Margaret H. Dunham, “Extensible Markov Model,” Proceedings IEEE ICDM
Conference, November 2004, pp 371-374.7. Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a Dynamic Spatiotemporal,”
Proceedings of the IEEE PAKDD Conference, April 2006, Singapore. (Also in Lecture Notes in Computer Science, Vol 3918, 2006, Springer Berlin/Heidelberg, pp 750-754.)
8. Yu Meng and Margaret H. Dunham, “Mining Developing Trends of Dynamic Spatiotemporal Data Streams,” Journal of Computers, Vol 1, No 3, June 2006, pp 43-50.
9. Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a Dynamic Spatiotemporal,” Proceedings of the IEEE PAKDD Conference, April 2006, Singapore. (Also in Lecture Notes in Computer Science, Vol 3918, 2006, Springer Berlin/Heidelberg, pp 750-754.) (Extended version submitted to Journal of Computers.)
10. Yu Meng, Margaret Dunham, Marco Marchetti, and Jie Huang, ”Rare Event Detection in a Spatiotemporal Environment,” Proceedings of the IEEE Conference on Granular Computing, May 2006, pp 629-634.
IIIT
All
ah
ab
ad
27
Recommended