Upload
francine-king
View
218
Download
4
Embed Size (px)
Citation preview
SCAN: a Scalable, Adaptive, Secure and Network-aware
Content Distribution Network
Yan Chen
CS DepartmentNorthwestern University
Motivation• The Internet has evolved to become a
commercial infrastructure for service delivery– Web delivery, VoIP, streaming media …
• Challenges for Internet-scale services– Scalability: 600M users, 35M Web sites, 2.1Tb/s– Efficiency: bandwidth, storage, management– Agility: dynamic clients/network/servers– Security: proliferate attacks/viruses/worms
• E.g., content delivery - Content Distribution Network (CDN)– Web delivery– Grid computing
How CDN Works
Challenges for CDN• Content Location
– Find nearby replicas with good DoS attack resilience– Dynamic, scalable semantic search
• Replica Deployment– Dynamics, efficiency– Client QoS (latency, coherence) and server capacity
constraints• Replica Management
– Replica index state maintenance scalability• Adaptation to Network Congestion/Failures
– Overlay monitoring scalability and accuracy• Security
– Proactive anomaly/intrusion detection on high-speed network
Provision: Dynamic Replication
+ Update Multicast Tree BuildingReplica Management:
(Incremental) Content Clustering
Network End-to-End Distance
Monitoring (latency & loss rate)
DHT-based Replica Location:
Network DoS Attack Resilient
& Semantic Search Support
SCAN: Scalable Content Access Network
Proactive Anomaly/Intrusion
Detection on High-speed Network
Replica Location (security)
• Existing Work and Problems– Centralized, Replicated and Distributed Directory
Services– No security benchmarking, which one has the best
DoS attack resilience?
• Solution – Proposed the first simulation-based network DoS
resilience benchmark– Applied it to compare three directory services– DHT-based Distributed Directory Services has best
resilience in practice
• Publication– 3rd Int. Conf. on Info. and Comm. Security (ICICS),
2001
Replica Location (semantic search)
• Existing Work and Problems– Mostly keyword/title based search– Emerging semantic search systems, but static,
unscalable
• Solution – Apply DHT to distribute the indices– Use “concept indexing” to incrementally grow the
semantic space => incrementally add new concepts & documents
– Group the indices based on semantic locality => semantic routing, better query accuracy and efficiency
Replica Placement & Coherence Support
• Existing Work and Problems– Static placement– Dynamic but inefficient placement– No coherence support
• Solution– Dynamically place close to optimal # of replicas
with clients QoS (latency) and servers capacity constraints
– Self-organize replica into a scalable application-level multicast for disseminating updates
– With overlay network topology only
• Publication– IPTPS 2002, Pervasive Computing 2002
• Existing Work and Problems– Cooperative access for good efficiency requires
maintaining replica indices– Per Website replication, scalable, but poor
performance– Per URL replication, good performance, but unscalable
• Solution– Clustering-based replication reduces the overhead
significantly without sacrificing much performance– Proposed a unique online Web object popularity
prediction scheme based on hyperlink structures– Online incremental clustering and replication to push
replicas before accessed• Publication
– ICNP 2002, IEEE J-SAC 2003
Replica Management
Adaptation to Network Congestion/Failures
• Existing Work and Problems – Latency estimation systems scalable, but cannot
monitor congestion/failures which require n2
measurement for n end hosts• Solution
– Tomography-based Overlay Monitoring (TOM) - selectively monitor a basis set of O(n logn) paths to infer the loss rates of other paths
– Works in real-time, adapts to topology changes, has good load balancing and tolerates topology errors
– Built an adaptive overlay streaming media system on top of TOM
– Root-cause diagnosis in progress• Publication
– Modeling: SIGCOMM IMC 2003 (extended abstract)– Full version under submission
• Existing Work and Problems – A/I detection requires flow-level traffic monitoring,
unscalable for high-speed network– Most IDS are signature-based, only for known attacks
• Solution– Leverage “K-ary sketch”, a compact probabilistic
summary of flow-level traffic, constant update/query cost, linearity
– Use statistical methods, like Hidden Markov Model (HMM) and time series analysis for proactive detection
– Profile characteristics of new apps to reduce false positive
• Publication– K-ary sketch: SIGCOMM IMC 2003
Proactive Anomaly/Intrusion Detection on High-speed
Network