National Institute of Statistical Sciences
Affiliates
Workshop on
Modeling and Analysis of Network Data
March 9-10, 2001
NISS
Headquarters
Research Triangle Park, NC
Abstracts
Session I: Network Measurement and Monitoring
Global DNS Nameserver Performance
Analysis using NeTraMet
Nevil Brownlee, CAIDA
There are many ways in which network measurements can be carried out, active (sending test packets) or passive (watching actual traffic), using archived data (trace files) or near-realtime. NeTraMet is a system for measuring traffic flows on the Internet. It is an open-source implementation of the IETF's RTFM (Realtime Traffic Flow Measurement) architecture.
The Domain Name System can be viewed as having three levels: host, local nameserver, global nameserver. Enterprise networks have local nameservers, which query the global nameservers to resolve subdomains of the top-level domains. Most local nameservers run BIND, which distributes its queries across all of the global nameservers.
We use NetTraMet meters to collect data on DNS requests and we responses. For a chosen set of local nameservers we collect 5-minute samples of request counts, lost request percentage (no matching response), and median response time for DNS queries of all the global nameservers.
Strip charts of these metrics allow one to see performance variations in individual servers (affecting a single server), and in the Internet paths to them (affecting several servers). They provide a useful tool for monitoring a site's global connectivity.
Detailed plots of these metrics, together with total traffic rate on a site's Internet connection(s), are also discussed. These can yield some understanding of the way variations in traffic rate and/or request rate can produce corresponding variations in response time and loss percentage.
PackMime: An Internet Traffic Generator
Jin Cao, William S. Cleveland, Dong Lin, and Don X. Sun
Bell Labs, Lucent Technologies
Overview: PackMime, a closed-loop IP traffic model, generates
synthetic packet traffic that mimes the behavior of live aggregate Internet
packet traffic. The PackMime architecture consists of two parts: (1) statistical
models that stochastically generate the aggregate request traffic for TCP
connections for each of a collection of clouds of hosts
and each application protocol; and (2) a simulated network carrying packets
resulting from the generated connections. The traffic for each cloud enters
and exits the network at some node of the network. Actual TCP implementations
from the BSD kernel are used to deliver the packets across the simulated network
and provide TCP feedback.
Statistical Models: The models generate connection request
variables (as time sequences) for each cloud and for each application protocol
whose traffic is carried on the simulated network. For HTTP 1.0, the connection
variables are request times, client request file sizes, downloaded server
file sizes, round-trip times between the clients and the network, round-trip
times between the servers and the network, server delays, and bottleneck link
speeds (used not to model bottlenecks but rather to create packet jitter).
All of these variables exhibit long-range
dependence that decreases to independence as the traffic request load increases.
The inter-arrival times have a Weibull distribution with an upper tail that
is longer than the exponential but that tends to the exponential as the load
increases. For each variable, a very simple fractional sum-difference (FSD)
time series model-class, together with
transformations of the variables, model the long-range dependence. The FSD
model parameters and the Weibull shape and scale parameters change with the
load, which is measured in new connections/second, and simple functions of
load model the parameter changes.
Input: To run PackMime, one needs to specify the network, the clouds, and the application request loads for each cloud.
Empirical Study: The basis of PackMime is extensive analysis
of packet header databases collected on live Internet wires. There are two
types of analysis. (1) The statistical models have been built and validated
based
on analyses of the databases. (2) PackMime performance in generating synthetic
packet traffic has been evaluated based on analyses of the the live packet
traffic of the databases. For this latter task we have developed a suite of
statistical tools for analyzing packet processes: packet inter-arrival times
and packet sizes. The tools are applied to the live header databases and to
similar databases of headers from the PackMime synthetic traffic, and the
results are compared. In particular, PackMime reproduces our finding that
Internet packet arrivals on a link tend to Poisson and packet sizes tend to
independence as the request load increases.
S-Net: Empirical analysis is carried out using S-Net, an Internet traffic measurement and analysis system that begins with packet header collection on network wires, and ends with data analysis on a cluster of linux PCs running S, a language and system for organizing, visualizing, and analyzing data. S-Net allows detailed analysis of packet header information, achieved in part by extensive visualization. We have studied two header databases collected by our Bell Lab team and our Helios team, an NGI project based at the NCNI network (about a terabyte of compressed headers); and we have studied databases collected by a Harvard team and by the NLANR team.
PackMime Applications: PackMime has been used to study performance issues on a single link carrying HTTP 1.0 traffic. A PackMime queueing analysis shows that as the traffic load increases, queueing tends to that of packet traffic with Poisson arrivals and independent service times. We have used PackMime to provision the link speed of the wire that connects our Bell Labs research network to the rest of the Internet through studying the HTTP 1.0 transfer time distribution and packet loss as the link speed changes. This replaces the over-provision method that has been used historically on the link, which seems today unattractive because it has become less acceptable to waste money.
Status: We believe we have amply demonstrated that the PackMime approach succeeds. But (1) we need to expand the scope, and (2) study further certain model features.
For more information see http://cm.bell-labs.com/stat/InternetTraffic.
Bandwidth and Congestion
Measurement via Remote Monitoring
Sam Weerahandi, Telcordia Technologies
The paper deals with the problem of estimating the physical bandwidth and the used bandwidth of a link in an IP network, particularly the Internet. We carry out the bandwidth estimation based on active measurements taken from a remote monitor. This type of estimation is useful in situations where one needs to know the Quality of Service (QoS) of a leased network or a third party network that affects the QoS of own customers and it is not possible to obtain passive measurements from devices of such networks. Like the well known tool PathChar, our method also determines the bandwidth between any two nodes in such a network by taking various delay measurements from the remote monitor to the end points of the link. Unlike PathChar, however, we do not require estimation of bandwidth for all hops leading to the link of interest, thus minimizing the traffic injected for the purpose of estimation. Moreover, in addition to the physical bandwidth, we also estimate the delay contribution of the hop and used bandwidth near real-time.
Delay data are measured as the size of the packet vary over a reasonable range. The data obtained in this manner is then statistically analyzed to estimate both the physical bandwidth and the used bandwidth. A model appropriate for bandwidth estimation will be presented along with an overview of derivations. Problem of estimating parameters of bandwidth models will also be discussed. Some results comparing the performance of bandwidth estimation using our methodology and that of PathChar will be presented and additional capabilities of our solution will be discussed.
Internet Topology
Andre Broido and kc claffy, CAIDA
We describe fundamentals of analyzing Internet graphs at various layers, and
problems in gathering and analyzing
Internet routing and topology data in graph form. Using skitter topology data
and Oregon Route Views BGP table data, we present IP graphs, their connected
components, and the combinatorial core of Internet topology. We discuss techniques
for assessing which portions of the global Internet are characterized by the
highest degree of `connectivity', both in central/backbone and access/delivery
components of the topology. We describe
several combinatorial approaches, including:
Our analysis introduces several new concepts for graph-theoretic routing and topology analysis:
We will also present tables showing the top 50 `most connected' nodes, at various
levels of Internet node granularity. We cover distributions of AS path and prefix
lengths, use of prepending, and outdegree vs indegree of ASes. Time permitting,
some newer results regarding diffusion on Internet graphs and nodes' geodesic
load will
be discussed.
Background material is available at:
Session II: Network Performance Control
Scheduling Your Network
Connections
Mor Harchol-Balter, Carnegie Mellon University
This talk proposes a method for improving the performance of Web servers servicing static HTTP requests. The idea is to give preference to those requests which are quick, or have small remaining processing requirements, in accordance with the SRPT (Shortest-Remaining-Processing-Time) scheduling policy.
The implementation is at the kernel level and involves controlling the order in which socket buffers are drained into the network. Experiments use the Linux operating system and the Apache and Flash web servers.
Empirical results indicate that SRPT-based scheduling of connections yields significant reductions in mean response time, mean slowdown, and variance in response time at the Web server. Most importantly, and counter to intuition, the large requests are not at all (or hardly) penalized by SRPT-based scheduling.
To explain the above results, we present some of our new theorems on SRPT scheduling under workloads with a heavy-tailed property. These theorems motivate our implementation work.
Duality Model of TCP and
Queue Management Algorithms
Steven Low, California Institute of Technology
We describe a duality model of congestion control and apply it
to TCP and active queue management schemes.
Congestion control is the interaction of source rates with certain congestion
measures in the network. The basic idea is to regard source rates as primal
variables and congestion measures as dual variables, and congestion control
as a Lagrangian method that iterates on source rates and congestion measures
to maximize aggregate source utility subject to capacity constraint. In TCP,
the primal iteration is carried out by source algorithms such as Reno or Vegas,
and the dual iteration is carried out by queue management such as DropTail,
RED or REM. We present these algorithms and their generalizations, derive their
utility functions, and study their interaction.
On Dynamic Control
of Stochastic Networks:
Fluid and Higher Order Approximations
Ruth J. Williams, University of California, San Diego
This talk will give a partial survey and present some recent developments
concerning approximate models for stochastic networks, where the networks allow
for flexible scheduling of resources through dynamic (state-dependent) alternate
routing and sequencing. Fluid and higher order approximations to network dynamic
control
problems will be discussed. In particular, connections between these levels
of approximation will be described. Situations in which the network is congested
or heavily loaded will receive particular emphasis. Some examples that illustrate
resource pooling effects will be given.
Session III: Internet Traffic Models
Statistical Resource
Sharing in the Presence of Heavy Tails
Predrag Jelenkovic, Columbia University
Statistical sharing of network bottleneck resources, e.g., bandwidth, processing power and buffer space, represents a common way of increasing the capacity of network transmission elements. This sharing allows for temporary high demands of a subset of users to be accommodated due to lower short-term needs of the remaining clients. While the benefits of sharing were well understood for exponential traffic models, the impact of heavy tails and long-range dependence has just been uncovered. Our results quantify these benefits in the case of baseline On-Off transmission demands. In addition, the explicit nature of the results explains the tradeoff between bandwidth / processing power and buffer space that might be a useful guideline for future network designs.
On Understanding
Large-Scale TCP/IP Networks:
A Lesson in Internet Traffic Modeling
Walter Willinger, AT&T Labs Research
Recent empirical discoveries concerning various scaling properties of the temporal dynamics of Internet traffic or of some of the topological features associated with the physical structure of the Internet have resulted in a number of proposed models or "explanations" of these ``emergent'' phenomena. In this talk, I will discuss a simple framework for identifying those explanations that are relevant for the networking application at hand and can provide a basis for developing a useful, consistent, and verifiable theory of large-scale TCP/IP networks. In turn, I will also show why some proposed models and ``explanations'' are irrelevant, at least as far as the networking context is concerned.
Multiplicative Modeling
of Network Traffic
Rolf Riedi, Rice University
Recent studies have made it clear that network traffic is `multifractal' in nature. Using moments of (possibly) all orders multifractal analysis does justice to the positive, non-Gaussian marginals of traffic traces and goes beyond the second order approximations of linear processes.
Multifractal structures are naturally tied to multiplicative frameworks. Indeed, a simple stochastic traffic model based on the Haar wavelet demonstrates the effectiveness of a multiplicative approach and its superiority over the typical additive schemes, both in multifractal statistics capturing `burstiness' as well as in network relevant metrics. Due to the inherent tree structure, wavelet based models are not second order stationary. To overcome this disadvantage this talk explores novel, stationary multifractal processes and the statistical challenges they present.
View/Download PDF Version of Program and Abstracts
