225 lines
10 KiB
Plaintext
225 lines
10 KiB
Plaintext
README.txt
|
|
|
|
Author: Murali R. Krishnan (MuraliK)
|
|
Created: Jan 6, 1997
|
|
|
|
Revisions:
|
|
Date By Comments
|
|
----------------- -------- -------------------------------------------
|
|
|
|
|
|
Summary :
|
|
This file describes the files in the directory svcs\infocomm\atq
|
|
and details related to ISATQ - Internet Services Async Thread Queue module
|
|
|
|
|
|
File Owner Description
|
|
|
|
README.txt MuraliK This file.
|
|
abw.hxx MuraliK Bandwidth throttler declarations
|
|
abw.cxx MuraliK Bandwidth throttler for ATQ
|
|
acache.cxx MuraliK Alloc Cache module
|
|
atqbmon.cxx MCourage Listen backlog monitor
|
|
atqbmon.hxx MCourage Listen backlog monitor header
|
|
atqcport.cxx JohnsonA Fake Completion port for Win95
|
|
atqcport.hxx JohnsonA Fake Completion port for Win95 header
|
|
|
|
atqendp.cxx MuraliK Atq Endpoint manager
|
|
atqmain.cxx MuraliK Exposed ATQ entrypoints
|
|
atqprocs.hxx MuraliK Internal Function Prototypes
|
|
atqsupp.cxx MuraliK Atq Support Functions - timeout, thread pool, etc.
|
|
atqtypes.hxx MuraliK Atq Internal Types
|
|
|
|
atqxmit.cxx JohnsonA Internal routines for TransmitFile()
|
|
auxctrs.hxx MuraliK Auxiliar counters - for internal analysis
|
|
dbgutil.h MuraliK Debug support definitions
|
|
dllmain.cxx MuraliK Dll Entry points
|
|
isatq.def MuraliK .def file
|
|
|
|
isatq.hxx MuraliK pre-compiled header file
|
|
isatq.rc MuraliK Resource file
|
|
sched.cxx MuraliK IIS Scheduler - internal thread pool for scheduling
|
|
sched.hxx MuraliK Scheduler data structures
|
|
|
|
timeout.cxx MuraliK ATQ Contexts Timeout Logic
|
|
timer.cxx MuraliK Time measurement support code
|
|
xmitnt.cxx JohnsonA obsolete file - replaced by atqxmit.cxx
|
|
|
|
----------------------------------------------------------------------
|
|
|
|
Implementation Details
|
|
|
|
Contents:
|
|
|
|
ATQ based Bandwidth Throttle
|
|
Author: MuraliK
|
|
Date: 25-May-1995
|
|
|
|
Goal:
|
|
Given a specified bandwidth which should be used as threshold,
|
|
the ATQ module shall throttle traffic, gracefully. Minimum CPU impact
|
|
should be seen; Minor variations above specified threshold is
|
|
allowed. Performance in the fast cause (no throttle) should be high
|
|
and involve less stuff in the critical path.
|
|
|
|
Given:
|
|
M -- an administrator specified bandwidth which should not be
|
|
exceeded in most cases. (assume to be specified through a special API
|
|
interface added to ATQ module)
|
|
|
|
Solution:
|
|
Various solutions are possible based on measurements and metrics
|
|
chosen. Whenever two possible solutions are possible, we pick the
|
|
simplest one to avoid complexity and performance impact. (Remember to
|
|
K.I.S.S.)
|
|
|
|
Sub Problems:
|
|
1) Determination of Exisiting Usage:
|
|
At real time determining existing usage exactly is computationally
|
|
intensive. We resort to approximate measures whenever possible.
|
|
Idea is: Estimated Bandwidth = (TotalBytesSent / PeriodOfObservation).
|
|
|
|
solution a)
|
|
Use a separate thread for sampling and calculating the
|
|
bandwidth. Whenever an IO operation completes (we return from
|
|
GetQueuedCompletionStatus()), increment the TotalBytesSent for the
|
|
period under consideration. The sampling thread wakes up at regular
|
|
intervals and caclulates the bandwidth effective at that time. The solution
|
|
also uses histogramming to smooth out sudden variations in the bandwidth.
|
|
This solution is:
|
|
+ good, since it limits complexity in calculating bandwidth
|
|
- ignores completion of IO simultaneously => sudden spikes are possible.
|
|
- ignores the duration took for actual IO to complete (results could be
|
|
misleading)
|
|
- requires separate sampling thread for bandwidth calculation.
|
|
|
|
solution b)
|
|
This solution uses a running approximation of time taken for
|
|
completing an i/o of standard size viz., 1 KB transfer. Initially we start
|
|
with an approximation of 0 Bytes sent/second (reasonable, since we just
|
|
started). When an IO completes, the time taken for transfer then is
|
|
calculated from the count of bytes sent and time required from inception to
|
|
end of IO. Now we do a simple average of existing approximation and the
|
|
newly caculated time. This gives the next approximation for bandwidth/time
|
|
taken. Successively the calculations refine the effective usage measurement
|
|
made. (However, we must note, by so simplifying, we offset ourselves from
|
|
worrying about the concurrency in IO processing.) In case of concurrent
|
|
transfers time taken for data transfer is larger than the actual time only
|
|
for the particular transfer. Hence, the solution makes conservative
|
|
estimates based on this measured value.
|
|
|
|
+ no separate thread for sampling
|
|
+ simple interface & function to calculate bandwidth.
|
|
- avoids unusaual spikes seen in above solution.
|
|
|
|
|
|
2) Determination of Action to be performed:
|
|
The allowed operations in ATQ module include Read, Write and
|
|
TransmitFile. When a new operation is submitted, we need to evaluate if it
|
|
is safe(allow), marginally safe(block) or unsafe(reject) to perform the
|
|
operation. Evaluation of "safety"ness is tricky and involves knowledge
|
|
about the operations, buffers used, CPU overhead for the operation setup,
|
|
and estimated and specified bandwidths.
|
|
Assume M and B as specified and estimated bandwidths respectively. Let
|
|
R,W, and T stand for the operations Read, Write and TransmitFile. In
|
|
addition assume that s and b are used as suffixes for small and big
|
|
transfers. Definition of small and big are arbitrary and should be fixed
|
|
empirically. Please refer the following table for actions to be performed.
|
|
|
|
Action Table:
|
|
------------------------------------------------------------------------------
|
|
\ Action |
|
|
Bandwidth\ to be | Allow Block Reject
|
|
comparison\ Done |
|
|
------------------------------------------------------------------------------
|
|
M > B R,W,T - -
|
|
|
|
M ~= B W, T R -
|
|
(approx. equal) (reduces future traffic)
|
|
|
|
M < B Ws, Ts Wb, Tb R
|
|
(reject on LongQueue)
|
|
|
|
------------------------------------------------------------------------------
|
|
|
|
Rationale:
|
|
case M > B: In this case, the services are not yet hitting the limits
|
|
specified, so it is acceptable to allow all the operations to occur without
|
|
any blockage.
|
|
|
|
case M ~= B: (i.e. -delta <= |(M - B)| <= +delta
|
|
[Note: We use approximation, since exact equal is costly to calculate.]
|
|
At this juncture, the N/w usage is at the brink of specified bandwidth. It
|
|
is good to take some steps to reduce future traffic. Servers operate on
|
|
serve-a-request basis -- they receive requests from clients and act upon
|
|
them. It is hence worthwhile to limit the number of requests getting
|
|
submitted to the active queue banging on the network. By delaying the Read,
|
|
processing of requests are delayed artificially, leading to delayed load on
|
|
the network. By the time delayed reads proceed, hopefully the network is
|
|
eased up and hence server will stabilise. As far as write and transmit
|
|
goes, certain amount of CPU processing is done and it is worthwhile to
|
|
perform them, rather than delaying and queueing, wasting CPU usage.
|
|
|
|
Another possibility is: Do Nothing. In most cases, the load may be coming
|
|
down, in which case the bandwidth utilized will naturally get low. To the
|
|
contrary allowing reads to proceed may result in resulting Write and
|
|
Transmit loads. Due to this potential danger, we dont adopt this solution.
|
|
|
|
case M < B:
|
|
The bandwidth utilization has exceeded the specified limit. This is an
|
|
important case that deserves regulation. Heavy gains are achieved by
|
|
adopting reduced reads and delaying Wb and Tb. Better yet, reads can be
|
|
rejected indicating that the server is busy or network is busy. In most
|
|
cases when the server goes for a read operation, it is at the starting
|
|
point of processing any future request from client (exception is: FTP
|
|
server doing control reads, regularly.) Hence, it is no harm rejecting the
|
|
read request entirely. In addition, blocking Wb and Tb delays their impact
|
|
on the bandwidth, and brings down the bandwidth utilization faster than
|
|
possible only by rejecting Reads. We dont want to reject Wb or Tb, simply
|
|
because the amount of CPU work done for the same may be too high. By
|
|
blocking them, most of the CPU work does not go waste.
|
|
|
|
|
|
Implementation:
|
|
To be continued later.
|
|
|
|
|
|
The action table is simplified as shown below to keep the implementation
|
|
simpler.
|
|
|
|
Action Table:
|
|
------------------------------------------------------------------------------
|
|
\ Action |
|
|
Bandwidth\ to be | Allow Block Reject
|
|
comparison\ Done |
|
|
------------------------------------------------------------------------------
|
|
M > B R,W,T - -
|
|
|
|
M ~= B W, T R -
|
|
(approx. equal) (reduces future traffic)
|
|
|
|
M < B W, T R
|
|
------------------------------------------------------------------------------
|
|
|
|
Status and Entry point Modifications:
|
|
|
|
We keep track of three global variables, one each for each of the
|
|
operations: Read, Write and XmitFile. The values of these variables
|
|
indicate if the operation is allowed, blocked or rejected. The entry points
|
|
AtqReadFile(), AtqWriteFile() and AtqXmitFile() are modified to check the
|
|
status and do appropriate action. If the operation is allowed, then
|
|
operation proceeds normally. If the operation is blocked, then we store
|
|
the context in a blocked list. The parameters of the entry points, which
|
|
are required for restarting the operation are also stored along with
|
|
context. The operation is rejected, if the status indicates rejection. All
|
|
these three global variables are read, without any synchronization
|
|
primitives around them. This will potentially lead to minor
|
|
inconsistencies, which is acceptable. However, performance is improved
|
|
since there is no syncronization primitive that needs to be accessed.( This
|
|
assertion however is dependent upon SMP implementations and needs to be
|
|
verified. It is deferred for current implementation.)
|
|
|
|
|
|
|
|
|