NT4/stat at master - NT4

History
Microsoft ce47f986d8 First commit		2020-09-30 17:12:29 +02:00
..
i386	First commit	2020-09-30 17:12:29 +02:00
mips	First commit	2020-09-30 17:12:29 +02:00
win32/src	First commit	2020-09-30 17:12:29 +02:00
makefile	First commit	2020-09-30 17:12:29 +02:00
readme	First commit	2020-09-30 17:12:29 +02:00
sources	First commit	2020-09-30 17:12:29 +02:00
stat.c	First commit	2020-09-30 17:12:29 +02:00
statw32.def	First commit	2020-09-30 17:12:29 +02:00
readme

This directory and its subdirectory contain source and 
binary files for the statitics support packages that can
be run across multiple platforms.  

The directory is organized as follows:

1) .\ ->  
          a) stat.c       (single c source)

          b) makefile.rst (common for windows, OS2 16, and OS2386)  

          c) makefile     (for Windows NT)

          d) sources      (sources file for Windows NT)

          e) The header file, teststat.h, required for building 
             the dlls is under ..\inc\.
  
2) .\win  (FOR WINDOWS)

          a) .\src      (contains the remaining .asm file, and the 
                         module def file)

          b) .\bin      (the binary statwin.dll file)

3) .\WIN32  (FOR WIN32 APPS)

          a) .\src      (contains the .def file and an i386 sub-dir.
                         and a mips subdir, each containing an asm file) 

          b) .\bin      (the binary file)

4) .\os2286 (FOR 16 bit OS/2 Cruiser and Sloop apps)

          a) .\src      (the module def file)

          b) .\bin      (the binary stat286.dll file)
         
5) .\os2386 (FOR 32 bit OS/2 Cruiser apps)

          a) .\src      (the module def file)

          b) .\bin      (the binary stat386.dll file)

**********************************************************************

To build an application that uses a statistics DLL:
--------------------------------------------------

To use one of the above binaries, please read the USAGE NOTES at the
end of this document.  Please copy the teststat.h file from this
directory to the directory where you are building your application.
Copy the relevant .dll to your libpath.

It is essential that you define the type of system you are building
your application for, since the header file uses some special types
that are dependent on the system.  While compiling your application,
add the following flag: -DXXX where XXX stands for one of: 

          WIN       - for Windows applications
          OS2286    - for 16 bit OS/2 applications
          OS2386    - for 32 bit OS/2 applications
          WIN32     - for Win32 applications.
**********************************************************************

To build one of the dlls:
------------------------

If building a Windows, OS2 16 or OS/2 32 bit dll:
--------------------------------------------------------

a)   Copy the stat.c file found under this directory
     and teststat.h from ..\inc\. to a local directory.  
     Copy the .asm file from win\src if building for WIN.

b)   Also copy the "makefile.rst" from here to the same directory.  

c)   From ???\src copy the remaining files to the local directory, 
     where ??? represents win, os2286 or os2386.

d)   Edit "makefile.rst" to define the system that you are making the
     dll for.  Eg. if you are making the dll for windows, remove
     the comment sign (#) from the line "WIN=TRUE" in the makefile
     and ensure that the other system defines (OS2286 and OS2386)
     are commented out.  

f)   Type "nmake -f makefile.rst" and the dll will be created for
     you.   (Ensure that your development environment is set up for 
     the right system).

If building the Win32 dll:
-------------------------

a)   Copy stat.c, makefile and sources files found under this 
     directory and teststat.h from ..\inc\. to a local directory.  

b)   tc the win32\src directory to your local directory.  This 
     will create an i386 (or mips) sub-directory containing an asm
     file on your local machine.

c)   From the directory where you have your sources file, type
     "build -xxx statw32" from the command line, where xxx represents
     your target system.  It is 386 by default.

d)   A binary file "statw32.dll" will be created along with the
     .obj file under .\xxx\obj where xxx is your target system.
     It is i386 by default.

In case you have any questions, or if you run into any problems,
contact vaidy (936-7812).

*****************************************************************

USAGE NOTES
-----------

This  is   the  user's  guide  to  using  TestStat.dll,  the
statistical package.  In case of questions contact vaidy (936-
7812).

This document  describes the use of  each  of  the  functions
available through  this module  and then demonstrates the use
of these routines through an example.

This module provides basic statistical routines which can be 
used to compute average, min, max, standard deviation, and 
statistical convergence.  Statistical convergence of the 
average is determined by the number of test iterations required 
for the average to converge to a "stable" value.  The number of 
iterations required is computed on the fly as the data is 
collected, so that the caller is informed when enough data is 
collected (the average is stable).  Stable averages obtained in 
this way can be compared to other stable averages obtained under
different experimental conditions with known confidence levels.  
Notes in this document describe the meanings of stability and 
confidence more formally.

In addition to the functionlity described above, this module
also provides routines that generate normally distributed 
random numbers.  Three routines are provided that return
random numbers within a specified boundary, a set of uniformly
distributed random numbers within the range 0 to 1 and a normally
distributed set of numbers around a mean, which satisfy a given
mean and standard deviation.
 
1)  TestStatOpen:
    -------------

Description:  Allocates an instance data array for the data set 
and other global data structures required by the high level 
functions.

    USHORT FAR PASCAL
    TestStatOpen (
        USHORT usMinIterations, 
        USHORT usMaxIterations
        );

    usMinIterations  - The minimum number of iterations
        that the calling application has to run before
        the convergence algorithm may be used.  

    usMaxIterations - The maximum number of iterations
        that the test program may run.  The maximum
        acceptable value is 64K.  An internal data array
        of usMaxIterations of ULONGs is allocated.  The
        caller should bear this in mind when setting this
        parameter.

Remarks:   This routine  should be  called before  the first
call to TestStatInit.  If usMinIterations is zero, an error 
code is returned.  If usMinIterations is greater than
usMaxIterations an error code is returned.  If usMinIterations 
is equal to usMaxIterations, TestStatConverge will return TRUE 
after that many iterations.  This function frees the caller 
from the responsibility of allocating any data storage or 
book-keeping.  

Return Value:   0 if the call succeeded.
                An error code indicating a failure.  The 
                error code may be one of:

                    STAT_ERROR_ILLEGAL_MIN_ITER
                    STAT_ERROR_ILLEGAL_MAX_ITER
                    STAT_ERROR_ALLOC_FAILED

See also:       TestStatInit, TestStatConverge, TestStatValues,
                TestStatClose.

2)  TestStatInit:
    -------------

Description: Initializes variables required by the convergence 
and statistics routines.

    VOID FAR PASCAL
    TestStatInit (
        VOID
        );

Remarks:  This routine should be called before the first
call to TestStatConverge and after each call to 
TestStatValues, if you want to converge on a new set of 
data. 

Return Value:   None.

See also:       TestStatOpen, TestStatClose, TestStatConverge, 
                TestStatValues.


3)  TestStatConverge:
    -----------------

Description:  Automatically computes number of iterations
required for 95% confidence in data obtained.  

    BOOL FAR PASCAL 
    TestStatConverge (
        ULONG ulNewDataPoint,
        );

    ulNewDataPoint  -  The  data  point  obtained  for  the
        current iteration.

Remarks:  This routine should be called for each
iteration of the test.  The first call to this routine
should be preceded by a call to TestStatInit.  The test
program should check for the return value and should stop
the test as soon as a TRUE is returned.  

In making tests of significance, sometimes errors will be 
encountered in the results concerning an hypothesis tested.  
The hypothesis is that the difference between the actual 
mean in one experiment and the actual mean in a second 
experiment is less than a specified value.  This 
difference is expressed as a percentage of the first
experiment's mean.  We call this difference, the "precision"
of the comparison.

If the assumption is true and the results of the tests leads one 
to believe that it is false, the condition is described as a
TYPE I error.  If the assumption is false and the test results 
show that the two means are within the prescribed difference, the
condition is described as a TYPE II error.

The probability of TYPE I error is set by the significance level 
of the test.  Choosing a small probability of one type of error, 
increases the probability of the other type.  

The routines in this module operate on the following set of
assumed parameters:

     95% confidence that if the means differ by less than 5% they 
     are really the same, and 85% confidence that if the means 
     differ by more than 5% that they are really different.
 
The algorithm in this module uses these assumptions to determine
the number of iterations needed to achieve these levels of
confidence.

The reason for emphasizing TYPE II error is that a TYPE I error
indicates that the means differ, when in fact, they are the same.
If they differ, we will usually explore why, and in doing so, will
discover that they are not really different after all.  If, on the 
other hand, we get a TYPE II error, then this means that the
results show no difference, whereas the means really are 
different.  This is to be avoided since if means don't differ
from one run to the next, we are unlikely to look further into
the problem.

When additional iterations are forced by a high usMinIterations, 
then the resulting precision will usually be less than 5%.  
Conversely, when usMaxIterations are reached without converging, 
then the precision will be greater than 5%.  The precision returned
by TestStatValues will indicate how meaningful the comparisons of 
two means will be.

Return Value: FALSE if further iterations are required for
              the test to converge or usMinIterations has
              not been reached.

              TRUE if already converged or maximum limit 
              on iterations has been reached.

See also:     TestStatOpen, TestStatInit, TestStatValues, 
              TestStatClose.

4)  TestStatValues:
    ----------------

Description:   Automatically computes  a  number  of  useful
statistical  values for a given set of data.

    VOID FAR PASCAL 
    TestStatValues (
        PSZ pszOutputString,
        USHORT usOutlierFactor,
        PULONG * pulDataArray,
        PUSHORT pcusElementsInArray,
        PUSHORT pcusDiscardedValues,
        );

     pszOutputString - A pointer to a string buffer to which
         output data may be returned.  The minimum size of
         the buffer should be 81 bytes.  The string will
         be a NULL terminated ascii string.

     usOutlierFactor - Factor that defines the range of
         acceptable data values.  A value of zero will ignore
         this factor and all data will be considered
         valid.
 
     pulDataArray - A pointer to the data array.  If the outlier
         factor has been chosen, this array has as many elements
         as there were good data points in the data set.  Else,
         all the data points are contained in the data array.

     pcusElementsInArray - The number of elements in the array
         pointed to by puDataArray.

     pcusDiscardedValues - pointer to the number of data
         points discarded based upon the outlier factor.

Remarks:  This routine should be called only once for
each test, normally after TestStatConverge has returned 
TRUE.  Any call to this should be followed by a call to
TestStatInit before the next call to TestStatConverge, 
if you want to converge on a new set of data.  The Outlier 
factor decides the range of acceptable values in the data set.  
The format of the returned string will be (as in C):

     "%4u %10lu %10lu %10lu %6u %5u %10lu %4u %2u ".

These represent the mode number, mean, minimum, maximum, 
the number of iterations completed, the precision, the standard 
deviation, number of points discarded, and, the outlier factor 
from the data set.  The mode number will always be zero.  

The precision will be 5% in case test results converged before
the limit on the maximum iterations is reached.  Otherwise,
it returns the precision of the results gathered.

The precision value in this case assumes that the Type I error
and Type II error probabilities are 85% and 95% respectively.

The outlier factor determines along with the standard deviation
any abnormal data points.  Any data point that does not satisfy:

 [Mean - (SDev * OF)]  < Data Point < [Mean + (SDev * OF)],

where SDev is the standard deviation computed with good data
points and  OF is  the outlier  factor, is  left out  in the
statistics  computation.     The   standard   deviation   is
recomputed and  this process  is repeated until there are no
abnormal data  entries in  the data  set.    The  number  of
outliers that were discarded is also returned to the calling
program.   To ignore  the outlier factor and this process of
elimination, the  outlier factor  may be  set to  zero.  
Otherwise, the outlier factor should be at least 2 in order 
for the results to be meaningful.

Return Value: None

See also:     TestStatOpen, TestStatClose, TestStatInit, 
              TestStatConverge

5)  TestStatClose:
    -------------

Description:  Deallocates instance data structures and
all memory allocated by TestStatOpen and TestStatInit.

     VOID FAR PASCAL
     TestStatClose (
         VOID
         );

Remarks:   This routine  should be  called after  the last
call to TestStatValues.   A call to this must be followed by a 
call to TestStatOpen and TestStatInit, in that order, before
the application calls TestStatConverge and TestStatValues.

Return Value: None

See also:     TestStatOpen, TestStatInit, TestStatConverge, 
              TestStatValues.

------------------------------------------------------------

Usage of Statistical routines for convergence and values:  TestApp
------------------------------------------------------------------

#define MIN_ITERATION   3
#define MAX_ITERATION 200
#define OUTLIER_FACTOR  4

Body of test application
{
USHORT usMinIteration = MIN_ITERATION;
USHORT usMaxIteration = MAX_ITERATION;
ULONG  ulDataForCurrentIter;
ULONG  far *pulDataArray;  // make sure you have the "far" for 16 bit.
      
char   chOutputBufferForString [81];
USHORT usOutlierFactor = OUTLIER_FACTOR;
USHORT cusDiscardedValues;
USHORT cusElementsInArray;
:
:
     
    if (!TestStatOpen (usMinIteration,
                       usMaxIteration)) {
        // Data Array could not be allocated.
        // Cannot do convergence/statistics routines;
        // Check parameters to call;
    }

    do { // for each test or if need to run convergence again 
        TestStatInit ()

        // Initialize test variables;
        :
        do {   // convergence loop; do until a
               // TRUE is returned 
            //  Start the timer;
            //  Test operation;
            //  Stop the timer;
            ulDataForCurrentIter = // get the elapsed time for
                                   // operation;
        } while (!TestStatConverge (ulDataForCurrentIter)); 
        //  the data set has converged.  Call the Statistics
        //  routine for the values and output data 

        TestStatValues (OutputBufferForString,
                        usOutlierFactor,
                        &pulDataArray,
                        &cusDiscardedValues,
                        &cusElementsInArray,
                        );

        // the OutputBufferForString array has all the data.
        // iDiscardedValues has the number of discarded values
        // 
    } while (//more tests or need to converge on new data set )
    :
    :
    TestStatClose();
    :
}

-------------------------------------------------------------------

Random Number Generation Routines:

6)  TestStatUniRand:
    ---------------

Description:  Returns a number within the range of 0 to 1 based on
a starting seed.

     double FAR PASCAL 
     TestStatUniRand (
          VOID
          );

Remarks:  This routine returns a set of uniformly distributed numbers
between 0 and 1, on being, called repeatedly.  TestStatUniRand makes
use of the multiplicative congruential algorithm discussed in Knuth,
Vol. II, Chapter 3.  A starting seed is chosen along with a
multiplier and a modulus values.  The seed for the next iteration is
computed from these values as follows:

                   Temp Value  =  X   *  A, where,

X is the current seed value and A is the multiplier.  The remainder
of the division of this value by the modulus identifier is
determined.  This will be the seed for the next iteration.  This
value is divided by the modulus value to obtain a normalized value
(that lies between 0 and 1).  This normalized value is returned to
the caller.  

Through experiments, Sullivans, W. L has determined that a good set of
values is returned by selecting one of the 9 following values as
starting seeds:

     32347753, 52142147, 52142123, 53214215, 23521425, 42321479,
     20302541, 32524125, 42152159.
     
TestStatUniRand uses 32347753 as the starting seed.  A good set of
values, mentioned above, implies that for the given seed, it takes
a very large number of iterations, before the set of returned values
is repeated.  The following values have been chosen for the
multiplier and the modulus by M.C. Pike and I.D. Hill (reference):

          Multiplier - 3125
          Modulus id - 67108864

Return Value:  A double float between 0 and 1.

See also:      TestStatShortRand, TestStatRand, TestStatNormDist.

7)  TestStatShortRand:
    -----------------

Description:  Returns a number within the range of 0 to 65535 based on
a starting seed.

     USHORT FAR PASCAL 
     TestStatShortRand (
          VOID
          );

Remarks:  This routine returns a set of uniformly distributed numbers
between 0 and 65535, on being, called repeatedly.  TestStatShortRand makes
use of the multiplicative congruential algorithm discussed in Knuth,
Vol. II, Chapter 3.  A starting seed is chosen along with a
multiplier and a modulus values.  The seed for the next iteration is
computed from these values as follows:

                   Temp Value  =  X   *  A, where,

X is the current seed value and A is the multiplier.  The remainder
of the division of this value by the modulus identifier is
determined.  This will be the seed for the next iteration.  This
value is multiplied by 65535 and divided by the modulus value 
to obtain a value between 0 and 65535.  This value is returned to
the caller.  

Through experiments, Sullivans, W. L has determined that a good set of
values is returned by selecting one of the 9 following values as
starting seeds:

     32347753, 52142147, 52142123, 53214215, 23521425, 42321479,
     20302541, 32524125, 42152159.
     
TestStatShortRand uses 32347753 as the starting seed.  A good set of
values, mentioned above, implies that for the given seed, it takes
a very large number of iterations, before the set of returned values
is repeated.  The following values have been chosen for the
multiplier and the modulus by M.C. Pike and I.D. Hill (reference):

          Multiplier - 3125
          Modulus id - 67108864

Return Value:  A USHORT between 0 and 65535.

See also:      TestStatUniRand, TestStatRand, TestStatNormDist.


8)  TestStatRand:
    ------------

Description:  Returns a uniformly distributed random number within
a specified range.  

     ULONG FAR PASCAL
     TestStatRand (
         ULONG ulLower,
         ULONG ulUpper
     );

     ulLower - Specifies the lower boundary of the desired random
         number.  Should be atleast 1 in value.  

     ulUpper - Specifies the upper boundary of the desired random
         number.  May not exceed 67108863.

Remarks:  TestStatRand calls TestStatNorm for obtaining a normalized
random number.  The value obtained from TestStatNorm is then
multiplied by the range (i.e. the difference between ulUpper and
ulLower).  The computed value is then added to the lower limit
and the resulting number is returned.  It should be noted that both 
ulLower and ulUpper are included in the range of returned random
numbers. 

Return Value:  A random number within the specified range.

See Also:      TestStatShortRand, TestStatUniRand, TestStatNormDist.

9)  TestStatNormDist:
    ----------------

Description:  With every call, returns a number that forms a set
of points whose mean is approximately the input mean and whose
standard deviation is nearly equal to the input standard deviation.
A normally distributed set of points is generated.

    LONG FAR PASCAL
    TestStatNormDist (
        ULONG  ulMean,
        USHORT usSDev
    );

Remarks:  This routine uses a formula discussed in 'Random Number
Generation and Testing', IBM Data Processing Techniques, C20-8011
and 'Tuning an Operating System for General Purpose Use', Russell
P. Blake, Online Conferences (info. to be filled in).

     TestStatNormDist makes use of TestStatShortRand to get a set of
uniformly distributed numbers.  It generates a point around the
input mean using the following formula:

                                14
                                 _
     lRetVal <-  Mean + ( -7 +  >_ TestStatShortRand ()) * Std. Dev  

                               i=1

     The set of points generated with several calls to this routine
will be uniformly distributed with a mean of about the input mean
and a standard deviation of approximately the input standard
deviation.  The returned value may be negative, too, depending
upon the values returned by TestStatShortRand and the input standard
deviation!

Return Value:  A long integer.

See also:      TestStatShortRand, TestStatUniRand, TestStatRand.