Home | History | Annotate | Download | only in cros_utils
      1 # We did not author this file nor mantain it. Skip linting it.
      2 #pylint: skip-file
      3 # Copyright (c) 1999-2008 Gary Strangman; All Rights Reserved.
      4 #
      5 # Permission is hereby granted, free of charge, to any person obtaining a copy
      6 # of this software and associated documentation files (the "Software"), to deal
      7 # in the Software without restriction, including without limitation the rights
      8 # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
      9 # copies of the Software, and to permit persons to whom the Software is
     10 # furnished to do so, subject to the following conditions:
     11 #
     12 # The above copyright notice and this permission notice shall be included in
     13 # all copies or substantial portions of the Software.
     14 #
     15 # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
     16 # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
     17 # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
     18 # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
     19 # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
     20 # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
     21 # THE SOFTWARE.
     22 #
     23 # Comments and/or additions are welcome (send e-mail to:
     24 # strang (at] nmr.mgh.harvard.edu).
     25 #
     26 """stats.py module
     27 
     28 (Requires pstat.py module.)
     29 
     30 #################################################
     31 #######  Written by:  Gary Strangman  ###########
     32 #######  Last modified:  Oct 31, 2008 ###########
     33 #################################################
     34 
     35 A collection of basic statistical functions for python.  The function
     36 names appear below.
     37 
     38 IMPORTANT:  There are really *3* sets of functions.  The first set has an 'l'
     39 prefix, which can be used with list or tuple arguments.  The second set has
     40 an 'a' prefix, which can accept NumPy array arguments.  These latter
     41 functions are defined only when NumPy is available on the system.  The third
     42 type has NO prefix (i.e., has the name that appears below).  Functions of
     43 this set are members of a "Dispatch" class, c/o David Ascher.  This class
     44 allows different functions to be called depending on the type of the passed
     45 arguments.  Thus, stats.mean is a member of the Dispatch class and
     46 stats.mean(range(20)) will call stats.lmean(range(20)) while
     47 stats.mean(Numeric.arange(20)) will call stats.amean(Numeric.arange(20)).
     48 This is a handy way to keep consistent function names when different
     49 argument types require different functions to be called.  Having
     50 implementated the Dispatch class, however, means that to get info on
     51 a given function, you must use the REAL function name ... that is
     52 "print stats.lmean.__doc__" or "print stats.amean.__doc__" work fine,
     53 while "print stats.mean.__doc__" will print the doc for the Dispatch
     54 class.  NUMPY FUNCTIONS ('a' prefix) generally have more argument options
     55 but should otherwise be consistent with the corresponding list functions.
     56 
     57 Disclaimers:  The function list is obviously incomplete and, worse, the
     58 functions are not optimized.  All functions have been tested (some more
     59 so than others), but they are far from bulletproof.  Thus, as with any
     60 free software, no warranty or guarantee is expressed or implied. :-)  A
     61 few extra functions that don't appear in the list below can be found by
     62 interested treasure-hunters.  These functions don't necessarily have
     63 both list and array versions but were deemed useful
     64 
     65 CENTRAL TENDENCY:  geometricmean
     66                    harmonicmean
     67                    mean
     68                    median
     69                    medianscore
     70                    mode
     71 
     72 MOMENTS:  moment
     73           variation
     74           skew
     75           kurtosis
     76           skewtest   (for Numpy arrays only)
     77           kurtosistest (for Numpy arrays only)
     78           normaltest (for Numpy arrays only)
     79 
     80 ALTERED VERSIONS:  tmean  (for Numpy arrays only)
     81                    tvar   (for Numpy arrays only)
     82                    tmin   (for Numpy arrays only)
     83                    tmax   (for Numpy arrays only)
     84                    tstdev (for Numpy arrays only)
     85                    tsem   (for Numpy arrays only)
     86                    describe
     87 
     88 FREQUENCY STATS:  itemfreq
     89                   scoreatpercentile
     90                   percentileofscore
     91                   histogram
     92                   cumfreq
     93                   relfreq
     94 
     95 VARIABILITY:  obrientransform
     96               samplevar
     97               samplestdev
     98               signaltonoise (for Numpy arrays only)
     99               var
    100               stdev
    101               sterr
    102               sem
    103               z
    104               zs
    105               zmap (for Numpy arrays only)
    106 
    107 TRIMMING FCNS:  threshold (for Numpy arrays only)
    108                 trimboth
    109                 trim1
    110                 round (round all vals to 'n' decimals; Numpy only)
    111 
    112 CORRELATION FCNS:  covariance  (for Numpy arrays only)
    113                    correlation (for Numpy arrays only)
    114                    paired
    115                    pearsonr
    116                    spearmanr
    117                    pointbiserialr
    118                    kendalltau
    119                    linregress
    120 
    121 INFERENTIAL STATS:  ttest_1samp
    122                     ttest_ind
    123                     ttest_rel
    124                     chisquare
    125                     ks_2samp
    126                     mannwhitneyu
    127                     ranksums
    128                     wilcoxont
    129                     kruskalwallish
    130                     friedmanchisquare
    131 
    132 PROBABILITY CALCS:  chisqprob
    133                     erfcc
    134                     zprob
    135                     ksprob
    136                     fprob
    137                     betacf
    138                     gammln
    139                     betai
    140 
    141 ANOVA FUNCTIONS:  F_oneway
    142                   F_value
    143 
    144 SUPPORT FUNCTIONS:  writecc
    145                     incr
    146                     sign  (for Numpy arrays only)
    147                     sum
    148                     cumsum
    149                     ss
    150                     summult
    151                     sumdiffsquared
    152                     square_of_sums
    153                     shellsort
    154                     rankdata
    155                     outputpairedstats
    156                     findwithin
    157 """
    158 ## CHANGE LOG:
    159 ## ===========
    160 ## 09-07-21 ... added capability for getting the 'proportion' out of l/amannwhitneyu (but comment-disabled)
    161 ## 08-10-31 ... fixed import LinearAlgebra bug before glm fcns
    162 ## 07-11-26 ... conversion for numpy started
    163 ## 07-05-16 ... added Lin's Concordance Correlation Coefficient (alincc) and acov
    164 ## 05-08-21 ... added "Dice's coefficient"
    165 ## 04-10-26 ... added ap2t(), an ugly fcn for converting p-vals to T-vals
    166 ## 04-04-03 ... added amasslinregress() function to do regression on N-D arrays
    167 ## 03-01-03 ... CHANGED VERSION TO 0.6
    168 ##              fixed atsem() to properly handle limits=None case
    169 ##              improved histogram and median functions (estbinwidth) and
    170 ##                   fixed atvar() function (wrong answers for neg numbers?!?)
    171 ## 02-11-19 ... fixed attest_ind and attest_rel for div-by-zero Overflows
    172 ## 02-05-10 ... fixed lchisqprob indentation (failed when df=even)
    173 ## 00-12-28 ... removed aanova() to separate module, fixed licensing to
    174 ##                   match Python License, fixed doc string & imports
    175 ## 00-04-13 ... pulled all "global" statements, except from aanova()
    176 ##              added/fixed lots of documentation, removed io.py dependency
    177 ##              changed to version 0.5
    178 ## 99-11-13 ... added asign() function
    179 ## 99-11-01 ... changed version to 0.4 ... enough incremental changes now
    180 ## 99-10-25 ... added acovariance and acorrelation functions
    181 ## 99-10-10 ... fixed askew/akurtosis to avoid divide-by-zero errors
    182 ##              added aglm function (crude, but will be improved)
    183 ## 99-10-04 ... upgraded acumsum, ass, asummult, asamplevar, avar, etc. to
    184 ##                   all handle lists of 'dimension's and keepdims
    185 ##              REMOVED ar0, ar2, ar3, ar4 and replaced them with around
    186 ##              reinserted fixes for abetai to avoid math overflows
    187 ## 99-09-05 ... rewrote achisqprob/aerfcc/aksprob/afprob/abetacf/abetai to
    188 ##                   handle multi-dimensional arrays (whew!)
    189 ## 99-08-30 ... fixed l/amoment, l/askew, l/akurtosis per D'Agostino (1990)
    190 ##              added anormaltest per same reference
    191 ##              re-wrote azprob to calc arrays of probs all at once
    192 ## 99-08-22 ... edited attest_ind printing section so arrays could be rounded
    193 ## 99-08-19 ... fixed amean and aharmonicmean for non-error(!) overflow on
    194 ##                   short/byte arrays (mean of #s btw 100-300 = -150??)
    195 ## 99-08-09 ... fixed asum so that the None case works for Byte arrays
    196 ## 99-08-08 ... fixed 7/3 'improvement' to handle t-calcs on N-D arrays
    197 ## 99-07-03 ... improved attest_ind, attest_rel (zero-division errortrap)
    198 ## 99-06-24 ... fixed bug(?) in attest_ind (n1=a.shape[0])
    199 ## 04/11/99 ... added asignaltonoise, athreshold functions, changed all
    200 ##                   max/min in array section to N.maximum/N.minimum,
    201 ##                   fixed square_of_sums to prevent integer overflow
    202 ## 04/10/99 ... !!! Changed function name ... sumsquared ==> square_of_sums
    203 ## 03/18/99 ... Added ar0, ar2, ar3 and ar4 rounding functions
    204 ## 02/28/99 ... Fixed aobrientransform to return an array rather than a list
    205 ## 01/15/99 ... Essentially ceased updating list-versions of functions (!!!)
    206 ## 01/13/99 ... CHANGED TO VERSION 0.3
    207 ##              fixed bug in a/lmannwhitneyu p-value calculation
    208 ## 12/31/98 ... fixed variable-name bug in ldescribe
    209 ## 12/19/98 ... fixed bug in findwithin (fcns needed pstat. prefix)
    210 ## 12/16/98 ... changed amedianscore to return float (not array) for 1 score
    211 ## 12/14/98 ... added atmin and atmax functions
    212 ##              removed umath from import line (not needed)
    213 ##              l/ageometricmean modified to reduce chance of overflows (take
    214 ##                   nth root first, then multiply)
    215 ## 12/07/98 ... added __version__variable (now 0.2)
    216 ##              removed all 'stats.' from anova() fcn
    217 ## 12/06/98 ... changed those functions (except shellsort) that altered
    218 ##                   arguments in-place ... cumsum, ranksort, ...
    219 ##              updated (and fixed some) doc-strings
    220 ## 12/01/98 ... added anova() function (requires NumPy)
    221 ##              incorporated Dispatch class
    222 ## 11/12/98 ... added functionality to amean, aharmonicmean, ageometricmean
    223 ##              added 'asum' function (added functionality to N.add.reduce)
    224 ##              fixed both moment and amoment (two errors)
    225 ##              changed name of skewness and askewness to skew and askew
    226 ##              fixed (a)histogram (which sometimes counted points <lowerlimit)
    227 
    228 import pstat  # required 3rd party module
    229 import math, string, copy  # required python modules
    230 from types import *
    231 
    232 __version__ = 0.6
    233 
    234 ############# DISPATCH CODE ##############
    235 
    236 
    237 class Dispatch:
    238   """
    239 The Dispatch class, care of David Ascher, allows different functions to
    240 be called depending on the argument types.  This way, there can be one
    241 function name regardless of the argument type.  To access function doc
    242 in stats.py module, prefix the function with an 'l' or 'a' for list or
    243 array arguments, respectively.  That is, print stats.lmean.__doc__ or
    244 print stats.amean.__doc__ or whatever.
    245 """
    246 
    247   def __init__(self, *tuples):
    248     self._dispatch = {}
    249     for func, types in tuples:
    250       for t in types:
    251         if t in self._dispatch.keys():
    252           raise ValueError, "can't have two dispatches on " + str(t)
    253         self._dispatch[t] = func
    254     self._types = self._dispatch.keys()
    255 
    256   def __call__(self, arg1, *args, **kw):
    257     if type(arg1) not in self._types:
    258       raise TypeError, "don't know how to dispatch %s arguments" % type(arg1)
    259     return apply(self._dispatch[type(arg1)], (arg1,) + args, kw)
    260 
    261 ##########################################################################
    262 ########################   LIST-BASED FUNCTIONS   ########################
    263 ##########################################################################
    264 
    265 ### Define these regardless
    266 
    267 ####################################
    268 #######  CENTRAL TENDENCY  #########
    269 ####################################
    270 
    271 
    272 def lgeometricmean(inlist):
    273   """
    274 Calculates the geometric mean of the values in the passed list.
    275 That is:  n-th root of (x1 * x2 * ... * xn).  Assumes a '1D' list.
    276 
    277 Usage:   lgeometricmean(inlist)
    278 """
    279   mult = 1.0
    280   one_over_n = 1.0 / len(inlist)
    281   for item in inlist:
    282     mult = mult * pow(item, one_over_n)
    283   return mult
    284 
    285 
    286 def lharmonicmean(inlist):
    287   """
    288 Calculates the harmonic mean of the values in the passed list.
    289 That is:  n / (1/x1 + 1/x2 + ... + 1/xn).  Assumes a '1D' list.
    290 
    291 Usage:   lharmonicmean(inlist)
    292 """
    293   sum = 0
    294   for item in inlist:
    295     sum = sum + 1.0 / item
    296   return len(inlist) / sum
    297 
    298 
    299 def lmean(inlist):
    300   """
    301 Returns the arithematic mean of the values in the passed list.
    302 Assumes a '1D' list, but will function on the 1st dim of an array(!).
    303 
    304 Usage:   lmean(inlist)
    305 """
    306   sum = 0
    307   for item in inlist:
    308     sum = sum + item
    309   return sum / float(len(inlist))
    310 
    311 
    312 def lmedian(inlist, numbins=1000):
    313   """
    314 Returns the computed median value of a list of numbers, given the
    315 number of bins to use for the histogram (more bins brings the computed value
    316 closer to the median score, default number of bins = 1000).  See G.W.
    317 Heiman's Basic Stats (1st Edition), or CRC Probability & Statistics.
    318 
    319 Usage:   lmedian (inlist, numbins=1000)
    320 """
    321   (hist, smallest, binsize, extras) = histogram(
    322       inlist, numbins, [min(inlist), max(inlist)])  # make histog
    323   cumhist = cumsum(hist)  # make cumulative histogram
    324   for i in range(len(cumhist)):  # get 1st(!) index holding 50%ile score
    325     if cumhist[i] >= len(inlist) / 2.0:
    326       cfbin = i
    327       break
    328   LRL = smallest + binsize * cfbin  # get lower read limit of that bin
    329   cfbelow = cumhist[cfbin - 1]
    330   freq = float(hist[cfbin])  # frequency IN the 50%ile bin
    331   median = LRL + (
    332       (len(inlist) / 2.0 - cfbelow) / float(freq)) * binsize  # median formula
    333   return median
    334 
    335 
    336 def lmedianscore(inlist):
    337   """
    338 Returns the 'middle' score of the passed list.  If there is an even
    339 number of scores, the mean of the 2 middle scores is returned.
    340 
    341 Usage:   lmedianscore(inlist)
    342 """
    343 
    344   newlist = copy.deepcopy(inlist)
    345   newlist.sort()
    346   if len(newlist) % 2 == 0:  # if even number of scores, average middle 2
    347     index = len(newlist) / 2  # integer division correct
    348     median = float(newlist[index] + newlist[index - 1]) / 2
    349   else:
    350     index = len(newlist) / 2  # int divsion gives mid value when count from 0
    351     median = newlist[index]
    352   return median
    353 
    354 
    355 def lmode(inlist):
    356   """
    357 Returns a list of the modal (most common) score(s) in the passed
    358 list.  If there is more than one such score, all are returned.  The
    359 bin-count for the mode(s) is also returned.
    360 
    361 Usage:   lmode(inlist)
    362 Returns: bin-count for mode(s), a list of modal value(s)
    363 """
    364 
    365   scores = pstat.unique(inlist)
    366   scores.sort()
    367   freq = []
    368   for item in scores:
    369     freq.append(inlist.count(item))
    370   maxfreq = max(freq)
    371   mode = []
    372   stillmore = 1
    373   while stillmore:
    374     try:
    375       indx = freq.index(maxfreq)
    376       mode.append(scores[indx])
    377       del freq[indx]
    378       del scores[indx]
    379     except ValueError:
    380       stillmore = 0
    381   return maxfreq, mode
    382 
    383 ####################################
    384 ############  MOMENTS  #############
    385 ####################################
    386 
    387 
    388 def lmoment(inlist, moment=1):
    389   """
    390 Calculates the nth moment about the mean for a sample (defaults to
    391 the 1st moment).  Used to calculate coefficients of skewness and kurtosis.
    392 
    393 Usage:   lmoment(inlist,moment=1)
    394 Returns: appropriate moment (r) from ... 1/n * SUM((inlist(i)-mean)**r)
    395 """
    396   if moment == 1:
    397     return 0.0
    398   else:
    399     mn = mean(inlist)
    400     n = len(inlist)
    401     s = 0
    402     for x in inlist:
    403       s = s + (x - mn)**moment
    404     return s / float(n)
    405 
    406 
    407 def lvariation(inlist):
    408   """
    409 Returns the coefficient of variation, as defined in CRC Standard
    410 Probability and Statistics, p.6.
    411 
    412 Usage:   lvariation(inlist)
    413 """
    414   return 100.0 * samplestdev(inlist) / float(mean(inlist))
    415 
    416 
    417 def lskew(inlist):
    418   """
    419 Returns the skewness of a distribution, as defined in Numerical
    420 Recipies (alternate defn in CRC Standard Probability and Statistics, p.6.)
    421 
    422 Usage:   lskew(inlist)
    423 """
    424   return moment(inlist, 3) / pow(moment(inlist, 2), 1.5)
    425 
    426 
    427 def lkurtosis(inlist):
    428   """
    429 Returns the kurtosis of a distribution, as defined in Numerical
    430 Recipies (alternate defn in CRC Standard Probability and Statistics, p.6.)
    431 
    432 Usage:   lkurtosis(inlist)
    433 """
    434   return moment(inlist, 4) / pow(moment(inlist, 2), 2.0)
    435 
    436 
    437 def ldescribe(inlist):
    438   """
    439 Returns some descriptive statistics of the passed list (assumed to be 1D).
    440 
    441 Usage:   ldescribe(inlist)
    442 Returns: n, mean, standard deviation, skew, kurtosis
    443 """
    444   n = len(inlist)
    445   mm = (min(inlist), max(inlist))
    446   m = mean(inlist)
    447   sd = stdev(inlist)
    448   sk = skew(inlist)
    449   kurt = kurtosis(inlist)
    450   return n, mm, m, sd, sk, kurt
    451 
    452 ####################################
    453 #######  FREQUENCY STATS  ##########
    454 ####################################
    455 
    456 
    457 def litemfreq(inlist):
    458   """
    459 Returns a list of pairs.  Each pair consists of one of the scores in inlist
    460 and it's frequency count.  Assumes a 1D list is passed.
    461 
    462 Usage:   litemfreq(inlist)
    463 Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies)
    464 """
    465   scores = pstat.unique(inlist)
    466   scores.sort()
    467   freq = []
    468   for item in scores:
    469     freq.append(inlist.count(item))
    470   return pstat.abut(scores, freq)
    471 
    472 
    473 def lscoreatpercentile(inlist, percent):
    474   """
    475 Returns the score at a given percentile relative to the distribution
    476 given by inlist.
    477 
    478 Usage:   lscoreatpercentile(inlist,percent)
    479 """
    480   if percent > 1:
    481     print '\nDividing percent>1 by 100 in lscoreatpercentile().\n'
    482     percent = percent / 100.0
    483   targetcf = percent * len(inlist)
    484   h, lrl, binsize, extras = histogram(inlist)
    485   cumhist = cumsum(copy.deepcopy(h))
    486   for i in range(len(cumhist)):
    487     if cumhist[i] >= targetcf:
    488       break
    489   score = binsize * (
    490       (targetcf - cumhist[i - 1]) / float(h[i])) + (lrl + binsize * i)
    491   return score
    492 
    493 
    494 def lpercentileofscore(inlist, score, histbins=10, defaultlimits=None):
    495   """
    496 Returns the percentile value of a score relative to the distribution
    497 given by inlist.  Formula depends on the values used to histogram the data(!).
    498 
    499 Usage:   lpercentileofscore(inlist,score,histbins=10,defaultlimits=None)
    500 """
    501 
    502   h, lrl, binsize, extras = histogram(inlist, histbins, defaultlimits)
    503   cumhist = cumsum(copy.deepcopy(h))
    504   i = int((score - lrl) / float(binsize))
    505   pct = (cumhist[i - 1] + (
    506       (score -
    507        (lrl + binsize * i)) / float(binsize)) * h[i]) / float(len(inlist)) * 100
    508   return pct
    509 
    510 
    511 def lhistogram(inlist, numbins=10, defaultreallimits=None, printextras=0):
    512   """
    513 Returns (i) a list of histogram bin counts, (ii) the smallest value
    514 of the histogram binning, and (iii) the bin width (the last 2 are not
    515 necessarily integers).  Default number of bins is 10.  If no sequence object
    516 is given for defaultreallimits, the routine picks (usually non-pretty) bins
    517 spanning all the numbers in the inlist.
    518 
    519 Usage:   lhistogram (inlist, numbins=10,
    520 defaultreallimits=None,suppressoutput=0)
    521 Returns: list of bin values, lowerreallimit, binsize, extrapoints
    522 """
    523   if (defaultreallimits <> None):
    524     if type(defaultreallimits) not in [ListType, TupleType] or len(
    525         defaultreallimits) == 1:  # only one limit given, assumed to be lower one & upper is calc'd
    526       lowerreallimit = defaultreallimits
    527       upperreallimit = 1.000001 * max(inlist)
    528     else:  # assume both limits given
    529       lowerreallimit = defaultreallimits[0]
    530       upperreallimit = defaultreallimits[1]
    531     binsize = (upperreallimit - lowerreallimit) / float(numbins)
    532   else:  # no limits given for histogram, both must be calc'd
    533     estbinwidth = (max(inlist) -
    534                    min(inlist)) / float(numbins) + 1e-6  #1=>cover all
    535     binsize = ((max(inlist) - min(inlist) + estbinwidth)) / float(numbins)
    536     lowerreallimit = min(inlist) - binsize / 2  #lower real limit,1st bin
    537   bins = [0] * (numbins)
    538   extrapoints = 0
    539   for num in inlist:
    540     try:
    541       if (num - lowerreallimit) < 0:
    542         extrapoints = extrapoints + 1
    543       else:
    544         bintoincrement = int((num - lowerreallimit) / float(binsize))
    545         bins[bintoincrement] = bins[bintoincrement] + 1
    546     except:
    547       extrapoints = extrapoints + 1
    548   if (extrapoints > 0 and printextras == 1):
    549     print '\nPoints outside given histogram range =', extrapoints
    550   return (bins, lowerreallimit, binsize, extrapoints)
    551 
    552 
    553 def lcumfreq(inlist, numbins=10, defaultreallimits=None):
    554   """
    555 Returns a cumulative frequency histogram, using the histogram function.
    556 
    557 Usage:   lcumfreq(inlist,numbins=10,defaultreallimits=None)
    558 Returns: list of cumfreq bin values, lowerreallimit, binsize, extrapoints
    559 """
    560   h, l, b, e = histogram(inlist, numbins, defaultreallimits)
    561   cumhist = cumsum(copy.deepcopy(h))
    562   return cumhist, l, b, e
    563 
    564 
    565 def lrelfreq(inlist, numbins=10, defaultreallimits=None):
    566   """
    567 Returns a relative frequency histogram, using the histogram function.
    568 
    569 Usage:   lrelfreq(inlist,numbins=10,defaultreallimits=None)
    570 Returns: list of cumfreq bin values, lowerreallimit, binsize, extrapoints
    571 """
    572   h, l, b, e = histogram(inlist, numbins, defaultreallimits)
    573   for i in range(len(h)):
    574     h[i] = h[i] / float(len(inlist))
    575   return h, l, b, e
    576 
    577 ####################################
    578 #####  VARIABILITY FUNCTIONS  ######
    579 ####################################
    580 
    581 
    582 def lobrientransform(*args):
    583   """
    584 Computes a transform on input data (any number of columns).  Used to
    585 test for homogeneity of variance prior to running one-way stats.  From
    586 Maxwell and Delaney, p.112.
    587 
    588 Usage:   lobrientransform(*args)
    589 Returns: transformed data for use in an ANOVA
    590 """
    591   TINY = 1e-10
    592   k = len(args)
    593   n = [0.0] * k
    594   v = [0.0] * k
    595   m = [0.0] * k
    596   nargs = []
    597   for i in range(k):
    598     nargs.append(copy.deepcopy(args[i]))
    599     n[i] = float(len(nargs[i]))
    600     v[i] = var(nargs[i])
    601     m[i] = mean(nargs[i])
    602   for j in range(k):
    603     for i in range(n[j]):
    604       t1 = (n[j] - 1.5) * n[j] * (nargs[j][i] - m[j])**2
    605       t2 = 0.5 * v[j] * (n[j] - 1.0)
    606       t3 = (n[j] - 1.0) * (n[j] - 2.0)
    607       nargs[j][i] = (t1 - t2) / float(t3)
    608   check = 1
    609   for j in range(k):
    610     if v[j] - mean(nargs[j]) > TINY:
    611       check = 0
    612   if check <> 1:
    613     raise ValueError, 'Problem in obrientransform.'
    614   else:
    615     return nargs
    616 
    617 
    618 def lsamplevar(inlist):
    619   """
    620 Returns the variance of the values in the passed list using
    621 N for the denominator (i.e., DESCRIBES the sample variance only).
    622 
    623 Usage:   lsamplevar(inlist)
    624 """
    625   n = len(inlist)
    626   mn = mean(inlist)
    627   deviations = []
    628   for item in inlist:
    629     deviations.append(item - mn)
    630   return ss(deviations) / float(n)
    631 
    632 
    633 def lsamplestdev(inlist):
    634   """
    635 Returns the standard deviation of the values in the passed list using
    636 N for the denominator (i.e., DESCRIBES the sample stdev only).
    637 
    638 Usage:   lsamplestdev(inlist)
    639 """
    640   return math.sqrt(samplevar(inlist))
    641 
    642 
    643 def lcov(x, y, keepdims=0):
    644   """
    645 Returns the estimated covariance of the values in the passed
    646 array (i.e., N-1).  Dimension can equal None (ravel array first), an
    647 integer (the dimension over which to operate), or a sequence (operate
    648 over multiple dimensions).  Set keepdims=1 to return an array with the
    649 same number of dimensions as inarray.
    650 
    651 Usage:   lcov(x,y,keepdims=0)
    652 """
    653 
    654   n = len(x)
    655   xmn = mean(x)
    656   ymn = mean(y)
    657   xdeviations = [0] * len(x)
    658   ydeviations = [0] * len(y)
    659   for i in range(len(x)):
    660     xdeviations[i] = x[i] - xmn
    661     ydeviations[i] = y[i] - ymn
    662   ss = 0.0
    663   for i in range(len(xdeviations)):
    664     ss = ss + xdeviations[i] * ydeviations[i]
    665   return ss / float(n - 1)
    666 
    667 
    668 def lvar(inlist):
    669   """
    670 Returns the variance of the values in the passed list using N-1
    671 for the denominator (i.e., for estimating population variance).
    672 
    673 Usage:   lvar(inlist)
    674 """
    675   n = len(inlist)
    676   mn = mean(inlist)
    677   deviations = [0] * len(inlist)
    678   for i in range(len(inlist)):
    679     deviations[i] = inlist[i] - mn
    680   return ss(deviations) / float(n - 1)
    681 
    682 
    683 def lstdev(inlist):
    684   """
    685 Returns the standard deviation of the values in the passed list
    686 using N-1 in the denominator (i.e., to estimate population stdev).
    687 
    688 Usage:   lstdev(inlist)
    689 """
    690   return math.sqrt(var(inlist))
    691 
    692 
    693 def lsterr(inlist):
    694   """
    695 Returns the standard error of the values in the passed list using N-1
    696 in the denominator (i.e., to estimate population standard error).
    697 
    698 Usage:   lsterr(inlist)
    699 """
    700   return stdev(inlist) / float(math.sqrt(len(inlist)))
    701 
    702 
    703 def lsem(inlist):
    704   """
    705 Returns the estimated standard error of the mean (sx-bar) of the
    706 values in the passed list.  sem = stdev / sqrt(n)
    707 
    708 Usage:   lsem(inlist)
    709 """
    710   sd = stdev(inlist)
    711   n = len(inlist)
    712   return sd / math.sqrt(n)
    713 
    714 
    715 def lz(inlist, score):
    716   """
    717 Returns the z-score for a given input score, given that score and the
    718 list from which that score came.  Not appropriate for population calculations.
    719 
    720 Usage:   lz(inlist, score)
    721 """
    722   z = (score - mean(inlist)) / samplestdev(inlist)
    723   return z
    724 
    725 
    726 def lzs(inlist):
    727   """
    728 Returns a list of z-scores, one for each score in the passed list.
    729 
    730 Usage:   lzs(inlist)
    731 """
    732   zscores = []
    733   for item in inlist:
    734     zscores.append(z(inlist, item))
    735   return zscores
    736 
    737 ####################################
    738 #######  TRIMMING FUNCTIONS  #######
    739 ####################################
    740 
    741 
    742 def ltrimboth(l, proportiontocut):
    743   """
    744 Slices off the passed proportion of items from BOTH ends of the passed
    745 list (i.e., with proportiontocut=0.1, slices 'leftmost' 10% AND 'rightmost'
    746 10% of scores.  Assumes list is sorted by magnitude.  Slices off LESS if
    747 proportion results in a non-integer slice index (i.e., conservatively
    748 slices off proportiontocut).
    749 
    750 Usage:   ltrimboth (l,proportiontocut)
    751 Returns: trimmed version of list l
    752 """
    753   lowercut = int(proportiontocut * len(l))
    754   uppercut = len(l) - lowercut
    755   return l[lowercut:uppercut]
    756 
    757 
    758 def ltrim1(l, proportiontocut, tail='right'):
    759   """
    760 Slices off the passed proportion of items from ONE end of the passed
    761 list (i.e., if proportiontocut=0.1, slices off 'leftmost' or 'rightmost'
    762 10% of scores).  Slices off LESS if proportion results in a non-integer
    763 slice index (i.e., conservatively slices off proportiontocut).
    764 
    765 Usage:   ltrim1 (l,proportiontocut,tail='right')  or set tail='left'
    766 Returns: trimmed version of list l
    767 """
    768   if tail == 'right':
    769     lowercut = 0
    770     uppercut = len(l) - int(proportiontocut * len(l))
    771   elif tail == 'left':
    772     lowercut = int(proportiontocut * len(l))
    773     uppercut = len(l)
    774   return l[lowercut:uppercut]
    775 
    776 ####################################
    777 #####  CORRELATION FUNCTIONS  ######
    778 ####################################
    779 
    780 
    781 def lpaired(x, y):
    782   """
    783 Interactively determines the type of data and then runs the
    784 appropriated statistic for paired group data.
    785 
    786 Usage:   lpaired(x,y)
    787 Returns: appropriate statistic name, value, and probability
    788 """
    789   samples = ''
    790   while samples not in ['i', 'r', 'I', 'R', 'c', 'C']:
    791     print '\nIndependent or related samples, or correlation (i,r,c): ',
    792     samples = raw_input()
    793 
    794   if samples in ['i', 'I', 'r', 'R']:
    795     print '\nComparing variances ...',
    796     # USE O'BRIEN'S TEST FOR HOMOGENEITY OF VARIANCE, Maxwell & delaney, p.112
    797     r = obrientransform(x, y)
    798     f, p = F_oneway(pstat.colex(r, 0), pstat.colex(r, 1))
    799     if p < 0.05:
    800       vartype = 'unequal, p=' + str(round(p, 4))
    801     else:
    802       vartype = 'equal'
    803     print vartype
    804     if samples in ['i', 'I']:
    805       if vartype[0] == 'e':
    806         t, p = ttest_ind(x, y, 0)
    807         print '\nIndependent samples t-test:  ', round(t, 4), round(p, 4)
    808       else:
    809         if len(x) > 20 or len(y) > 20:
    810           z, p = ranksums(x, y)
    811           print '\nRank Sums test (NONparametric, n>20):  ', round(z, 4), round(
    812               p, 4)
    813         else:
    814           u, p = mannwhitneyu(x, y)
    815           print '\nMann-Whitney U-test (NONparametric, ns<20):  ', round(
    816               u, 4), round(p, 4)
    817 
    818     else:  # RELATED SAMPLES
    819       if vartype[0] == 'e':
    820         t, p = ttest_rel(x, y, 0)
    821         print '\nRelated samples t-test:  ', round(t, 4), round(p, 4)
    822       else:
    823         t, p = ranksums(x, y)
    824         print '\nWilcoxon T-test (NONparametric):  ', round(t, 4), round(p, 4)
    825   else:  # CORRELATION ANALYSIS
    826     corrtype = ''
    827     while corrtype not in ['c', 'C', 'r', 'R', 'd', 'D']:
    828       print '\nIs the data Continuous, Ranked, or Dichotomous (c,r,d): ',
    829       corrtype = raw_input()
    830     if corrtype in ['c', 'C']:
    831       m, b, r, p, see = linregress(x, y)
    832       print '\nLinear regression for continuous variables ...'
    833       lol = [['Slope', 'Intercept', 'r', 'Prob', 'SEestimate'],
    834              [round(m, 4), round(b, 4), round(r, 4), round(p, 4), round(see, 4)]
    835             ]
    836       pstat.printcc(lol)
    837     elif corrtype in ['r', 'R']:
    838       r, p = spearmanr(x, y)
    839       print '\nCorrelation for ranked variables ...'
    840       print "Spearman's r: ", round(r, 4), round(p, 4)
    841     else:  # DICHOTOMOUS
    842       r, p = pointbiserialr(x, y)
    843       print '\nAssuming x contains a dichotomous variable ...'
    844       print 'Point Biserial r: ', round(r, 4), round(p, 4)
    845   print '\n\n'
    846   return None
    847 
    848 
    849 def lpearsonr(x, y):
    850   """
    851 Calculates a Pearson correlation coefficient and the associated
    852 probability value.  Taken from Heiman's Basic Statistics for the Behav.
    853 Sci (2nd), p.195.
    854 
    855 Usage:   lpearsonr(x,y)      where x and y are equal-length lists
    856 Returns: Pearson's r value, two-tailed p-value
    857 """
    858   TINY = 1.0e-30
    859   if len(x) <> len(y):
    860     raise ValueError, 'Input values not paired in pearsonr.  Aborting.'
    861   n = len(x)
    862   x = map(float, x)
    863   y = map(float, y)
    864   xmean = mean(x)
    865   ymean = mean(y)
    866   r_num = n * (summult(x, y)) - sum(x) * sum(y)
    867   r_den = math.sqrt((n * ss(x) - square_of_sums(x)) *
    868                     (n * ss(y) - square_of_sums(y)))
    869   r = (r_num / r_den)  # denominator already a float
    870   df = n - 2
    871   t = r * math.sqrt(df / ((1.0 - r + TINY) * (1.0 + r + TINY)))
    872   prob = betai(0.5 * df, 0.5, df / float(df + t * t))
    873   return r, prob
    874 
    875 
    876 def llincc(x, y):
    877   """
    878 Calculates Lin's concordance correlation coefficient.
    879 
    880 Usage:   alincc(x,y)    where x, y are equal-length arrays
    881 Returns: Lin's CC
    882 """
    883   covar = lcov(x, y) * (len(x) - 1) / float(len(x))  # correct denom to n
    884   xvar = lvar(x) * (len(x) - 1) / float(len(x))  # correct denom to n
    885   yvar = lvar(y) * (len(y) - 1) / float(len(y))  # correct denom to n
    886   lincc = (2 * covar) / ((xvar + yvar) + ((amean(x) - amean(y))**2))
    887   return lincc
    888 
    889 
    890 def lspearmanr(x, y):
    891   """
    892 Calculates a Spearman rank-order correlation coefficient.  Taken
    893 from Heiman's Basic Statistics for the Behav. Sci (1st), p.192.
    894 
    895 Usage:   lspearmanr(x,y)      where x and y are equal-length lists
    896 Returns: Spearman's r, two-tailed p-value
    897 """
    898   TINY = 1e-30
    899   if len(x) <> len(y):
    900     raise ValueError, 'Input values not paired in spearmanr.  Aborting.'
    901   n = len(x)
    902   rankx = rankdata(x)
    903   ranky = rankdata(y)
    904   dsq = sumdiffsquared(rankx, ranky)
    905   rs = 1 - 6 * dsq / float(n * (n**2 - 1))
    906   t = rs * math.sqrt((n - 2) / ((rs + 1.0) * (1.0 - rs)))
    907   df = n - 2
    908   probrs = betai(0.5 * df, 0.5, df / (df + t * t))  # t already a float
    909   # probability values for rs are from part 2 of the spearman function in
    910   # Numerical Recipies, p.510.  They are close to tables, but not exact. (?)
    911   return rs, probrs
    912 
    913 
    914 def lpointbiserialr(x, y):
    915   """
    916 Calculates a point-biserial correlation coefficient and the associated
    917 probability value.  Taken from Heiman's Basic Statistics for the Behav.
    918 Sci (1st), p.194.
    919 
    920 Usage:   lpointbiserialr(x,y)      where x,y are equal-length lists
    921 Returns: Point-biserial r, two-tailed p-value
    922 """
    923   TINY = 1e-30
    924   if len(x) <> len(y):
    925     raise ValueError, 'INPUT VALUES NOT PAIRED IN pointbiserialr.  ABORTING.'
    926   data = pstat.abut(x, y)
    927   categories = pstat.unique(x)
    928   if len(categories) <> 2:
    929     raise ValueError, 'Exactly 2 categories required for pointbiserialr().'
    930   else:  # there are 2 categories, continue
    931     codemap = pstat.abut(categories, range(2))
    932     recoded = pstat.recode(data, codemap, 0)
    933     x = pstat.linexand(data, 0, categories[0])
    934     y = pstat.linexand(data, 0, categories[1])
    935     xmean = mean(pstat.colex(x, 1))
    936     ymean = mean(pstat.colex(y, 1))
    937     n = len(data)
    938     adjust = math.sqrt((len(x) / float(n)) * (len(y) / float(n)))
    939     rpb = (ymean - xmean) / samplestdev(pstat.colex(data, 1)) * adjust
    940     df = n - 2
    941     t = rpb * math.sqrt(df / ((1.0 - rpb + TINY) * (1.0 + rpb + TINY)))
    942     prob = betai(0.5 * df, 0.5, df / (df + t * t))  # t already a float
    943     return rpb, prob
    944 
    945 
    946 def lkendalltau(x, y):
    947   """
    948 Calculates Kendall's tau ... correlation of ordinal data.  Adapted
    949 from function kendl1 in Numerical Recipies.  Needs good test-routine.@@@
    950 
    951 Usage:   lkendalltau(x,y)
    952 Returns: Kendall's tau, two-tailed p-value
    953 """
    954   n1 = 0
    955   n2 = 0
    956   iss = 0
    957   for j in range(len(x) - 1):
    958     for k in range(j, len(y)):
    959       a1 = x[j] - x[k]
    960       a2 = y[j] - y[k]
    961       aa = a1 * a2
    962       if (aa):  # neither list has a tie
    963         n1 = n1 + 1
    964         n2 = n2 + 1
    965         if aa > 0:
    966           iss = iss + 1
    967         else:
    968           iss = iss - 1
    969       else:
    970         if (a1):
    971           n1 = n1 + 1
    972         else:
    973           n2 = n2 + 1
    974   tau = iss / math.sqrt(n1 * n2)
    975   svar = (4.0 * len(x) + 10.0) / (9.0 * len(x) * (len(x) - 1))
    976   z = tau / math.sqrt(svar)
    977   prob = erfcc(abs(z) / 1.4142136)
    978   return tau, prob
    979 
    980 
    981 def llinregress(x, y):
    982   """
    983 Calculates a regression line on x,y pairs.
    984 
    985 Usage:   llinregress(x,y)      x,y are equal-length lists of x-y coordinates
    986 Returns: slope, intercept, r, two-tailed prob, sterr-of-estimate
    987 """
    988   TINY = 1.0e-20
    989   if len(x) <> len(y):
    990     raise ValueError, 'Input values not paired in linregress.  Aborting.'
    991   n = len(x)
    992   x = map(float, x)
    993   y = map(float, y)
    994   xmean = mean(x)
    995   ymean = mean(y)
    996   r_num = float(n * (summult(x, y)) - sum(x) * sum(y))
    997   r_den = math.sqrt((n * ss(x) - square_of_sums(x)) *
    998                     (n * ss(y) - square_of_sums(y)))
    999   r = r_num / r_den
   1000   z = 0.5 * math.log((1.0 + r + TINY) / (1.0 - r + TINY))
   1001   df = n - 2
   1002   t = r * math.sqrt(df / ((1.0 - r + TINY) * (1.0 + r + TINY)))
   1003   prob = betai(0.5 * df, 0.5, df / (df + t * t))
   1004   slope = r_num / float(n * ss(x) - square_of_sums(x))
   1005   intercept = ymean - slope * xmean
   1006   sterrest = math.sqrt(1 - r * r) * samplestdev(y)
   1007   return slope, intercept, r, prob, sterrest
   1008 
   1009 ####################################
   1010 #####  INFERENTIAL STATISTICS  #####
   1011 ####################################
   1012 
   1013 
   1014 def lttest_1samp(a, popmean, printit=0, name='Sample', writemode='a'):
   1015   """
   1016 Calculates the t-obtained for the independent samples T-test on ONE group
   1017 of scores a, given a population mean.  If printit=1, results are printed
   1018 to the screen.  If printit='filename', the results are output to 'filename'
   1019 using the given writemode (default=append).  Returns t-value, and prob.
   1020 
   1021 Usage:   lttest_1samp(a,popmean,Name='Sample',printit=0,writemode='a')
   1022 Returns: t-value, two-tailed prob
   1023 """
   1024   x = mean(a)
   1025   v = var(a)
   1026   n = len(a)
   1027   df = n - 1
   1028   svar = ((n - 1) * v) / float(df)
   1029   t = (x - popmean) / math.sqrt(svar * (1.0 / n))
   1030   prob = betai(0.5 * df, 0.5, float(df) / (df + t * t))
   1031 
   1032   if printit <> 0:
   1033     statname = 'Single-sample T-test.'
   1034     outputpairedstats(printit, writemode, 'Population', '--', popmean, 0, 0, 0,
   1035                       name, n, x, v, min(a), max(a), statname, t, prob)
   1036   return t, prob
   1037 
   1038 
   1039 def lttest_ind(a, b, printit=0, name1='Samp1', name2='Samp2', writemode='a'):
   1040   """
   1041 Calculates the t-obtained T-test on TWO INDEPENDENT samples of
   1042 scores a, and b.  From Numerical Recipies, p.483.  If printit=1, results
   1043 are printed to the screen.  If printit='filename', the results are output
   1044 to 'filename' using the given writemode (default=append).  Returns t-value,
   1045 and prob.
   1046 
   1047 Usage:   lttest_ind(a,b,printit=0,name1='Samp1',name2='Samp2',writemode='a')
   1048 Returns: t-value, two-tailed prob
   1049 """
   1050   x1 = mean(a)
   1051   x2 = mean(b)
   1052   v1 = stdev(a)**2
   1053   v2 = stdev(b)**2
   1054   n1 = len(a)
   1055   n2 = len(b)
   1056   df = n1 + n2 - 2
   1057   svar = ((n1 - 1) * v1 + (n2 - 1) * v2) / float(df)
   1058   if not svar:
   1059     svar = 1.0e-26
   1060   t = (x1 - x2) / math.sqrt(svar * (1.0 / n1 + 1.0 / n2))
   1061   prob = betai(0.5 * df, 0.5, df / (df + t * t))
   1062 
   1063   if printit <> 0:
   1064     statname = 'Independent samples T-test.'
   1065     outputpairedstats(printit, writemode, name1, n1, x1, v1, min(a), max(a),
   1066                       name2, n2, x2, v2, min(b), max(b), statname, t, prob)
   1067   return t, prob
   1068 
   1069 
   1070 def lttest_rel(a,
   1071                b,
   1072                printit=0,
   1073                name1='Sample1',
   1074                name2='Sample2',
   1075                writemode='a'):
   1076   """
   1077 Calculates the t-obtained T-test on TWO RELATED samples of scores,
   1078 a and b.  From Numerical Recipies, p.483.  If printit=1, results are
   1079 printed to the screen.  If printit='filename', the results are output to
   1080 'filename' using the given writemode (default=append).  Returns t-value,
   1081 and prob.
   1082 
   1083 Usage:   lttest_rel(a,b,printit=0,name1='Sample1',name2='Sample2',writemode='a')
   1084 Returns: t-value, two-tailed prob
   1085 """
   1086   if len(a) <> len(b):
   1087     raise ValueError, 'Unequal length lists in ttest_rel.'
   1088   x1 = mean(a)
   1089   x2 = mean(b)
   1090   v1 = var(a)
   1091   v2 = var(b)
   1092   n = len(a)
   1093   cov = 0
   1094   for i in range(len(a)):
   1095     cov = cov + (a[i] - x1) * (b[i] - x2)
   1096   df = n - 1
   1097   cov = cov / float(df)
   1098   sd = math.sqrt((v1 + v2 - 2.0 * cov) / float(n))
   1099   t = (x1 - x2) / sd
   1100   prob = betai(0.5 * df, 0.5, df / (df + t * t))
   1101 
   1102   if printit <> 0:
   1103     statname = 'Related samples T-test.'
   1104     outputpairedstats(printit, writemode, name1, n, x1, v1, min(a), max(a),
   1105                       name2, n, x2, v2, min(b), max(b), statname, t, prob)
   1106   return t, prob
   1107 
   1108 
   1109 def lchisquare(f_obs, f_exp=None):
   1110   """
   1111 Calculates a one-way chi square for list of observed frequencies and returns
   1112 the result.  If no expected frequencies are given, the total N is assumed to
   1113 be equally distributed across all groups.
   1114 
   1115 Usage:   lchisquare(f_obs, f_exp=None)   f_obs = list of observed cell freq.
   1116 Returns: chisquare-statistic, associated p-value
   1117 """
   1118   k = len(f_obs)  # number of groups
   1119   if f_exp == None:
   1120     f_exp = [sum(f_obs) / float(k)] * len(f_obs)  # create k bins with = freq.
   1121   chisq = 0
   1122   for i in range(len(f_obs)):
   1123     chisq = chisq + (f_obs[i] - f_exp[i])**2 / float(f_exp[i])
   1124   return chisq, chisqprob(chisq, k - 1)
   1125 
   1126 
   1127 def lks_2samp(data1, data2):
   1128   """
   1129 Computes the Kolmogorov-Smirnof statistic on 2 samples.  From
   1130 Numerical Recipies in C, page 493.
   1131 
   1132 Usage:   lks_2samp(data1,data2)   data1&2 are lists of values for 2 conditions
   1133 Returns: KS D-value, associated p-value
   1134 """
   1135   j1 = 0
   1136   j2 = 0
   1137   fn1 = 0.0
   1138   fn2 = 0.0
   1139   n1 = len(data1)
   1140   n2 = len(data2)
   1141   en1 = n1
   1142   en2 = n2
   1143   d = 0.0
   1144   data1.sort()
   1145   data2.sort()
   1146   while j1 < n1 and j2 < n2:
   1147     d1 = data1[j1]
   1148     d2 = data2[j2]
   1149     if d1 <= d2:
   1150       fn1 = (j1) / float(en1)
   1151       j1 = j1 + 1
   1152     if d2 <= d1:
   1153       fn2 = (j2) / float(en2)
   1154       j2 = j2 + 1
   1155     dt = (fn2 - fn1)
   1156     if math.fabs(dt) > math.fabs(d):
   1157       d = dt
   1158   try:
   1159     en = math.sqrt(en1 * en2 / float(en1 + en2))
   1160     prob = ksprob((en + 0.12 + 0.11 / en) * abs(d))
   1161   except:
   1162     prob = 1.0
   1163   return d, prob
   1164 
   1165 
   1166 def lmannwhitneyu(x, y):
   1167   """
   1168 Calculates a Mann-Whitney U statistic on the provided scores and
   1169 returns the result.  Use only when the n in each condition is < 20 and
   1170 you have 2 independent samples of ranks.  NOTE: Mann-Whitney U is
   1171 significant if the u-obtained is LESS THAN or equal to the critical
   1172 value of U found in the tables.  Equivalent to Kruskal-Wallis H with
   1173 just 2 groups.
   1174 
   1175 Usage:   lmannwhitneyu(data)
   1176 Returns: u-statistic, one-tailed p-value (i.e., p(z(U)))
   1177 """
   1178   n1 = len(x)
   1179   n2 = len(y)
   1180   ranked = rankdata(x + y)
   1181   rankx = ranked[0:n1]  # get the x-ranks
   1182   ranky = ranked[n1:]  # the rest are y-ranks
   1183   u1 = n1 * n2 + (n1 * (n1 + 1)) / 2.0 - sum(rankx)  # calc U for x
   1184   u2 = n1 * n2 - u1  # remainder is U for y
   1185   bigu = max(u1, u2)
   1186   smallu = min(u1, u2)
   1187   proportion = bigu / float(n1 * n2)
   1188   T = math.sqrt(tiecorrect(ranked))  # correction factor for tied scores
   1189   if T == 0:
   1190     raise ValueError, 'All numbers are identical in lmannwhitneyu'
   1191   sd = math.sqrt(T * n1 * n2 * (n1 + n2 + 1) / 12.0)
   1192   z = abs((bigu - n1 * n2 / 2.0) / sd)  # normal approximation for prob calc
   1193   return smallu, 1.0 - zprob(z)  #, proportion
   1194 
   1195 
   1196 def ltiecorrect(rankvals):
   1197   """
   1198 Corrects for ties in Mann Whitney U and Kruskal Wallis H tests.  See
   1199 Siegel, S. (1956) Nonparametric Statistics for the Behavioral Sciences.
   1200 New York: McGraw-Hill.  Code adapted from |Stat rankind.c code.
   1201 
   1202 Usage:   ltiecorrect(rankvals)
   1203 Returns: T correction factor for U or H
   1204 """
   1205   sorted, posn = shellsort(rankvals)
   1206   n = len(sorted)
   1207   T = 0.0
   1208   i = 0
   1209   while (i < n - 1):
   1210     if sorted[i] == sorted[i + 1]:
   1211       nties = 1
   1212       while (i < n - 1) and (sorted[i] == sorted[i + 1]):
   1213         nties = nties + 1
   1214         i = i + 1
   1215       T = T + nties**3 - nties
   1216     i = i + 1
   1217   T = T / float(n**3 - n)
   1218   return 1.0 - T
   1219 
   1220 
   1221 def lranksums(x, y):
   1222   """
   1223 Calculates the rank sums statistic on the provided scores and
   1224 returns the result.  Use only when the n in each condition is > 20 and you
   1225 have 2 independent samples of ranks.
   1226 
   1227 Usage:   lranksums(x,y)
   1228 Returns: a z-statistic, two-tailed p-value
   1229 """
   1230   n1 = len(x)
   1231   n2 = len(y)
   1232   alldata = x + y
   1233   ranked = rankdata(alldata)
   1234   x = ranked[:n1]
   1235   y = ranked[n1:]
   1236   s = sum(x)
   1237   expected = n1 * (n1 + n2 + 1) / 2.0
   1238   z = (s - expected) / math.sqrt(n1 * n2 * (n1 + n2 + 1) / 12.0)
   1239   prob = 2 * (1.0 - zprob(abs(z)))
   1240   return z, prob
   1241 
   1242 
   1243 def lwilcoxont(x, y):
   1244   """
   1245 Calculates the Wilcoxon T-test for related samples and returns the
   1246 result.  A non-parametric T-test.
   1247 
   1248 Usage:   lwilcoxont(x,y)
   1249 Returns: a t-statistic, two-tail probability estimate
   1250 """
   1251   if len(x) <> len(y):
   1252     raise ValueError, 'Unequal N in wilcoxont.  Aborting.'
   1253   d = []
   1254   for i in range(len(x)):
   1255     diff = x[i] - y[i]
   1256     if diff <> 0:
   1257       d.append(diff)
   1258   count = len(d)
   1259   absd = map(abs, d)
   1260   absranked = rankdata(absd)
   1261   r_plus = 0.0
   1262   r_minus = 0.0
   1263   for i in range(len(absd)):
   1264     if d[i] < 0:
   1265       r_minus = r_minus + absranked[i]
   1266     else:
   1267       r_plus = r_plus + absranked[i]
   1268   wt = min(r_plus, r_minus)
   1269   mn = count * (count + 1) * 0.25
   1270   se = math.sqrt(count * (count + 1) * (2.0 * count + 1.0) / 24.0)
   1271   z = math.fabs(wt - mn) / se
   1272   prob = 2 * (1.0 - zprob(abs(z)))
   1273   return wt, prob
   1274 
   1275 
   1276 def lkruskalwallish(*args):
   1277   """
   1278 The Kruskal-Wallis H-test is a non-parametric ANOVA for 3 or more
   1279 groups, requiring at least 5 subjects in each group.  This function
   1280 calculates the Kruskal-Wallis H-test for 3 or more independent samples
   1281 and returns the result.
   1282 
   1283 Usage:   lkruskalwallish(*args)
   1284 Returns: H-statistic (corrected for ties), associated p-value
   1285 """
   1286   args = list(args)
   1287   n = [0] * len(args)
   1288   all = []
   1289   n = map(len, args)
   1290   for i in range(len(args)):
   1291     all = all + args[i]
   1292   ranked = rankdata(all)
   1293   T = tiecorrect(ranked)
   1294   for i in range(len(args)):
   1295     args[i] = ranked[0:n[i]]
   1296     del ranked[0:n[i]]
   1297   rsums = []
   1298   for i in range(len(args)):
   1299     rsums.append(sum(args[i])**2)
   1300     rsums[i] = rsums[i] / float(n[i])
   1301   ssbn = sum(rsums)
   1302   totaln = sum(n)
   1303   h = 12.0 / (totaln * (totaln + 1)) * ssbn - 3 * (totaln + 1)
   1304   df = len(args) - 1
   1305   if T == 0:
   1306     raise ValueError, 'All numbers are identical in lkruskalwallish'
   1307   h = h / float(T)
   1308   return h, chisqprob(h, df)
   1309 
   1310 
   1311 def lfriedmanchisquare(*args):
   1312   """
   1313 Friedman Chi-Square is a non-parametric, one-way within-subjects
   1314 ANOVA.  This function calculates the Friedman Chi-square test for repeated
   1315 measures and returns the result, along with the associated probability
   1316 value.  It assumes 3 or more repeated measures.  Only 3 levels requires a
   1317 minimum of 10 subjects in the study.  Four levels requires 5 subjects per
   1318 level(??).
   1319 
   1320 Usage:   lfriedmanchisquare(*args)
   1321 Returns: chi-square statistic, associated p-value
   1322 """
   1323   k = len(args)
   1324   if k < 3:
   1325     raise ValueError, 'Less than 3 levels.  Friedman test not appropriate.'
   1326   n = len(args[0])
   1327   data = apply(pstat.abut, tuple(args))
   1328   for i in range(len(data)):
   1329     data[i] = rankdata(data[i])
   1330   ssbn = 0
   1331   for i in range(k):
   1332     ssbn = ssbn + sum(args[i])**2
   1333   chisq = 12.0 / (k * n * (k + 1)) * ssbn - 3 * n * (k + 1)
   1334   return chisq, chisqprob(chisq, k - 1)
   1335 
   1336 ####################################
   1337 ####  PROBABILITY CALCULATIONS  ####
   1338 ####################################
   1339 
   1340 
   1341 def lchisqprob(chisq, df):
   1342   """
   1343 Returns the (1-tailed) probability value associated with the provided
   1344 chi-square value and df.  Adapted from chisq.c in Gary Perlman's |Stat.
   1345 
   1346 Usage:   lchisqprob(chisq,df)
   1347 """
   1348   BIG = 20.0
   1349 
   1350   def ex(x):
   1351     BIG = 20.0
   1352     if x < -BIG:
   1353       return 0.0
   1354     else:
   1355       return math.exp(x)
   1356 
   1357   if chisq <= 0 or df < 1:
   1358     return 1.0
   1359   a = 0.5 * chisq
   1360   if df % 2 == 0:
   1361     even = 1
   1362   else:
   1363     even = 0
   1364   if df > 1:
   1365     y = ex(-a)
   1366   if even:
   1367     s = y
   1368   else:
   1369     s = 2.0 * zprob(-math.sqrt(chisq))
   1370   if (df > 2):
   1371     chisq = 0.5 * (df - 1.0)
   1372     if even:
   1373       z = 1.0
   1374     else:
   1375       z = 0.5
   1376     if a > BIG:
   1377       if even:
   1378         e = 0.0
   1379       else:
   1380         e = math.log(math.sqrt(math.pi))
   1381       c = math.log(a)
   1382       while (z <= chisq):
   1383         e = math.log(z) + e
   1384         s = s + ex(c * z - a - e)
   1385         z = z + 1.0
   1386       return s
   1387     else:
   1388       if even:
   1389         e = 1.0
   1390       else:
   1391         e = 1.0 / math.sqrt(math.pi) / math.sqrt(a)
   1392       c = 0.0
   1393       while (z <= chisq):
   1394         e = e * (a / float(z))
   1395         c = c + e
   1396         z = z + 1.0
   1397       return (c * y + s)
   1398   else:
   1399     return s
   1400 
   1401 
   1402 def lerfcc(x):
   1403   """
   1404 Returns the complementary error function erfc(x) with fractional
   1405 error everywhere less than 1.2e-7.  Adapted from Numerical Recipies.
   1406 
   1407 Usage:   lerfcc(x)
   1408 """
   1409   z = abs(x)
   1410   t = 1.0 / (1.0 + 0.5 * z)
   1411   ans = t * math.exp(-z * z - 1.26551223 + t * (1.00002368 + t * (
   1412       0.37409196 + t * (0.09678418 + t * (-0.18628806 + t * (0.27886807 + t * (
   1413           -1.13520398 + t * (1.48851587 + t * (-0.82215223 + t * 0.17087277)))))
   1414                        ))))
   1415   if x >= 0:
   1416     return ans
   1417   else:
   1418     return 2.0 - ans
   1419 
   1420 
   1421 def lzprob(z):
   1422   """
   1423 Returns the area under the normal curve 'to the left of' the given z value.
   1424 Thus,
   1425     for z<0, zprob(z) = 1-tail probability
   1426     for z>0, 1.0-zprob(z) = 1-tail probability
   1427     for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability
   1428 Adapted from z.c in Gary Perlman's |Stat.
   1429 
   1430 Usage:   lzprob(z)
   1431 """
   1432   Z_MAX = 6.0  # maximum meaningful z-value
   1433   if z == 0.0:
   1434     x = 0.0
   1435   else:
   1436     y = 0.5 * math.fabs(z)
   1437     if y >= (Z_MAX * 0.5):
   1438       x = 1.0
   1439     elif (y < 1.0):
   1440       w = y * y
   1441       x = ((
   1442           ((((((0.000124818987 * w - 0.001075204047) * w + 0.005198775019) * w -
   1443               0.019198292004) * w + 0.059054035642) * w - 0.151968751364) * w +
   1444            0.319152932694) * w - 0.531923007300) * w + 0.797884560593) * y * 2.0
   1445     else:
   1446       y = y - 2.0
   1447       x = (((((((
   1448           ((((((-0.000045255659 * y + 0.000152529290) * y - 0.000019538132) * y
   1449               - 0.000676904986) * y + 0.001390604284) * y - 0.000794620820) * y
   1450            - 0.002034254874) * y + 0.006549791214) * y - 0.010557625006) * y +
   1451                0.011630447319) * y - 0.009279453341) * y + 0.005353579108) * y -
   1452             0.002141268741) * y + 0.000535310849) * y + 0.999936657524
   1453   if z > 0.0:
   1454     prob = ((x + 1.0) * 0.5)
   1455   else:
   1456     prob = ((1.0 - x) * 0.5)
   1457   return prob
   1458 
   1459 
   1460 def lksprob(alam):
   1461   """
   1462 Computes a Kolmolgorov-Smirnov t-test significance level.  Adapted from
   1463 Numerical Recipies.
   1464 
   1465 Usage:   lksprob(alam)
   1466 """
   1467   fac = 2.0
   1468   sum = 0.0
   1469   termbf = 0.0
   1470   a2 = -2.0 * alam * alam
   1471   for j in range(1, 201):
   1472     term = fac * math.exp(a2 * j * j)
   1473     sum = sum + term
   1474     if math.fabs(term) <= (0.001 * termbf) or math.fabs(term) < (1.0e-8 * sum):
   1475       return sum
   1476     fac = -fac
   1477     termbf = math.fabs(term)
   1478   return 1.0  # Get here only if fails to converge; was 0.0!!
   1479 
   1480 
   1481 def lfprob(dfnum, dfden, F):
   1482   """
   1483 Returns the (1-tailed) significance level (p-value) of an F
   1484 statistic given the degrees of freedom for the numerator (dfR-dfF) and
   1485 the degrees of freedom for the denominator (dfF).
   1486 
   1487 Usage:   lfprob(dfnum, dfden, F)   where usually dfnum=dfbn, dfden=dfwn
   1488 """
   1489   p = betai(0.5 * dfden, 0.5 * dfnum, dfden / float(dfden + dfnum * F))
   1490   return p
   1491 
   1492 
   1493 def lbetacf(a, b, x):
   1494   """
   1495 This function evaluates the continued fraction form of the incomplete
   1496 Beta function, betai.  (Adapted from: Numerical Recipies in C.)
   1497 
   1498 Usage:   lbetacf(a,b,x)
   1499 """
   1500   ITMAX = 200
   1501   EPS = 3.0e-7
   1502 
   1503   bm = az = am = 1.0
   1504   qab = a + b
   1505   qap = a + 1.0
   1506   qam = a - 1.0
   1507   bz = 1.0 - qab * x / qap
   1508   for i in range(ITMAX + 1):
   1509     em = float(i + 1)
   1510     tem = em + em
   1511     d = em * (b - em) * x / ((qam + tem) * (a + tem))
   1512     ap = az + d * am
   1513     bp = bz + d * bm
   1514     d = -(a + em) * (qab + em) * x / ((qap + tem) * (a + tem))
   1515     app = ap + d * az
   1516     bpp = bp + d * bz
   1517     aold = az
   1518     am = ap / bpp
   1519     bm = bp / bpp
   1520     az = app / bpp
   1521     bz = 1.0
   1522     if (abs(az - aold) < (EPS * abs(az))):
   1523       return az
   1524   print 'a or b too big, or ITMAX too small in Betacf.'
   1525 
   1526 
   1527 def lgammln(xx):
   1528   """
   1529 Returns the gamma function of xx.
   1530     Gamma(z) = Integral(0,infinity) of t^(z-1)exp(-t) dt.
   1531 (Adapted from: Numerical Recipies in C.)
   1532 
   1533 Usage:   lgammln(xx)
   1534 """
   1535 
   1536   coeff = [76.18009173, -86.50532033, 24.01409822, -1.231739516, 0.120858003e-2,
   1537            -0.536382e-5]
   1538   x = xx - 1.0
   1539   tmp = x + 5.5
   1540   tmp = tmp - (x + 0.5) * math.log(tmp)
   1541   ser = 1.0
   1542   for j in range(len(coeff)):
   1543     x = x + 1
   1544     ser = ser + coeff[j] / x
   1545   return -tmp + math.log(2.50662827465 * ser)
   1546 
   1547 
   1548 def lbetai(a, b, x):
   1549   """
   1550 Returns the incomplete beta function:
   1551 
   1552     I-sub-x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt)
   1553 
   1554 where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma
   1555 function of a.  The continued fraction formulation is implemented here,
   1556 using the betacf function.  (Adapted from: Numerical Recipies in C.)
   1557 
   1558 Usage:   lbetai(a,b,x)
   1559 """
   1560   if (x < 0.0 or x > 1.0):
   1561     raise ValueError, 'Bad x in lbetai'
   1562   if (x == 0.0 or x == 1.0):
   1563     bt = 0.0
   1564   else:
   1565     bt = math.exp(gammln(a + b) - gammln(a) - gammln(b) + a * math.log(x) + b *
   1566                   math.log(1.0 - x))
   1567   if (x < (a + 1.0) / (a + b + 2.0)):
   1568     return bt * betacf(a, b, x) / float(a)
   1569   else:
   1570     return 1.0 - bt * betacf(b, a, 1.0 - x) / float(b)
   1571 
   1572 ####################################
   1573 #######  ANOVA CALCULATIONS  #######
   1574 ####################################
   1575 
   1576 
   1577 def lF_oneway(*lists):
   1578   """
   1579 Performs a 1-way ANOVA, returning an F-value and probability given
   1580 any number of groups.  From Heiman, pp.394-7.
   1581 
   1582 Usage:   F_oneway(*lists)    where *lists is any number of lists, one per
   1583                                   treatment group
   1584 Returns: F value, one-tailed p-value
   1585 """
   1586   a = len(lists)  # ANOVA on 'a' groups, each in it's own list
   1587   means = [0] * a
   1588   vars = [0] * a
   1589   ns = [0] * a
   1590   alldata = []
   1591   tmp = map(N.array, lists)
   1592   means = map(amean, tmp)
   1593   vars = map(avar, tmp)
   1594   ns = map(len, lists)
   1595   for i in range(len(lists)):
   1596     alldata = alldata + lists[i]
   1597   alldata = N.array(alldata)
   1598   bign = len(alldata)
   1599   sstot = ass(alldata) - (asquare_of_sums(alldata) / float(bign))
   1600   ssbn = 0
   1601   for list in lists:
   1602     ssbn = ssbn + asquare_of_sums(N.array(list)) / float(len(list))
   1603   ssbn = ssbn - (asquare_of_sums(alldata) / float(bign))
   1604   sswn = sstot - ssbn
   1605   dfbn = a - 1
   1606   dfwn = bign - a
   1607   msb = ssbn / float(dfbn)
   1608   msw = sswn / float(dfwn)
   1609   f = msb / msw
   1610   prob = fprob(dfbn, dfwn, f)
   1611   return f, prob
   1612 
   1613 
   1614 def lF_value(ER, EF, dfnum, dfden):
   1615   """
   1616 Returns an F-statistic given the following:
   1617         ER  = error associated with the null hypothesis (the Restricted model)
   1618         EF  = error associated with the alternate hypothesis (the Full model)
   1619         dfR-dfF = degrees of freedom of the numerator
   1620         dfF = degrees of freedom associated with the denominator/Full model
   1621 
   1622 Usage:   lF_value(ER,EF,dfnum,dfden)
   1623 """
   1624   return ((ER - EF) / float(dfnum) / (EF / float(dfden)))
   1625 
   1626 ####################################
   1627 ########  SUPPORT FUNCTIONS  #######
   1628 ####################################
   1629 
   1630 
   1631 def writecc(listoflists, file, writetype='w', extra=2):
   1632   """
   1633 Writes a list of lists to a file in columns, customized by the max
   1634 size of items within the columns (max size of items in col, +2 characters)
   1635 to specified file.  File-overwrite is the default.
   1636 
   1637 Usage:   writecc (listoflists,file,writetype='w',extra=2)
   1638 Returns: None
   1639 """
   1640   if type(listoflists[0]) not in [ListType, TupleType]:
   1641     listoflists = [listoflists]
   1642   outfile = open(file, writetype)
   1643   rowstokill = []
   1644   list2print = copy.deepcopy(listoflists)
   1645   for i in range(len(listoflists)):
   1646     if listoflists[i] == [
   1647         '\n'
   1648     ] or listoflists[i] == '\n' or listoflists[i] == 'dashes':
   1649       rowstokill = rowstokill + [i]
   1650   rowstokill.reverse()
   1651   for row in rowstokill:
   1652     del list2print[row]
   1653   maxsize = [0] * len(list2print[0])
   1654   for col in range(len(list2print[0])):
   1655     items = pstat.colex(list2print, col)
   1656     items = map(pstat.makestr, items)
   1657     maxsize[col] = max(map(len, items)) + extra
   1658   for row in listoflists:
   1659     if row == ['\n'] or row == '\n':
   1660       outfile.write('\n')
   1661     elif row == ['dashes'] or row == 'dashes':
   1662       dashes = [0] * len(maxsize)
   1663       for j in range(len(maxsize)):
   1664         dashes[j] = '-' * (maxsize[j] - 2)
   1665       outfile.write(pstat.lineincustcols(dashes, maxsize))
   1666     else:
   1667       outfile.write(pstat.lineincustcols(row, maxsize))
   1668     outfile.write('\n')
   1669   outfile.close()
   1670   return None
   1671 
   1672 
   1673 def lincr(l, cap):  # to increment a list up to a max-list of 'cap'
   1674   """
   1675 Simulate a counting system from an n-dimensional list.
   1676 
   1677 Usage:   lincr(l,cap)   l=list to increment, cap=max values for each list pos'n
   1678 Returns: next set of values for list l, OR -1 (if overflow)
   1679 """
   1680   l[0] = l[0] + 1  # e.g., [0,0,0] --> [2,4,3] (=cap)
   1681   for i in range(len(l)):
   1682     if l[i] > cap[i] and i < len(l) - 1:  # if carryover AND not done
   1683       l[i] = 0
   1684       l[i + 1] = l[i + 1] + 1
   1685     elif l[i] > cap[i] and i == len(
   1686         l) - 1:  # overflow past last column, must be finished
   1687       l = -1
   1688   return l
   1689 
   1690 
   1691 def lsum(inlist):
   1692   """
   1693 Returns the sum of the items in the passed list.
   1694 
   1695 Usage:   lsum(inlist)
   1696 """
   1697   s = 0
   1698   for item in inlist:
   1699     s = s + item
   1700   return s
   1701 
   1702 
   1703 def lcumsum(inlist):
   1704   """
   1705 Returns a list consisting of the cumulative sum of the items in the
   1706 passed list.
   1707 
   1708 Usage:   lcumsum(inlist)
   1709 """
   1710   newlist = copy.deepcopy(inlist)
   1711   for i in range(1, len(newlist)):
   1712     newlist[i] = newlist[i] + newlist[i - 1]
   1713   return newlist
   1714 
   1715 
   1716 def lss(inlist):
   1717   """
   1718 Squares each value in the passed list, adds up these squares and
   1719 returns the result.
   1720 
   1721 Usage:   lss(inlist)
   1722 """
   1723   ss = 0
   1724   for item in inlist:
   1725     ss = ss + item * item
   1726   return ss
   1727 
   1728 
   1729 def lsummult(list1, list2):
   1730   """
   1731 Multiplies elements in list1 and list2, element by element, and
   1732 returns the sum of all resulting multiplications.  Must provide equal
   1733 length lists.
   1734 
   1735 Usage:   lsummult(list1,list2)
   1736 """
   1737   if len(list1) <> len(list2):
   1738     raise ValueError, 'Lists not equal length in summult.'
   1739   s = 0
   1740   for item1, item2 in pstat.abut(list1, list2):
   1741     s = s + item1 * item2
   1742   return s
   1743 
   1744 
   1745 def lsumdiffsquared(x, y):
   1746   """
   1747 Takes pairwise differences of the values in lists x and y, squares
   1748 these differences, and returns the sum of these squares.
   1749 
   1750 Usage:   lsumdiffsquared(x,y)
   1751 Returns: sum[(x[i]-y[i])**2]
   1752 """
   1753   sds = 0
   1754   for i in range(len(x)):
   1755     sds = sds + (x[i] - y[i])**2
   1756   return sds
   1757 
   1758 
   1759 def lsquare_of_sums(inlist):
   1760   """
   1761 Adds the values in the passed list, squares the sum, and returns
   1762 the result.
   1763 
   1764 Usage:   lsquare_of_sums(inlist)
   1765 Returns: sum(inlist[i])**2
   1766 """
   1767   s = sum(inlist)
   1768   return float(s) * s
   1769 
   1770 
   1771 def lshellsort(inlist):
   1772   """
   1773 Shellsort algorithm.  Sorts a 1D-list.
   1774 
   1775 Usage:   lshellsort(inlist)
   1776 Returns: sorted-inlist, sorting-index-vector (for original list)
   1777 """
   1778   n = len(inlist)
   1779   svec = copy.deepcopy(inlist)
   1780   ivec = range(n)
   1781   gap = n / 2  # integer division needed
   1782   while gap > 0:
   1783     for i in range(gap, n):
   1784       for j in range(i - gap, -1, -gap):
   1785         while j >= 0 and svec[j] > svec[j + gap]:
   1786           temp = svec[j]
   1787           svec[j] = svec[j + gap]
   1788           svec[j + gap] = temp
   1789           itemp = ivec[j]
   1790           ivec[j] = ivec[j + gap]
   1791           ivec[j + gap] = itemp
   1792     gap = gap / 2  # integer division needed
   1793 # svec is now sorted inlist, and ivec has the order svec[i] = vec[ivec[i]]
   1794   return svec, ivec
   1795 
   1796 
   1797 def lrankdata(inlist):
   1798   """
   1799 Ranks the data in inlist, dealing with ties appropritely.  Assumes
   1800 a 1D inlist.  Adapted from Gary Perlman's |Stat ranksort.
   1801 
   1802 Usage:   lrankdata(inlist)
   1803 Returns: a list of length equal to inlist, containing rank scores
   1804 """
   1805   n = len(inlist)
   1806   svec, ivec = shellsort(inlist)
   1807   sumranks = 0
   1808   dupcount = 0
   1809   newlist = [0] * n
   1810   for i in range(n):
   1811     sumranks = sumranks + i
   1812     dupcount = dupcount + 1
   1813     if i == n - 1 or svec[i] <> svec[i + 1]:
   1814       averank = sumranks / float(dupcount) + 1
   1815       for j in range(i - dupcount + 1, i + 1):
   1816         newlist[ivec[j]] = averank
   1817       sumranks = 0
   1818       dupcount = 0
   1819   return newlist
   1820 
   1821 
   1822 def outputpairedstats(fname, writemode, name1, n1, m1, se1, min1, max1, name2,
   1823                       n2, m2, se2, min2, max2, statname, stat, prob):
   1824   """
   1825 Prints or write to a file stats for two groups, using the name, n,
   1826 mean, sterr, min and max for each group, as well as the statistic name,
   1827 its value, and the associated p-value.
   1828 
   1829 Usage:   outputpairedstats(fname,writemode,
   1830                            name1,n1,mean1,stderr1,min1,max1,
   1831                            name2,n2,mean2,stderr2,min2,max2,
   1832                            statname,stat,prob)
   1833 Returns: None
   1834 """
   1835   suffix = ''  # for *s after the p-value
   1836   try:
   1837     x = prob.shape
   1838     prob = prob[0]
   1839   except:
   1840     pass
   1841   if prob < 0.001:
   1842     suffix = '  ***'
   1843   elif prob < 0.01:
   1844     suffix = '  **'
   1845   elif prob < 0.05:
   1846     suffix = '  *'
   1847   title = [['Name', 'N', 'Mean', 'SD', 'Min', 'Max']]
   1848   lofl = title + [[name1, n1, round(m1, 3), round(
   1849       math.sqrt(se1), 3), min1, max1], [name2, n2, round(m2, 3), round(
   1850           math.sqrt(se2), 3), min2, max2]]
   1851   if type(fname) <> StringType or len(fname) == 0:
   1852     print
   1853     print statname
   1854     print
   1855     pstat.printcc(lofl)
   1856     print
   1857     try:
   1858       if stat.shape == ():
   1859         stat = stat[0]
   1860       if prob.shape == ():
   1861         prob = prob[0]
   1862     except:
   1863       pass
   1864     print 'Test statistic = ', round(stat, 3), '   p = ', round(prob, 3), suffix
   1865     print
   1866   else:
   1867     file = open(fname, writemode)
   1868     file.write('\n' + statname + '\n\n')
   1869     file.close()
   1870     writecc(lofl, fname, 'a')
   1871     file = open(fname, 'a')
   1872     try:
   1873       if stat.shape == ():
   1874         stat = stat[0]
   1875       if prob.shape == ():
   1876         prob = prob[0]
   1877     except:
   1878       pass
   1879     file.write(pstat.list2string(['\nTest statistic = ', round(stat, 4),
   1880                                   '   p = ', round(prob, 4), suffix, '\n\n']))
   1881     file.close()
   1882   return None
   1883 
   1884 
   1885 def lfindwithin(data):
   1886   """
   1887 Returns an integer representing a binary vector, where 1=within-
   1888 subject factor, 0=between.  Input equals the entire data 2D list (i.e.,
   1889 column 0=random factor, column -1=measured values (those two are skipped).
   1890 Note: input data is in |Stat format ... a list of lists ("2D list") with
   1891 one row per measured value, first column=subject identifier, last column=
   1892 score, one in-between column per factor (these columns contain level
   1893 designations on each factor).  See also stats.anova.__doc__.
   1894 
   1895 Usage:   lfindwithin(data)     data in |Stat format
   1896 """
   1897 
   1898   numfact = len(data[0]) - 1
   1899   withinvec = 0
   1900   for col in range(1, numfact):
   1901     examplelevel = pstat.unique(pstat.colex(data, col))[0]
   1902     rows = pstat.linexand(data, col, examplelevel)  # get 1 level of this factor
   1903     factsubjs = pstat.unique(pstat.colex(rows, 0))
   1904     allsubjs = pstat.unique(pstat.colex(data, 0))
   1905     if len(factsubjs) == len(allsubjs):  # fewer Ss than scores on this factor?
   1906       withinvec = withinvec + (1 << col)
   1907   return withinvec
   1908 
   1909 #########################################################
   1910 #########################################################
   1911 ####### DISPATCH LISTS AND TUPLES TO ABOVE FCNS #########
   1912 #########################################################
   1913 #########################################################
   1914 
   1915 ## CENTRAL TENDENCY:
   1916 geometricmean = Dispatch((lgeometricmean, (ListType, TupleType)),)
   1917 harmonicmean = Dispatch((lharmonicmean, (ListType, TupleType)),)
   1918 mean = Dispatch((lmean, (ListType, TupleType)),)
   1919 median = Dispatch((lmedian, (ListType, TupleType)),)
   1920 medianscore = Dispatch((lmedianscore, (ListType, TupleType)),)
   1921 mode = Dispatch((lmode, (ListType, TupleType)),)
   1922 
   1923 ## MOMENTS:
   1924 moment = Dispatch((lmoment, (ListType, TupleType)),)
   1925 variation = Dispatch((lvariation, (ListType, TupleType)),)
   1926 skew = Dispatch((lskew, (ListType, TupleType)),)
   1927 kurtosis = Dispatch((lkurtosis, (ListType, TupleType)),)
   1928 describe = Dispatch((ldescribe, (ListType, TupleType)),)
   1929 
   1930 ## FREQUENCY STATISTICS:
   1931 itemfreq = Dispatch((litemfreq, (ListType, TupleType)),)
   1932 scoreatpercentile = Dispatch((lscoreatpercentile, (ListType, TupleType)),)
   1933 percentileofscore = Dispatch((lpercentileofscore, (ListType, TupleType)),)
   1934 histogram = Dispatch((lhistogram, (ListType, TupleType)),)
   1935 cumfreq = Dispatch((lcumfreq, (ListType, TupleType)),)
   1936 relfreq = Dispatch((lrelfreq, (ListType, TupleType)),)
   1937 
   1938 ## VARIABILITY:
   1939 obrientransform = Dispatch((lobrientransform, (ListType, TupleType)),)
   1940 samplevar = Dispatch((lsamplevar, (ListType, TupleType)),)
   1941 samplestdev = Dispatch((lsamplestdev, (ListType, TupleType)),)
   1942 var = Dispatch((lvar, (ListType, TupleType)),)
   1943 stdev = Dispatch((lstdev, (ListType, TupleType)),)
   1944 sterr = Dispatch((lsterr, (ListType, TupleType)),)
   1945 sem = Dispatch((lsem, (ListType, TupleType)),)
   1946 z = Dispatch((lz, (ListType, TupleType)),)
   1947 zs = Dispatch((lzs, (ListType, TupleType)),)
   1948 
   1949 ## TRIMMING FCNS:
   1950 trimboth = Dispatch((ltrimboth, (ListType, TupleType)),)
   1951 trim1 = Dispatch((ltrim1, (ListType, TupleType)),)
   1952 
   1953 ## CORRELATION FCNS:
   1954 paired = Dispatch((lpaired, (ListType, TupleType)),)
   1955 pearsonr = Dispatch((lpearsonr, (ListType, TupleType)),)
   1956 spearmanr = Dispatch((lspearmanr, (ListType, TupleType)),)
   1957 pointbiserialr = Dispatch((lpointbiserialr, (ListType, TupleType)),)
   1958 kendalltau = Dispatch((lkendalltau, (ListType, TupleType)),)
   1959 linregress = Dispatch((llinregress, (ListType, TupleType)),)
   1960 
   1961 ## INFERENTIAL STATS:
   1962 ttest_1samp = Dispatch((lttest_1samp, (ListType, TupleType)),)
   1963 ttest_ind = Dispatch((lttest_ind, (ListType, TupleType)),)
   1964 ttest_rel = Dispatch((lttest_rel, (ListType, TupleType)),)
   1965 chisquare = Dispatch((lchisquare, (ListType, TupleType)),)
   1966 ks_2samp = Dispatch((lks_2samp, (ListType, TupleType)),)
   1967 mannwhitneyu = Dispatch((lmannwhitneyu, (ListType, TupleType)),)
   1968 ranksums = Dispatch((lranksums, (ListType, TupleType)),)
   1969 tiecorrect = Dispatch((ltiecorrect, (ListType, TupleType)),)
   1970 wilcoxont = Dispatch((lwilcoxont, (ListType, TupleType)),)
   1971 kruskalwallish = Dispatch((lkruskalwallish, (ListType, TupleType)),)
   1972 friedmanchisquare = Dispatch((lfriedmanchisquare, (ListType, TupleType)),)
   1973 
   1974 ## PROBABILITY CALCS:
   1975 chisqprob = Dispatch((lchisqprob, (IntType, FloatType)),)
   1976 zprob = Dispatch((lzprob, (IntType, FloatType)),)
   1977 ksprob = Dispatch((lksprob, (IntType, FloatType)),)
   1978 fprob = Dispatch((lfprob, (IntType, FloatType)),)
   1979 betacf = Dispatch((lbetacf, (IntType, FloatType)),)
   1980 betai = Dispatch((lbetai, (IntType, FloatType)),)
   1981 erfcc = Dispatch((lerfcc, (IntType, FloatType)),)
   1982 gammln = Dispatch((lgammln, (IntType, FloatType)),)
   1983 
   1984 ## ANOVA FUNCTIONS:
   1985 F_oneway = Dispatch((lF_oneway, (ListType, TupleType)),)
   1986 F_value = Dispatch((lF_value, (ListType, TupleType)),)
   1987 
   1988 ## SUPPORT FUNCTIONS:
   1989 incr = Dispatch((lincr, (ListType, TupleType)),)
   1990 sum = Dispatch((lsum, (ListType, TupleType)),)
   1991 cumsum = Dispatch((lcumsum, (ListType, TupleType)),)
   1992 ss = Dispatch((lss, (ListType, TupleType)),)
   1993 summult = Dispatch((lsummult, (ListType, TupleType)),)
   1994 square_of_sums = Dispatch((lsquare_of_sums, (ListType, TupleType)),)
   1995 sumdiffsquared = Dispatch((lsumdiffsquared, (ListType, TupleType)),)
   1996 shellsort = Dispatch((lshellsort, (ListType, TupleType)),)
   1997 rankdata = Dispatch((lrankdata, (ListType, TupleType)),)
   1998 findwithin = Dispatch((lfindwithin, (ListType, TupleType)),)
   1999 
   2000 #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
   2001 #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
   2002 #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
   2003 #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
   2004 #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
   2005 #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
   2006 #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
   2007 #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
   2008 #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
   2009 #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
   2010 #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
   2011 #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
   2012 #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
   2013 #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
   2014 #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
   2015 #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
   2016 #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
   2017 #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
   2018 #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
   2019 
   2020 try:  # DEFINE THESE *ONLY* IF NUMERIC IS AVAILABLE
   2021   import numpy as N
   2022   import numpy.linalg as LA
   2023 
   2024   #####################################
   2025   ########  ACENTRAL TENDENCY  ########
   2026   #####################################
   2027 
   2028 
   2029   def ageometricmean(inarray, dimension=None, keepdims=0):
   2030     """
   2031 Calculates the geometric mean of the values in the passed array.
   2032 That is:  n-th root of (x1 * x2 * ... * xn).  Defaults to ALL values in
   2033 the passed array.  Use dimension=None to flatten array first.  REMEMBER: if
   2034 dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and
   2035 if dimension is a sequence, it collapses over all specified dimensions.  If
   2036 keepdims is set to 1, the resulting array will have as many dimensions as
   2037 inarray, with only 1 'level' per dim that was collapsed over.
   2038 
   2039 Usage:   ageometricmean(inarray,dimension=None,keepdims=0)
   2040 Returns: geometric mean computed over dim(s) listed in dimension
   2041 """
   2042     inarray = N.array(inarray, N.float_)
   2043     if dimension == None:
   2044       inarray = N.ravel(inarray)
   2045       size = len(inarray)
   2046       mult = N.power(inarray, 1.0 / size)
   2047       mult = N.multiply.reduce(mult)
   2048     elif type(dimension) in [IntType, FloatType]:
   2049       size = inarray.shape[dimension]
   2050       mult = N.power(inarray, 1.0 / size)
   2051       mult = N.multiply.reduce(mult, dimension)
   2052       if keepdims == 1:
   2053         shp = list(inarray.shape)
   2054         shp[dimension] = 1
   2055         sum = N.reshape(sum, shp)
   2056     else:  # must be a SEQUENCE of dims to average over
   2057       dims = list(dimension)
   2058       dims.sort()
   2059       dims.reverse()
   2060       size = N.array(N.multiply.reduce(N.take(inarray.shape, dims)), N.float_)
   2061       mult = N.power(inarray, 1.0 / size)
   2062       for dim in dims:
   2063         mult = N.multiply.reduce(mult, dim)
   2064       if keepdims == 1:
   2065         shp = list(inarray.shape)
   2066         for dim in dims:
   2067           shp[dim] = 1
   2068         mult = N.reshape(mult, shp)
   2069     return mult
   2070 
   2071   def aharmonicmean(inarray, dimension=None, keepdims=0):
   2072     """
   2073 Calculates the harmonic mean of the values in the passed array.
   2074 That is:  n / (1/x1 + 1/x2 + ... + 1/xn).  Defaults to ALL values in
   2075 the passed array.  Use dimension=None to flatten array first.  REMEMBER: if
   2076 dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and
   2077 if dimension is a sequence, it collapses over all specified dimensions.  If
   2078 keepdims is set to 1, the resulting array will have as many dimensions as
   2079 inarray, with only 1 'level' per dim that was collapsed over.
   2080 
   2081 Usage:   aharmonicmean(inarray,dimension=None,keepdims=0)
   2082 Returns: harmonic mean computed over dim(s) in dimension
   2083 """
   2084     inarray = inarray.astype(N.float_)
   2085     if dimension == None:
   2086       inarray = N.ravel(inarray)
   2087       size = len(inarray)
   2088       s = N.add.reduce(1.0 / inarray)
   2089     elif type(dimension) in [IntType, FloatType]:
   2090       size = float(inarray.shape[dimension])
   2091       s = N.add.reduce(1.0 / inarray, dimension)
   2092       if keepdims == 1:
   2093         shp = list(inarray.shape)
   2094         shp[dimension] = 1
   2095         s = N.reshape(s, shp)
   2096     else:  # must be a SEQUENCE of dims to average over
   2097       dims = list(dimension)
   2098       dims.sort()
   2099       nondims = []
   2100       for i in range(len(inarray.shape)):
   2101         if i not in dims:
   2102           nondims.append(i)
   2103       tinarray = N.transpose(inarray, nondims + dims)  # put keep-dims first
   2104       idx = [0] * len(nondims)
   2105       if idx == []:
   2106         size = len(N.ravel(inarray))
   2107         s = asum(1.0 / inarray)
   2108         if keepdims == 1:
   2109           s = N.reshape([s], N.ones(len(inarray.shape)))
   2110       else:
   2111         idx[0] = -1
   2112         loopcap = N.array(tinarray.shape[0:len(nondims)]) - 1
   2113         s = N.zeros(loopcap + 1, N.float_)
   2114         while incr(idx, loopcap) <> -1:
   2115           s[idx] = asum(1.0 / tinarray[idx])
   2116         size = N.multiply.reduce(N.take(inarray.shape, dims))
   2117         if keepdims == 1:
   2118           shp = list(inarray.shape)
   2119           for dim in dims:
   2120             shp[dim] = 1
   2121           s = N.reshape(s, shp)
   2122     return size / s
   2123 
   2124   def amean(inarray, dimension=None, keepdims=0):
   2125     """
   2126 Calculates the arithmatic mean of the values in the passed array.
   2127 That is:  1/n * (x1 + x2 + ... + xn).  Defaults to ALL values in the
   2128 passed array.  Use dimension=None to flatten array first.  REMEMBER: if
   2129 dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and
   2130 if dimension is a sequence, it collapses over all specified dimensions.  If
   2131 keepdims is set to 1, the resulting array will have as many dimensions as
   2132 inarray, with only 1 'level' per dim that was collapsed over.
   2133 
   2134 Usage:   amean(inarray,dimension=None,keepdims=0)
   2135 Returns: arithematic mean calculated over dim(s) in dimension
   2136 """
   2137     if inarray.dtype in [N.int_, N.short, N.ubyte]:
   2138       inarray = inarray.astype(N.float_)
   2139     if dimension == None:
   2140       inarray = N.ravel(inarray)
   2141       sum = N.add.reduce(inarray)
   2142       denom = float(len(inarray))
   2143     elif type(dimension) in [IntType, FloatType]:
   2144       sum = asum(inarray, dimension)
   2145       denom = float(inarray.shape[dimension])
   2146       if keepdims == 1:
   2147         shp = list(inarray.shape)
   2148         shp[dimension] = 1
   2149         sum = N.reshape(sum, shp)
   2150     else:  # must be a TUPLE of dims to average over
   2151       dims = list(dimension)
   2152       dims.sort()
   2153       dims.reverse()
   2154       sum = inarray * 1.0
   2155       for dim in dims:
   2156         sum = N.add.reduce(sum, dim)
   2157       denom = N.array(N.multiply.reduce(N.take(inarray.shape, dims)), N.float_)
   2158       if keepdims == 1:
   2159         shp = list(inarray.shape)
   2160         for dim in dims:
   2161           shp[dim] = 1
   2162         sum = N.reshape(sum, shp)
   2163     return sum / denom
   2164 
   2165   def amedian(inarray, numbins=1000):
   2166     """
   2167 Calculates the COMPUTED median value of an array of numbers, given the
   2168 number of bins to use for the histogram (more bins approaches finding the
   2169 precise median value of the array; default number of bins = 1000).  From
   2170 G.W. Heiman's Basic Stats, or CRC Probability & Statistics.
   2171 NOTE:  THIS ROUTINE ALWAYS uses the entire passed array (flattens it first).
   2172 
   2173 Usage:   amedian(inarray,numbins=1000)
   2174 Returns: median calculated over ALL values in inarray
   2175 """
   2176     inarray = N.ravel(inarray)
   2177     (hist, smallest, binsize, extras) = ahistogram(inarray, numbins,
   2178                                                    [min(inarray), max(inarray)])
   2179     cumhist = N.cumsum(hist)  # make cumulative histogram
   2180     otherbins = N.greater_equal(cumhist, len(inarray) / 2.0)
   2181     otherbins = list(otherbins)  # list of 0/1s, 1s start at median bin
   2182     cfbin = otherbins.index(1)  # get 1st(!) index holding 50%ile score
   2183     LRL = smallest + binsize * cfbin  # get lower read limit of that bin
   2184     cfbelow = N.add.reduce(hist[0:cfbin])  # cum. freq. below bin
   2185     freq = hist[cfbin]  # frequency IN the 50%ile bin
   2186     median = LRL + (
   2187         (len(inarray) / 2.0 - cfbelow) / float(freq)) * binsize  # MEDIAN
   2188     return median
   2189 
   2190   def amedianscore(inarray, dimension=None):
   2191     """
   2192 Returns the 'middle' score of the passed array.  If there is an even
   2193 number of scores, the mean of the 2 middle scores is returned.  Can function
   2194 with 1D arrays, or on the FIRST dimension of 2D arrays (i.e., dimension can
   2195 be None, to pre-flatten the array, or else dimension must equal 0).
   2196 
   2197 Usage:   amedianscore(inarray,dimension=None)
   2198 Returns: 'middle' score of the array, or the mean of the 2 middle scores
   2199 """
   2200     if dimension == None:
   2201       inarray = N.ravel(inarray)
   2202       dimension = 0
   2203     inarray = N.sort(inarray, dimension)
   2204     if inarray.shape[dimension] % 2 == 0:  # if even number of elements
   2205       indx = inarray.shape[dimension] / 2  # integer division correct
   2206       median = N.asarray(inarray[indx] + inarray[indx - 1]) / 2.0
   2207     else:
   2208       indx = inarray.shape[dimension] / 2  # integer division correct
   2209       median = N.take(inarray, [indx], dimension)
   2210       if median.shape == (1,):
   2211         median = median[0]
   2212     return median
   2213 
   2214   def amode(a, dimension=None):
   2215     """
   2216 Returns an array of the modal (most common) score in the passed array.
   2217 If there is more than one such score, ONLY THE FIRST is returned.
   2218 The bin-count for the modal values is also returned.  Operates on whole
   2219 array (dimension=None), or on a given dimension.
   2220 
   2221 Usage:   amode(a, dimension=None)
   2222 Returns: array of bin-counts for mode(s), array of corresponding modal values
   2223 """
   2224 
   2225     if dimension == None:
   2226       a = N.ravel(a)
   2227       dimension = 0
   2228     scores = pstat.aunique(N.ravel(a))  # get ALL unique values
   2229     testshape = list(a.shape)
   2230     testshape[dimension] = 1
   2231     oldmostfreq = N.zeros(testshape)
   2232     oldcounts = N.zeros(testshape)
   2233     for score in scores:
   2234       template = N.equal(a, score)
   2235       counts = asum(template, dimension, 1)
   2236       mostfrequent = N.where(counts > oldcounts, score, oldmostfreq)
   2237       oldcounts = N.where(counts > oldcounts, counts, oldcounts)
   2238       oldmostfreq = mostfrequent
   2239     return oldcounts, mostfrequent
   2240 
   2241   def atmean(a, limits=None, inclusive=(1, 1)):
   2242     """
   2243 Returns the arithmetic mean of all values in an array, ignoring values
   2244 strictly outside the sequence passed to 'limits'.   Note: either limit
   2245 in the sequence, or the value of limits itself, can be set to None.  The
   2246 inclusive list/tuple determines whether the lower and upper limiting bounds
   2247 (respectively) are open/exclusive (0) or closed/inclusive (1).
   2248 
   2249 Usage:   atmean(a,limits=None,inclusive=(1,1))
   2250 """
   2251     if a.dtype in [N.int_, N.short, N.ubyte]:
   2252       a = a.astype(N.float_)
   2253     if limits == None:
   2254       return mean(a)
   2255     assert type(limits) in [ListType, TupleType, N.ndarray
   2256                            ], 'Wrong type for limits in atmean'
   2257     if inclusive[0]:
   2258       lowerfcn = N.greater_equal
   2259     else:
   2260       lowerfcn = N.greater
   2261     if inclusive[1]:
   2262       upperfcn = N.less_equal
   2263     else:
   2264       upperfcn = N.less
   2265     if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(
   2266         N.ravel(a)):
   2267       raise ValueError, 'No array values within given limits (atmean).'
   2268     elif limits[0] == None and limits[1] <> None:
   2269       mask = upperfcn(a, limits[1])
   2270     elif limits[0] <> None and limits[1] == None:
   2271       mask = lowerfcn(a, limits[0])
   2272     elif limits[0] <> None and limits[1] <> None:
   2273       mask = lowerfcn(a, limits[0]) * upperfcn(a, limits[1])
   2274     s = float(N.add.reduce(N.ravel(a * mask)))
   2275     n = float(N.add.reduce(N.ravel(mask)))
   2276     return s / n
   2277 
   2278   def atvar(a, limits=None, inclusive=(1, 1)):
   2279     """
   2280 Returns the sample variance of values in an array, (i.e., using N-1),
   2281 ignoring values strictly outside the sequence passed to 'limits'.
   2282 Note: either limit in the sequence, or the value of limits itself,
   2283 can be set to None.  The inclusive list/tuple determines whether the lower
   2284 and upper limiting bounds (respectively) are open/exclusive (0) or
   2285 closed/inclusive (1). ASSUMES A FLAT ARRAY (OR ELSE PREFLATTENS).
   2286 
   2287 Usage:   atvar(a,limits=None,inclusive=(1,1))
   2288 """
   2289     a = a.astype(N.float_)
   2290     if limits == None or limits == [None, None]:
   2291       return avar(a)
   2292     assert type(limits) in [ListType, TupleType, N.ndarray
   2293                            ], 'Wrong type for limits in atvar'
   2294     if inclusive[0]:
   2295       lowerfcn = N.greater_equal
   2296     else:
   2297       lowerfcn = N.greater
   2298     if inclusive[1]:
   2299       upperfcn = N.less_equal
   2300     else:
   2301       upperfcn = N.less
   2302     if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(
   2303         N.ravel(a)):
   2304       raise ValueError, 'No array values within given limits (atvar).'
   2305     elif limits[0] == None and limits[1] <> None:
   2306       mask = upperfcn(a, limits[1])
   2307     elif limits[0] <> None and limits[1] == None:
   2308       mask = lowerfcn(a, limits[0])
   2309     elif limits[0] <> None and limits[1] <> None:
   2310       mask = lowerfcn(a, limits[0]) * upperfcn(a, limits[1])
   2311 
   2312     a = N.compress(mask, a)  # squish out excluded values
   2313     return avar(a)
   2314 
   2315   def atmin(a, lowerlimit=None, dimension=None, inclusive=1):
   2316     """
   2317 Returns the minimum value of a, along dimension, including only values less
   2318 than (or equal to, if inclusive=1) lowerlimit.  If the limit is set to None,
   2319 all values in the array are used.
   2320 
   2321 Usage:   atmin(a,lowerlimit=None,dimension=None,inclusive=1)
   2322 """
   2323     if inclusive:
   2324       lowerfcn = N.greater
   2325     else:
   2326       lowerfcn = N.greater_equal
   2327     if dimension == None:
   2328       a = N.ravel(a)
   2329       dimension = 0
   2330     if lowerlimit == None:
   2331       lowerlimit = N.minimum.reduce(N.ravel(a)) - 11
   2332     biggest = N.maximum.reduce(N.ravel(a))
   2333     ta = N.where(lowerfcn(a, lowerlimit), a, biggest)
   2334     return N.minimum.reduce(ta, dimension)
   2335 
   2336   def atmax(a, upperlimit, dimension=None, inclusive=1):
   2337     """
   2338 Returns the maximum value of a, along dimension, including only values greater
   2339 than (or equal to, if inclusive=1) upperlimit.  If the limit is set to None,
   2340 a limit larger than the max value in the array is used.
   2341 
   2342 Usage:   atmax(a,upperlimit,dimension=None,inclusive=1)
   2343 """
   2344     if inclusive:
   2345       upperfcn = N.less
   2346     else:
   2347       upperfcn = N.less_equal
   2348     if dimension == None:
   2349       a = N.ravel(a)
   2350       dimension = 0
   2351     if upperlimit == None:
   2352       upperlimit = N.maximum.reduce(N.ravel(a)) + 1
   2353     smallest = N.minimum.reduce(N.ravel(a))
   2354     ta = N.where(upperfcn(a, upperlimit), a, smallest)
   2355     return N.maximum.reduce(ta, dimension)
   2356 
   2357   def atstdev(a, limits=None, inclusive=(1, 1)):
   2358     """
   2359 Returns the standard deviation of all values in an array, ignoring values
   2360 strictly outside the sequence passed to 'limits'.   Note: either limit
   2361 in the sequence, or the value of limits itself, can be set to None.  The
   2362 inclusive list/tuple determines whether the lower and upper limiting bounds
   2363 (respectively) are open/exclusive (0) or closed/inclusive (1).
   2364 
   2365 Usage:   atstdev(a,limits=None,inclusive=(1,1))
   2366 """
   2367     return N.sqrt(tvar(a, limits, inclusive))
   2368 
   2369   def atsem(a, limits=None, inclusive=(1, 1)):
   2370     """
   2371 Returns the standard error of the mean for the values in an array,
   2372 (i.e., using N for the denominator), ignoring values strictly outside
   2373 the sequence passed to 'limits'.   Note: either limit in the sequence,
   2374 or the value of limits itself, can be set to None.  The inclusive list/tuple
   2375 determines whether the lower and upper limiting bounds (respectively) are
   2376 open/exclusive (0) or closed/inclusive (1).
   2377 
   2378 Usage:   atsem(a,limits=None,inclusive=(1,1))
   2379 """
   2380     sd = tstdev(a, limits, inclusive)
   2381     if limits == None or limits == [None, None]:
   2382       n = float(len(N.ravel(a)))
   2383       limits = [min(a) - 1, max(a) + 1]
   2384     assert type(limits) in [ListType, TupleType, N.ndarray
   2385                            ], 'Wrong type for limits in atsem'
   2386     if inclusive[0]:
   2387       lowerfcn = N.greater_equal
   2388     else:
   2389       lowerfcn = N.greater
   2390     if inclusive[1]:
   2391       upperfcn = N.less_equal
   2392     else:
   2393       upperfcn = N.less
   2394     if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(
   2395         N.ravel(a)):
   2396       raise ValueError, 'No array values within given limits (atsem).'
   2397     elif limits[0] == None and limits[1] <> None:
   2398       mask = upperfcn(a, limits[1])
   2399     elif limits[0] <> None and limits[1] == None:
   2400       mask = lowerfcn(a, limits[0])
   2401     elif limits[0] <> None and limits[1] <> None:
   2402       mask = lowerfcn(a, limits[0]) * upperfcn(a, limits[1])
   2403     term1 = N.add.reduce(N.ravel(a * a * mask))
   2404     n = float(N.add.reduce(N.ravel(mask)))
   2405     return sd / math.sqrt(n)
   2406 
   2407 #####################################
   2408 ############  AMOMENTS  #############
   2409 #####################################
   2410 
   2411   def amoment(a, moment=1, dimension=None):
   2412     """
   2413 Calculates the nth moment about the mean for a sample (defaults to the
   2414 1st moment).  Generally used to calculate coefficients of skewness and
   2415 kurtosis.  Dimension can equal None (ravel array first), an integer
   2416 (the dimension over which to operate), or a sequence (operate over
   2417 multiple dimensions).
   2418 
   2419 Usage:   amoment(a,moment=1,dimension=None)
   2420 Returns: appropriate moment along given dimension
   2421 """
   2422     if dimension == None:
   2423       a = N.ravel(a)
   2424       dimension = 0
   2425     if moment == 1:
   2426       return 0.0
   2427     else:
   2428       mn = amean(a, dimension, 1)  # 1=keepdims
   2429       s = N.power((a - mn), moment)
   2430       return amean(s, dimension)
   2431 
   2432   def avariation(a, dimension=None):
   2433     """
   2434 Returns the coefficient of variation, as defined in CRC Standard
   2435 Probability and Statistics, p.6. Dimension can equal None (ravel array
   2436 first), an integer (the dimension over which to operate), or a
   2437 sequence (operate over multiple dimensions).
   2438 
   2439 Usage:   avariation(a,dimension=None)
   2440 """
   2441     return 100.0 * asamplestdev(a, dimension) / amean(a, dimension)
   2442 
   2443   def askew(a, dimension=None):
   2444     """
   2445 Returns the skewness of a distribution (normal ==> 0.0; >0 means extra
   2446 weight in left tail).  Use askewtest() to see if it's close enough.
   2447 Dimension can equal None (ravel array first), an integer (the
   2448 dimension over which to operate), or a sequence (operate over multiple
   2449 dimensions).
   2450 
   2451 Usage:   askew(a, dimension=None)
   2452 Returns: skew of vals in a along dimension, returning ZERO where all vals equal
   2453 """
   2454     denom = N.power(amoment(a, 2, dimension), 1.5)
   2455     zero = N.equal(denom, 0)
   2456     if type(denom) == N.ndarray and asum(zero) <> 0:
   2457       print 'Number of zeros in askew: ', asum(zero)
   2458     denom = denom + zero  # prevent divide-by-zero
   2459     return N.where(zero, 0, amoment(a, 3, dimension) / denom)
   2460 
   2461   def akurtosis(a, dimension=None):
   2462     """
   2463 Returns the kurtosis of a distribution (normal ==> 3.0; >3 means
   2464 heavier in the tails, and usually more peaked).  Use akurtosistest()
   2465 to see if it's close enough.  Dimension can equal None (ravel array
   2466 first), an integer (the dimension over which to operate), or a
   2467 sequence (operate over multiple dimensions).
   2468 
   2469 Usage:   akurtosis(a,dimension=None)
   2470 Returns: kurtosis of values in a along dimension, and ZERO where all vals equal
   2471 """
   2472     denom = N.power(amoment(a, 2, dimension), 2)
   2473     zero = N.equal(denom, 0)
   2474     if type(denom) == N.ndarray and asum(zero) <> 0:
   2475       print 'Number of zeros in akurtosis: ', asum(zero)
   2476     denom = denom + zero  # prevent divide-by-zero
   2477     return N.where(zero, 0, amoment(a, 4, dimension) / denom)
   2478 
   2479   def adescribe(inarray, dimension=None):
   2480     """
   2481 Returns several descriptive statistics of the passed array.  Dimension
   2482 can equal None (ravel array first), an integer (the dimension over
   2483 which to operate), or a sequence (operate over multiple dimensions).
   2484 
   2485 Usage:   adescribe(inarray,dimension=None)
   2486 Returns: n, (min,max), mean, standard deviation, skew, kurtosis
   2487 """
   2488     if dimension == None:
   2489       inarray = N.ravel(inarray)
   2490       dimension = 0
   2491     n = inarray.shape[dimension]
   2492     mm = (N.minimum.reduce(inarray), N.maximum.reduce(inarray))
   2493     m = amean(inarray, dimension)
   2494     sd = astdev(inarray, dimension)
   2495     skew = askew(inarray, dimension)
   2496     kurt = akurtosis(inarray, dimension)
   2497     return n, mm, m, sd, skew, kurt
   2498 
   2499 #####################################
   2500 ########  NORMALITY TESTS  ##########
   2501 #####################################
   2502 
   2503   def askewtest(a, dimension=None):
   2504     """
   2505 Tests whether the skew is significantly different from a normal
   2506 distribution.  Dimension can equal None (ravel array first), an
   2507 integer (the dimension over which to operate), or a sequence (operate
   2508 over multiple dimensions).
   2509 
   2510 Usage:   askewtest(a,dimension=None)
   2511 Returns: z-score and 2-tail z-probability
   2512 """
   2513     if dimension == None:
   2514       a = N.ravel(a)
   2515       dimension = 0
   2516     b2 = askew(a, dimension)
   2517     n = float(a.shape[dimension])
   2518     y = b2 * N.sqrt(((n + 1) * (n + 3)) / (6.0 * (n - 2)))
   2519     beta2 = (3.0 * (n * n + 27 * n - 70) * (n + 1) *
   2520              (n + 3)) / ((n - 2.0) * (n + 5) * (n + 7) * (n + 9))
   2521     W2 = -1 + N.sqrt(2 * (beta2 - 1))
   2522     delta = 1 / N.sqrt(N.log(N.sqrt(W2)))
   2523     alpha = N.sqrt(2 / (W2 - 1))
   2524     y = N.where(y == 0, 1, y)
   2525     Z = delta * N.log(y / alpha + N.sqrt((y / alpha)**2 + 1))
   2526     return Z, (1.0 - zprob(Z)) * 2
   2527 
   2528   def akurtosistest(a, dimension=None):
   2529     """
   2530 Tests whether a dataset has normal kurtosis (i.e.,
   2531 kurtosis=3(n-1)/(n+1)) Valid only for n>20.  Dimension can equal None
   2532 (ravel array first), an integer (the dimension over which to operate),
   2533 or a sequence (operate over multiple dimensions).
   2534 
   2535 Usage:   akurtosistest(a,dimension=None)
   2536 Returns: z-score and 2-tail z-probability, returns 0 for bad pixels
   2537 """
   2538     if dimension == None:
   2539       a = N.ravel(a)
   2540       dimension = 0
   2541     n = float(a.shape[dimension])
   2542     if n < 20:
   2543       print 'akurtosistest only valid for n>=20 ... continuing anyway, n=', n
   2544     b2 = akurtosis(a, dimension)
   2545     E = 3.0 * (n - 1) / (n + 1)
   2546     varb2 = 24.0 * n * (n - 2) * (n - 3) / ((n + 1) * (n + 1) * (n + 3) *
   2547                                             (n + 5))
   2548     x = (b2 - E) / N.sqrt(varb2)
   2549     sqrtbeta1 = 6.0 * (n * n - 5 * n + 2) / ((n + 7) * (n + 9)) * N.sqrt(
   2550         (6.0 * (n + 3) * (n + 5)) / (n * (n - 2) * (n - 3)))
   2551     A = 6.0 + 8.0 / sqrtbeta1 * (2.0 / sqrtbeta1 +
   2552                                  N.sqrt(1 + 4.0 / (sqrtbeta1**2)))
   2553     term1 = 1 - 2 / (9.0 * A)
   2554     denom = 1 + x * N.sqrt(2 / (A - 4.0))
   2555     denom = N.where(N.less(denom, 0), 99, denom)
   2556     term2 = N.where(
   2557         N.equal(denom, 0), term1, N.power(
   2558             (1 - 2.0 / A) / denom, 1 / 3.0))
   2559     Z = (term1 - term2) / N.sqrt(2 / (9.0 * A))
   2560     Z = N.where(N.equal(denom, 99), 0, Z)
   2561     return Z, (1.0 - zprob(Z)) * 2
   2562 
   2563   def anormaltest(a, dimension=None):
   2564     """
   2565 Tests whether skew and/OR kurtosis of dataset differs from normal
   2566 curve.  Can operate over multiple dimensions.  Dimension can equal
   2567 None (ravel array first), an integer (the dimension over which to
   2568 operate), or a sequence (operate over multiple dimensions).
   2569 
   2570 Usage:   anormaltest(a,dimension=None)
   2571 Returns: z-score and 2-tail probability
   2572 """
   2573     if dimension == None:
   2574       a = N.ravel(a)
   2575       dimension = 0
   2576     s, p = askewtest(a, dimension)
   2577     k, p = akurtosistest(a, dimension)
   2578     k2 = N.power(s, 2) + N.power(k, 2)
   2579     return k2, achisqprob(k2, 2)
   2580 
   2581 #####################################
   2582 ######  AFREQUENCY FUNCTIONS  #######
   2583 #####################################
   2584 
   2585   def aitemfreq(a):
   2586     """
   2587 Returns a 2D array of item frequencies.  Column 1 contains item values,
   2588 column 2 contains their respective counts.  Assumes a 1D array is passed.
   2589 @@@sorting OK?
   2590 
   2591 Usage:   aitemfreq(a)
   2592 Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies)
   2593 """
   2594     scores = pstat.aunique(a)
   2595     scores = N.sort(scores)
   2596     freq = N.zeros(len(scores))
   2597     for i in range(len(scores)):
   2598       freq[i] = N.add.reduce(N.equal(a, scores[i]))
   2599     return N.array(pstat.aabut(scores, freq))
   2600 
   2601   def ascoreatpercentile(inarray, percent):
   2602     """
   2603 Usage:   ascoreatpercentile(inarray,percent)   0<percent<100
   2604 Returns: score at given percentile, relative to inarray distribution
   2605 """
   2606     percent = percent / 100.0
   2607     targetcf = percent * len(inarray)
   2608     h, lrl, binsize, extras = histogram(inarray)
   2609     cumhist = cumsum(h * 1)
   2610     for i in range(len(cumhist)):
   2611       if cumhist[i] >= targetcf:
   2612         break
   2613     score = binsize * (
   2614         (targetcf - cumhist[i - 1]) / float(h[i])) + (lrl + binsize * i)
   2615     return score
   2616 
   2617   def apercentileofscore(inarray, score, histbins=10, defaultlimits=None):
   2618     """
   2619 Note: result of this function depends on the values used to histogram
   2620 the data(!).
   2621 
   2622 Usage:   apercentileofscore(inarray,score,histbins=10,defaultlimits=None)
   2623 Returns: percentile-position of score (0-100) relative to inarray
   2624 """
   2625     h, lrl, binsize, extras = histogram(inarray, histbins, defaultlimits)
   2626     cumhist = cumsum(h * 1)
   2627     i = int((score - lrl) / float(binsize))
   2628     pct = (cumhist[i - 1] + ((score - (lrl + binsize * i)) / float(binsize)) *
   2629            h[i]) / float(len(inarray)) * 100
   2630     return pct
   2631 
   2632   def ahistogram(inarray, numbins=10, defaultlimits=None, printextras=1):
   2633     """
   2634 Returns (i) an array of histogram bin counts, (ii) the smallest value
   2635 of the histogram binning, and (iii) the bin width (the last 2 are not
   2636 necessarily integers).  Default number of bins is 10.  Defaultlimits
   2637 can be None (the routine picks bins spanning all the numbers in the
   2638 inarray) or a 2-sequence (lowerlimit, upperlimit).  Returns all of the
   2639 following: array of bin values, lowerreallimit, binsize, extrapoints.
   2640 
   2641 Usage:   ahistogram(inarray,numbins=10,defaultlimits=None,printextras=1)
   2642 Returns: (array of bin counts, bin-minimum, min-width, #-points-outside-range)
   2643 """
   2644     inarray = N.ravel(inarray)  # flatten any >1D arrays
   2645     if (defaultlimits <> None):
   2646       lowerreallimit = defaultlimits[0]
   2647       upperreallimit = defaultlimits[1]
   2648       binsize = (upperreallimit - lowerreallimit) / float(numbins)
   2649     else:
   2650       Min = N.minimum.reduce(inarray)
   2651       Max = N.maximum.reduce(inarray)
   2652       estbinwidth = float(Max - Min) / float(numbins) + 1e-6
   2653       binsize = (Max - Min + estbinwidth) / float(numbins)
   2654       lowerreallimit = Min - binsize / 2.0  #lower real limit,1st bin
   2655     bins = N.zeros(numbins)
   2656     extrapoints = 0
   2657     for num in inarray:
   2658       try:
   2659         if (num - lowerreallimit) < 0:
   2660           extrapoints = extrapoints + 1
   2661         else:
   2662           bintoincrement = int((num - lowerreallimit) / float(binsize))
   2663           bins[bintoincrement] = bins[bintoincrement] + 1
   2664       except:  # point outside lower/upper limits
   2665         extrapoints = extrapoints + 1
   2666     if (extrapoints > 0 and printextras == 1):
   2667       print '\nPoints outside given histogram range =', extrapoints
   2668     return (bins, lowerreallimit, binsize, extrapoints)
   2669 
   2670   def acumfreq(a, numbins=10, defaultreallimits=None):
   2671     """
   2672 Returns a cumulative frequency histogram, using the histogram function.
   2673 Defaultreallimits can be None (use all data), or a 2-sequence containing
   2674 lower and upper limits on values to include.
   2675 
   2676 Usage:   acumfreq(a,numbins=10,defaultreallimits=None)
   2677 Returns: array of cumfreq bin values, lowerreallimit, binsize, extrapoints
   2678 """
   2679     h, l, b, e = histogram(a, numbins, defaultreallimits)
   2680     cumhist = cumsum(h * 1)
   2681     return cumhist, l, b, e
   2682 
   2683   def arelfreq(a, numbins=10, defaultreallimits=None):
   2684     """
   2685 Returns a relative frequency histogram, using the histogram function.
   2686 Defaultreallimits can be None (use all data), or a 2-sequence containing
   2687 lower and upper limits on values to include.
   2688 
   2689 Usage:   arelfreq(a,numbins=10,defaultreallimits=None)
   2690 Returns: array of cumfreq bin values, lowerreallimit, binsize, extrapoints
   2691 """
   2692     h, l, b, e = histogram(a, numbins, defaultreallimits)
   2693     h = N.array(h / float(a.shape[0]))
   2694     return h, l, b, e
   2695 
   2696 #####################################
   2697 ######  AVARIABILITY FUNCTIONS  #####
   2698 #####################################
   2699 
   2700   def aobrientransform(*args):
   2701     """
   2702 Computes a transform on input data (any number of columns).  Used to
   2703 test for homogeneity of variance prior to running one-way stats.  Each
   2704 array in *args is one level of a factor.  If an F_oneway() run on the
   2705 transformed data and found significant, variances are unequal.   From
   2706 Maxwell and Delaney, p.112.
   2707 
   2708 Usage:   aobrientransform(*args)    *args = 1D arrays, one per level of factor
   2709 Returns: transformed data for use in an ANOVA
   2710 """
   2711     TINY = 1e-10
   2712     k = len(args)
   2713     n = N.zeros(k, N.float_)
   2714     v = N.zeros(k, N.float_)
   2715     m = N.zeros(k, N.float_)
   2716     nargs = []
   2717     for i in range(k):
   2718       nargs.append(args[i].astype(N.float_))
   2719       n[i] = float(len(nargs[i]))
   2720       v[i] = var(nargs[i])
   2721       m[i] = mean(nargs[i])
   2722     for j in range(k):
   2723       for i in range(n[j]):
   2724         t1 = (n[j] - 1.5) * n[j] * (nargs[j][i] - m[j])**2
   2725         t2 = 0.5 * v[j] * (n[j] - 1.0)
   2726         t3 = (n[j] - 1.0) * (n[j] - 2.0)
   2727         nargs[j][i] = (t1 - t2) / float(t3)
   2728     check = 1
   2729     for j in range(k):
   2730       if v[j] - mean(nargs[j]) > TINY:
   2731         check = 0
   2732     if check <> 1:
   2733       raise ValueError, 'Lack of convergence in obrientransform.'
   2734     else:
   2735       return N.array(nargs)
   2736 
   2737   def asamplevar(inarray, dimension=None, keepdims=0):
   2738     """
   2739 Returns the sample standard deviation of the values in the passed
   2740 array (i.e., using N).  Dimension can equal None (ravel array first),
   2741 an integer (the dimension over which to operate), or a sequence
   2742 (operate over multiple dimensions).  Set keepdims=1 to return an array
   2743 with the same number of dimensions as inarray.
   2744 
   2745 Usage:   asamplevar(inarray,dimension=None,keepdims=0)
   2746 """
   2747     if dimension == None:
   2748       inarray = N.ravel(inarray)
   2749       dimension = 0
   2750     if dimension == 1:
   2751       mn = amean(inarray, dimension)[:, N.NewAxis]
   2752     else:
   2753       mn = amean(inarray, dimension, keepdims=1)
   2754     deviations = inarray - mn
   2755     if type(dimension) == ListType:
   2756       n = 1
   2757       for d in dimension:
   2758         n = n * inarray.shape[d]
   2759     else:
   2760       n = inarray.shape[dimension]
   2761     svar = ass(deviations, dimension, keepdims) / float(n)
   2762     return svar
   2763 
   2764   def asamplestdev(inarray, dimension=None, keepdims=0):
   2765     """
   2766 Returns the sample standard deviation of the values in the passed
   2767 array (i.e., using N).  Dimension can equal None (ravel array first),
   2768 an integer (the dimension over which to operate), or a sequence
   2769 (operate over multiple dimensions).  Set keepdims=1 to return an array
   2770 with the same number of dimensions as inarray.
   2771 
   2772 Usage:   asamplestdev(inarray,dimension=None,keepdims=0)
   2773 """
   2774     return N.sqrt(asamplevar(inarray, dimension, keepdims))
   2775 
   2776   def asignaltonoise(instack, dimension=0):
   2777     """
   2778 Calculates signal-to-noise.  Dimension can equal None (ravel array
   2779 first), an integer (the dimension over which to operate), or a
   2780 sequence (operate over multiple dimensions).
   2781 
   2782 Usage:   asignaltonoise(instack,dimension=0):
   2783 Returns: array containing the value of (mean/stdev) along dimension,
   2784          or 0 when stdev=0
   2785 """
   2786     m = mean(instack, dimension)
   2787     sd = stdev(instack, dimension)
   2788     return N.where(sd == 0, 0, m / sd)
   2789 
   2790   def acov(x, y, dimension=None, keepdims=0):
   2791     """
   2792 Returns the estimated covariance of the values in the passed
   2793 array (i.e., N-1).  Dimension can equal None (ravel array first), an
   2794 integer (the dimension over which to operate), or a sequence (operate
   2795 over multiple dimensions).  Set keepdims=1 to return an array with the
   2796 same number of dimensions as inarray.
   2797 
   2798 Usage:   acov(x,y,dimension=None,keepdims=0)
   2799 """
   2800     if dimension == None:
   2801       x = N.ravel(x)
   2802       y = N.ravel(y)
   2803       dimension = 0
   2804     xmn = amean(x, dimension, 1)  # keepdims
   2805     xdeviations = x - xmn
   2806     ymn = amean(y, dimension, 1)  # keepdims
   2807     ydeviations = y - ymn
   2808     if type(dimension) == ListType:
   2809       n = 1
   2810       for d in dimension:
   2811         n = n * x.shape[d]
   2812     else:
   2813       n = x.shape[dimension]
   2814     covar = N.sum(xdeviations * ydeviations) / float(n - 1)
   2815     return covar
   2816 
   2817   def avar(inarray, dimension=None, keepdims=0):
   2818     """
   2819 Returns the estimated population variance of the values in the passed
   2820 array (i.e., N-1).  Dimension can equal None (ravel array first), an
   2821 integer (the dimension over which to operate), or a sequence (operate
   2822 over multiple dimensions).  Set keepdims=1 to return an array with the
   2823 same number of dimensions as inarray.
   2824 
   2825 Usage:   avar(inarray,dimension=None,keepdims=0)
   2826 """
   2827     if dimension == None:
   2828       inarray = N.ravel(inarray)
   2829       dimension = 0
   2830     mn = amean(inarray, dimension, 1)
   2831     deviations = inarray - mn
   2832     if type(dimension) == ListType:
   2833       n = 1
   2834       for d in dimension:
   2835         n = n * inarray.shape[d]
   2836     else:
   2837       n = inarray.shape[dimension]
   2838     var = ass(deviations, dimension, keepdims) / float(n - 1)
   2839     return var
   2840 
   2841   def astdev(inarray, dimension=None, keepdims=0):
   2842     """
   2843 Returns the estimated population standard deviation of the values in
   2844 the passed array (i.e., N-1).  Dimension can equal None (ravel array
   2845 first), an integer (the dimension over which to operate), or a
   2846 sequence (operate over multiple dimensions).  Set keepdims=1 to return
   2847 an array with the same number of dimensions as inarray.
   2848 
   2849 Usage:   astdev(inarray,dimension=None,keepdims=0)
   2850 """
   2851     return N.sqrt(avar(inarray, dimension, keepdims))
   2852 
   2853   def asterr(inarray, dimension=None, keepdims=0):
   2854     """
   2855 Returns the estimated population standard error of the values in the
   2856 passed array (i.e., N-1).  Dimension can equal None (ravel array
   2857 first), an integer (the dimension over which to operate), or a
   2858 sequence (operate over multiple dimensions).  Set keepdims=1 to return
   2859 an array with the same number of dimensions as inarray.
   2860 
   2861 Usage:   asterr(inarray,dimension=None,keepdims=0)
   2862 """
   2863     if dimension == None:
   2864       inarray = N.ravel(inarray)
   2865       dimension = 0
   2866     return astdev(inarray, dimension,
   2867                   keepdims) / float(N.sqrt(inarray.shape[dimension]))
   2868 
   2869   def asem(inarray, dimension=None, keepdims=0):
   2870     """
   2871 Returns the standard error of the mean (i.e., using N) of the values
   2872 in the passed array.  Dimension can equal None (ravel array first), an
   2873 integer (the dimension over which to operate), or a sequence (operate
   2874 over multiple dimensions).  Set keepdims=1 to return an array with the
   2875 same number of dimensions as inarray.
   2876 
   2877 Usage:   asem(inarray,dimension=None, keepdims=0)
   2878 """
   2879     if dimension == None:
   2880       inarray = N.ravel(inarray)
   2881       dimension = 0
   2882     if type(dimension) == ListType:
   2883       n = 1
   2884       for d in dimension:
   2885         n = n * inarray.shape[d]
   2886     else:
   2887       n = inarray.shape[dimension]
   2888     s = asamplestdev(inarray, dimension, keepdims) / N.sqrt(n - 1)
   2889     return s
   2890 
   2891   def az(a, score):
   2892     """
   2893 Returns the z-score of a given input score, given thearray from which
   2894 that score came.  Not appropriate for population calculations, nor for
   2895 arrays > 1D.
   2896 
   2897 Usage:   az(a, score)
   2898 """
   2899     z = (score - amean(a)) / asamplestdev(a)
   2900     return z
   2901 
   2902   def azs(a):
   2903     """
   2904 Returns a 1D array of z-scores, one for each score in the passed array,
   2905 computed relative to the passed array.
   2906 
   2907 Usage:   azs(a)
   2908 """
   2909     zscores = []
   2910     for item in a:
   2911       zscores.append(z(a, item))
   2912     return N.array(zscores)
   2913 
   2914   def azmap(scores, compare, dimension=0):
   2915     """
   2916 Returns an array of z-scores the shape of scores (e.g., [x,y]), compared to
   2917 array passed to compare (e.g., [time,x,y]).  Assumes collapsing over dim 0
   2918 of the compare array.
   2919 
   2920 Usage:   azs(scores, compare, dimension=0)
   2921 """
   2922     mns = amean(compare, dimension)
   2923     sstd = asamplestdev(compare, 0)
   2924     return (scores - mns) / sstd
   2925 
   2926 #####################################
   2927 #######  ATRIMMING FUNCTIONS  #######
   2928 #####################################
   2929 
   2930 ## deleted around() as it's in numpy now
   2931 
   2932   def athreshold(a, threshmin=None, threshmax=None, newval=0):
   2933     """
   2934 Like Numeric.clip() except that values <threshmid or >threshmax are replaced
   2935 by newval instead of by threshmin/threshmax (respectively).
   2936 
   2937 Usage:   athreshold(a,threshmin=None,threshmax=None,newval=0)
   2938 Returns: a, with values <threshmin or >threshmax replaced with newval
   2939 """
   2940     mask = N.zeros(a.shape)
   2941     if threshmin <> None:
   2942       mask = mask + N.where(a < threshmin, 1, 0)
   2943     if threshmax <> None:
   2944       mask = mask + N.where(a > threshmax, 1, 0)
   2945     mask = N.clip(mask, 0, 1)
   2946     return N.where(mask, newval, a)
   2947 
   2948   def atrimboth(a, proportiontocut):
   2949     """
   2950 Slices off the passed proportion of items from BOTH ends of the passed
   2951 array (i.e., with proportiontocut=0.1, slices 'leftmost' 10% AND
   2952 'rightmost' 10% of scores.  You must pre-sort the array if you want
   2953 "proper" trimming.  Slices off LESS if proportion results in a
   2954 non-integer slice index (i.e., conservatively slices off
   2955 proportiontocut).
   2956 
   2957 Usage:   atrimboth (a,proportiontocut)
   2958 Returns: trimmed version of array a
   2959 """
   2960     lowercut = int(proportiontocut * len(a))
   2961     uppercut = len(a) - lowercut
   2962     return a[lowercut:uppercut]
   2963 
   2964   def atrim1(a, proportiontocut, tail='right'):
   2965     """
   2966 Slices off the passed proportion of items from ONE end of the passed
   2967 array (i.e., if proportiontocut=0.1, slices off 'leftmost' or 'rightmost'
   2968 10% of scores).  Slices off LESS if proportion results in a non-integer
   2969 slice index (i.e., conservatively slices off proportiontocut).
   2970 
   2971 Usage:   atrim1(a,proportiontocut,tail='right')  or set tail='left'
   2972 Returns: trimmed version of array a
   2973 """
   2974     if string.lower(tail) == 'right':
   2975       lowercut = 0
   2976       uppercut = len(a) - int(proportiontocut * len(a))
   2977     elif string.lower(tail) == 'left':
   2978       lowercut = int(proportiontocut * len(a))
   2979       uppercut = len(a)
   2980     return a[lowercut:uppercut]
   2981 
   2982 #####################################
   2983 #####  ACORRELATION FUNCTIONS  ######
   2984 #####################################
   2985 
   2986   def acovariance(X):
   2987     """
   2988 Computes the covariance matrix of a matrix X.  Requires a 2D matrix input.
   2989 
   2990 Usage:   acovariance(X)
   2991 Returns: covariance matrix of X
   2992 """
   2993     if len(X.shape) <> 2:
   2994       raise TypeError, 'acovariance requires 2D matrices'
   2995     n = X.shape[0]
   2996     mX = amean(X, 0)
   2997     return N.dot(N.transpose(X), X) / float(n) - N.multiply.outer(mX, mX)
   2998 
   2999   def acorrelation(X):
   3000     """
   3001 Computes the correlation matrix of a matrix X.  Requires a 2D matrix input.
   3002 
   3003 Usage:   acorrelation(X)
   3004 Returns: correlation matrix of X
   3005 """
   3006     C = acovariance(X)
   3007     V = N.diagonal(C)
   3008     return C / N.sqrt(N.multiply.outer(V, V))
   3009 
   3010   def apaired(x, y):
   3011     """
   3012 Interactively determines the type of data in x and y, and then runs the
   3013 appropriated statistic for paired group data.
   3014 
   3015 Usage:   apaired(x,y)     x,y = the two arrays of values to be compared
   3016 Returns: appropriate statistic name, value, and probability
   3017 """
   3018     samples = ''
   3019     while samples not in ['i', 'r', 'I', 'R', 'c', 'C']:
   3020       print '\nIndependent or related samples, or correlation (i,r,c): ',
   3021       samples = raw_input()
   3022 
   3023     if samples in ['i', 'I', 'r', 'R']:
   3024       print '\nComparing variances ...',
   3025       # USE O'BRIEN'S TEST FOR HOMOGENEITY OF VARIANCE, Maxwell & delaney, p.112
   3026       r = obrientransform(x, y)
   3027       f, p = F_oneway(pstat.colex(r, 0), pstat.colex(r, 1))
   3028       if p < 0.05:
   3029         vartype = 'unequal, p=' + str(round(p, 4))
   3030       else:
   3031         vartype = 'equal'
   3032       print vartype
   3033       if samples in ['i', 'I']:
   3034         if vartype[0] == 'e':
   3035           t, p = ttest_ind(x, y, None, 0)
   3036           print '\nIndependent samples t-test:  ', round(t, 4), round(p, 4)
   3037         else:
   3038           if len(x) > 20 or len(y) > 20:
   3039             z, p = ranksums(x, y)
   3040             print '\nRank Sums test (NONparametric, n>20):  ', round(
   3041                 z, 4), round(p, 4)
   3042           else:
   3043             u, p = mannwhitneyu(x, y)
   3044             print '\nMann-Whitney U-test (NONparametric, ns<20):  ', round(
   3045                 u, 4), round(p, 4)
   3046 
   3047       else:  # RELATED SAMPLES
   3048         if vartype[0] == 'e':
   3049           t, p = ttest_rel(x, y, 0)
   3050           print '\nRelated samples t-test:  ', round(t, 4), round(p, 4)
   3051         else:
   3052           t, p = ranksums(x, y)
   3053           print '\nWilcoxon T-test (NONparametric):  ', round(t, 4), round(p, 4)
   3054     else:  # CORRELATION ANALYSIS
   3055       corrtype = ''
   3056       while corrtype not in ['c', 'C', 'r', 'R', 'd', 'D']:
   3057         print '\nIs the data Continuous, Ranked, or Dichotomous (c,r,d): ',
   3058         corrtype = raw_input()
   3059       if corrtype in ['c', 'C']:
   3060         m, b, r, p, see = linregress(x, y)
   3061         print '\nLinear regression for continuous variables ...'
   3062         lol = [
   3063             ['Slope', 'Intercept', 'r', 'Prob', 'SEestimate'],
   3064             [round(m, 4), round(b, 4), round(r, 4), round(p, 4), round(see, 4)]
   3065         ]
   3066         pstat.printcc(lol)
   3067       elif corrtype in ['r', 'R']:
   3068         r, p = spearmanr(x, y)
   3069         print '\nCorrelation for ranked variables ...'
   3070         print "Spearman's r: ", round(r, 4), round(p, 4)
   3071       else:  # DICHOTOMOUS
   3072         r, p = pointbiserialr(x, y)
   3073         print '\nAssuming x contains a dichotomous variable ...'
   3074         print 'Point Biserial r: ', round(r, 4), round(p, 4)
   3075     print '\n\n'
   3076     return None
   3077 
   3078   def dices(x, y):
   3079     """
   3080 Calculates Dice's coefficient ... (2*number of common terms)/(number of terms in
   3081 x +
   3082 number of terms in y). Returns a value between 0 (orthogonal) and 1.
   3083 
   3084 Usage:  dices(x,y)
   3085 """
   3086     import sets
   3087     x = sets.Set(x)
   3088     y = sets.Set(y)
   3089     common = len(x.intersection(y))
   3090     total = float(len(x) + len(y))
   3091     return 2 * common / total
   3092 
   3093   def icc(x, y=None, verbose=0):
   3094     """
   3095 Calculates intraclass correlation coefficients using simple, Type I sums of
   3096 squares.
   3097 If only one variable is passed, assumed it's an Nx2 matrix
   3098 
   3099 Usage:   icc(x,y=None,verbose=0)
   3100 Returns: icc rho, prob ####PROB IS A GUESS BASED ON PEARSON
   3101 """
   3102     TINY = 1.0e-20
   3103     if y:
   3104       all = N.concatenate([x, y], 0)
   3105     else:
   3106       all = x + 0
   3107       x = all[:, 0]
   3108       y = all[:, 1]
   3109     totalss = ass(all - mean(all))
   3110     pairmeans = (x + y) / 2.
   3111     withinss = ass(x - pairmeans) + ass(y - pairmeans)
   3112     withindf = float(len(x))
   3113     betwdf = float(len(x) - 1)
   3114     withinms = withinss / withindf
   3115     betweenms = (totalss - withinss) / betwdf
   3116     rho = (betweenms - withinms) / (withinms + betweenms)
   3117     t = rho * math.sqrt(betwdf / ((1.0 - rho + TINY) * (1.0 + rho + TINY)))
   3118     prob = abetai(0.5 * betwdf, 0.5, betwdf / (betwdf + t * t), verbose)
   3119     return rho, prob
   3120 
   3121   def alincc(x, y):
   3122     """
   3123 Calculates Lin's concordance correlation coefficient.
   3124 
   3125 Usage:   alincc(x,y)    where x, y are equal-length arrays
   3126 Returns: Lin's CC
   3127 """
   3128     x = N.ravel(x)
   3129     y = N.ravel(y)
   3130     covar = acov(x, y) * (len(x) - 1) / float(len(x))  # correct denom to n
   3131     xvar = avar(x) * (len(x) - 1) / float(len(x))  # correct denom to n
   3132     yvar = avar(y) * (len(y) - 1) / float(len(y))  # correct denom to n
   3133     lincc = (2 * covar) / ((xvar + yvar) + ((amean(x) - amean(y))**2))
   3134     return lincc
   3135 
   3136   def apearsonr(x, y, verbose=1):
   3137     """
   3138 Calculates a Pearson correlation coefficient and returns p.  Taken
   3139 from Heiman's Basic Statistics for the Behav. Sci (2nd), p.195.
   3140 
   3141 Usage:   apearsonr(x,y,verbose=1)      where x,y are equal length arrays
   3142 Returns: Pearson's r, two-tailed p-value
   3143 """
   3144     TINY = 1.0e-20
   3145     n = len(x)
   3146     xmean = amean(x)
   3147     ymean = amean(y)
   3148     r_num = n * (N.add.reduce(x * y)) - N.add.reduce(x) * N.add.reduce(y)
   3149     r_den = math.sqrt((n * ass(x) - asquare_of_sums(x)) *
   3150                       (n * ass(y) - asquare_of_sums(y)))
   3151     r = (r_num / r_den)
   3152     df = n - 2
   3153     t = r * math.sqrt(df / ((1.0 - r + TINY) * (1.0 + r + TINY)))
   3154     prob = abetai(0.5 * df, 0.5, df / (df + t * t), verbose)
   3155     return r, prob
   3156 
   3157   def aspearmanr(x, y):
   3158     """
   3159 Calculates a Spearman rank-order correlation coefficient.  Taken
   3160 from Heiman's Basic Statistics for the Behav. Sci (1st), p.192.
   3161 
   3162 Usage:   aspearmanr(x,y)      where x,y are equal-length arrays
   3163 Returns: Spearman's r, two-tailed p-value
   3164 """
   3165     TINY = 1e-30
   3166     n = len(x)
   3167     rankx = rankdata(x)
   3168     ranky = rankdata(y)
   3169     dsq = N.add.reduce((rankx - ranky)**2)
   3170     rs = 1 - 6 * dsq / float(n * (n**2 - 1))
   3171     t = rs * math.sqrt((n - 2) / ((rs + 1.0) * (1.0 - rs)))
   3172     df = n - 2
   3173     probrs = abetai(0.5 * df, 0.5, df / (df + t * t))
   3174     # probability values for rs are from part 2 of the spearman function in
   3175     # Numerical Recipies, p.510.  They close to tables, but not exact.(?)
   3176     return rs, probrs
   3177 
   3178   def apointbiserialr(x, y):
   3179     """
   3180 Calculates a point-biserial correlation coefficient and the associated
   3181 probability value.  Taken from Heiman's Basic Statistics for the Behav.
   3182 Sci (1st), p.194.
   3183 
   3184 Usage:   apointbiserialr(x,y)      where x,y are equal length arrays
   3185 Returns: Point-biserial r, two-tailed p-value
   3186 """
   3187     TINY = 1e-30
   3188     categories = pstat.aunique(x)
   3189     data = pstat.aabut(x, y)
   3190     if len(categories) <> 2:
   3191       raise ValueError, ('Exactly 2 categories required (in x) for '
   3192                          'pointbiserialr().')
   3193     else:  # there are 2 categories, continue
   3194       codemap = pstat.aabut(categories, N.arange(2))
   3195       recoded = pstat.arecode(data, codemap, 0)
   3196       x = pstat.alinexand(data, 0, categories[0])
   3197       y = pstat.alinexand(data, 0, categories[1])
   3198       xmean = amean(pstat.acolex(x, 1))
   3199       ymean = amean(pstat.acolex(y, 1))
   3200       n = len(data)
   3201       adjust = math.sqrt((len(x) / float(n)) * (len(y) / float(n)))
   3202       rpb = (ymean - xmean) / asamplestdev(pstat.acolex(data, 1)) * adjust
   3203       df = n - 2
   3204       t = rpb * math.sqrt(df / ((1.0 - rpb + TINY) * (1.0 + rpb + TINY)))
   3205       prob = abetai(0.5 * df, 0.5, df / (df + t * t))
   3206       return rpb, prob
   3207 
   3208   def akendalltau(x, y):
   3209     """
   3210 Calculates Kendall's tau ... correlation of ordinal data.  Adapted
   3211 from function kendl1 in Numerical Recipies.  Needs good test-cases.@@@
   3212 
   3213 Usage:   akendalltau(x,y)
   3214 Returns: Kendall's tau, two-tailed p-value
   3215 """
   3216     n1 = 0
   3217     n2 = 0
   3218     iss = 0
   3219     for j in range(len(x) - 1):
   3220       for k in range(j, len(y)):
   3221         a1 = x[j] - x[k]
   3222         a2 = y[j] - y[k]
   3223         aa = a1 * a2
   3224         if (aa):  # neither array has a tie
   3225           n1 = n1 + 1
   3226           n2 = n2 + 1
   3227           if aa > 0:
   3228             iss = iss + 1
   3229           else:
   3230             iss = iss - 1
   3231         else:
   3232           if (a1):
   3233             n1 = n1 + 1
   3234           else:
   3235             n2 = n2 + 1
   3236     tau = iss / math.sqrt(n1 * n2)
   3237     svar = (4.0 * len(x) + 10.0) / (9.0 * len(x) * (len(x) - 1))
   3238     z = tau / math.sqrt(svar)
   3239     prob = erfcc(abs(z) / 1.4142136)
   3240     return tau, prob
   3241 
   3242   def alinregress(*args):
   3243     """
   3244 Calculates a regression line on two arrays, x and y, corresponding to x,y
   3245 pairs.  If a single 2D array is passed, alinregress finds dim with 2 levels
   3246 and splits data into x,y pairs along that dim.
   3247 
   3248 Usage:   alinregress(*args)    args=2 equal-length arrays, or one 2D array
   3249 Returns: slope, intercept, r, two-tailed prob, sterr-of-the-estimate, n
   3250 """
   3251     TINY = 1.0e-20
   3252     if len(args) == 1:  # more than 1D array?
   3253       args = args[0]
   3254       if len(args) == 2:
   3255         x = args[0]
   3256         y = args[1]
   3257       else:
   3258         x = args[:, 0]
   3259         y = args[:, 1]
   3260     else:
   3261       x = args[0]
   3262       y = args[1]
   3263     n = len(x)
   3264     xmean = amean(x)
   3265     ymean = amean(y)
   3266     r_num = n * (N.add.reduce(x * y)) - N.add.reduce(x) * N.add.reduce(y)
   3267     r_den = math.sqrt((n * ass(x) - asquare_of_sums(x)) *
   3268                       (n * ass(y) - asquare_of_sums(y)))
   3269     r = r_num / r_den
   3270     z = 0.5 * math.log((1.0 + r + TINY) / (1.0 - r + TINY))
   3271     df = n - 2
   3272     t = r * math.sqrt(df / ((1.0 - r + TINY) * (1.0 + r + TINY)))
   3273     prob = abetai(0.5 * df, 0.5, df / (df + t * t))
   3274     slope = r_num / (float(n) * ass(x) - asquare_of_sums(x))
   3275     intercept = ymean - slope * xmean
   3276     sterrest = math.sqrt(1 - r * r) * asamplestdev(y)
   3277     return slope, intercept, r, prob, sterrest, n
   3278 
   3279   def amasslinregress(*args):
   3280     """
   3281 Calculates a regression line on one 1D array (x) and one N-D array (y).
   3282 
   3283 Returns: slope, intercept, r, two-tailed prob, sterr-of-the-estimate, n
   3284 """
   3285     TINY = 1.0e-20
   3286     if len(args) == 1:  # more than 1D array?
   3287       args = args[0]
   3288       if len(args) == 2:
   3289         x = N.ravel(args[0])
   3290         y = args[1]
   3291       else:
   3292         x = N.ravel(args[:, 0])
   3293         y = args[:, 1]
   3294     else:
   3295       x = args[0]
   3296       y = args[1]
   3297     x = x.astype(N.float_)
   3298     y = y.astype(N.float_)
   3299     n = len(x)
   3300     xmean = amean(x)
   3301     ymean = amean(y, 0)
   3302     shp = N.ones(len(y.shape))
   3303     shp[0] = len(x)
   3304     x.shape = shp
   3305     print x.shape, y.shape
   3306     r_num = n * (N.add.reduce(x * y, 0)) - N.add.reduce(x) * N.add.reduce(y, 0)
   3307     r_den = N.sqrt((n * ass(x) - asquare_of_sums(x)) *
   3308                    (n * ass(y, 0) - asquare_of_sums(y, 0)))
   3309     zerodivproblem = N.equal(r_den, 0)
   3310     r_den = N.where(zerodivproblem, 1, r_den
   3311                    )  # avoid zero-division in 1st place
   3312     r = r_num / r_den  # need to do this nicely for matrix division
   3313     r = N.where(zerodivproblem, 0.0, r)
   3314     z = 0.5 * N.log((1.0 + r + TINY) / (1.0 - r + TINY))
   3315     df = n - 2
   3316     t = r * N.sqrt(df / ((1.0 - r + TINY) * (1.0 + r + TINY)))
   3317     prob = abetai(0.5 * df, 0.5, df / (df + t * t))
   3318 
   3319     ss = float(n) * ass(x) - asquare_of_sums(x)
   3320     s_den = N.where(ss == 0, 1, ss)  # avoid zero-division in 1st place
   3321     slope = r_num / s_den
   3322     intercept = ymean - slope * xmean
   3323     sterrest = N.sqrt(1 - r * r) * asamplestdev(y, 0)
   3324     return slope, intercept, r, prob, sterrest, n
   3325 
   3326 #####################################
   3327 #####  AINFERENTIAL STATISTICS  #####
   3328 #####################################
   3329 
   3330   def attest_1samp(a, popmean, printit=0, name='Sample', writemode='a'):
   3331     """
   3332 Calculates the t-obtained for the independent samples T-test on ONE group
   3333 of scores a, given a population mean.  If printit=1, results are printed
   3334 to the screen.  If printit='filename', the results are output to 'filename'
   3335 using the given writemode (default=append).  Returns t-value, and prob.
   3336 
   3337 Usage:   attest_1samp(a,popmean,Name='Sample',printit=0,writemode='a')
   3338 Returns: t-value, two-tailed prob
   3339 """
   3340     if type(a) != N.ndarray:
   3341       a = N.array(a)
   3342     x = amean(a)
   3343     v = avar(a)
   3344     n = len(a)
   3345     df = n - 1
   3346     svar = ((n - 1) * v) / float(df)
   3347     t = (x - popmean) / math.sqrt(svar * (1.0 / n))
   3348     prob = abetai(0.5 * df, 0.5, df / (df + t * t))
   3349 
   3350     if printit <> 0:
   3351       statname = 'Single-sample T-test.'
   3352       outputpairedstats(printit, writemode, 'Population', '--', popmean, 0, 0,
   3353                         0, name, n, x, v, N.minimum.reduce(N.ravel(a)),
   3354                         N.maximum.reduce(N.ravel(a)), statname, t, prob)
   3355     return t, prob
   3356 
   3357   def attest_ind(a,
   3358                  b,
   3359                  dimension=None,
   3360                  printit=0,
   3361                  name1='Samp1',
   3362                  name2='Samp2',
   3363                  writemode='a'):
   3364     """
   3365 Calculates the t-obtained T-test on TWO INDEPENDENT samples of scores
   3366 a, and b.  From Numerical Recipies, p.483.  If printit=1, results are
   3367 printed to the screen.  If printit='filename', the results are output
   3368 to 'filename' using the given writemode (default=append).  Dimension
   3369 can equal None (ravel array first), or an integer (the dimension over
   3370 which to operate on a and b).
   3371 
   3372 Usage:   attest_ind (a,b,dimension=None,printit=0,
   3373                      Name1='Samp1',Name2='Samp2',writemode='a')
   3374 Returns: t-value, two-tailed p-value
   3375 """
   3376     if dimension == None:
   3377       a = N.ravel(a)
   3378       b = N.ravel(b)
   3379       dimension = 0
   3380     x1 = amean(a, dimension)
   3381     x2 = amean(b, dimension)
   3382     v1 = avar(a, dimension)
   3383     v2 = avar(b, dimension)
   3384     n1 = a.shape[dimension]
   3385     n2 = b.shape[dimension]
   3386     df = n1 + n2 - 2
   3387     svar = ((n1 - 1) * v1 + (n2 - 1) * v2) / float(df)
   3388     zerodivproblem = N.equal(svar, 0)
   3389     svar = N.where(zerodivproblem, 1, svar)  # avoid zero-division in 1st place
   3390     t = (x1 - x2) / N.sqrt(svar *
   3391                            (1.0 / n1 + 1.0 / n2))  # N-D COMPUTATION HERE!!!!!!
   3392     t = N.where(zerodivproblem, 1.0, t)  # replace NaN/wrong t-values with 1.0
   3393     probs = abetai(0.5 * df, 0.5, float(df) / (df + t * t))
   3394 
   3395     if type(t) == N.ndarray:
   3396       probs = N.reshape(probs, t.shape)
   3397     if probs.shape == (1,):
   3398       probs = probs[0]
   3399 
   3400     if printit <> 0:
   3401       if type(t) == N.ndarray:
   3402         t = t[0]
   3403       if type(probs) == N.ndarray:
   3404         probs = probs[0]
   3405       statname = 'Independent samples T-test.'
   3406       outputpairedstats(printit, writemode, name1, n1, x1, v1,
   3407                         N.minimum.reduce(N.ravel(a)),
   3408                         N.maximum.reduce(N.ravel(a)), name2, n2, x2, v2,
   3409                         N.minimum.reduce(N.ravel(b)),
   3410                         N.maximum.reduce(N.ravel(b)), statname, t, probs)
   3411       return
   3412     return t, probs
   3413 
   3414   def ap2t(pval, df):
   3415     """
   3416 Tries to compute a t-value from a p-value (or pval array) and associated df.
   3417 SLOW for large numbers of elements(!) as it re-computes p-values 20 times
   3418 (smaller step-sizes) at which point it decides it's done. Keeps the signs
   3419 of the input array. Returns 1000 (or -1000) if t>100.
   3420 
   3421 Usage:  ap2t(pval,df)
   3422 Returns: an array of t-values with the shape of pval
   3423     """
   3424     pval = N.array(pval)
   3425     signs = N.sign(pval)
   3426     pval = abs(pval)
   3427     t = N.ones(pval.shape, N.float_) * 50
   3428     step = N.ones(pval.shape, N.float_) * 25
   3429     print 'Initial ap2t() prob calc'
   3430     prob = abetai(0.5 * df, 0.5, float(df) / (df + t * t))
   3431     print 'ap2t() iter: ',
   3432     for i in range(10):
   3433       print i, ' ',
   3434       t = N.where(pval < prob, t + step, t - step)
   3435       prob = abetai(0.5 * df, 0.5, float(df) / (df + t * t))
   3436       step = step / 2
   3437     print
   3438     # since this is an ugly hack, we get ugly boundaries
   3439     t = N.where(t > 99.9, 1000, t)  # hit upper-boundary
   3440     t = t + signs
   3441     return t  #, prob, pval
   3442 
   3443   def attest_rel(a,
   3444                  b,
   3445                  dimension=None,
   3446                  printit=0,
   3447                  name1='Samp1',
   3448                  name2='Samp2',
   3449                  writemode='a'):
   3450     """
   3451 Calculates the t-obtained T-test on TWO RELATED samples of scores, a
   3452 and b.  From Numerical Recipies, p.483.  If printit=1, results are
   3453 printed to the screen.  If printit='filename', the results are output
   3454 to 'filename' using the given writemode (default=append).  Dimension
   3455 can equal None (ravel array first), or an integer (the dimension over
   3456 which to operate on a and b).
   3457 
   3458 Usage:   attest_rel(a,b,dimension=None,printit=0,
   3459                     name1='Samp1',name2='Samp2',writemode='a')
   3460 Returns: t-value, two-tailed p-value
   3461 """
   3462     if dimension == None:
   3463       a = N.ravel(a)
   3464       b = N.ravel(b)
   3465       dimension = 0
   3466     if len(a) <> len(b):
   3467       raise ValueError, 'Unequal length arrays.'
   3468     x1 = amean(a, dimension)
   3469     x2 = amean(b, dimension)
   3470     v1 = avar(a, dimension)
   3471     v2 = avar(b, dimension)
   3472     n = a.shape[dimension]
   3473     df = float(n - 1)
   3474     d = (a - b).astype('d')
   3475 
   3476     denom = N.sqrt(
   3477         (n * N.add.reduce(d * d, dimension) - N.add.reduce(d, dimension)**2) /
   3478         df)
   3479     zerodivproblem = N.equal(denom, 0)
   3480     denom = N.where(zerodivproblem, 1, denom
   3481                    )  # avoid zero-division in 1st place
   3482     t = N.add.reduce(d, dimension) / denom  # N-D COMPUTATION HERE!!!!!!
   3483     t = N.where(zerodivproblem, 1.0, t)  # replace NaN/wrong t-values with 1.0
   3484     probs = abetai(0.5 * df, 0.5, float(df) / (df + t * t))
   3485     if type(t) == N.ndarray:
   3486       probs = N.reshape(probs, t.shape)
   3487     if probs.shape == (1,):
   3488       probs = probs[0]
   3489 
   3490     if printit <> 0:
   3491       statname = 'Related samples T-test.'
   3492       outputpairedstats(printit, writemode, name1, n, x1, v1,
   3493                         N.minimum.reduce(N.ravel(a)),
   3494                         N.maximum.reduce(N.ravel(a)), name2, n, x2, v2,
   3495                         N.minimum.reduce(N.ravel(b)),
   3496                         N.maximum.reduce(N.ravel(b)), statname, t, probs)
   3497       return
   3498     return t, probs
   3499 
   3500   def achisquare(f_obs, f_exp=None):
   3501     """
   3502 Calculates a one-way chi square for array of observed frequencies and returns
   3503 the result.  If no expected frequencies are given, the total N is assumed to
   3504 be equally distributed across all groups.
   3505 @@@NOT RIGHT??
   3506 
   3507 Usage:   achisquare(f_obs, f_exp=None)   f_obs = array of observed cell freq.
   3508 Returns: chisquare-statistic, associated p-value
   3509 """
   3510 
   3511     k = len(f_obs)
   3512     if f_exp == None:
   3513       f_exp = N.array([sum(f_obs) / float(k)] * len(f_obs), N.float_)
   3514     f_exp = f_exp.astype(N.float_)
   3515     chisq = N.add.reduce((f_obs - f_exp)**2 / f_exp)
   3516     return chisq, achisqprob(chisq, k - 1)
   3517 
   3518   def aks_2samp(data1, data2):
   3519     """
   3520 Computes the Kolmogorov-Smirnof statistic on 2 samples.  Modified from
   3521 Numerical Recipies in C, page 493.  Returns KS D-value, prob.  Not ufunc-
   3522 like.
   3523 
   3524 Usage:   aks_2samp(data1,data2)  where data1 and data2 are 1D arrays
   3525 Returns: KS D-value, p-value
   3526 """
   3527     j1 = 0  # N.zeros(data1.shape[1:]) TRIED TO MAKE THIS UFUNC-LIKE
   3528     j2 = 0  # N.zeros(data2.shape[1:])
   3529     fn1 = 0.0  # N.zeros(data1.shape[1:],N.float_)
   3530     fn2 = 0.0  # N.zeros(data2.shape[1:],N.float_)
   3531     n1 = data1.shape[0]
   3532     n2 = data2.shape[0]
   3533     en1 = n1 * 1
   3534     en2 = n2 * 1
   3535     d = N.zeros(data1.shape[1:], N.float_)
   3536     data1 = N.sort(data1, 0)
   3537     data2 = N.sort(data2, 0)
   3538     while j1 < n1 and j2 < n2:
   3539       d1 = data1[j1]
   3540       d2 = data2[j2]
   3541       if d1 <= d2:
   3542         fn1 = (j1) / float(en1)
   3543         j1 = j1 + 1
   3544       if d2 <= d1:
   3545         fn2 = (j2) / float(en2)
   3546         j2 = j2 + 1
   3547       dt = (fn2 - fn1)
   3548       if abs(dt) > abs(d):
   3549         d = dt
   3550 #    try:
   3551     en = math.sqrt(en1 * en2 / float(en1 + en2))
   3552     prob = aksprob((en + 0.12 + 0.11 / en) * N.fabs(d))
   3553     #    except:
   3554     #        prob = 1.0
   3555     return d, prob
   3556 
   3557   def amannwhitneyu(x, y):
   3558     """
   3559 Calculates a Mann-Whitney U statistic on the provided scores and
   3560 returns the result.  Use only when the n in each condition is < 20 and
   3561 you have 2 independent samples of ranks.  REMEMBER: Mann-Whitney U is
   3562 significant if the u-obtained is LESS THAN or equal to the critical
   3563 value of U.
   3564 
   3565 Usage:   amannwhitneyu(x,y)     where x,y are arrays of values for 2 conditions
   3566 Returns: u-statistic, one-tailed p-value (i.e., p(z(U)))
   3567 """
   3568     n1 = len(x)
   3569     n2 = len(y)
   3570     ranked = rankdata(N.concatenate((x, y)))
   3571     rankx = ranked[0:n1]  # get the x-ranks
   3572     ranky = ranked[n1:]  # the rest are y-ranks
   3573     u1 = n1 * n2 + (n1 * (n1 + 1)) / 2.0 - sum(rankx)  # calc U for x
   3574     u2 = n1 * n2 - u1  # remainder is U for y
   3575     bigu = max(u1, u2)
   3576     smallu = min(u1, u2)
   3577     proportion = bigu / float(n1 * n2)
   3578     T = math.sqrt(tiecorrect(ranked))  # correction factor for tied scores
   3579     if T == 0:
   3580       raise ValueError, 'All numbers are identical in amannwhitneyu'
   3581     sd = math.sqrt(T * n1 * n2 * (n1 + n2 + 1) / 12.0)
   3582     z = abs((bigu - n1 * n2 / 2.0) / sd)  # normal approximation for prob calc
   3583     return smallu, 1.0 - azprob(z), proportion
   3584 
   3585   def atiecorrect(rankvals):
   3586     """
   3587 Tie-corrector for ties in Mann Whitney U and Kruskal Wallis H tests.
   3588 See Siegel, S. (1956) Nonparametric Statistics for the Behavioral
   3589 Sciences.  New York: McGraw-Hill.  Code adapted from |Stat rankind.c
   3590 code.
   3591 
   3592 Usage:   atiecorrect(rankvals)
   3593 Returns: T correction factor for U or H
   3594 """
   3595     sorted, posn = ashellsort(N.array(rankvals))
   3596     n = len(sorted)
   3597     T = 0.0
   3598     i = 0
   3599     while (i < n - 1):
   3600       if sorted[i] == sorted[i + 1]:
   3601         nties = 1
   3602         while (i < n - 1) and (sorted[i] == sorted[i + 1]):
   3603           nties = nties + 1
   3604           i = i + 1
   3605         T = T + nties**3 - nties
   3606       i = i + 1
   3607     T = T / float(n**3 - n)
   3608     return 1.0 - T
   3609 
   3610   def aranksums(x, y):
   3611     """
   3612 Calculates the rank sums statistic on the provided scores and returns
   3613 the result.
   3614 
   3615 Usage:   aranksums(x,y)     where x,y are arrays of values for 2 conditions
   3616 Returns: z-statistic, two-tailed p-value
   3617 """
   3618     n1 = len(x)
   3619     n2 = len(y)
   3620     alldata = N.concatenate((x, y))
   3621     ranked = arankdata(alldata)
   3622     x = ranked[:n1]
   3623     y = ranked[n1:]
   3624     s = sum(x)
   3625     expected = n1 * (n1 + n2 + 1) / 2.0
   3626     z = (s - expected) / math.sqrt(n1 * n2 * (n1 + n2 + 1) / 12.0)
   3627     prob = 2 * (1.0 - azprob(abs(z)))
   3628     return z, prob
   3629 
   3630   def awilcoxont(x, y):
   3631     """
   3632 Calculates the Wilcoxon T-test for related samples and returns the
   3633 result.  A non-parametric T-test.
   3634 
   3635 Usage:   awilcoxont(x,y)     where x,y are equal-length arrays for 2 conditions
   3636 Returns: t-statistic, two-tailed p-value
   3637 """
   3638     if len(x) <> len(y):
   3639       raise ValueError, 'Unequal N in awilcoxont.  Aborting.'
   3640     d = x - y
   3641     d = N.compress(N.not_equal(d, 0), d)  # Keep all non-zero differences
   3642     count = len(d)
   3643     absd = abs(d)
   3644     absranked = arankdata(absd)
   3645     r_plus = 0.0
   3646     r_minus = 0.0
   3647     for i in range(len(absd)):
   3648       if d[i] < 0:
   3649         r_minus = r_minus + absranked[i]
   3650       else:
   3651         r_plus = r_plus + absranked[i]
   3652     wt = min(r_plus, r_minus)
   3653     mn = count * (count + 1) * 0.25
   3654     se = math.sqrt(count * (count + 1) * (2.0 * count + 1.0) / 24.0)
   3655     z = math.fabs(wt - mn) / se
   3656     z = math.fabs(wt - mn) / se
   3657     prob = 2 * (1.0 - zprob(abs(z)))
   3658     return wt, prob
   3659 
   3660   def akruskalwallish(*args):
   3661     """
   3662 The Kruskal-Wallis H-test is a non-parametric ANOVA for 3 or more
   3663 groups, requiring at least 5 subjects in each group.  This function
   3664 calculates the Kruskal-Wallis H and associated p-value for 3 or more
   3665 independent samples.
   3666 
   3667 Usage:   akruskalwallish(*args)     args are separate arrays for 3+ conditions
   3668 Returns: H-statistic (corrected for ties), associated p-value
   3669 """
   3670     assert len(args) == 3, 'Need at least 3 groups in stats.akruskalwallish()'
   3671     args = list(args)
   3672     n = [0] * len(args)
   3673     n = map(len, args)
   3674     all = []
   3675     for i in range(len(args)):
   3676       all = all + args[i].tolist()
   3677     ranked = rankdata(all)
   3678     T = tiecorrect(ranked)
   3679     for i in range(len(args)):
   3680       args[i] = ranked[0:n[i]]
   3681       del ranked[0:n[i]]
   3682     rsums = []
   3683     for i in range(len(args)):
   3684       rsums.append(sum(args[i])**2)
   3685       rsums[i] = rsums[i] / float(n[i])
   3686     ssbn = sum(rsums)
   3687     totaln = sum(n)
   3688     h = 12.0 / (totaln * (totaln + 1)) * ssbn - 3 * (totaln + 1)
   3689     df = len(args) - 1
   3690     if T == 0:
   3691       raise ValueError, 'All numbers are identical in akruskalwallish'
   3692     h = h / float(T)
   3693     return h, chisqprob(h, df)
   3694 
   3695   def afriedmanchisquare(*args):
   3696     """
   3697 Friedman Chi-Square is a non-parametric, one-way within-subjects
   3698 ANOVA.  This function calculates the Friedman Chi-square test for
   3699 repeated measures and returns the result, along with the associated
   3700 probability value.  It assumes 3 or more repeated measures.  Only 3
   3701 levels requires a minimum of 10 subjects in the study.  Four levels
   3702 requires 5 subjects per level(??).
   3703 
   3704 Usage:   afriedmanchisquare(*args)   args are separate arrays for 2+ conditions
   3705 Returns: chi-square statistic, associated p-value
   3706 """
   3707     k = len(args)
   3708     if k < 3:
   3709       raise ValueError, ('\nLess than 3 levels.  Friedman test not '
   3710                          'appropriate.\n')
   3711     n = len(args[0])
   3712     data = apply(pstat.aabut, args)
   3713     data = data.astype(N.float_)
   3714     for i in range(len(data)):
   3715       data[i] = arankdata(data[i])
   3716     ssbn = asum(asum(args, 1)**2)
   3717     chisq = 12.0 / (k * n * (k + 1)) * ssbn - 3 * n * (k + 1)
   3718     return chisq, achisqprob(chisq, k - 1)
   3719 
   3720 #####################################
   3721 ####  APROBABILITY CALCULATIONS  ####
   3722 #####################################
   3723 
   3724   def achisqprob(chisq, df):
   3725     """
   3726 Returns the (1-tail) probability value associated with the provided chi-square
   3727 value and df.  Heavily modified from chisq.c in Gary Perlman's |Stat.  Can
   3728 handle multiple dimensions.
   3729 
   3730 Usage:   achisqprob(chisq,df)    chisq=chisquare stat., df=degrees of freedom
   3731 """
   3732     BIG = 200.0
   3733 
   3734     def ex(x):
   3735       BIG = 200.0
   3736       exponents = N.where(N.less(x, -BIG), -BIG, x)
   3737       return N.exp(exponents)
   3738 
   3739     if type(chisq) == N.ndarray:
   3740       arrayflag = 1
   3741     else:
   3742       arrayflag = 0
   3743       chisq = N.array([chisq])
   3744     if df < 1:
   3745       return N.ones(chisq.shape, N.float)
   3746     probs = N.zeros(chisq.shape, N.float_)
   3747     probs = N.where(
   3748         N.less_equal(chisq, 0), 1.0, probs)  # set prob=1 for chisq<0
   3749     a = 0.5 * chisq
   3750     if df > 1:
   3751       y = ex(-a)
   3752     if df % 2 == 0:
   3753       even = 1
   3754       s = y * 1
   3755       s2 = s * 1
   3756     else:
   3757       even = 0
   3758       s = 2.0 * azprob(-N.sqrt(chisq))
   3759       s2 = s * 1
   3760     if (df > 2):
   3761       chisq = 0.5 * (df - 1.0)
   3762       if even:
   3763         z = N.ones(probs.shape, N.float_)
   3764       else:
   3765         z = 0.5 * N.ones(probs.shape, N.float_)
   3766       if even:
   3767         e = N.zeros(probs.shape, N.float_)
   3768       else:
   3769         e = N.log(N.sqrt(N.pi)) * N.ones(probs.shape, N.float_)
   3770       c = N.log(a)
   3771       mask = N.zeros(probs.shape)
   3772       a_big = N.greater(a, BIG)
   3773       a_big_frozen = -1 * N.ones(probs.shape, N.float_)
   3774       totalelements = N.multiply.reduce(N.array(probs.shape))
   3775       while asum(mask) <> totalelements:
   3776         e = N.log(z) + e
   3777         s = s + ex(c * z - a - e)
   3778         z = z + 1.0
   3779         #            print z, e, s
   3780         newmask = N.greater(z, chisq)
   3781         a_big_frozen = N.where(newmask * N.equal(mask, 0) * a_big, s,
   3782                                a_big_frozen)
   3783         mask = N.clip(newmask + mask, 0, 1)
   3784       if even:
   3785         z = N.ones(probs.shape, N.float_)
   3786         e = N.ones(probs.shape, N.float_)
   3787       else:
   3788         z = 0.5 * N.ones(probs.shape, N.float_)
   3789         e = 1.0 / N.sqrt(N.pi) / N.sqrt(a) * N.ones(probs.shape, N.float_)
   3790       c = 0.0
   3791       mask = N.zeros(probs.shape)
   3792       a_notbig_frozen = -1 * N.ones(probs.shape, N.float_)
   3793       while asum(mask) <> totalelements:
   3794         e = e * (a / z.astype(N.float_))
   3795         c = c + e
   3796         z = z + 1.0
   3797         #            print '#2', z, e, c, s, c*y+s2
   3798         newmask = N.greater(z, chisq)
   3799         a_notbig_frozen = N.where(newmask * N.equal(mask, 0) * (1 - a_big),
   3800                                   c * y + s2, a_notbig_frozen)
   3801         mask = N.clip(newmask + mask, 0, 1)
   3802       probs = N.where(
   3803           N.equal(probs, 1), 1, N.where(
   3804               N.greater(a, BIG), a_big_frozen, a_notbig_frozen))
   3805       return probs
   3806     else:
   3807       return s
   3808 
   3809   def aerfcc(x):
   3810     """
   3811 Returns the complementary error function erfc(x) with fractional error
   3812 everywhere less than 1.2e-7.  Adapted from Numerical Recipies.  Can
   3813 handle multiple dimensions.
   3814 
   3815 Usage:   aerfcc(x)
   3816 """
   3817     z = abs(x)
   3818     t = 1.0 / (1.0 + 0.5 * z)
   3819     ans = t * N.exp(-z * z - 1.26551223 + t * (1.00002368 + t * (
   3820         0.37409196 + t * (0.09678418 + t * (-0.18628806 + t * (
   3821             0.27886807 + t * (-1.13520398 + t * (1.48851587 + t * (
   3822                 -0.82215223 + t * 0.17087277)))))))))
   3823     return N.where(N.greater_equal(x, 0), ans, 2.0 - ans)
   3824 
   3825   def azprob(z):
   3826     """
   3827 Returns the area under the normal curve 'to the left of' the given z value.
   3828 Thus,
   3829     for z<0, zprob(z) = 1-tail probability
   3830     for z>0, 1.0-zprob(z) = 1-tail probability
   3831     for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability
   3832 Adapted from z.c in Gary Perlman's |Stat.  Can handle multiple dimensions.
   3833 
   3834 Usage:   azprob(z)    where z is a z-value
   3835 """
   3836 
   3837     def yfunc(y):
   3838       x = (((((((
   3839           ((((((-0.000045255659 * y + 0.000152529290) * y - 0.000019538132) * y
   3840               - 0.000676904986) * y + 0.001390604284) * y - 0.000794620820) * y
   3841            - 0.002034254874) * y + 0.006549791214) * y - 0.010557625006) * y +
   3842                0.011630447319) * y - 0.009279453341) * y + 0.005353579108) * y -
   3843             0.002141268741) * y + 0.000535310849) * y + 0.999936657524
   3844       return x
   3845 
   3846     def wfunc(w):
   3847       x = ((((((((0.000124818987 * w - 0.001075204047) * w + 0.005198775019) * w
   3848                 - 0.019198292004) * w + 0.059054035642) * w - 0.151968751364) *
   3849              w + 0.319152932694) * w - 0.531923007300) * w +
   3850            0.797884560593) * N.sqrt(w) * 2.0
   3851       return x
   3852 
   3853     Z_MAX = 6.0  # maximum meaningful z-value
   3854     x = N.zeros(z.shape, N.float_)  # initialize
   3855     y = 0.5 * N.fabs(z)
   3856     x = N.where(N.less(y, 1.0), wfunc(y * y), yfunc(y - 2.0))  # get x's
   3857     x = N.where(N.greater(y, Z_MAX * 0.5), 1.0, x)  # kill those with big Z
   3858     prob = N.where(N.greater(z, 0), (x + 1) * 0.5, (1 - x) * 0.5)
   3859     return prob
   3860 
   3861   def aksprob(alam):
   3862     """
   3863 Returns the probability value for a K-S statistic computed via ks_2samp.
   3864 Adapted from Numerical Recipies.  Can handle multiple dimensions.
   3865 
   3866 Usage:   aksprob(alam)
   3867 """
   3868     if type(alam) == N.ndarray:
   3869       frozen = -1 * N.ones(alam.shape, N.float64)
   3870       alam = alam.astype(N.float64)
   3871       arrayflag = 1
   3872     else:
   3873       frozen = N.array(-1.)
   3874       alam = N.array(alam, N.float64)
   3875       arrayflag = 1
   3876     mask = N.zeros(alam.shape)
   3877     fac = 2.0 * N.ones(alam.shape, N.float_)
   3878     sum = N.zeros(alam.shape, N.float_)
   3879     termbf = N.zeros(alam.shape, N.float_)
   3880     a2 = N.array(-2.0 * alam * alam, N.float64)
   3881     totalelements = N.multiply.reduce(N.array(mask.shape))
   3882     for j in range(1, 201):
   3883       if asum(mask) == totalelements:
   3884         break
   3885       exponents = (a2 * j * j)
   3886       overflowmask = N.less(exponents, -746)
   3887       frozen = N.where(overflowmask, 0, frozen)
   3888       mask = mask + overflowmask
   3889       term = fac * N.exp(exponents)
   3890       sum = sum + term
   3891       newmask = N.where(
   3892           N.less_equal(
   3893               abs(term), (0.001 * termbf)) + N.less(
   3894                   abs(term), 1.0e-8 * sum), 1, 0)
   3895       frozen = N.where(newmask * N.equal(mask, 0), sum, frozen)
   3896       mask = N.clip(mask + newmask, 0, 1)
   3897       fac = -fac
   3898       termbf = abs(term)
   3899     if arrayflag:
   3900       return N.where(
   3901           N.equal(frozen, -1), 1.0, frozen)  # 1.0 if doesn't converge
   3902     else:
   3903       return N.where(
   3904           N.equal(frozen, -1), 1.0, frozen)[0]  # 1.0 if doesn't converge
   3905 
   3906   def afprob(dfnum, dfden, F):
   3907     """
   3908 Returns the 1-tailed significance level (p-value) of an F statistic
   3909 given the degrees of freedom for the numerator (dfR-dfF) and the degrees
   3910 of freedom for the denominator (dfF).  Can handle multiple dims for F.
   3911 
   3912 Usage:   afprob(dfnum, dfden, F)   where usually dfnum=dfbn, dfden=dfwn
   3913 """
   3914     if type(F) == N.ndarray:
   3915       return abetai(0.5 * dfden, 0.5 * dfnum, dfden / (1.0 * dfden + dfnum * F))
   3916     else:
   3917       return abetai(0.5 * dfden, 0.5 * dfnum, dfden / float(dfden + dfnum * F))
   3918 
   3919   def abetacf(a, b, x, verbose=1):
   3920     """
   3921 Evaluates the continued fraction form of the incomplete Beta function,
   3922 betai.  (Adapted from: Numerical Recipies in C.)  Can handle multiple
   3923 dimensions for x.
   3924 
   3925 Usage:   abetacf(a,b,x,verbose=1)
   3926 """
   3927     ITMAX = 200
   3928     EPS = 3.0e-7
   3929 
   3930     arrayflag = 1
   3931     if type(x) == N.ndarray:
   3932       frozen = N.ones(x.shape,
   3933                       N.float_) * -1  #start out w/ -1s, should replace all
   3934     else:
   3935       arrayflag = 0
   3936       frozen = N.array([-1])
   3937       x = N.array([x])
   3938     mask = N.zeros(x.shape)
   3939     bm = az = am = 1.0
   3940     qab = a + b
   3941     qap = a + 1.0
   3942     qam = a - 1.0
   3943     bz = 1.0 - qab * x / qap
   3944     for i in range(ITMAX + 1):
   3945       if N.sum(N.ravel(N.equal(frozen, -1))) == 0:
   3946         break
   3947       em = float(i + 1)
   3948       tem = em + em
   3949       d = em * (b - em) * x / ((qam + tem) * (a + tem))
   3950       ap = az + d * am
   3951       bp = bz + d * bm
   3952       d = -(a + em) * (qab + em) * x / ((qap + tem) * (a + tem))
   3953       app = ap + d * az
   3954       bpp = bp + d * bz
   3955       aold = az * 1
   3956       am = ap / bpp
   3957       bm = bp / bpp
   3958       az = app / bpp
   3959       bz = 1.0
   3960       newmask = N.less(abs(az - aold), EPS * abs(az))
   3961       frozen = N.where(newmask * N.equal(mask, 0), az, frozen)
   3962       mask = N.clip(mask + newmask, 0, 1)
   3963     noconverge = asum(N.equal(frozen, -1))
   3964     if noconverge <> 0 and verbose:
   3965       print 'a or b too big, or ITMAX too small in Betacf for ', noconverge, ' elements'
   3966     if arrayflag:
   3967       return frozen
   3968     else:
   3969       return frozen[0]
   3970 
   3971   def agammln(xx):
   3972     """
   3973 Returns the gamma function of xx.
   3974     Gamma(z) = Integral(0,infinity) of t^(z-1)exp(-t) dt.
   3975 Adapted from: Numerical Recipies in C.  Can handle multiple dims ... but
   3976 probably doesn't normally have to.
   3977 
   3978 Usage:   agammln(xx)
   3979 """
   3980     coeff = [76.18009173, -86.50532033, 24.01409822, -1.231739516,
   3981              0.120858003e-2, -0.536382e-5]
   3982     x = xx - 1.0
   3983     tmp = x + 5.5
   3984     tmp = tmp - (x + 0.5) * N.log(tmp)
   3985     ser = 1.0
   3986     for j in range(len(coeff)):
   3987       x = x + 1
   3988       ser = ser + coeff[j] / x
   3989     return -tmp + N.log(2.50662827465 * ser)
   3990 
   3991   def abetai(a, b, x, verbose=1):
   3992     """
   3993 Returns the incomplete beta function:
   3994 
   3995     I-sub-x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt)
   3996 
   3997 where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma
   3998 function of a.  The continued fraction formulation is implemented
   3999 here, using the betacf function.  (Adapted from: Numerical Recipies in
   4000 C.)  Can handle multiple dimensions.
   4001 
   4002 Usage:   abetai(a,b,x,verbose=1)
   4003 """
   4004     TINY = 1e-15
   4005     if type(a) == N.ndarray:
   4006       if asum(N.less(x, 0) + N.greater(x, 1)) <> 0:
   4007         raise ValueError, 'Bad x in abetai'
   4008     x = N.where(N.equal(x, 0), TINY, x)
   4009     x = N.where(N.equal(x, 1.0), 1 - TINY, x)
   4010 
   4011     bt = N.where(N.equal(x, 0) + N.equal(x, 1), 0, -1)
   4012     exponents = (gammln(a + b) - gammln(a) - gammln(b) + a * N.log(x) + b *
   4013                  N.log(1.0 - x))
   4014     # 746 (below) is the MAX POSSIBLE BEFORE OVERFLOW
   4015     exponents = N.where(N.less(exponents, -740), -740, exponents)
   4016     bt = N.exp(exponents)
   4017     if type(x) == N.ndarray:
   4018       ans = N.where(
   4019           N.less(x, (a + 1) / (a + b + 2.0)), bt * abetacf(a, b, x, verbose) /
   4020           float(a), 1.0 - bt * abetacf(b, a, 1.0 - x, verbose) / float(b))
   4021     else:
   4022       if x < (a + 1) / (a + b + 2.0):
   4023         ans = bt * abetacf(a, b, x, verbose) / float(a)
   4024       else:
   4025         ans = 1.0 - bt * abetacf(b, a, 1.0 - x, verbose) / float(b)
   4026     return ans
   4027 
   4028 #####################################
   4029 #######  AANOVA CALCULATIONS  #######
   4030 #####################################
   4031 
   4032   import numpy.linalg, operator
   4033   LA = numpy.linalg
   4034 
   4035   def aglm(data, para):
   4036     """
   4037 Calculates a linear model fit ... anova/ancova/lin-regress/t-test/etc. Taken
   4038 from:
   4039     Peterson et al. Statistical limitations in functional neuroimaging
   4040     I. Non-inferential methods and statistical models.  Phil Trans Royal Soc
   4041     Lond B 354: 1239-1260.
   4042 
   4043 Usage:   aglm(data,para)
   4044 Returns: statistic, p-value ???
   4045 """
   4046     if len(para) <> len(data):
   4047       print 'data and para must be same length in aglm'
   4048       return
   4049     n = len(para)
   4050     p = pstat.aunique(para)
   4051     x = N.zeros((n, len(p)))  # design matrix
   4052     for l in range(len(p)):
   4053       x[:, l] = N.equal(para, p[l])
   4054     b = N.dot(
   4055         N.dot(
   4056             LA.inv(N.dot(
   4057                 N.transpose(x), x)),  # i.e., b=inv(X'X)X'Y
   4058             N.transpose(x)),
   4059         data)
   4060     diffs = (data - N.dot(x, b))
   4061     s_sq = 1. / (n - len(p)) * N.dot(N.transpose(diffs), diffs)
   4062 
   4063     if len(p) == 2:  # ttest_ind
   4064       c = N.array([1, -1])
   4065       df = n - 2
   4066       fact = asum(1.0 / asum(x, 0))  # i.e., 1/n1 + 1/n2 + 1/n3 ...
   4067       t = N.dot(c, b) / N.sqrt(s_sq * fact)
   4068       probs = abetai(0.5 * df, 0.5, float(df) / (df + t * t))
   4069       return t, probs
   4070 
   4071   def aF_oneway(*args):
   4072     """
   4073 Performs a 1-way ANOVA, returning an F-value and probability given
   4074 any number of groups.  From Heiman, pp.394-7.
   4075 
   4076 Usage:   aF_oneway (*args)    where *args is 2 or more arrays, one per
   4077                                   treatment group
   4078 Returns: f-value, probability
   4079 """
   4080     na = len(args)  # ANOVA on 'na' groups, each in it's own array
   4081     means = [0] * na
   4082     vars = [0] * na
   4083     ns = [0] * na
   4084     alldata = []
   4085     tmp = map(N.array, args)
   4086     means = map(amean, tmp)
   4087     vars = map(avar, tmp)
   4088     ns = map(len, args)
   4089     alldata = N.concatenate(args)
   4090     bign = len(alldata)
   4091     sstot = ass(alldata) - (asquare_of_sums(alldata) / float(bign))
   4092     ssbn = 0
   4093     for a in args:
   4094       ssbn = ssbn + asquare_of_sums(N.array(a)) / float(len(a))
   4095     ssbn = ssbn - (asquare_of_sums(alldata) / float(bign))
   4096     sswn = sstot - ssbn
   4097     dfbn = na - 1
   4098     dfwn = bign - na
   4099     msb = ssbn / float(dfbn)
   4100     msw = sswn / float(dfwn)
   4101     f = msb / msw
   4102     prob = fprob(dfbn, dfwn, f)
   4103     return f, prob
   4104 
   4105   def aF_value(ER, EF, dfR, dfF):
   4106     """
   4107 Returns an F-statistic given the following:
   4108         ER  = error associated with the null hypothesis (the Restricted model)
   4109         EF  = error associated with the alternate hypothesis (the Full model)
   4110         dfR = degrees of freedom the Restricted model
   4111         dfF = degrees of freedom associated with the Restricted model
   4112 """
   4113     return ((ER - EF) / float(dfR - dfF) / (EF / float(dfF)))
   4114 
   4115   def outputfstats(Enum, Eden, dfnum, dfden, f, prob):
   4116     Enum = round(Enum, 3)
   4117     Eden = round(Eden, 3)
   4118     dfnum = round(Enum, 3)
   4119     dfden = round(dfden, 3)
   4120     f = round(f, 3)
   4121     prob = round(prob, 3)
   4122     suffix = ''  # for *s after the p-value
   4123     if prob < 0.001:
   4124       suffix = '  ***'
   4125     elif prob < 0.01:
   4126       suffix = '  **'
   4127     elif prob < 0.05:
   4128       suffix = '  *'
   4129     title = [['EF/ER', 'DF', 'Mean Square', 'F-value', 'prob', '']]
   4130     lofl = title + [[Enum, dfnum, round(Enum / float(dfnum), 3), f, prob, suffix
   4131                     ], [Eden, dfden, round(Eden / float(dfden), 3), '', '', '']]
   4132     pstat.printcc(lofl)
   4133     return
   4134 
   4135   def F_value_multivariate(ER, EF, dfnum, dfden):
   4136     """
   4137 Returns an F-statistic given the following:
   4138         ER  = error associated with the null hypothesis (the Restricted model)
   4139         EF  = error associated with the alternate hypothesis (the Full model)
   4140         dfR = degrees of freedom the Restricted model
   4141         dfF = degrees of freedom associated with the Restricted model
   4142 where ER and EF are matrices from a multivariate F calculation.
   4143 """
   4144     if type(ER) in [IntType, FloatType]:
   4145       ER = N.array([[ER]])
   4146     if type(EF) in [IntType, FloatType]:
   4147       EF = N.array([[EF]])
   4148     n_um = (LA.det(ER) - LA.det(EF)) / float(dfnum)
   4149     d_en = LA.det(EF) / float(dfden)
   4150     return n_um / d_en
   4151 
   4152 #####################################
   4153 #######  ASUPPORT FUNCTIONS  ########
   4154 #####################################
   4155 
   4156   def asign(a):
   4157     """
   4158 Usage:   asign(a)
   4159 Returns: array shape of a, with -1 where a<0 and +1 where a>=0
   4160 """
   4161     a = N.asarray(a)
   4162     if ((type(a) == type(1.4)) or (type(a) == type(1))):
   4163       return a - a - N.less(a, 0) + N.greater(a, 0)
   4164     else:
   4165       return N.zeros(N.shape(a)) - N.less(a, 0) + N.greater(a, 0)
   4166 
   4167   def asum(a, dimension=None, keepdims=0):
   4168     """
   4169 An alternative to the Numeric.add.reduce function, which allows one to
   4170 (1) collapse over multiple dimensions at once, and/or (2) to retain
   4171 all dimensions in the original array (squashing one down to size.
   4172 Dimension can equal None (ravel array first), an integer (the
   4173 dimension over which to operate), or a sequence (operate over multiple
   4174 dimensions).  If keepdims=1, the resulting array will have as many
   4175 dimensions as the input array.
   4176 
   4177 Usage:   asum(a, dimension=None, keepdims=0)
   4178 Returns: array summed along 'dimension'(s), same _number_ of dims if keepdims=1
   4179 """
   4180     if type(a) == N.ndarray and a.dtype in [N.int_, N.short, N.ubyte]:
   4181       a = a.astype(N.float_)
   4182     if dimension == None:
   4183       s = N.sum(N.ravel(a))
   4184     elif type(dimension) in [IntType, FloatType]:
   4185       s = N.add.reduce(a, dimension)
   4186       if keepdims == 1:
   4187         shp = list(a.shape)
   4188         shp[dimension] = 1
   4189         s = N.reshape(s, shp)
   4190     else:  # must be a SEQUENCE of dims to sum over
   4191       dims = list(dimension)
   4192       dims.sort()
   4193       dims.reverse()
   4194       s = a * 1.0
   4195       for dim in dims:
   4196         s = N.add.reduce(s, dim)
   4197       if keepdims == 1:
   4198         shp = list(a.shape)
   4199         for dim in dims:
   4200           shp[dim] = 1
   4201         s = N.reshape(s, shp)
   4202     return s
   4203 
   4204   def acumsum(a, dimension=None):
   4205     """
   4206 Returns an array consisting of the cumulative sum of the items in the
   4207 passed array.  Dimension can equal None (ravel array first), an
   4208 integer (the dimension over which to operate), or a sequence (operate
   4209 over multiple dimensions, but this last one just barely makes sense).
   4210 
   4211 Usage:   acumsum(a,dimension=None)
   4212 """
   4213     if dimension == None:
   4214       a = N.ravel(a)
   4215       dimension = 0
   4216     if type(dimension) in [ListType, TupleType, N.ndarray]:
   4217       dimension = list(dimension)
   4218       dimension.sort()
   4219       dimension.reverse()
   4220       for d in dimension:
   4221         a = N.add.accumulate(a, d)
   4222       return a
   4223     else:
   4224       return N.add.accumulate(a, dimension)
   4225 
   4226   def ass(inarray, dimension=None, keepdims=0):
   4227     """
   4228 Squares each value in the passed array, adds these squares & returns
   4229 the result.  Unfortunate function name. :-) Defaults to ALL values in
   4230 the array.  Dimension can equal None (ravel array first), an integer
   4231 (the dimension over which to operate), or a sequence (operate over
   4232 multiple dimensions).  Set keepdims=1 to maintain the original number
   4233 of dimensions.
   4234 
   4235 Usage:   ass(inarray, dimension=None, keepdims=0)
   4236 Returns: sum-along-'dimension' for (inarray*inarray)
   4237 """
   4238     if dimension == None:
   4239       inarray = N.ravel(inarray)
   4240       dimension = 0
   4241     return asum(inarray * inarray, dimension, keepdims)
   4242 
   4243   def asummult(array1, array2, dimension=None, keepdims=0):
   4244     """
   4245 Multiplies elements in array1 and array2, element by element, and
   4246 returns the sum (along 'dimension') of all resulting multiplications.
   4247 Dimension can equal None (ravel array first), an integer (the
   4248 dimension over which to operate), or a sequence (operate over multiple
   4249 dimensions).  A trivial function, but included for completeness.
   4250 
   4251 Usage:   asummult(array1,array2,dimension=None,keepdims=0)
   4252 """
   4253     if dimension == None:
   4254       array1 = N.ravel(array1)
   4255       array2 = N.ravel(array2)
   4256       dimension = 0
   4257     return asum(array1 * array2, dimension, keepdims)
   4258 
   4259   def asquare_of_sums(inarray, dimension=None, keepdims=0):
   4260     """
   4261 Adds the values in the passed array, squares that sum, and returns the
   4262 result.  Dimension can equal None (ravel array first), an integer (the
   4263 dimension over which to operate), or a sequence (operate over multiple
   4264 dimensions).  If keepdims=1, the returned array will have the same
   4265 NUMBER of dimensions as the original.
   4266 
   4267 Usage:   asquare_of_sums(inarray, dimension=None, keepdims=0)
   4268 Returns: the square of the sum over dim(s) in dimension
   4269 """
   4270     if dimension == None:
   4271       inarray = N.ravel(inarray)
   4272       dimension = 0
   4273     s = asum(inarray, dimension, keepdims)
   4274     if type(s) == N.ndarray:
   4275       return s.astype(N.float_) * s
   4276     else:
   4277       return float(s) * s
   4278 
   4279   def asumdiffsquared(a, b, dimension=None, keepdims=0):
   4280     """
   4281 Takes pairwise differences of the values in arrays a and b, squares
   4282 these differences, and returns the sum of these squares.  Dimension
   4283 can equal None (ravel array first), an integer (the dimension over
   4284 which to operate), or a sequence (operate over multiple dimensions).
   4285 keepdims=1 means the return shape = len(a.shape) = len(b.shape)
   4286 
   4287 Usage:   asumdiffsquared(a,b)
   4288 Returns: sum[ravel(a-b)**2]
   4289 """
   4290     if dimension == None:
   4291       inarray = N.ravel(a)
   4292       dimension = 0
   4293     return asum((a - b)**2, dimension, keepdims)
   4294 
   4295   def ashellsort(inarray):
   4296     """
   4297 Shellsort algorithm.  Sorts a 1D-array.
   4298 
   4299 Usage:   ashellsort(inarray)
   4300 Returns: sorted-inarray, sorting-index-vector (for original array)
   4301 """
   4302     n = len(inarray)
   4303     svec = inarray * 1.0
   4304     ivec = range(n)
   4305     gap = n / 2  # integer division needed
   4306     while gap > 0:
   4307       for i in range(gap, n):
   4308         for j in range(i - gap, -1, -gap):
   4309           while j >= 0 and svec[j] > svec[j + gap]:
   4310             temp = svec[j]
   4311             svec[j] = svec[j + gap]
   4312             svec[j + gap] = temp
   4313             itemp = ivec[j]
   4314             ivec[j] = ivec[j + gap]
   4315             ivec[j + gap] = itemp
   4316       gap = gap / 2  # integer division needed
   4317 #    svec is now sorted input vector, ivec has the order svec[i] = vec[ivec[i]]
   4318     return svec, ivec
   4319 
   4320   def arankdata(inarray):
   4321     """
   4322 Ranks the data in inarray, dealing with ties appropritely.  Assumes
   4323 a 1D inarray.  Adapted from Gary Perlman's |Stat ranksort.
   4324 
   4325 Usage:   arankdata(inarray)
   4326 Returns: array of length equal to inarray, containing rank scores
   4327 """
   4328     n = len(inarray)
   4329     svec, ivec = ashellsort(inarray)
   4330     sumranks = 0
   4331     dupcount = 0
   4332     newarray = N.zeros(n, N.float_)
   4333     for i in range(n):
   4334       sumranks = sumranks + i
   4335       dupcount = dupcount + 1
   4336       if i == n - 1 or svec[i] <> svec[i + 1]:
   4337         averank = sumranks / float(dupcount) + 1
   4338         for j in range(i - dupcount + 1, i + 1):
   4339           newarray[ivec[j]] = averank
   4340         sumranks = 0
   4341         dupcount = 0
   4342     return newarray
   4343 
   4344   def afindwithin(data):
   4345     """
   4346 Returns a binary vector, 1=within-subject factor, 0=between.  Input
   4347 equals the entire data array (i.e., column 1=random factor, last
   4348 column = measured values.
   4349 
   4350 Usage:   afindwithin(data)     data in |Stat format
   4351 """
   4352     numfact = len(data[0]) - 2
   4353     withinvec = [0] * numfact
   4354     for col in range(1, numfact + 1):
   4355       rows = pstat.linexand(data, col, pstat.unique(pstat.colex(data, 1))[0]
   4356                            )  # get 1 level of this factor
   4357       if len(pstat.unique(pstat.colex(rows, 0))) < len(
   4358           rows):  # if fewer subjects than scores on this factor
   4359         withinvec[col - 1] = 1
   4360     return withinvec
   4361 
   4362   #########################################################
   4363   #########################################################
   4364   ######  RE-DEFINE DISPATCHES TO INCLUDE ARRAYS  #########
   4365   #########################################################
   4366   #########################################################
   4367 
   4368   ## CENTRAL TENDENCY:
   4369   geometricmean = Dispatch(
   4370       (lgeometricmean, (ListType, TupleType)), (ageometricmean, (N.ndarray,)))
   4371   harmonicmean = Dispatch(
   4372       (lharmonicmean, (ListType, TupleType)), (aharmonicmean, (N.ndarray,)))
   4373   mean = Dispatch((lmean, (ListType, TupleType)), (amean, (N.ndarray,)))
   4374   median = Dispatch((lmedian, (ListType, TupleType)), (amedian, (N.ndarray,)))
   4375   medianscore = Dispatch(
   4376       (lmedianscore, (ListType, TupleType)), (amedianscore, (N.ndarray,)))
   4377   mode = Dispatch((lmode, (ListType, TupleType)), (amode, (N.ndarray,)))
   4378   tmean = Dispatch((atmean, (N.ndarray,)))
   4379   tvar = Dispatch((atvar, (N.ndarray,)))
   4380   tstdev = Dispatch((atstdev, (N.ndarray,)))
   4381   tsem = Dispatch((atsem, (N.ndarray,)))
   4382 
   4383   ## VARIATION:
   4384   moment = Dispatch((lmoment, (ListType, TupleType)), (amoment, (N.ndarray,)))
   4385   variation = Dispatch(
   4386       (lvariation, (ListType, TupleType)), (avariation, (N.ndarray,)))
   4387   skew = Dispatch((lskew, (ListType, TupleType)), (askew, (N.ndarray,)))
   4388   kurtosis = Dispatch(
   4389       (lkurtosis, (ListType, TupleType)), (akurtosis, (N.ndarray,)))
   4390   describe = Dispatch(
   4391       (ldescribe, (ListType, TupleType)), (adescribe, (N.ndarray,)))
   4392 
   4393   ## DISTRIBUTION TESTS
   4394 
   4395   skewtest = Dispatch(
   4396       (askewtest, (ListType, TupleType)), (askewtest, (N.ndarray,)))
   4397   kurtosistest = Dispatch(
   4398       (akurtosistest, (ListType, TupleType)), (akurtosistest, (N.ndarray,)))
   4399   normaltest = Dispatch(
   4400       (anormaltest, (ListType, TupleType)), (anormaltest, (N.ndarray,)))
   4401 
   4402   ## FREQUENCY STATS:
   4403   itemfreq = Dispatch(
   4404       (litemfreq, (ListType, TupleType)), (aitemfreq, (N.ndarray,)))
   4405   scoreatpercentile = Dispatch(
   4406       (lscoreatpercentile, (ListType, TupleType)), (ascoreatpercentile,
   4407                                                     (N.ndarray,)))
   4408   percentileofscore = Dispatch(
   4409       (lpercentileofscore, (ListType, TupleType)), (apercentileofscore,
   4410                                                     (N.ndarray,)))
   4411   histogram = Dispatch(
   4412       (lhistogram, (ListType, TupleType)), (ahistogram, (N.ndarray,)))
   4413   cumfreq = Dispatch(
   4414       (lcumfreq, (ListType, TupleType)), (acumfreq, (N.ndarray,)))
   4415   relfreq = Dispatch(
   4416       (lrelfreq, (ListType, TupleType)), (arelfreq, (N.ndarray,)))
   4417 
   4418   ## VARIABILITY:
   4419   obrientransform = Dispatch(
   4420       (lobrientransform, (ListType, TupleType)), (aobrientransform,
   4421                                                   (N.ndarray,)))
   4422   samplevar = Dispatch(
   4423       (lsamplevar, (ListType, TupleType)), (asamplevar, (N.ndarray,)))
   4424   samplestdev = Dispatch(
   4425       (lsamplestdev, (ListType, TupleType)), (asamplestdev, (N.ndarray,)))
   4426   signaltonoise = Dispatch((asignaltonoise, (N.ndarray,)),)
   4427   var = Dispatch((lvar, (ListType, TupleType)), (avar, (N.ndarray,)))
   4428   stdev = Dispatch((lstdev, (ListType, TupleType)), (astdev, (N.ndarray,)))
   4429   sterr = Dispatch((lsterr, (ListType, TupleType)), (asterr, (N.ndarray,)))
   4430   sem = Dispatch((lsem, (ListType, TupleType)), (asem, (N.ndarray,)))
   4431   z = Dispatch((lz, (ListType, TupleType)), (az, (N.ndarray,)))
   4432   zs = Dispatch((lzs, (ListType, TupleType)), (azs, (N.ndarray,)))
   4433 
   4434   ## TRIMMING FCNS:
   4435   threshold = Dispatch((athreshold, (N.ndarray,)),)
   4436   trimboth = Dispatch(
   4437       (ltrimboth, (ListType, TupleType)), (atrimboth, (N.ndarray,)))
   4438   trim1 = Dispatch((ltrim1, (ListType, TupleType)), (atrim1, (N.ndarray,)))
   4439 
   4440   ## CORRELATION FCNS:
   4441   paired = Dispatch((lpaired, (ListType, TupleType)), (apaired, (N.ndarray,)))
   4442   lincc = Dispatch((llincc, (ListType, TupleType)), (alincc, (N.ndarray,)))
   4443   pearsonr = Dispatch(
   4444       (lpearsonr, (ListType, TupleType)), (apearsonr, (N.ndarray,)))
   4445   spearmanr = Dispatch(
   4446       (lspearmanr, (ListType, TupleType)), (aspearmanr, (N.ndarray,)))
   4447   pointbiserialr = Dispatch(
   4448       (lpointbiserialr, (ListType, TupleType)), (apointbiserialr, (N.ndarray,)))
   4449   kendalltau = Dispatch(
   4450       (lkendalltau, (ListType, TupleType)), (akendalltau, (N.ndarray,)))
   4451   linregress = Dispatch(
   4452       (llinregress, (ListType, TupleType)), (alinregress, (N.ndarray,)))
   4453 
   4454   ## INFERENTIAL STATS:
   4455   ttest_1samp = Dispatch(
   4456       (lttest_1samp, (ListType, TupleType)), (attest_1samp, (N.ndarray,)))
   4457   ttest_ind = Dispatch(
   4458       (lttest_ind, (ListType, TupleType)), (attest_ind, (N.ndarray,)))
   4459   ttest_rel = Dispatch(
   4460       (lttest_rel, (ListType, TupleType)), (attest_rel, (N.ndarray,)))
   4461   chisquare = Dispatch(
   4462       (lchisquare, (ListType, TupleType)), (achisquare, (N.ndarray,)))
   4463   ks_2samp = Dispatch(
   4464       (lks_2samp, (ListType, TupleType)), (aks_2samp, (N.ndarray,)))
   4465   mannwhitneyu = Dispatch(
   4466       (lmannwhitneyu, (ListType, TupleType)), (amannwhitneyu, (N.ndarray,)))
   4467   tiecorrect = Dispatch(
   4468       (ltiecorrect, (ListType, TupleType)), (atiecorrect, (N.ndarray,)))
   4469   ranksums = Dispatch(
   4470       (lranksums, (ListType, TupleType)), (aranksums, (N.ndarray,)))
   4471   wilcoxont = Dispatch(
   4472       (lwilcoxont, (ListType, TupleType)), (awilcoxont, (N.ndarray,)))
   4473   kruskalwallish = Dispatch(
   4474       (lkruskalwallish, (ListType, TupleType)), (akruskalwallish, (N.ndarray,)))
   4475   friedmanchisquare = Dispatch(
   4476       (lfriedmanchisquare, (ListType, TupleType)), (afriedmanchisquare,
   4477                                                     (N.ndarray,)))
   4478 
   4479   ## PROBABILITY CALCS:
   4480   chisqprob = Dispatch(
   4481       (lchisqprob, (IntType, FloatType)), (achisqprob, (N.ndarray,)))
   4482   zprob = Dispatch((lzprob, (IntType, FloatType)), (azprob, (N.ndarray,)))
   4483   ksprob = Dispatch((lksprob, (IntType, FloatType)), (aksprob, (N.ndarray,)))
   4484   fprob = Dispatch((lfprob, (IntType, FloatType)), (afprob, (N.ndarray,)))
   4485   betacf = Dispatch((lbetacf, (IntType, FloatType)), (abetacf, (N.ndarray,)))
   4486   betai = Dispatch((lbetai, (IntType, FloatType)), (abetai, (N.ndarray,)))
   4487   erfcc = Dispatch((lerfcc, (IntType, FloatType)), (aerfcc, (N.ndarray,)))
   4488   gammln = Dispatch((lgammln, (IntType, FloatType)), (agammln, (N.ndarray,)))
   4489 
   4490   ## ANOVA FUNCTIONS:
   4491   F_oneway = Dispatch(
   4492       (lF_oneway, (ListType, TupleType)), (aF_oneway, (N.ndarray,)))
   4493   F_value = Dispatch(
   4494       (lF_value, (ListType, TupleType)), (aF_value, (N.ndarray,)))
   4495 
   4496   ## SUPPORT FUNCTIONS:
   4497   incr = Dispatch((lincr, (ListType, TupleType, N.ndarray)),)
   4498   sum = Dispatch((lsum, (ListType, TupleType)), (asum, (N.ndarray,)))
   4499   cumsum = Dispatch((lcumsum, (ListType, TupleType)), (acumsum, (N.ndarray,)))
   4500   ss = Dispatch((lss, (ListType, TupleType)), (ass, (N.ndarray,)))
   4501   summult = Dispatch(
   4502       (lsummult, (ListType, TupleType)), (asummult, (N.ndarray,)))
   4503   square_of_sums = Dispatch(
   4504       (lsquare_of_sums, (ListType, TupleType)), (asquare_of_sums, (N.ndarray,)))
   4505   sumdiffsquared = Dispatch(
   4506       (lsumdiffsquared, (ListType, TupleType)), (asumdiffsquared, (N.ndarray,)))
   4507   shellsort = Dispatch(
   4508       (lshellsort, (ListType, TupleType)), (ashellsort, (N.ndarray,)))
   4509   rankdata = Dispatch(
   4510       (lrankdata, (ListType, TupleType)), (arankdata, (N.ndarray,)))
   4511   findwithin = Dispatch(
   4512       (lfindwithin, (ListType, TupleType)), (afindwithin, (N.ndarray,)))
   4513 
   4514 ######################  END OF NUMERIC FUNCTION BLOCK  #####################
   4515 
   4516 ######################  END OF STATISTICAL FUNCTIONS  ######################
   4517 
   4518 except ImportError:
   4519   pass
   4520