Home | History | Annotate | Download | only in library
      1 :mod:`random` --- Generate pseudo-random numbers
      2 ================================================
      3 
      4 .. module:: random
      5    :synopsis: Generate pseudo-random numbers with various common distributions.
      6 
      7 **Source code:** :source:`Lib/random.py`
      8 
      9 --------------
     10 
     11 This module implements pseudo-random number generators for various
     12 distributions.
     13 
     14 For integers, there is uniform selection from a range. For sequences, there is
     15 uniform selection of a random element, a function to generate a random
     16 permutation of a list in-place, and a function for random sampling without
     17 replacement.
     18 
     19 On the real line, there are functions to compute uniform, normal (Gaussian),
     20 lognormal, negative exponential, gamma, and beta distributions. For generating
     21 distributions of angles, the von Mises distribution is available.
     22 
     23 Almost all module functions depend on the basic function :func:`.random`, which
     24 generates a random float uniformly in the semi-open range [0.0, 1.0).  Python
     25 uses the Mersenne Twister as the core generator.  It produces 53-bit precision
     26 floats and has a period of 2\*\*19937-1.  The underlying implementation in C is
     27 both fast and threadsafe.  The Mersenne Twister is one of the most extensively
     28 tested random number generators in existence.  However, being completely
     29 deterministic, it is not suitable for all purposes, and is completely unsuitable
     30 for cryptographic purposes.
     31 
     32 The functions supplied by this module are actually bound methods of a hidden
     33 instance of the :class:`random.Random` class.  You can instantiate your own
     34 instances of :class:`Random` to get generators that don't share state.
     35 
     36 Class :class:`Random` can also be subclassed if you want to use a different
     37 basic generator of your own devising: in that case, override the :meth:`~Random.random`,
     38 :meth:`~Random.seed`, :meth:`~Random.getstate`, and :meth:`~Random.setstate` methods.
     39 Optionally, a new generator can supply a :meth:`~Random.getrandbits` method --- this
     40 allows :meth:`randrange` to produce selections over an arbitrarily large range.
     41 
     42 The :mod:`random` module also provides the :class:`SystemRandom` class which
     43 uses the system function :func:`os.urandom` to generate random numbers
     44 from sources provided by the operating system.
     45 
     46 .. warning::
     47 
     48    The pseudo-random generators of this module should not be used for
     49    security purposes.  For security or cryptographic uses, see the
     50    :mod:`secrets` module.
     51 
     52 .. seealso::
     53 
     54    M. Matsumoto and T. Nishimura, "Mersenne Twister: A 623-dimensionally
     55    equidistributed uniform pseudorandom number generator", ACM Transactions on
     56    Modeling and Computer Simulation Vol. 8, No. 1, January pp.3--30 1998.
     57 
     58 
     59    `Complementary-Multiply-with-Carry recipe
     60    <https://code.activestate.com/recipes/576707/>`_ for a compatible alternative
     61    random number generator with a long period and comparatively simple update
     62    operations.
     63 
     64 
     65 Bookkeeping functions
     66 ---------------------
     67 
     68 .. function:: seed(a=None, version=2)
     69 
     70    Initialize the random number generator.
     71 
     72    If *a* is omitted or ``None``, the current system time is used.  If
     73    randomness sources are provided by the operating system, they are used
     74    instead of the system time (see the :func:`os.urandom` function for details
     75    on availability).
     76 
     77    If *a* is an int, it is used directly.
     78 
     79    With version 2 (the default), a :class:`str`, :class:`bytes`, or :class:`bytearray`
     80    object gets converted to an :class:`int` and all of its bits are used.
     81 
     82    With version 1 (provided for reproducing random sequences from older versions
     83    of Python), the algorithm for :class:`str` and :class:`bytes` generates a
     84    narrower range of seeds.
     85 
     86    .. versionchanged:: 3.2
     87       Moved to the version 2 scheme which uses all of the bits in a string seed.
     88 
     89 .. function:: getstate()
     90 
     91    Return an object capturing the current internal state of the generator.  This
     92    object can be passed to :func:`setstate` to restore the state.
     93 
     94 
     95 .. function:: setstate(state)
     96 
     97    *state* should have been obtained from a previous call to :func:`getstate`, and
     98    :func:`setstate` restores the internal state of the generator to what it was at
     99    the time :func:`getstate` was called.
    100 
    101 
    102 .. function:: getrandbits(k)
    103 
    104    Returns a Python integer with *k* random bits. This method is supplied with
    105    the MersenneTwister generator and some other generators may also provide it
    106    as an optional part of the API. When available, :meth:`getrandbits` enables
    107    :meth:`randrange` to handle arbitrarily large ranges.
    108 
    109 
    110 Functions for integers
    111 ----------------------
    112 
    113 .. function:: randrange(stop)
    114               randrange(start, stop[, step])
    115 
    116    Return a randomly selected element from ``range(start, stop, step)``.  This is
    117    equivalent to ``choice(range(start, stop, step))``, but doesn't actually build a
    118    range object.
    119 
    120    The positional argument pattern matches that of :func:`range`.  Keyword arguments
    121    should not be used because the function may use them in unexpected ways.
    122 
    123    .. versionchanged:: 3.2
    124       :meth:`randrange` is more sophisticated about producing equally distributed
    125       values.  Formerly it used a style like ``int(random()*n)`` which could produce
    126       slightly uneven distributions.
    127 
    128 .. function:: randint(a, b)
    129 
    130    Return a random integer *N* such that ``a <= N <= b``.  Alias for
    131    ``randrange(a, b+1)``.
    132 
    133 
    134 Functions for sequences
    135 -----------------------
    136 
    137 .. function:: choice(seq)
    138 
    139    Return a random element from the non-empty sequence *seq*. If *seq* is empty,
    140    raises :exc:`IndexError`.
    141 
    142 .. function:: choices(population, weights=None, *, cum_weights=None, k=1)
    143 
    144    Return a *k* sized list of elements chosen from the *population* with replacement.
    145    If the *population* is empty, raises :exc:`IndexError`.
    146 
    147    If a *weights* sequence is specified, selections are made according to the
    148    relative weights.  Alternatively, if a *cum_weights* sequence is given, the
    149    selections are made according to the cumulative weights (perhaps computed
    150    using :func:`itertools.accumulate`).  For example, the relative weights
    151    ``[10, 5, 30, 5]`` are equivalent to the cumulative weights
    152    ``[10, 15, 45, 50]``.  Internally, the relative weights are converted to
    153    cumulative weights before making selections, so supplying the cumulative
    154    weights saves work.
    155 
    156    If neither *weights* nor *cum_weights* are specified, selections are made
    157    with equal probability.  If a weights sequence is supplied, it must be
    158    the same length as the *population* sequence.  It is a :exc:`TypeError`
    159    to specify both *weights* and *cum_weights*.
    160 
    161    The *weights* or *cum_weights* can use any numeric type that interoperates
    162    with the :class:`float` values returned by :func:`random` (that includes
    163    integers, floats, and fractions but excludes decimals).
    164 
    165    .. versionadded:: 3.6
    166 
    167 
    168 .. function:: shuffle(x[, random])
    169 
    170    Shuffle the sequence *x* in place.
    171 
    172    The optional argument *random* is a 0-argument function returning a random
    173    float in [0.0, 1.0); by default, this is the function :func:`.random`.
    174 
    175    To shuffle an immutable sequence and return a new shuffled list, use
    176    ``sample(x, k=len(x))`` instead.
    177 
    178    Note that even for small ``len(x)``, the total number of permutations of *x*
    179    can quickly grow larger than the period of most random number generators.
    180    This implies that most permutations of a long sequence can never be
    181    generated.  For example, a sequence of length 2080 is the largest that
    182    can fit within the period of the Mersenne Twister random number generator.
    183 
    184 
    185 .. function:: sample(population, k)
    186 
    187    Return a *k* length list of unique elements chosen from the population sequence
    188    or set. Used for random sampling without replacement.
    189 
    190    Returns a new list containing elements from the population while leaving the
    191    original population unchanged.  The resulting list is in selection order so that
    192    all sub-slices will also be valid random samples.  This allows raffle winners
    193    (the sample) to be partitioned into grand prize and second place winners (the
    194    subslices).
    195 
    196    Members of the population need not be :term:`hashable` or unique.  If the population
    197    contains repeats, then each occurrence is a possible selection in the sample.
    198 
    199    To choose a sample from a range of integers, use a :func:`range` object as an
    200    argument.  This is especially fast and space efficient for sampling from a large
    201    population:  ``sample(range(10000000), k=60)``.
    202 
    203    If the sample size is larger than the population size, a :exc:`ValueError`
    204    is raised.
    205 
    206 Real-valued distributions
    207 -------------------------
    208 
    209 The following functions generate specific real-valued distributions. Function
    210 parameters are named after the corresponding variables in the distribution's
    211 equation, as used in common mathematical practice; most of these equations can
    212 be found in any statistics text.
    213 
    214 
    215 .. function:: random()
    216 
    217    Return the next random floating point number in the range [0.0, 1.0).
    218 
    219 
    220 .. function:: uniform(a, b)
    221 
    222    Return a random floating point number *N* such that ``a <= N <= b`` for
    223    ``a <= b`` and ``b <= N <= a`` for ``b < a``.
    224 
    225    The end-point value ``b`` may or may not be included in the range
    226    depending on floating-point rounding in the equation ``a + (b-a) * random()``.
    227 
    228 
    229 .. function:: triangular(low, high, mode)
    230 
    231    Return a random floating point number *N* such that ``low <= N <= high`` and
    232    with the specified *mode* between those bounds.  The *low* and *high* bounds
    233    default to zero and one.  The *mode* argument defaults to the midpoint
    234    between the bounds, giving a symmetric distribution.
    235 
    236 
    237 .. function:: betavariate(alpha, beta)
    238 
    239    Beta distribution.  Conditions on the parameters are ``alpha > 0`` and
    240    ``beta > 0``. Returned values range between 0 and 1.
    241 
    242 
    243 .. function:: expovariate(lambd)
    244 
    245    Exponential distribution.  *lambd* is 1.0 divided by the desired
    246    mean.  It should be nonzero.  (The parameter would be called
    247    "lambda", but that is a reserved word in Python.)  Returned values
    248    range from 0 to positive infinity if *lambd* is positive, and from
    249    negative infinity to 0 if *lambd* is negative.
    250 
    251 
    252 .. function:: gammavariate(alpha, beta)
    253 
    254    Gamma distribution.  (*Not* the gamma function!)  Conditions on the
    255    parameters are ``alpha > 0`` and ``beta > 0``.
    256 
    257    The probability distribution function is::
    258 
    259                  x ** (alpha - 1) * math.exp(-x / beta)
    260        pdf(x) =  --------------------------------------
    261                    math.gamma(alpha) * beta ** alpha
    262 
    263 
    264 .. function:: gauss(mu, sigma)
    265 
    266    Gaussian distribution.  *mu* is the mean, and *sigma* is the standard
    267    deviation.  This is slightly faster than the :func:`normalvariate` function
    268    defined below.
    269 
    270 
    271 .. function:: lognormvariate(mu, sigma)
    272 
    273    Log normal distribution.  If you take the natural logarithm of this
    274    distribution, you'll get a normal distribution with mean *mu* and standard
    275    deviation *sigma*.  *mu* can have any value, and *sigma* must be greater than
    276    zero.
    277 
    278 
    279 .. function:: normalvariate(mu, sigma)
    280 
    281    Normal distribution.  *mu* is the mean, and *sigma* is the standard deviation.
    282 
    283 
    284 .. function:: vonmisesvariate(mu, kappa)
    285 
    286    *mu* is the mean angle, expressed in radians between 0 and 2\*\ *pi*, and *kappa*
    287    is the concentration parameter, which must be greater than or equal to zero.  If
    288    *kappa* is equal to zero, this distribution reduces to a uniform random angle
    289    over the range 0 to 2\*\ *pi*.
    290 
    291 
    292 .. function:: paretovariate(alpha)
    293 
    294    Pareto distribution.  *alpha* is the shape parameter.
    295 
    296 
    297 .. function:: weibullvariate(alpha, beta)
    298 
    299    Weibull distribution.  *alpha* is the scale parameter and *beta* is the shape
    300    parameter.
    301 
    302 
    303 Alternative Generator
    304 ---------------------
    305 
    306 .. class:: SystemRandom([seed])
    307 
    308    Class that uses the :func:`os.urandom` function for generating random numbers
    309    from sources provided by the operating system. Not available on all systems.
    310    Does not rely on software state, and sequences are not reproducible. Accordingly,
    311    the :meth:`seed` method has no effect and is ignored.
    312    The :meth:`getstate` and :meth:`setstate` methods raise
    313    :exc:`NotImplementedError` if called.
    314 
    315 
    316 Notes on Reproducibility
    317 ------------------------
    318 
    319 Sometimes it is useful to be able to reproduce the sequences given by a pseudo
    320 random number generator.  By re-using a seed value, the same sequence should be
    321 reproducible from run to run as long as multiple threads are not running.
    322 
    323 Most of the random module's algorithms and seeding functions are subject to
    324 change across Python versions, but two aspects are guaranteed not to change:
    325 
    326 * If a new seeding method is added, then a backward compatible seeder will be
    327   offered.
    328 
    329 * The generator's :meth:`~Random.random` method will continue to produce the same
    330   sequence when the compatible seeder is given the same seed.
    331 
    332 .. _random-examples:
    333 
    334 Examples and Recipes
    335 --------------------
    336 
    337 Basic examples::
    338 
    339    >>> random()                             # Random float:  0.0 <= x < 1.0
    340    0.37444887175646646
    341 
    342    >>> uniform(2.5, 10.0)                   # Random float:  2.5 <= x < 10.0
    343    3.1800146073117523
    344 
    345    >>> expovariate(1 / 5)                   # Interval between arrivals averaging 5 seconds
    346    5.148957571865031
    347 
    348    >>> randrange(10)                        # Integer from 0 to 9 inclusive
    349    7
    350 
    351    >>> randrange(0, 101, 2)                 # Even integer from 0 to 100 inclusive
    352    26
    353 
    354    >>> choice(['win', 'lose', 'draw'])      # Single random element from a sequence
    355    'draw'
    356 
    357    >>> deck = 'ace two three four'.split()
    358    >>> shuffle(deck)                        # Shuffle a list
    359    >>> deck
    360    ['four', 'two', 'ace', 'three']
    361 
    362    >>> sample([10, 20, 30, 40, 50], k=4)    # Four samples without replacement
    363    [40, 10, 50, 30]
    364 
    365 Simulations::
    366 
    367    >>> # Six roulette wheel spins (weighted sampling with replacement)
    368    >>> choices(['red', 'black', 'green'], [18, 18, 2], k=6)
    369    ['red', 'green', 'black', 'black', 'red', 'black']
    370 
    371    >>> # Deal 20 cards without replacement from a deck of 52 playing cards
    372    >>> # and determine the proportion of cards with a ten-value
    373    >>> # (a ten, jack, queen, or king).
    374    >>> deck = collections.Counter(tens=16, low_cards=36)
    375    >>> seen = sample(list(deck.elements()), k=20)
    376    >>> seen.count('tens') / 20
    377    0.15
    378 
    379    >>> # Estimate the probability of getting 5 or more heads from 7 spins
    380    >>> # of a biased coin that settles on heads 60% of the time.
    381    >>> trial = lambda: choices('HT', cum_weights=(0.60, 1.00), k=7).count('H') >= 5
    382    >>> sum(trial() for i in range(10000)) / 10000
    383    0.4169
    384 
    385    >>> # Probability of the median of 5 samples being in middle two quartiles
    386    >>> trial = lambda : 2500 <= sorted(choices(range(10000), k=5))[2]  < 7500
    387    >>> sum(trial() for i in range(10000)) / 10000
    388    0.7958
    389 
    390 Example of `statistical bootstrapping
    391 <https://en.wikipedia.org/wiki/Bootstrapping_(statistics)>`_ using resampling
    392 with replacement to estimate a confidence interval for the mean of a sample of
    393 size five::
    394 
    395    # http://statistics.about.com/od/Applications/a/Example-Of-Bootstrapping.htm
    396    from statistics import mean
    397    from random import choices
    398 
    399    data = 1, 2, 4, 4, 10
    400    means = sorted(mean(choices(data, k=5)) for i in range(20))
    401    print(f'The sample mean of {mean(data):.1f} has a 90% confidence '
    402          f'interval from {means[1]:.1f} to {means[-2]:.1f}')
    403 
    404 Example of a `resampling permutation test
    405 <https://en.wikipedia.org/wiki/Resampling_(statistics)#Permutation_tests>`_
    406 to determine the statistical significance or `p-value
    407 <https://en.wikipedia.org/wiki/P-value>`_ of an observed difference
    408 between the effects of a drug versus a placebo::
    409 
    410     # Example from "Statistics is Easy" by Dennis Shasha and Manda Wilson
    411     from statistics import mean
    412     from random import shuffle
    413 
    414     drug = [54, 73, 53, 70, 73, 68, 52, 65, 65]
    415     placebo = [54, 51, 58, 44, 55, 52, 42, 47, 58, 46]
    416     observed_diff = mean(drug) - mean(placebo)
    417 
    418     n = 10000
    419     count = 0
    420     combined = drug + placebo
    421     for i in range(n):
    422         shuffle(combined)
    423         new_diff = mean(combined[:len(drug)]) - mean(combined[len(drug):])
    424         count += (new_diff >= observed_diff)
    425 
    426     print(f'{n} label reshufflings produced only {count} instances with a difference')
    427     print(f'at least as extreme as the observed difference of {observed_diff:.1f}.')
    428     print(f'The one-sided p-value of {count / n:.4f} leads us to reject the null')
    429     print(f'hypothesis that there is no difference between the drug and the placebo.')
    430 
    431 Simulation of arrival times and service deliveries in a single server queue::
    432 
    433     from random import expovariate, gauss
    434     from statistics import mean, median, stdev
    435 
    436     average_arrival_interval = 5.6
    437     average_service_time = 5.0
    438     stdev_service_time = 0.5
    439 
    440     num_waiting = 0
    441     arrivals = []
    442     starts = []
    443     arrival = service_end = 0.0
    444     for i in range(20000):
    445         if arrival <= service_end:
    446             num_waiting += 1
    447             arrival += expovariate(1.0 / average_arrival_interval)
    448             arrivals.append(arrival)
    449         else:
    450             num_waiting -= 1
    451             service_start = service_end if num_waiting else arrival
    452             service_time = gauss(average_service_time, stdev_service_time)
    453             service_end = service_start + service_time
    454             starts.append(service_start)
    455 
    456     waits = [start - arrival for arrival, start in zip(arrivals, starts)]
    457     print(f'Mean wait: {mean(waits):.1f}.  Stdev wait: {stdev(waits):.1f}.')
    458     print(f'Median wait: {median(waits):.1f}.  Max wait: {max(waits):.1f}.')
    459 
    460 .. seealso::
    461 
    462    `Statistics for Hackers <https://www.youtube.com/watch?v=Iq9DzN6mvYA>`_
    463    a video tutorial by
    464    `Jake Vanderplas <https://us.pycon.org/2016/speaker/profile/295/>`_
    465    on statistical analysis using just a few fundamental concepts
    466    including simulation, sampling, shuffling, and cross-validation.
    467 
    468    `Economics Simulation
    469    <http://nbviewer.jupyter.org/url/norvig.com/ipython/Economics.ipynb>`_
    470    a simulation of a marketplace by
    471    `Peter Norvig <http://norvig.com/bio.html>`_ that shows effective
    472    use of many of the tools and distributions provided by this module
    473    (gauss, uniform, sample, betavariate, choice, triangular, and randrange).
    474 
    475    `A Concrete Introduction to Probability (using Python)
    476    <http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability.ipynb>`_
    477    a tutorial by `Peter Norvig <http://norvig.com/bio.html>`_ covering
    478    the basics of probability theory, how to write simulations, and
    479    how to perform data analysis using Python.
    480