Home | History | Annotate | Download | only in library
      1 :mod:`random` --- Generate pseudo-random numbers
      2 ================================================
      3 
      4 .. module:: random
      5    :synopsis: Generate pseudo-random numbers with various common distributions.
      6 
      7 **Source code:** :source:`Lib/random.py`
      8 
      9 --------------
     10 
     11 This module implements pseudo-random number generators for various
     12 distributions.
     13 
     14 For integers, there is uniform selection from a range. For sequences, there is
     15 uniform selection of a random element, a function to generate a random
     16 permutation of a list in-place, and a function for random sampling without
     17 replacement.
     18 
     19 On the real line, there are functions to compute uniform, normal (Gaussian),
     20 lognormal, negative exponential, gamma, and beta distributions. For generating
     21 distributions of angles, the von Mises distribution is available.
     22 
     23 Almost all module functions depend on the basic function :func:`.random`, which
     24 generates a random float uniformly in the semi-open range [0.0, 1.0).  Python
     25 uses the Mersenne Twister as the core generator.  It produces 53-bit precision
     26 floats and has a period of 2\*\*19937-1.  The underlying implementation in C is
     27 both fast and threadsafe.  The Mersenne Twister is one of the most extensively
     28 tested random number generators in existence.  However, being completely
     29 deterministic, it is not suitable for all purposes, and is completely unsuitable
     30 for cryptographic purposes.
     31 
     32 The functions supplied by this module are actually bound methods of a hidden
     33 instance of the :class:`random.Random` class.  You can instantiate your own
     34 instances of :class:`Random` to get generators that don't share state.
     35 
     36 Class :class:`Random` can also be subclassed if you want to use a different
     37 basic generator of your own devising: in that case, override the :meth:`~Random.random`,
     38 :meth:`~Random.seed`, :meth:`~Random.getstate`, and :meth:`~Random.setstate` methods.
     39 Optionally, a new generator can supply a :meth:`~Random.getrandbits` method --- this
     40 allows :meth:`randrange` to produce selections over an arbitrarily large range.
     41 
     42 The :mod:`random` module also provides the :class:`SystemRandom` class which
     43 uses the system function :func:`os.urandom` to generate random numbers
     44 from sources provided by the operating system.
     45 
     46 .. warning::
     47 
     48    The pseudo-random generators of this module should not be used for
     49    security purposes.  For security or cryptographic uses, see the
     50    :mod:`secrets` module.
     51 
     52 .. seealso::
     53 
     54    M. Matsumoto and T. Nishimura, "Mersenne Twister: A 623-dimensionally
     55    equidistributed uniform pseudorandom number generator", ACM Transactions on
     56    Modeling and Computer Simulation Vol. 8, No. 1, January pp.3--30 1998.
     57 
     58 
     59    `Complementary-Multiply-with-Carry recipe
     60    <https://code.activestate.com/recipes/576707/>`_ for a compatible alternative
     61    random number generator with a long period and comparatively simple update
     62    operations.
     63 
     64 
     65 Bookkeeping functions
     66 ---------------------
     67 
     68 .. function:: seed(a=None, version=2)
     69 
     70    Initialize the random number generator.
     71 
     72    If *a* is omitted or ``None``, the current system time is used.  If
     73    randomness sources are provided by the operating system, they are used
     74    instead of the system time (see the :func:`os.urandom` function for details
     75    on availability).
     76 
     77    If *a* is an int, it is used directly.
     78 
     79    With version 2 (the default), a :class:`str`, :class:`bytes`, or :class:`bytearray`
     80    object gets converted to an :class:`int` and all of its bits are used.
     81 
     82    With version 1 (provided for reproducing random sequences from older versions
     83    of Python), the algorithm for :class:`str` and :class:`bytes` generates a
     84    narrower range of seeds.
     85 
     86    .. versionchanged:: 3.2
     87       Moved to the version 2 scheme which uses all of the bits in a string seed.
     88 
     89 .. function:: getstate()
     90 
     91    Return an object capturing the current internal state of the generator.  This
     92    object can be passed to :func:`setstate` to restore the state.
     93 
     94 
     95 .. function:: setstate(state)
     96 
     97    *state* should have been obtained from a previous call to :func:`getstate`, and
     98    :func:`setstate` restores the internal state of the generator to what it was at
     99    the time :func:`getstate` was called.
    100 
    101 
    102 .. function:: getrandbits(k)
    103 
    104    Returns a Python integer with *k* random bits. This method is supplied with
    105    the MersenneTwister generator and some other generators may also provide it
    106    as an optional part of the API. When available, :meth:`getrandbits` enables
    107    :meth:`randrange` to handle arbitrarily large ranges.
    108 
    109 
    110 Functions for integers
    111 ----------------------
    112 
    113 .. function:: randrange(stop)
    114               randrange(start, stop[, step])
    115 
    116    Return a randomly selected element from ``range(start, stop, step)``.  This is
    117    equivalent to ``choice(range(start, stop, step))``, but doesn't actually build a
    118    range object.
    119 
    120    The positional argument pattern matches that of :func:`range`.  Keyword arguments
    121    should not be used because the function may use them in unexpected ways.
    122 
    123    .. versionchanged:: 3.2
    124       :meth:`randrange` is more sophisticated about producing equally distributed
    125       values.  Formerly it used a style like ``int(random()*n)`` which could produce
    126       slightly uneven distributions.
    127 
    128 .. function:: randint(a, b)
    129 
    130    Return a random integer *N* such that ``a <= N <= b``.  Alias for
    131    ``randrange(a, b+1)``.
    132 
    133 
    134 Functions for sequences
    135 -----------------------
    136 
    137 .. function:: choice(seq)
    138 
    139    Return a random element from the non-empty sequence *seq*. If *seq* is empty,
    140    raises :exc:`IndexError`.
    141 
    142 .. function:: choices(population, weights=None, *, cum_weights=None, k=1)
    143 
    144    Return a *k* sized list of elements chosen from the *population* with replacement.
    145    If the *population* is empty, raises :exc:`IndexError`.
    146 
    147    If a *weights* sequence is specified, selections are made according to the
    148    relative weights.  Alternatively, if a *cum_weights* sequence is given, the
    149    selections are made according to the cumulative weights (perhaps computed
    150    using :func:`itertools.accumulate`).  For example, the relative weights
    151    ``[10, 5, 30, 5]`` are equivalent to the cumulative weights
    152    ``[10, 15, 45, 50]``.  Internally, the relative weights are converted to
    153    cumulative weights before making selections, so supplying the cumulative
    154    weights saves work.
    155 
    156    If neither *weights* nor *cum_weights* are specified, selections are made
    157    with equal probability.  If a weights sequence is supplied, it must be
    158    the same length as the *population* sequence.  It is a :exc:`TypeError`
    159    to specify both *weights* and *cum_weights*.
    160 
    161    The *weights* or *cum_weights* can use any numeric type that interoperates
    162    with the :class:`float` values returned by :func:`random` (that includes
    163    integers, floats, and fractions but excludes decimals).
    164 
    165    For a given seed, the :func:`choices` function with equal weighting
    166    typically produces a different sequence than repeated calls to
    167    :func:`choice`.  The algorithm used by :func:`choices` uses floating
    168    point arithmetic for internal consistency and speed.  The algorithm used
    169    by :func:`choice` defaults to integer arithmetic with repeated selections
    170    to avoid small biases from round-off error.
    171 
    172    .. versionadded:: 3.6
    173 
    174 
    175 .. function:: shuffle(x[, random])
    176 
    177    Shuffle the sequence *x* in place.
    178 
    179    The optional argument *random* is a 0-argument function returning a random
    180    float in [0.0, 1.0); by default, this is the function :func:`.random`.
    181 
    182    To shuffle an immutable sequence and return a new shuffled list, use
    183    ``sample(x, k=len(x))`` instead.
    184 
    185    Note that even for small ``len(x)``, the total number of permutations of *x*
    186    can quickly grow larger than the period of most random number generators.
    187    This implies that most permutations of a long sequence can never be
    188    generated.  For example, a sequence of length 2080 is the largest that
    189    can fit within the period of the Mersenne Twister random number generator.
    190 
    191 
    192 .. function:: sample(population, k)
    193 
    194    Return a *k* length list of unique elements chosen from the population sequence
    195    or set. Used for random sampling without replacement.
    196 
    197    Returns a new list containing elements from the population while leaving the
    198    original population unchanged.  The resulting list is in selection order so that
    199    all sub-slices will also be valid random samples.  This allows raffle winners
    200    (the sample) to be partitioned into grand prize and second place winners (the
    201    subslices).
    202 
    203    Members of the population need not be :term:`hashable` or unique.  If the population
    204    contains repeats, then each occurrence is a possible selection in the sample.
    205 
    206    To choose a sample from a range of integers, use a :func:`range` object as an
    207    argument.  This is especially fast and space efficient for sampling from a large
    208    population:  ``sample(range(10000000), k=60)``.
    209 
    210    If the sample size is larger than the population size, a :exc:`ValueError`
    211    is raised.
    212 
    213 Real-valued distributions
    214 -------------------------
    215 
    216 The following functions generate specific real-valued distributions. Function
    217 parameters are named after the corresponding variables in the distribution's
    218 equation, as used in common mathematical practice; most of these equations can
    219 be found in any statistics text.
    220 
    221 
    222 .. function:: random()
    223 
    224    Return the next random floating point number in the range [0.0, 1.0).
    225 
    226 
    227 .. function:: uniform(a, b)
    228 
    229    Return a random floating point number *N* such that ``a <= N <= b`` for
    230    ``a <= b`` and ``b <= N <= a`` for ``b < a``.
    231 
    232    The end-point value ``b`` may or may not be included in the range
    233    depending on floating-point rounding in the equation ``a + (b-a) * random()``.
    234 
    235 
    236 .. function:: triangular(low, high, mode)
    237 
    238    Return a random floating point number *N* such that ``low <= N <= high`` and
    239    with the specified *mode* between those bounds.  The *low* and *high* bounds
    240    default to zero and one.  The *mode* argument defaults to the midpoint
    241    between the bounds, giving a symmetric distribution.
    242 
    243 
    244 .. function:: betavariate(alpha, beta)
    245 
    246    Beta distribution.  Conditions on the parameters are ``alpha > 0`` and
    247    ``beta > 0``. Returned values range between 0 and 1.
    248 
    249 
    250 .. function:: expovariate(lambd)
    251 
    252    Exponential distribution.  *lambd* is 1.0 divided by the desired
    253    mean.  It should be nonzero.  (The parameter would be called
    254    "lambda", but that is a reserved word in Python.)  Returned values
    255    range from 0 to positive infinity if *lambd* is positive, and from
    256    negative infinity to 0 if *lambd* is negative.
    257 
    258 
    259 .. function:: gammavariate(alpha, beta)
    260 
    261    Gamma distribution.  (*Not* the gamma function!)  Conditions on the
    262    parameters are ``alpha > 0`` and ``beta > 0``.
    263 
    264    The probability distribution function is::
    265 
    266                  x ** (alpha - 1) * math.exp(-x / beta)
    267        pdf(x) =  --------------------------------------
    268                    math.gamma(alpha) * beta ** alpha
    269 
    270 
    271 .. function:: gauss(mu, sigma)
    272 
    273    Gaussian distribution.  *mu* is the mean, and *sigma* is the standard
    274    deviation.  This is slightly faster than the :func:`normalvariate` function
    275    defined below.
    276 
    277 
    278 .. function:: lognormvariate(mu, sigma)
    279 
    280    Log normal distribution.  If you take the natural logarithm of this
    281    distribution, you'll get a normal distribution with mean *mu* and standard
    282    deviation *sigma*.  *mu* can have any value, and *sigma* must be greater than
    283    zero.
    284 
    285 
    286 .. function:: normalvariate(mu, sigma)
    287 
    288    Normal distribution.  *mu* is the mean, and *sigma* is the standard deviation.
    289 
    290 
    291 .. function:: vonmisesvariate(mu, kappa)
    292 
    293    *mu* is the mean angle, expressed in radians between 0 and 2\*\ *pi*, and *kappa*
    294    is the concentration parameter, which must be greater than or equal to zero.  If
    295    *kappa* is equal to zero, this distribution reduces to a uniform random angle
    296    over the range 0 to 2\*\ *pi*.
    297 
    298 
    299 .. function:: paretovariate(alpha)
    300 
    301    Pareto distribution.  *alpha* is the shape parameter.
    302 
    303 
    304 .. function:: weibullvariate(alpha, beta)
    305 
    306    Weibull distribution.  *alpha* is the scale parameter and *beta* is the shape
    307    parameter.
    308 
    309 
    310 Alternative Generator
    311 ---------------------
    312 
    313 .. class:: SystemRandom([seed])
    314 
    315    Class that uses the :func:`os.urandom` function for generating random numbers
    316    from sources provided by the operating system. Not available on all systems.
    317    Does not rely on software state, and sequences are not reproducible. Accordingly,
    318    the :meth:`seed` method has no effect and is ignored.
    319    The :meth:`getstate` and :meth:`setstate` methods raise
    320    :exc:`NotImplementedError` if called.
    321 
    322 
    323 Notes on Reproducibility
    324 ------------------------
    325 
    326 Sometimes it is useful to be able to reproduce the sequences given by a pseudo
    327 random number generator.  By re-using a seed value, the same sequence should be
    328 reproducible from run to run as long as multiple threads are not running.
    329 
    330 Most of the random module's algorithms and seeding functions are subject to
    331 change across Python versions, but two aspects are guaranteed not to change:
    332 
    333 * If a new seeding method is added, then a backward compatible seeder will be
    334   offered.
    335 
    336 * The generator's :meth:`~Random.random` method will continue to produce the same
    337   sequence when the compatible seeder is given the same seed.
    338 
    339 .. _random-examples:
    340 
    341 Examples and Recipes
    342 --------------------
    343 
    344 Basic examples::
    345 
    346    >>> random()                             # Random float:  0.0 <= x < 1.0
    347    0.37444887175646646
    348 
    349    >>> uniform(2.5, 10.0)                   # Random float:  2.5 <= x < 10.0
    350    3.1800146073117523
    351 
    352    >>> expovariate(1 / 5)                   # Interval between arrivals averaging 5 seconds
    353    5.148957571865031
    354 
    355    >>> randrange(10)                        # Integer from 0 to 9 inclusive
    356    7
    357 
    358    >>> randrange(0, 101, 2)                 # Even integer from 0 to 100 inclusive
    359    26
    360 
    361    >>> choice(['win', 'lose', 'draw'])      # Single random element from a sequence
    362    'draw'
    363 
    364    >>> deck = 'ace two three four'.split()
    365    >>> shuffle(deck)                        # Shuffle a list
    366    >>> deck
    367    ['four', 'two', 'ace', 'three']
    368 
    369    >>> sample([10, 20, 30, 40, 50], k=4)    # Four samples without replacement
    370    [40, 10, 50, 30]
    371 
    372 Simulations::
    373 
    374    >>> # Six roulette wheel spins (weighted sampling with replacement)
    375    >>> choices(['red', 'black', 'green'], [18, 18, 2], k=6)
    376    ['red', 'green', 'black', 'black', 'red', 'black']
    377 
    378    >>> # Deal 20 cards without replacement from a deck of 52 playing cards
    379    >>> # and determine the proportion of cards with a ten-value
    380    >>> # (a ten, jack, queen, or king).
    381    >>> deck = collections.Counter(tens=16, low_cards=36)
    382    >>> seen = sample(list(deck.elements()), k=20)
    383    >>> seen.count('tens') / 20
    384    0.15
    385 
    386    >>> # Estimate the probability of getting 5 or more heads from 7 spins
    387    >>> # of a biased coin that settles on heads 60% of the time.
    388    >>> def trial():
    389    ...     return choices('HT', cum_weights=(0.60, 1.00), k=7).count('H') >= 5
    390    ...
    391    >>> sum(trial() for i in range(10000)) / 10000
    392    0.4169
    393 
    394    >>> # Probability of the median of 5 samples being in middle two quartiles
    395    >>> def trial():
    396    ...     return 2500 <= sorted(choices(range(10000), k=5))[2] < 7500
    397    ...
    398    >>> sum(trial() for i in range(10000)) / 10000
    399    0.7958
    400 
    401 Example of `statistical bootstrapping
    402 <https://en.wikipedia.org/wiki/Bootstrapping_(statistics)>`_ using resampling
    403 with replacement to estimate a confidence interval for the mean of a sample of
    404 size five::
    405 
    406    # http://statistics.about.com/od/Applications/a/Example-Of-Bootstrapping.htm
    407    from statistics import mean
    408    from random import choices
    409 
    410    data = 1, 2, 4, 4, 10
    411    means = sorted(mean(choices(data, k=5)) for i in range(20))
    412    print(f'The sample mean of {mean(data):.1f} has a 90% confidence '
    413          f'interval from {means[1]:.1f} to {means[-2]:.1f}')
    414 
    415 Example of a `resampling permutation test
    416 <https://en.wikipedia.org/wiki/Resampling_(statistics)#Permutation_tests>`_
    417 to determine the statistical significance or `p-value
    418 <https://en.wikipedia.org/wiki/P-value>`_ of an observed difference
    419 between the effects of a drug versus a placebo::
    420 
    421     # Example from "Statistics is Easy" by Dennis Shasha and Manda Wilson
    422     from statistics import mean
    423     from random import shuffle
    424 
    425     drug = [54, 73, 53, 70, 73, 68, 52, 65, 65]
    426     placebo = [54, 51, 58, 44, 55, 52, 42, 47, 58, 46]
    427     observed_diff = mean(drug) - mean(placebo)
    428 
    429     n = 10000
    430     count = 0
    431     combined = drug + placebo
    432     for i in range(n):
    433         shuffle(combined)
    434         new_diff = mean(combined[:len(drug)]) - mean(combined[len(drug):])
    435         count += (new_diff >= observed_diff)
    436 
    437     print(f'{n} label reshufflings produced only {count} instances with a difference')
    438     print(f'at least as extreme as the observed difference of {observed_diff:.1f}.')
    439     print(f'The one-sided p-value of {count / n:.4f} leads us to reject the null')
    440     print(f'hypothesis that there is no difference between the drug and the placebo.')
    441 
    442 Simulation of arrival times and service deliveries in a single server queue::
    443 
    444     from random import expovariate, gauss
    445     from statistics import mean, median, stdev
    446 
    447     average_arrival_interval = 5.6
    448     average_service_time = 5.0
    449     stdev_service_time = 0.5
    450 
    451     num_waiting = 0
    452     arrivals = []
    453     starts = []
    454     arrival = service_end = 0.0
    455     for i in range(20000):
    456         if arrival <= service_end:
    457             num_waiting += 1
    458             arrival += expovariate(1.0 / average_arrival_interval)
    459             arrivals.append(arrival)
    460         else:
    461             num_waiting -= 1
    462             service_start = service_end if num_waiting else arrival
    463             service_time = gauss(average_service_time, stdev_service_time)
    464             service_end = service_start + service_time
    465             starts.append(service_start)
    466 
    467     waits = [start - arrival for arrival, start in zip(arrivals, starts)]
    468     print(f'Mean wait: {mean(waits):.1f}.  Stdev wait: {stdev(waits):.1f}.')
    469     print(f'Median wait: {median(waits):.1f}.  Max wait: {max(waits):.1f}.')
    470 
    471 .. seealso::
    472 
    473    `Statistics for Hackers <https://www.youtube.com/watch?v=Iq9DzN6mvYA>`_
    474    a video tutorial by
    475    `Jake Vanderplas <https://us.pycon.org/2016/speaker/profile/295/>`_
    476    on statistical analysis using just a few fundamental concepts
    477    including simulation, sampling, shuffling, and cross-validation.
    478 
    479    `Economics Simulation
    480    <http://nbviewer.jupyter.org/url/norvig.com/ipython/Economics.ipynb>`_
    481    a simulation of a marketplace by
    482    `Peter Norvig <http://norvig.com/bio.html>`_ that shows effective
    483    use of many of the tools and distributions provided by this module
    484    (gauss, uniform, sample, betavariate, choice, triangular, and randrange).
    485 
    486    `A Concrete Introduction to Probability (using Python)
    487    <http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability.ipynb>`_
    488    a tutorial by `Peter Norvig <http://norvig.com/bio.html>`_ covering
    489    the basics of probability theory, how to write simulations, and
    490    how to perform data analysis using Python.
    491