This chapter describes functions for generating random variates and
computing their probability distributions. Samples from the
distributions described in this chapter can be obtained using any of the
random number generators in the library as an underlying source of
randomness. In the simplest cases a non-uniform distribution can be
obtained analytically from the uniform distribution of a random number
generator by applying an appropriate transformation. This method uses
one call to the random number generator.
More complicated distributions are created by the
acceptance-rejection method, which compares the desired
distribution against a distribution which is similar and known
analytically. This usually requires several samples from the generator.
The functions described in this section are declared in
`gsl_randist.h'.
This function returns a Gaussian random variate, with mean zero and
standard deviation sigma. The probability distribution for
Gaussian random variates is,
for x in the range -\infty to +\infty. Use the
transformation z = \mu + x on the numbers returned by
gsl_ran_gaussian to obtain a Gaussian distribution with mean
\mu. This function uses the Box-Mueller algorithm which requires two
calls the random number generator r.
These functions compute results for the unit Gaussian distribution. They
are equivalent to the functions above with a standard deviation of one,
sigma = 1.
Random: double gsl_ran_gaussian_tail(const gsl_rng * r, double a, double sigma)
This function provides random variates from the upper tail of a Gaussian
distribution with standard deviation sigma. The values returned
are larger than the lower limit a, which must be positive. The
method is based on Marsaglia's famous rectangle-wedge-tail algorithm (Ann
Math Stat 32, 894-899 (1961)), with this aspect explained in Knuth, v2,
3rd ed, p139,586 (exercise 11).
The probability distribution for Gaussian tail random variates is,
for x > a where N(a;\sigma) is the normalization constant,
N(a;\sigma) = (1/2) erfc(a / sqrt(2 sigma^2)).
Function: double gsl_ran_gaussian_tail_pdf(double x, double a, double sigma)
This function computes the probability density p(x) at x
for a Gaussian tail distribution with standard deviation sigma and
lower limit a, using the formula given above.
Random: double gsl_ran_ugaussian_tail(const gsl_rng * r, double a)
Function: double gsl_ran_ugaussian_tail_pdf(double x, double a)
These functions compute results for the tail of a unit Gaussian
distribution. They are equivalent to the functions above with a standard
deviation of one, sigma = 1.
This function generates a pair of correlated gaussian variates, with
mean zero, correlation coefficient rho and standard deviations
sigma_x and sigma_y in the x and y directions.
The probability distribution for bivariate gaussian random variates is,
p(x,y) dx dy = {1 \over 2 \pi \sigma_x \sigma_y \sqrt{1-\rho^2}} \exp (-(x^2 + y^2 - 2 \rho x y)/2\sigma_x^2\sigma_y^2 (1-\rho^2)) dx dy
for x,y in the range -\infty to +\infty. The
correlation coefficient rho should lie between 1 and
-1.
This function computes the probability density p(x,y) at
(x,y) for a bivariate gaussian distribution with standard
deviations sigma_x, sigma_y and correlation coefficient
rho, using the formula given above.
Random: double gsl_ran_exppow(const gsl_rng * r, double a, double b)
This function returns a random variate from the exponential power distribution
with scale parameter a and exponent b. The distribution is,
p(x) dx = {1 \over 2 a \Gamma(1+1/b)} \exp(-|x/a|^b) dx
for
x >= 0. For b = 1 this reduces to the Laplace
distribution. For b = 2 it has the same form as a gaussian
distribution, but with
a = \sqrt{2} \sigma.
Function: double gsl_ran_exppow_pdf(double x, double a, double b)
This function computes the probability density p(x) at x
for an exponential power distribution with scale parameter a
and exponent b, using the formula given above.
Random: double gsl_ran_cauchy(const gsl_rng * r, double a)
This function returns a random variate from the Cauchy distribution with
scale parameter a. The probability distribution for Cauchy
random variates is,
p(x) dx = {1 \over a\pi (1 + (x/a)^2) } dx
for x in the range -\infty to +\infty. The Cauchy
distribution is also known as the Lorentz distribution.
Function: double gsl_ran_cauchy_pdf(double x, double a)
This function computes the probability density p(x) at x
for a Cauchy distribution with scale parameter a, using the formula
given above.
Random: double gsl_ran_rayleigh_tail(const gsl_rng * r, double a double sigma)
This function returns a random variate from the tail of the Rayleigh
distribution with scale parameter sigma and a lower limit of
a. The distribution is,
Function: double gsl_ran_rayleigh_tail_pdf(double x, double a, double sigma)
This function computes the probability density p(x) at x
for a Rayleigh tail distribution with scale parameter sigma and
lower limit a, using the formula given above.
This function returns a random variate from the Landau distribution. The
probability distribution for Landau random variates is defined
analytically by the complex integral,
This function returns a random variate from the Levy symmetric stable
distribution with scale c and exponent alpha. The symmetric
stable probability distribution is defined by a fourier transform,
There is no explicit solution for the form of p(x) and the
library does not define a corresponding pdf function. For
\alpha = 1 the distribution reduces to the Cauchy distribution. For
\alpha = 2 it is a Gaussian distribution with
\sigma = \sqrt{2} c. For \alpha < 1 the tails of the
distribution become extremely wide.
This function returns a random variate from the Levy skew stable
distribution with scale c, exponent alpha and skewness
parameter beta. The skewness parameter must lie in the range
[-1,1]. The Levy skew stable probability distribution is defined
by a fourier transform,
When \alpha = 1 the term \tan(\pi \alpha/2) is replaced by
-(2/\pi)\log|t|. There is no explicit solution for the form of
p(x) and the library does not define a corresponding pdf
function. For \alpha = 2 the distribution reduces to a Gaussian
distribution with
\sigma = \sqrt{2} c and the skewness parameter has no effect.
For \alpha < 1 the tails of the distribution become extremely
wide. The symmetric distribution corresponds to \beta =
0.
The algorithm only works for
0 < alpha <= 2.
The Levy alpha-stable distributions have the property that if N
alpha-stable variates are drawn from the distribution p(c, \alpha,
\beta) then the sum Y = X_1 + X_2 + \dots + X_N will also be
distributed as an alpha-stable variate,
p(N^(1/\alpha) c, \alpha, \beta).
The t-distribution arises in statistics. If Y_1 has a normal
distribution and Y_2 has a chi-squared distribution with
\nu degrees of freedom then the ratio,
X = { Y_1 \over \sqrt{Y_2 / \nu} }
has a t-distribution t(x;\nu) with \nu degrees of freedom.
The spherical distributions generate random vectors, located on a
spherical surface. They can be used as random directions, for example in
the steps of a random walk.
This function returns a random direction vector v =
(x,y) in two dimensions. The vector is normalized such that
|v|^2 = x^2 + y^2 = 1. The obvious way to do this is to take a
uniform random number between 0 and 2\pi and let x and
y be the sine and cosine respectively. Two trig functions would
have been expensive in the old days, but with modern hardware
implementations, this is sometimes the fastest way to go. This is the
case for my home Pentium (but not the case for my Sun Sparcstation 20 at
work). Once can avoid the trig evaluations by choosing x and
y in the interior of a unit circle (choose them at random from the
interior of the enclosing square, and then reject those that are outside
the unit circle), and then dividing by
\sqrt{x^2 + y^2}.
A much cleverer approach, attributed to von Neumann (See Knuth, v2, 3rd
ed, p140, exercise 23), requires neither trig nor a square root. In
this approach, u and v are chosen at random from the
interior of a unit circle, and then x=(u^2-v^2)/(u^2+v^2) and
y=uv/(u^2+v^2).
This function returns a random direction vector v =
(x,y,z) in three dimensions. The vector is normalized
such that |v|^2 = x^2 + y^2 + z^2 = 1. The method employed is
due to Robert E. Knop (CACM 13, 326 (1970)), and explained in Knuth, v2,
3rd ed, p136. It uses the surprising fact that the distribution
projected along any axis is actually uniform (this is only true for 3d).
Random: void gsl_ran_dir_nd(const gsl_rng * r, int n, double *x)
This function returns a random direction vector
v = (x_1,x_2,...,x_n) in n dimensions. The vector is normalized
such that
|v|^2 = x_1^2 + x_2^2 + ... + x_n^2 = 1. The method
uses the fact that a multivariate gaussian distribution is spherically
symmetric. Each component is generated to have a gaussian distribution,
and then the components are normalized. The method is described by
Knuth, v2, 3rd ed, p135-136, and attributed to G. W. Brown, Modern
Mathematics for the Engineer (1956).
Given K discrete events with different probabilities P[k],
produce a random value k consistent with its probability.
The obvious way to do this is to preprocess the probability list by
generating a cumulative probability array with K+1 elements:
C[0] = 0
C[k+1] = C[k]+P[k].
Note that this construction produces C[K]=1. Now choose a
uniform deviate u between 0 and 1, and find the value of k
such that
C[k] <= u < C[k+1].
Although this in principle requires of order \log K steps per
random number generation, they are fast steps, and if you use something
like \lfloor uK \rfloor as a starting point, you can often do
pretty well.
But faster methods have been devised. Again, the idea is to preprocess
the probability list, and save the result in some form of lookup table;
then the individual calls for a random discrete event can go rapidly.
An approach invented by G. Marsaglia (Generating discrete random numbers
in a computer, Comm ACM 6, 37-38 (1963)) is very clever, and readers
interested in examples of good algorithm design are directed to this
short and well-written paper. Unfortunately, for large K,
Marsaglia's lookup table can be quite large.
A much better approach is due to Alastair J. Walker (An efficient method
for generating discrete random variables with general distributions, ACM
Trans on Mathematical Software 3, 253-256 (1977); see also Knuth, v2,
3rd ed, p120-121,139). This requires two lookup tables, one floating
point and one integer, but both only of size K. After
preprocessing, the random numbers are generated in O(1) time, even for
large K. The preprocessing suggested by Walker requires
O(K^2) effort, but that is not actually necessary, and the
implementation provided here only takes O(K) effort. In general,
more preprocessing leads to faster generation of the individual random
numbers, but a diminishing return is reached pretty early. Knuth points
out that the optimal preprocessing is combinatorially difficult for
large K.
This method can be used to speed up some of the discrete random number
generators below, such as the binomial distribution. To use if for
something like the Poisson Distribution, a modification would have to
be made, since it only takes a finite set of K outcomes.
This function returns a pointer to a structure that contains the lookup
table for the discrete random number generator. The array P[] contains
the probabilities of the discrete events; these array elements must all be
positive, but they needn't add up to one (so you can think of them more
generally as "weights") -- the preprocessor will normalize appropriately.
This return value is used
as an argument for the gsl_ran_discrete function below.
Returns the probability P[k] of observing the variable k.
Since P[k] is not stored as part of the lookup table, it must be
recomputed; this computation takes O(K), so if K is large
and you care about the original array P[k] used to create the
lookup table, then you should just keep this original array P[k]
around.
Random: unsigned int gsl_ran_binomial(const gsl_rng * r, double p, unsigned int n)
This function returns a random integer from the binomial distribution,
the number of successes in n independent trials with probability
p. The probability distribution for binomial variates is,
p(k) = {n! \over k! (n-k)! } p^k (1-p)^{n-k}
for
0 <= k <= n.
Function: double gsl_ran_binomial_pdf(unsigned int k, double p, unsigned int n)
This function computes the probability p(k) of obtaining k
from a binomial distribution with parameters p and n, using
the formula given above.
This function returns a random integer from the negative binomial
distribution, the number of failures occurring before n successes
in independent trials with probability p of success. The
probability distribution for negative binomial variates is,
Function: double gsl_ran_negative_binomial_pdf(unsigned int k, double p, double n)
This function computes the probability p(k) of obtaining k
from a negative binomial distribution with parameters p and
n, using the formula given above.
Random: unsigned int gsl_ran_pascal(const gsl_rng * r, double p, unsigned int k)
This function returns a random integer from the Pascal distribution. The
Pascal distribution is simply a negative binomial distribution with an
integer value of n.
Random: unsigned int gsl_ran_geometric(const gsl_rng * r, double p)
This function returns a random integer from the geometric distribution,
the number of independent trials with probability p until the
first success. The probability distribution for geometric variates
is,
p(k) = p (1-p)^(k-1)
for
k >= 1.
Function: double gsl_ran_geometric_pdf(unsigned int k, double p)
This function computes the probability p(k) of obtaining k
from a geometric distribution with probability parameter p, using
the formula given above.
Random: unsigned int gsl_ran_hypergeometric(const gsl_rng * r, unsigned int n1, unsigned int n2, unsigned int t)
This function returns a random integer from the hypergeometric
distribution. The probability distribution for hypergeometric
random variates is,
p(k) = C(n_1,k) C(n_2, t-k) / C(n_1 + n_2,k)
where C(a,b) = a!/(b!(a-b)!). The domain of k is
max(0,t-n_2), ..., max(t,n_1).
Function: double gsl_ran_hypergeometric_pdf(unsigned int k, unsigned int n1, unsigned int n2, unsigned int t)
This function computes the probability p(k) of obtaining k
from a hypergeometric distribution with parameters n1, n2,
n3, using the formula given above.
Random: unsigned int gsl_ran_logarithmic(const gsl_rng * r, double p)
This function returns a random integer from the logarithmic
distribution. The probability distribution for logarithmic random variates
is,
p(k) = {-1 \over \log(1-p)} {(p^k \over k)}
for
k >= 1.
Function: double gsl_ran_logarithmic_pdf(unsigned int k, double p)
This function computes the probability p(k) of obtaining k
from a logarithmic distribution with probability parameter p,
using the formula given above.
The following functions allow the shuffling and sampling of a set of
objects. The algorithms rely on a random number generator as source of
randomness and a poor quality generator can lead to correlations in the
output. In particular it is important to avoid generators with a short
period. For more information see Knuth, v2, 3rd ed, Section 3.4.2,
"Random Sampling and Shuffling".
This function randomly shuffles the order of n objects, each of
size size, stored in the array base[0..n-1]. The
output of the random number generator r is used to produce the
permutation. The algorithm generates all possible n!
permutations with equal probability, assuming a perfect source of random
numbers.
The following code shows how to shuffle the numbers from 0 to 51,
int a[52];
for (i = 0; i < 52; i++)
{
a[i] = i;
}
gsl_ran_shuffle (r, a, 52, sizeof (int));
Random: int gsl_ran_choose(const gsl_rng * r, void * dest, size_t k, void * src, size_t n, size_t size)
This function fills the array dest[k] with k objects taken
randomly from the n elements of the array
src[0..n-1]. The objects are each of size size. The
output of the random number generator r is used to make the
selection. The algorithm ensures all possible samples are equally
likely, assuming a perfect source of randomness.
The objects are sampled without replacement, thus each object can
only appear once in dest[k]. It is required that k be less
than or equal to n. The objects in dest will be in the
same relative order as those in src. You will need to call
gsl_ran_shuffle(r, dest, n, size) if you want to randomize the
order.
The following code shows how to select a random sample of three unique
numbers from the set 0 to 99,
double a[3], b[100];
for (i = 0; i < 100; i++)
{
b[i] = (double) i;
}
gsl_ran_choose (r, a, 3, b, 100, sizeof (double));
This function is like gsl_ran_choose but samples k items
from the original array of n items src with replacement, so
the same object can appear more than once in the output sequence
dest. There is no requirement that k be less than n
in this case.
The following program demonstrates the use of a random number generator
to produce variates from a distribution. It prints 10 samples from the
Poisson distribution with a mean of 3.
#include <stdio.h>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
int
main (void)
{
const gsl_rng_type * T;
gsl_rng * r;
int i, n = 10;
double mu = 3.0;
/* create a generator chosen by the
environment variable GSL_RNG_TYPE */
gsl_rng_env_setup();
T = gsl_rng_default;
r = gsl_rng_alloc (T);
/* print n random variates chosen from
the poisson distribution with mean
parameter mu */
for (i = 0; i < n; i++)
{
unsigned int k = gsl_ran_poisson (r, mu);
printf(" %u", k);
}
printf("\n");
return 0;
}
If the library and header files are installed under `/usr/local'
(the default location) then the program can be compiled with these
options,
gcc demo.c -lgsl -lgslcblas -lm
Here is the output of the program,
$ ./a.out
4 2 3 3 1 3 4 1 3 5
The variates depend on the seed used by the generator. The seed for the
default generator type gsl_rng_default can be changed with the
GSL_RNG_SEED environment variable to produce a different stream
of variates,
For an encyclopaedic coverage of the subject readers are advised to
consult the book Non-Uniform Random Variate Generation by Luc
Devroye. It covers every imaginable distribution and provides hundreds
of algorithms.
Luc Devroye, Non-Uniform Random Variate Generation,
Springer-Verlag, ISBN 0-387-96305-7.
The subject of random variate generation is also reviewed by Knuth, who
describes algorithms for all the major distributions.
Donald E. Knuth, The Art of Computer Programming: Seminumerical
Algorithms (Vol 2, 3rd Ed, 1997), Addison-Wesley, ISBN 0201896842.
The Particle Data Group provides a short review of techniques for
generating distributions of random numbers in the "Monte Carlo"
section of its Annual Review of Particle Physics.
Review of Particle Properties
R.M. Barnett et al., Physical Review D54, 1 (1996)
http://pdg.lbl.gov/.
The Review of Particle Physics is available online in postscript and pdf
format.