This chapter describes functions for sorting data, both directly and
indirectly (using an index). All the functions use the heapsort
algorithm. Heapsort is an O(N \log N) algorithm which operates
in-place. It does not require any additional storage and provides
consistent performance. The running time for its worst-case (ordered
data) is not significantly longer than the average and best
cases. Note that the heapsort algorithm does not preserve the relative
ordering of equal elements -- it is an unstable sort. However
the resulting order of equal elements will be consistent across
different platforms when using these functions.
The following function provides a simple alternative to the standard
library function qsort. It is intended for systems lacking
qsort, not as a replacement for it. The function qsort
should be used whenever possible, as it will be faster and can provide
stable ordering of equal elements. Documentation for qsort is
available in the GNU C Library Reference Manual.
The functions described in this section are defined in the header file
`gsl_heapsort.h'.
This function sorts the count elements of the array array,
each of size size, into ascending order using the comparison
function compare. The type of the comparison function is defined by,
int (*gsl_comparison_fn_t) (const void * a,
const void * b)
A comparison function should return a negative integer if the first
argument is less than the second argument, 0 if the two arguments
are equal and a positive integer if the first argument is greater than
the second argument.
For example, the following function can be used to sort doubles into
ascending numerical order.
int
compare_doubles (const double * a,
const double * b)
{
if (*a > *b)
return 1;
else if (*a < *b)
return -1;
else
return 0;
}
The appropriate function call to perform the sort is,
Note that unlike qsort the heapsort algorithm cannot be made into
a stable sort by pointer arithmetic. The trick of comparing pointers for
equal elements in the comparison function does not work for the heapsort
algorithm. The heapsort algorithm performs an internal rearrangement of
the data which destroys its initial ordering.
This function indirectly sorts the count elements of the array
array, each of size size, into ascending order using the
comparison function compare. The resulting permutation is stored
in p, an array of length n. The elements of p give the
index of the array element which would have been stored in that position
if the array had been sorted in place. The first element of p
gives the index of the least element in array, and the last
element of p gives the index of the greatest element in
array. The array itself is not changed.
The following functions will sort the elements of an array or vector,
either directly or indirectly. They are defined for all real and integer
types using the normal suffix rules. For example, the float
versions of the array functions are gsl_sort_float and
gsl_sort_float_index. The corresponding vector functions are
gsl_sort_vector_float and gsl_sort_vector_float_index. The
prototypes are available in the header files `gsl_sort_float.h'`gsl_sort_vector_float.h'. The complete set of prototypes can be
included using the header files `gsl_sort.h' and
`gsl_sort_vector.h'.
There are no functions for sorting complex arrays or vectors, since the
ordering of complex numbers is not uniquely defined. To sort a complex
vector by magnitude compute a real vector containing the the magnitudes
of the complex elements, and sort this vector indirectly. The resulting
index gives the appropriate ordering of the original complex vector.
This function indirectly sorts the n elements of the array
data with stride stride into ascending order, storing the
resulting permutation in p. The array p must be allocated to
a sufficient length to store the n elements of the permutation.
The elements of p give the index of the array element which would
have been stored in that position if the array had been sorted in place.
The array data is not changed.
Function: int gsl_sort_vector_index(gsl_permutation * p, const gsl_vector * v)
This function indirectly sorts the elements of the vector v into
ascending order, storing the resulting permutation in p. The
elements of p give the index of the vector element which would
have been stored in that position if the vector had been sorted in
place. The first element of p gives the index of the least element
in v, and the last element of p gives the index of the
greatest element in v. The vector v is not changed.
The functions described in this section select the k-th smallest
or largest elements of a data set of size N. The routines use an
O(kN) direct insertion algorithm which is suited to subsets that
are small compared with the total size of the dataset. For example, the
routines are useful for selecting the 10 largest values from one million
data points, but not for selecting the largest 100,000 values. If the
subset is a significant part of the total dataset it may be faster
to sort all the elements of the dataset directly with an O(N \log
N) algorithm and obtain the smallest or largest values that way.
This function copies the k-th smallest elements of the array
src, of size n and stride stride, in ascending
numerical order in dest. The size of the subset k must be
less than or equal to n. The data src is not modified by
this operation.
This function copies the k-th largest elements of the array
src, of size n and stride stride, in descending
numerical order in dest. The size of the subset k must be
less than or equal to n. The data src is not modified by
this operation.
These functions copy the k-th smallest or largest elements of the
vector v into the array dest. The size of the subset k
must be less than or equal to the length of the vector v.
The following functions find the indices of the k-th smallest or
largest elements of a dataset,
This function stores the indices of the k-th smallest elements of
the array src, of size n and stride stride, in the
array p. The indices are chosen so that the corresponding data is
in ascending numerical order. The size of the subset k must be
less than or equal to n. The data src is not modified by
this operation.
This function stores the indices of the k-th largest elements of
the array src, of size n and stride stride, in the
array p. The indices are chosen so that the corresponding data is
in descending numerical order. The size of the subset k must be
less than or equal to n. The data src is not modified by
this operation.
These functions store the indices of k-th smallest or largest
elements of the vector v in the array p. The size of the
subset k must be less than or equal to the length of the vector
v.
The rank of an element is its order in the sorted data. The rank
is the inverse of the index permutation, p. It can be computed
using the following algorithm,
for (i = 0; i < p->size; i++)
{
size_t pi = p->data[i];
rank->data[pi] = i;
}
This can be computed directly from the function
gsl_permutation_invert(rank,p).
The following function will print the rank of each element of the vector
v,
void
print_rank (gsl_vector * v)
{
size_t i;
size_t n = v->size;
gsl_permutation * perm = gsl_permutation_alloc(n);
gsl_permutation * rank = gsl_permutation_alloc(n);
gsl_sort_vector_index (perm, v);
gsl_permutation_invert (rank, perm);
for (i = 0; i < n; i++)
{
double vi = gsl_vector_get(v, i);
printf("element = %d, value = %g, rank = %d\n",
i, vi, rank->data[i]);
}
gsl_permutation_free (perm);
gsl_permutation_free (rank);
}