Allegro can manipulate and display text using any character values from 0
right up to 2^32-1 (although the current implementation of the grabber can
only create fonts using characters up to 2^16-1). You can choose between a
number of different text encoding formats, which controls how strings are
stored and how Allegro interprets strings that you pass to it. This setting
affects all aspects of the system: whenever you see a function that returns
a char * type, or that takes a char * as an argument, that text will be in
whatever format you have told Allegro to use.
By default, Allegro uses UTF-8 encoded text (U_UTF8). This is a
variable-width format, where characters can occupy anywhere from one to six
bytes. The nice thing about it is that characters ranging from 0-127 are
encoded directly as themselves, so UTF-8 is upwardly compatible with 7 bit
ASCII ("Hello, World!" means the same thing regardless of whether you
interpret it as ASCII or UTF-8 data). Any character values above 128, such
as accented vowels, the UK currency symbol, and Arabic or Chinese
characters, will be encoded as a sequence of two or more bytes, each in the
range 128-255. This means you will never get what looks like a 7 bit ASCII
character as part of the encoding of a different character value, which
makes it very easy to manipulate UTF-8 strings.
There are a few editing programs that understand UTF-8 format text files.
Alternatively, you can write your strings in plain ASCII or 16 bit Unicode
formats, and then use the Allegro textconv program to convert them into
UTF-8.
If you prefer to use some other text format, you can set Allegro to work
with normal 8 bit ASCII (U_ASCII), or 16 bit Unicode (U_UNICODE) instead, or
you can provide some handler functions to make it support whatever other
text encoding you like (for example it would be easy to add support for 32
bit UCS-4 characters, or the Chinese GB-code format).
There is some limited support for alternative 8 bit codepages, via the
U_ASCII_CP mode. This is very slow, so you shouldn't use it for serious
work, but it can be handy as an easy way to convert text between different
codepages. By default the U_ASCII_CP mode is set up to reduce text to a
clean 7 bit ASCII format, trying to replace any accented vowels with their
simpler equivalents (this is used by the allegro_message() function when it
needs to print an error report onto a text mode DOS screen). If you want to
work with other codepages, you can do this by passing a character mapping
table to the set_ucodepage() function.
Note that you can use the Unicode routines before you call install_allegro()
or allegro_init(). If you want to work in a text mode other than UTF-8, it
is best to set it with set_uformat() just before you call these.
void set_uformat(int type);
Sets the current text encoding format. This will affect all parts of
Allegro, wherever you see a function that returns a char *, or takes a
char * as a parameter. The type should be one of the values:
U_ASCII - fixed size, 8 bit ASCII characters
U_ASCII_CP - alternative 8 bit codepage (see set_ucodepage())
U_UNICODE - fixed size, 16 bit Unicode characters
U_UTF8 - variable size, UTF-8 format Unicode characters
Although you can change the text format on the fly, this is not a good
idea. Many strings, for example the names of your hardware drivers and
any language translations, are loaded when you call allegro_init(), so if
you change the encoding format after this, they will be in the wrong
format, and things will not work properly. Generally you should only call
set_uformat() once, before allegro_init(), and then leave it on the same
setting for the duration of your program.
int get_uformat(void);
Returns the currently selected text encoding format.
void register_uformat(int type,
int (*u_getc)(const char *s),
int (*u_getx)(char **s),
int (*u_setc)(char *s, int c),
int (*u_width)(const char *s),
int (*u_cwidth)(int c),
int (*u_isok)(int c));
Installs a set of custom handler functions for a new text encoding
format. The type is the ID code for your new format, which should be a
4-character string as produced by the AL_ID() macro, and which can later
be passed to functions like set_uformat() and uconvert(). The function
parameters are handlers that implement the character access for your new
type: see below for details of these.
void set_ucodepage(const unsigned short *table,
const unsigned short *extras);
When you select the U_ASCII_CP encoding mode, a set of tables are used to
convert between 8 bit characters and their Unicode equivalents. You can
use this function to specify a custom set of mapping tables, which allows
you to support different 8 bit codepages. The table parameter points to
an array of 256 shorts, which contain the Unicode value for each
character in your codepage. The extras parameter, if not NULL, points to
a list of mapping pairs, which will be used when reducing Unicode data to
your codepage. Each pair consists of a Unicode value, followed by the way
it should be represented in your codepage. The table is terminated by a
zero Unicode value. This allows you to create a many->one mapping, where
many different Unicode characters can be represented by a single codepage
value (eg. for reducing accented vowels to 7 bit ASCII).
int need_uconvert(const char *s, int type, int newtype);
Given a pointer to a string, a description of the type of the string, and
the type that you would like this string to be converted into, this
function tells you whether any conversion is required. No conversion will
be needed if type and newtype are the same, or if one type is ASCII, the
other is UTF-8, and the string contains only character values less than
128. As a convenience shortcut, you can pass the value U_CURRENT as
either of the type parameters, to represent whatever text format is
currently selected.
int uconvert_size(const char *s, int type, int newtype);
Returns the number of bytes that will be required to store the specified
string after a conversion from type to newtype, including the zero
terminator. The type parameters can use the value U_CURRENT as a shortcut
to represent the currently selected encoding format.
void do_uconvert(const char *s, int type,
char *buf, int newtype, int size);
Converts the specified string from type to newtype, storing at most size
bytes into the output buf. The type parameters can use the value
U_CURRENT as a shortcut to represent the currently selected encoding
format.
char *uconvert(const char *s, int type,
char *buf, int newtype, int size);
Higher level function running on top of do_uconvert(). This function
converts the specified string from type to newtype, storing at most size
bytes into the output buf, but it checks before doing the conversion, and
doesn't bother if the string formats are already the same (either both
types are equal, or one is ASCII, the other is UTF-8, and the string
contains only 7 bit ASCII characters). If a conversion was performed it
returns a pointer to buf, otherwise it returns a copy of s, so you must
use the return value rather than assuming that the string will always be
moved to buf. As a convenience, if buf is NULL it will convert the string
into an internal static buffer. You should be wary of using this feature,
though, because that buffer will be overwritten the next time this
routine is called, so don't expect the data to persist across any other
library calls.
char *uconvert_ascii(const char *s, char buf[]);
Helper macro for converting strings from ASCII into the current encoding
format. Expands to uconvert(s, U_ASCII, buf, U_CURRENT, sizeof(buf)).
char *uconvert_toascii(const char *s, char buf[]);
Helper macro for converting strings from the current encoding format into
ASCII. Expands to uconvert(s, U_CURRENT, buf, U_ASCII, sizeof(buf)).
extern char empty_string[];
You can't just rely on "" to be a valid empty string in any encoding
format. This global buffer contains a number of consecutive zeros, so it
will be a valid empty string no matter whether the program is running in
ASCII, Unicode, or UTF-8 mode.
int ugetc(const char *s);
Low level helper function for reading Unicode text data. Given a pointer
to a string in the current encoding format, it returns the next character
from the string.
int ugetx(char **s);
int ugetxc(const char **s);
Low level helper function for reading Unicode text data. Given the
address of a pointer to a string in the current encoding format, it
returns the next character from the string, and advances the pointer to
the character after the one just read.
ugetxc is provided for working with pointer-to-pointer-to-const char
data.
int usetc(char *s, int c);
Low level helper function for writing Unicode text data. It writes the
specified character to the given address in the current encoding format,
and returns the number of bytes written.
int uwidth(const char *s);
Low level helper function for testing Unicode text data. It returns the
number of bytes occupied by the first character of the specified string,
in the current encoding format.
int ucwidth(int c);
Low level helper function for testing Unicode text data. It returns the
number of bytes that would be occupied by the specified character value,
when encoded in the current format.
int uisok(int c);
Low level helper function for testing Unicode text data. Tests whether
the specified value can be correctly encoded in the current format.
int uoffset(const char *s, int index);
Returns the offset in bytes from the start of the string to the character
at the specified index. If the index is negative, it counts backward from
the end of the string, so an index of -1 will return an offset to the last
character.
int ugetat(const char *s, int index);
Returns the character value at the specified index within the string. A
zero index parameter will return the first character of the string. If
the index is negative, it counts backward from the end of the string, so
an index of -1 will return the last character of the string.
int usetat(char *s, int index, int c);
Replaces the character at the specified index within the string with
value c, handling any adjustments for variable width data (ie. if c
encodes to a different width than the previous value at that location).
Returns the number of bytes by which the trailing part of the string was
moved. If the index is negative, it counts backward from the end of the
string.
int uinsert(char *s, int index, int c);
Inserts the character c at the specified index within the string, sliding
the rest of the data along to make room. Returns the number of bytes by
which the trailing part of the string was moved. If the index is
negative, it counts backward from the end of the string.
int uremove(char *s, int index);
Removes the character at the specified index within the string, sliding
the rest of the data back to fill the gap. Returns the number of bytes by
which the trailing part of the string was moved. If the index is
negative, it counts backward from the end of the string.
int ustrsize(const char *s);
Returns the size of the specified string in bytes, not including the
trailing zero.
int ustrsizez(const char *s);
Returns the size of the specified string in bytes, including the trailing
zero.
int uwidth_max(int type);
Low level helper function for working with Unicode text data. Returns the
largest number of bytes that one character can occupy in the given
encoding format. Pass U_CURRENT to represent the current format.
int utolower(int c);
This function returns c, converting it to lower case if it is upper case.
int utoupper(int c);
This function returns c, converting it to upper case if it is lower case.
int uisspace(int c);
Returns nonzero if c is whitespace, that is, carriage return, newline,
form feed, tab, vertical tab, or space.
int uisdigit(int c);
Returns nonzero if c is a digit.
char *ustrdup(const char *src)
This functions copies the NULL-terminated string src into a newly
allocated area of memory. The memory returned by this call must be freed
by the caller. Returns NULL if it cannot allocate space for the duplicated
string.
char *_ustrdup(const char *src, void* (*malloc_func) (size_t))
Does the same as ustrdup(), but allows the user to specify his own memory
allocater function.
char *ustrcpy(char *dest, const char *src);
This function copies src (including the terminating NULL character) into
dest. The return value is the value of dest.
char *ustrzcpy(char *dest, int size, const char *src);
This function copies src (including the terminating NULL character) into
dest, whose length in bytes is specified by size and which is guaranteed
to be NULL-terminated. The return value is the value of dest.
char *ustrcat(char *dest, const char *src);
This function concatenates src to the end of dest. The return value is the
value of dest.
char *ustrzcat(char *dest, int size, const char *src);
This function concatenates src to the end of dest, whose length in bytes
is specified by size and which is guaranteed to be NULL-terminated. The
return value is the value of dest.
int ustrlen(const char *s);
This function returns the number of characters in s. Note that this
doesn't have to equal the string's size in bytes.
int ustrcmp(const char *s1, const char *s2);
This function compares s1 and s2. Returns zero if the strings are equal,
a positive number if s1 comes after s2 in the ASCII collating sequence,
else a negative number.
char *ustrncpy(char *dest, const char *src, int n);
This function is like ustrcpy() except that no more than n characters
from src are copied into dest. If src is shorter than n characters, NULL
characters are appended to dest as padding until n characters have been
written. Note that if src is longer than n characters, dest will not be
NULL-terminated. The return value is the value of dest.
char *ustrzncpy(char *dest, int size, const char *src, int n);
This function is like ustrzcpy() except that no more than n characters
from src are copied into dest. If src is shorter than n characters, NULL
characters are appended to dest as padding until n characters have been
written. Note that dest is guaranteed to be NULL-terminated. The return
value is the value of dest.
char *ustrncat(char *dest, const char *src, int n);
This function is like ustrcat() except that no more than n characters
from src are appended to the end of dest. If the terminating NULL
character in src is reached before n characters have been written, the
NULL character is copied, but no other characters are written. If n
characters are written before a terminating NULL is encountered, the
function appends its own NULL character to dest, so that n+1 characters
are written. The return value is the value of dest.
char *ustrzncat(char *dest, int size, const char *src, int n);
This function is like ustrzcat() except that no more than n characters
from src are appended to the end of dest. If the terminating NULL
character in src is reached before n characters have been written, the
NULL character is copied, but no other characters are written. Note that
dest is guaranteed to be NULL-terminated. The return value is the value
of dest.
int ustrncmp(const char *s1, const char *s2, int n);
This function compares up to n characters of s1 and s2. Returns zero if
the substrings are equal, a positive number if s1 comes after s2 in the
ASCII collating sequence, else a negative number.
int ustricmp(const char *s1, const char *s2);
This function compares s1 and s2, ignoring case.
char *ustrlwr(char *s);
This function replaces all upper case letters in s with lower case
letters.
char *ustrupr(char *s);
This function replaces all lower case letters in s with upper case
letters.
char *ustrchr(const char *s, int c);
This function returns a pointer to the first occurrence of c in s, or
NULL if no match was found. Note that if c is NULL, this will return a
pointer to the end of the string.
char *ustrrchr(const char *s, int c);
This function returns a pointer to the last occurrence of c in s, or NULL
if no match was found.
char *ustrstr(const char *s1, const char *s2);
This function finds the first occurence of s2 in s1. Returns a pointer
within s1, or NULL if s2 wasn't found.
char *ustrpbrk(const char *s, const char *set);
This function finds the first character in s that matches any character in
set. Returns a pointer to the first match, or NULL if none are found.
char *ustrtok(char *s, const char *set);
This function retrieves tokens from s which are delimited by characters
from set. To initiate the search, pass the string to be searched as s.
For the remaining tokens, pass NULL instead. Returns a pointer to the
token, or NULL if no more are found. Warning: Since ustrtok alters the
string it is parsing, you should always copy the string to a temporary
buffer before parsing it. Also, this function is not reentrant (ie. you
cannot parse two strings at the same time).
char *ustrtok_r(char *s, const char *set, char **last);
Reentrant version of ustrtok. The last parameter is used to keep track
of where the parsing is up to and must be a pointer to a char * variable
allocated by the user that remains the same while parsing the same
string.
double uatof(const char *s);
Convert as much of the string as possible to an equivalent double
precision real number. This function is almost like `ustrtod(s, NULL)'.
Returns the equivalent value, or zero if the string does not represent a
number.
long ustrtol(const char *s, char **endp, int base);
This function converts the initial part of s to a signed integer, which
is returned as a value of type `long int', setting *endp to point to the
first unused character, if endp is not a NULL pointer. The base argument
indicates what base the digits (or letters) should be treated as. If base
is zero, the base is determined by looking for `0x', `0X', or `0' as the
first part of the string, and sets the base used to 16, 16, or 8 if it
finds one. The default base is 10 if none of those prefixes are found.
double ustrtod(const char *s, char **endp);
This function converts as many characters of s that look like a floating
point number into one, and sets *endp to point to the first unused
character, if endp is not a NULL pointer.
const char *ustrerror(int err);
This function returns a string that describes the error code `err', which
normally comes from the variable `errno'. Returns a pointer to a static
string that should not be modified or free'd. If you make subsequent
calls to ustrerror, the string might be overwritten.
int usprintf(char *buf, const char *format, ...);
This function writes formatted data into the output buffer. A NULL
character is written to mark the end of the string. Returns the number of
characters written, not including the terminating NULL character.
int uszprintf(char *buf, int size, const char *format, ...);
This function writes formatted data into the output buffer, whose length
in bytes is specified by size and which is guaranteed to be NULL
terminated. Returns the number of characters that would have been written
without eventual truncation (like with usprintf), not including the
terminating NULL character.
int uvsprintf(char *buf, const char *format, va_list args);
This is like usprintf(), but you pass the variable argument list directly,
instead of the arguments themselves.
int uvszprintf(char *buf, int size, const char *format, va_list args);
This is like uszprintf(), but you pass the variable argument list
directly, instead of the arguments themselves.
|