Html2Wml converts HTML pages to WML decks, suitable for being viewed on a
Wap device. The program can be launched from a shell to statically convert
a set of pages, or as a CGI to convert a particular (potentially dynamic)
HTML resource.
Althought the result is not guarantied to be valid WML, it should be the
case for most pages. Good HTML pages will most probably produce valid WML
decks. To check and correct your pages, you can use W3C's softwares: the HTML Validator, available online at http://validator.w3.org and HTML Tidy, written by Dave Raggett.
Html2Wml provides the following features:
translation of the links
limitation of the cards size by splitting the result into several cards
inclusion of files (similar to the SSI)
compilation of the result (using the WML Tools, see LINKS)
a debug mode to check the result using validation functions
Please note that most of these options are also available when calling
Html2Wml as a CGI. In this case, boolean options are given the value ``1''
or ``0'', and other options simply receive the value they expect. For
example, --ascii becomes ?ascii=1 or ?a=1. See the file t/form.html for an example on how to call Html2Wml as a CGI.
This option tells Html2Wml to collapse redundant whitespaces, tabulations,
carriage returns, lines feeds and empty paragraphs. The aim is to reduce
the size of the WML document as much as possible. Collapsing empty
paragraphs is necessary for two reasons. First, this avoids empty screens
(and on a device with only 4 lines of display, an empty screen can be quite
ennoying). Second, Html2wml creates many empty paragraphs when converting,
because of the way the syntax reconstructor is programmed. Deleting these
empty paragraphs is necessary like cleaning the kitchen :-)
If this really bother you, you can desactivate this behaviour with the
--nocollapse option.
Setting this option tells Html2Wml to use the compiler from WML Tools to
compile the WML deck. If you want to create a real Wap site, you should
seriously use this option in order to reduce the size of the WML decks.
Remember that Wap devices have very little amount of memory. If this is not
enought, use the splitting options.
This option tells Html2Wml to replace the image tags with their
corresponding alternative text (as with a text mode web browser). This
option is on by default.
This option is on by default. This makes Html2Wml flattens the HTML tables
(they are linearized), as Lynx does. I think this is better than trying to
use the native WML tables. First, they have extremely limited features and
possibilities compared to HTML tables. In particular, they can't be nested.
In fact this is normal because Wap devices are not supposed to have a big
CPU running at some zillions-hertz, and the calculations needed to render
the tables are the most complicated and CPU-hogger part of HTML.
Second, as they can't be nested, and as typical HTML pages heavily use
imbricated tables to create their layout, it's impossible to decide which
one could be kept. So the best thing is to keep none of them.
[Note] Although you can desactivate this behaviour, and although there is internal
support for tables, the unlinearized mode has not been heavily tested with
nested tables, and it may produce unexpected results.
This option allows you to limit the size (in bytes) of the generated cards.
Default is 1,500 bytes, which should be small enought to be loaded on most
Wap devices. See DECK SPLITTING for more information.
This option sets the threshold of the split event, which can occur when the
size of the current card is between max-card-size -
card-split-threshold and max-card-size. Default value is 50. See DECK SPLITTING for more information.
This option activates the debug mode. This prints the output result with
line numbering and with the result of the XML check. If the WML compiler
was called, the result is also printed in hexadecimal an ascii forms. When
called as a CGI, all of this is printed as HTML, so that can use any web
browser for that purpose.
When this option is on, it send the WML output to XML::Parser to check its
well-formedness.
The deck slicing is a feature that Html2Wml provides in order to match the low memory
capabilities of most Wap devices. Many can't handle cards larger than 2,000
bytes, therefore the cards must be sufficiently small to be viewed by all
Wap devices. To achieve this, you should compile your WML deck, which
reduce the size of the deck by 50%, but even then your cards may be too
big. This is where Html2Wml comes with the deck slicing feature. This
allows you to limit the size of the cards, currently only
before the compilation stage.
On some Wap phones, slicing the deck is not sufficient: the WLM browser
still tries to download the whole deck instead of just picking one card at
a time. A solution is to slice the WML document by decks. See the figure
below.
What this means is that Html2Wml generates several WML documents. In CGI
mode, only the appropriate deck is sent, selected by the id given in
parameter. If no id was given, the first deck is sent.
Currently, Html2Wml estimates the size of the card on the fly, by summing
the length of the strings that compose the WML output, texts and tags. I
say ``estimates'' and not ``calculates'' because computing the exact size
would require many more calculations than the way it is done now. One may
objects that there are only additions, which is correct, but knowing the exact size is not necessary. Indeed, if you compile the WML, most of the strings
of the tags will be removed, but not all.
For example, take an image tag:
<img src="images/dog.jpg" alt="Photo of a dog">. When compiled, the string "img" will be replaced by a one byte value. Same thing for the strings "src" and "alt", and the spaces, double quotes and equal signs will be stripped. Only the
text between double quote will be preserved... but not in every cases.
Indeed, in order to go a step further, the compiler can also encode parts
of the arguments as binary. For example, the string "http://www."
can be encoded as a single byte (8F in this case). Or, if the attribute is href, the string href="http:// can become the byte 4B.
As you see, it doesn't matter to know exactly the size of the textual form
of the WML, as it will always be far superior to the size of the compiled
form. That's why I don't count all the characters that may be actually
written.
It's the basic and classical way to code an hyperlink. It takes 42 bytes to
code this, because it is presented in a human-readable form.
The WAP Forum has defined a compact binary representation of WML in its
specification, which is called ``compiled WML''. It's a binary format,
therefore you, a mere human, can't read that, but your computer can. And
it's much faster for it to read a binary format than to read a textual
format.
The previous example would be, once compiled (and printed here as
hexadecimal):
1C 4A 8F 03 y a h o o 00 85 03 Y a h o o ! 00 01
This only takes 20 bytes. Half the size of the human-readable form. For a
Wap device, this means both less to download, and easier things to read.
Therefore the processing of the document can be achieved in a short time
compared to the tectual version of the same document.
There is a last argument, and not the less important: many Wap devices only
read binary WML.
Actions are a feature similar to (but with far less functionalities!) the
SSI (Server Side Includes) available on good servers like Apache. In order
not to interfere with the real SSI, but to keep the syntax easy to learn,
it differs in very few points.
Includes a file in the document at the current point. Please note that
Html2Wml doesn't check nor parse the file, and if the file cannot be found,
will silently die (this is the same behavior as SSI).
The links reconstruction engine is IMHO the most important part of
Html2Wml, because it's this engine that allows you to reconstruct the links
of the HTML document being converted. It has two modes, depending upon
whether Html2Wml was launched from the shell or as a CGI.
When used as a CGI, this engine will reconstructs the links of the HTML
document so that all the urls will be passed to Html2Wml in order to
convert the pointed files (pages or images). This is completly automatic
and can't be customized for now (but I don't think it would be really
useful).
When used from the shell, this engine reconstructs the links with the given
templates. Note that absolute URLs will be left untouched. The templates
can be customized using the following syntax.
This template controls the reconstruction of the href attribute of the A tag. Its value can be changed using the --hreftmpl option. Default value is
"{FILEPATH}{FILENAME}{$FILETYPE =~ s/s?html?/wml/o; $FILETYPE}".
This template controls the reconstruction of the src attribute of the IMG tag. Its value can be changed using the --srctmpl option. Default value is
"{FILEPATH}{FILENAME}{$FILETYPE =~ s/gif|png|jpe?g/wbmp/o; $FILETYPE}"
The template is a string that contains the new URL. More precisely, it's a
Text::Template template. Parameters can be interpolated as a constant or as
a variable. The template is embraced between curcly bracets, and can
contain any valid Perl code.
The simplest form of a template is {<em>PARAM</em>} which just returns the value of PARAM. If you want to do something more complex, you can use the corresponding
variable; for example {"foo $<em>PARAM</em> bar"}, or
{join "_", split " ", <em>PARAM</em>}.
You may read Text::Template for more information on what is possible within a template.
If the original URL contained a query part or a fragment part, then they
will be appended to the result of the template.
This is the official site of the WAP Forum. You can find some technical
information, as the specifications of all the technologies associated with
the WAP.
Altough not directly related to the Wap stuff, you may find useful to read
the specifications of the XML (WML is an XML application), and the
specifications of the different stylesheet languages (CSS and XSL), which
include support for low-resolution devices.
This web site is dedicated to Mobile UniX systems. It leads you to a lot of
useful hands-on information about installing and running Linux and BSD on
laptops, PDAs and other mobile computer devices.
wApua is an open source WML browser written in Perl/Tk. It's easy to intall
and to use. Its support for WML is incomplete, but sufficient for testing
purpose.
Tofoa is an open source Wap emulator written in Python. Its installation is
quite difficult, and its incomplete WML support makes it produce strange
results, even with valid WML documents.
EzWAP, from EZOS, is a commercial WML browser freely available for Windows
9x, NT, 2000 and CE. Compared to others Windows WML browsers, it requires
very few resources, and is quite stable. Its support for the WML specs
seems quite complete. A very good software.
Deck-It is a commercial Wap phone emulator, available for Windows and
Linux/Intel only. It's a very good piece of software which really show how
WML pages are rendered on a Wap phone, but one of its major default is that
it cannot read local files.
This is a Wap emulator written in Java. It uses the Java native GUI, and
claims to support binary WML, but it doesn't seem to work at all at this
time.