History of w3m

1999/2/18
1999/3/8 revised
1999/6/11 translated into English
Akinori Ito
aito@fw.ipsj.or.jp

Introduction

W3m is a text-based pager and WWW browser. It is similar application to the famous text-based browser Lynx. However, w3m has several advantages against Lynx. For example,

W3m can render tables.
W3m can render frame (by converting frame into table).
As w3m is a pager, it can read document from standard input. (I heard Lynx also can display standard-input-given document, like this:
```
   lynx /dev/fd/0 > file
```
Hmm, it works on Linux. )
W3m is small. Its stripped binary for Sparc (compiled with gcc -O2, version beta-990217) is only 260kbyte, while binary size of Lynx is beyond 1.8Mbyte. (Actually, lynx it 800K on my i386 system, w3m is 200K + libgc.)

It is true that Lynx is an excellent browser, who have many features w3m doesn't have. For example,

Lynx can handle cookies.
Lynx has many options.
Lynx is multilingual. (W3m is Japanese-English bilingual)

etc. It is also a great advantage that Lynx has a lot of documentation.

I don't intend w3m to be a substitute of any other browsers, including Netscape and Lynx. Why did I wrote w3m? Because I felt inconvenient with conventional browsers to `take a look' at web pages. I am browsing web pages in LAN environment. When I want to take a glance at a web page, I don't want to wait to start up Netscape. Lynx also takes a few seconds to start up (you can get lynx startup time to almost zero when you rm /etc/mailcap). On the other hand, w3m starts immediately with little load to the host machine. After looking at the information using w3m, I use other browser if I want to read the the page in detail. As for me, however, w3m is enough to read most of web pages.

The birth of w3m

w3m was derived from a pager named `fm'. Fm was written before 1991 (I don't remember the exact date) when WWW was not popular. At that time, the word `browser' meant a file browser like `more' or `less'.

I wrote fm to debug a program for my research. To trace the status of the program, it dumped megabytes of values of variables into a file, and I debugged it by checking the dumped file. The program dumped information at a certain time in one line, which made the dumped line several hundred characters long. When I looked the file using `more' or `less', one line was folded into several lines and it was very hard to read it. Therefore, I wrote fm, which didn't fold a line. Fm displayed one logical line as one physical line. When seeing the hidden part of a line, fm shifted entire screen. As I used 80x24 terminal at that time, fm was very useful for the debugging.

Several years later, I got to know WWW and began to use it. I used XMosaic and Chimera. I liked Chimera because it was light. As I was interested in the mechanism of WWW, I learned HTML and HTTP, and I felt it simpler than I expected. The earlier version of HTTP was very similar to Gopher protocol. HTML 2.0 was simple enough to render. All I have to do seemed to be line folding and itemized display. Then I made a little modification to fm and made a web browser. It was the first version of w3m. The name `w3m' was an abbreviation of Japanese phrase `WWW wo miru', which means `see WWW'. It was an inheritance from `fm', which was an abbreviation of `File wo miru'. The first version of w3m was released at the beginning of 1995.

Death and rebirth of w3m

I had used w3m as a pager to read files, E-mails and online manuals. It was a substitute of less. Sometimes I used w3m as a web browser, but there were many pages w3m couldn't display correctly, most of which used table for page layout. Once I tried to implement table renderer, but I gave up because it seemed to be too difficult for me.

It was 1998 when I tried to modify w3m again. There were two reasons. The first is that I had some time to do it. I stayed Boston University as a visiting researcher at that time. The second reason is that I wanted to use table in my personal web page. I had written research log using HTML, and I wanted to write a table in it. At first I used <pre>..</pre> to describe table, but it was not cool at all. One day I used <table> tag, which made me to use Netscape to read the research log. Then I decided to implement a table renderer into w3m.

I didn't intend to write a perfect table renderer because tables I used was not very complicated. However, incomplete table rendering made the display of table-layout pages horrible. I realized that it required almost-perfect table renderer to do well both in `rendering (real) table' and `fine display of table-layout page.' It was a thorn path.

After taking several months, I finished `fair' table renderer. Then I implemented form into w3m. Finally, w3m was reborn as a practical web browser.

Table rendering algorithm in w3m

HTML table rendering is difficult. Tabular environment of LaTeX is not very difficult, which makes the width of a column either a specified value or the maximum width to put items into it. On the other hand, HTML table renderer has to decide the width of a column so that the entire table can fit into the display appropriately, and fold the contents of the table according to the column width. Inappropriate column width decision makes the table ugly. Moreover, table can be nested, which makes the algorithm more complicated.

First, calculate the maximum and minimum width of each column. The maximum width is the width required to display the column without folding the contents. Generally, it is the length of paragraph delimited by <BR> or <P>. The minimum width is the lower limit to display the contents. If the column contains the word `internationalization', the minimum width will be 20. If the column contains <pre>..</pre>, the maximum width of the preformatted text will be the minimum width of the column.
If the width of the column is specified by WIDTH attribute, fix the column width using that value. If the specified width is smaller than the minimum width of the column, fix the column width to the minimum width.
Calculate the sum of the maximum width (or fixed width) of each column and check if the sum exceeds the screen width. If it is smaller than screen width, these values are used for width of each column.
If the sum is larger than the screen width, determine the widths of each column according to the following steps.
1. Let W be the screen width subtracted by the sum of widths of fixed-width columns.
2. Distribute W into the columns whose width are not decided, in proportion to the logarithm of the maximum width of each column.
3. If the distributed width of a column is smaller than the minimum width, then fix the width of the column to the minimum width, and do the distribution again.

In this process, distributed width is proportion to logarithm of maximum width, but I am not sure that this heuristic is the best. It can be, for example, square root of the maximum width.

The algorithm above assumes that the screen width is known. But it is not true for nested table. According the algorithm above, the column width of the outer table have to be known to render the inner table, while the total width of the inner table have to be known to determine the column width of the outer table. If WIDTH attribute exists there are no problems. Otherwise, w3m assumes that the inner table is 0.8 times as wide as the outer table. It works fine, but if there are two tables side by side in an outer table, the width of the outer table always exceeds the screen width. To render this kind of table correctly, one have to render the table once, check the width of outmost table, and then render the entire table again. Netscape might employ this kind of algorithm.

Libraries

w3m uses Boehm GC library. This library was written by H. Boehm and A. Demers. I could distribute w3m without this library because one can get the library separately, but I decided to contain it in the w3m distribution for the convenience of an installer. W3m doesn't use libwww.

Boehm GC is a garbage collector for C and C++. I began to use this library when I implemented table, and it was great. I couldn't implement table and form without this library.

Older version than beta-990304 used LIBFTP because I felt tired of writing codes to handle FTP protocol. But I rewrote the FTP code by myself to make w3m completely free. It made w3m slightly smaller.

By the way, w3m doesn't use UNIX standard regexp library and curses library. It is because I want to use Japanese. When I wrote fm, there were no free regexp/curses libraries that can treat Japanese. Now both libraries are available and they looks faster than w3m code.

Future work

...Nothing. As w3m's virtues are its small size and rendering speed, adding more features might lose these advantages. On the other hand, w3m is still known to have many bugs, and I will continue fixing them.