Application Note

Taming HTML Tables with AWK

Introduction

Manually translating tabular data into HTML-formatted tables manually can be a vexing and error-prone process - a 2-dimensional dataset must be converted into a linear one-dimensional string of HTML tags and data values with all the table, record and cell tags lining up exactly. In other words, this is job for a machine.

Here is a fast and easy method of making HTML tables that does not need megabytes of memory, does not need kilobuck tools but is very adequate for the small, occasional spur-of-the moment table. It may require that you have to do a little "hands-on" adjusting the table controls - which you often end up doing one way or another - plus some simple cut and paste. (You can even do everything under only the DOS prompt.)

The AWK language lends itself very well to doing this transformation of a 2-dimensional dataset to a linear one-dimensional string of HTML tags and data values simply and accurately. The result is an HTML-formatted sequence ready to be pasted into a web page, or tweaked for such details as cell formatting. A side benefit for the site concerned about page loading time, this technique can produce resultant HTML code that is about as efficient as absolutely possible.

The following describes a simple AWK program that reads a flat ASCII file of table data and writes the same data into an HTML formatted file. A powerful feature of the AWK language is that it's programs adapt to any number of vertical columns/arguments. Thus counting columns is automatic. This program is described with hopefully enough detail to permit it to be modified for other circumstances, but without being an AWK tutorial. And then we present an example for using this program.

Program description:

The NR==1 address begins the code block that is executed when the first input line has been read. the number of fields is read and saved in NF. The table header tags and table column titles are extracted and written to the output file.

The NR>1 address begins the code block that handles all following data. Each line is read and the NF fields in every line are transferred to appropriate data tags.

The END address specifies the code block that is executed after the last input line has been processed. This code writes the end-of-table tag plus a summary of how many lines of HTML were generated.

The simplest AWK coding technique is to use redirection to capture the output.

See Listing 1 of the table.awk program.

Data preparation:

Prepare the source table data (assumed in tabular form in a flat ASCII file with no tabs) by filling the displayed spaces within each field with a unique character, such as underline. The same number of columns must be maintained throughout the file.

Then ensure the source table has spaces separating the cells. AWK's default field separator is a space (which can be changed if needed, but that makes this program messier).

Pass this source file through the table.awk program which formats each cell and writes an HTML skeleton file with the appropriate table tags.

Program operation:

The first line is read to define the headings (and number of columns) which go into the <th> tagged records for each column.

All following lines to the end of the file are assumed data and go into the <tr> tagged records.

The output will be the following:

<table>                    <!-- linenum -->
<tr>                       <!-- linenum -->
<th align=right>Title</th> <!-- linenum -->
 ...                       <!-- linenum -->
</tr>                      <!-- linenum -->
                           <!-- linenum -->
<tr>                       <!-- linenum -->
 <td align=right>Data</td> <!-- linenum -->
 ...                       <!-- linenum -->
</tr>                      <!-- linenum -->
...                        <!-- linenum -->
</table>                   <!-- linenum -->

Each line in the output file is given a sequential line number to facilitate managing any subsequent changes. Note that this feature can be disabled if file upload time is a consideration by commenting or deleting all the 'printf ("%s%g%s", "\t<!--", numout, "-->\n")' lines.

A quirk to keep in mind is that the interpreter puts the command line into the first line of the output file when using redirection; so this line must be deleted in the paste operation.

Detailed Example:

For example, we wish to put the following data into a web page table:

                     Kane Todie Bullet Gov  Collin SJ Rvr
Kane Gulch TH         0.0   9.5  22.8  29.1  38.0  51.7
Junction Ruin         4.0   5.5  18.8  25.1  34.0  47.7

The first line is the titles for the columns, and successive lines are the cell data.

First, fill the spaces in each cell's data with our unique character (in this case, underline) and put it into file inpfile:

____________________ Kane Todie Bullet Gov  Collin SJ_Rvr
Kane_Gulch_TH_______  0.0   9.5  22.8  29.1  38.0  51.7
Junction_Ruin_______  4.0   5.5  18.8  25.1  34.0  47.7

Invoke AWK(*) to translate the infile

        C:/AWK/AWK table.awk infile > outfile

This produces the following outfile file:

  c:\awk\awk inpfile > outfile
  <table> <!--1-->
  <tr>    <!--2-->
   <th align=right>__Trail_Point_______</th>      <!--3-->
   <th align=right>Kane</th>      <!--4-->
   <th align=right>Todie</th>     <!--5-->
   <th align=right>Bullet</th>    <!--6-->
   <th align=right>Gov</th>       <!--7-->
   <th align=right>Collin</th>    <!--8-->
   <th align=right>SJ_Rvr</th>    <!--9-->
  </tr>   <!--10-->

  <tr>    <!--12-->
   <td align=right>Kane_Gulch_TH_______</td>      <!--12-->
   <td align=right>0.0</td>       <!--13-->
   <td align=right>9.5</td>       <!--14-->
   <td align=right>22.8</td>      <!--15-->
   <td align=right>29.1</td>      <!--16-->
   <td align=right>38.0</td>      <!--17-->
   <td align=right>51.7</td>      <!--18-->
  </tr>   <!--20-->

  <tr>    <!--22-->
   <td align=right>Junction_Ruin_______</td>      <!--22-->
   <td align=right>4.0</td>       <!--23-->
   <td align=right>5.5</td>       <!--24-->
   <td align=right>18.8</td>      <!--25-->
   <td align=right>25.1</td>      <!--26-->
   <td align=right>34.0</td>      <!--27-->
   <td align=right>47.7</td>      <!--28-->
  </tr>   <!--30-->
  </table>        <!--32-->
  <!-- 32 HTML records -->
  

Replace the fill character (in this example, the underlines) with spaces, delete the first line, and you probably have the data in HTML format ready to be pasted into your web page skeleton. A possible distraction is that the AWK interpreter puts out the tab character rather than spaces.

Check out the result with your web browser, adjust any necessary alignment, border, cellpadding, cellspacing, or width values, and you are ready for the web!

References:

Book:
The AWK Programming Language, 1988
by: Aho, Kernighan and Weinberger
Publ: Addison-Wesley Publishing Co.
ISBN: 0-201-07981-X
Book:
Effective AWK Programming
by: Arnold Robbins
Publ: SSC (+1 206-FOR-UNIX)
ISBN: 0-916151-88-3
Web: http://www.ssc.com
Email: sales@ssc.com
Also published by the Free Software Foundation as "The GNU AWK User's Guide"

Availability:

A compiler version (tawk) is available from:
Thompson Automation
5616 SW Jefferson
Portland, OR 97221
USA
North America: 800/944-0139
Tel: +1 503 224 1639
Fax: +1 503 224 3230
Web: http://www.tasoft.com/
Web: http://www.teleport.com/~thompson/
A compiler version (awk) that can generate standalone interpreted .exe programs is available from:
Mortice Kern Systems
185 Columbia Street W
Waterloo, ON
N2L 5Z5
Canada
North America: 800/265-2797
Tel: +1 519 884 2251
Fax: +1 519 884 8861
Web: http://www.mks.ca/solution/tk/

(*)The author used a DOS-based AWK interpreter written by Rob Duff that is apparently no longer available.

More information about the current state of AWK is available from -

Web sites where more information may be found (extracted largely from the above FAQ):

E. Stiltner


This application note has been brought to you by Skunk Creek Computing Services.

Address comments or otherwise to:
stiltner[AT]sccs.com

Legalities - -
The above information is based on the author's experience and observations. This information is supplied with no warranty nor any guarantee. It is the user's sole responsibility to validate it for his or her intended application.


URL of this page: http://www.sccs.com/sccsadoc.htm
Revised '5-Feb-2003,10:55:18'
Copyright © 1999, 2000 SCCS.