Subject: Elements of Electronic Text Style

Elements of E-Text Style
Version 1.0
9 August 1993

This file should be named ESTYLE10.TXT or estyle10.txt.


Copyright (c) 1993 by John E. Goodwin.  All Rights Reserved.

You may make and distribute verbatim copies of this work for non-
commercial purposes using any means, provided this copyright notice is 
included in all such copies.

Contact:  John E. Goodwin
          P.O. Box 6022
          St. Charles, IL  60174
          jegoodwin@delphi.com

[John Goodwin is available to consult, write, and teach courses on E-
text issues and Internetworking]


Abstract:  This manual discusses how to use electronic text (E-text) as 
a communications medium distinct from the print media.  The manual is 
written in a non-technical style, such as a humanist-of-little-brain 
might enjoy reading.  

  o  You can learn how to write effective E-text for personal, business, 
and scholarly communication.  

  o  It includes sections on preparing forms and texts for electronic 
response and on writing effective and business-like E-mail letters.  

  o  There is a brief section on Standard Generalized Markup Language, a 
coding standard of interest to humanists.


Just to prove how non-technical it all is, here is an exceptional lapse 
into technical jargon, in case you know what the Internet and FTP 
archives are:

    This work is a companion volume to _E-Mail 101_, available free as
    ftp://mrcnext.cso.uiuc.edu:/etext/etext93/email025.txt.


<title>  Elements of E-Text Style

  =Preface=     An Apology for E-Text

  =Part I=      Writing for an E-Text Audience

  =Part II=     Specific Differences of Style and Mechanics

  =Part III=    A Very Brief Style Manual

  =Appendix A=  Technical Details:  Relationship to SGML and TEI

  =Table I=     Full Table of Contents  (go to very end of this file)


<Preface>

    This work grew out of my earlier course notes published under the 
title _EMAIL 101_.  It was originally projected to be a three chapter 
section concerning the special needs of writers who wished their works 
to be transportable by the electronic networks.  The chapters were not 
included in the original release as they existed only in outline form.

Over the course of Summer 1993 I gradually came to realize that E-text 
was a communication medium in its own right, with its own needs and 
conventions, its own strengths and weaknesses, and not merely the 
bastard child of the print medium.  Consequently, many questions of 
style, long ago settled for print media and fixed into rules in style 
manuals, needed to be re-examined in light of the new medium.

Since, it seemed to me, that no one had set out to treat the stylistic 
considerations of writing E-text, at least at any length, I decided to 
expand my three chapters into the present work.  I set out to write down 
systematically some observations I had made concerning the differences 
between E-text and "ordinary" writing.  I treat E-text as a legitimate 
medium of expression, one that must be addressed on its own terms and 
without unnecessary reference to how the words might look on paper or 
how the work might be useful if printed out.

For reasons that I will discuss at length in the first part, only a 
small fraction of E-text will ever see the light of print.  While paper 
may offer a better resolution image and a more perspicuous whole, E-text 
excels at ease of production and portability.  It can be copied simply, 
transported great distances in seconds by electronic networks, and 
stored on magnetic media--floppy disks, hard drives, and CD-ROMs-- that 
are less bulky and cheaper than paper.  

The extraordinary growth of E-mail in the past few years, from a medium 
used by a few scientists and government officials to one accessible to 
millions, often in a humanistic or business setting, demands that we 
give the writing of E-text the attention it deserves.  If you wish to 
communicate effectively, you will have to master this new medium.  It is 
a necessary part of education--if only we knew what to teach!

Good writing is, in many respects, the same for any medium.  And the 
first thing any writer learns is that their** writing must fit both the 
audience and the medium being used.  We cannot pretend any longer that 
we are writing for print or that our audience will be looking at 
anything other than a computer screen.

  **  I deliberately use "their" as an ambiguous pronoun throughout.

Just as the print media differ among themselves depending on the 
intended audience, expected lifetime of the text, and peculiarities of 
the medium, so E-text differs from print.  

This work is organized as follows:

  In the first part we delineate the major differences between the print 
media and E-text.  

  In the second part, we discuss specific issues such as techniques
for designing a visually appealing layout, or representing characters.

  The third and final part is a brief style manual for writing E-text.  
It is not offered as a set of prescriptions, but as an example of how 
the principles in the second part can be realized in practice.

                       +     +     +

In this introductory section, I would like to make a brief apology for 
E-text.  It is not usual, in discussing the print media, to begin a 
manual on style with a defense of the worth of the medium; however, E-
text is so new that many persons will say "Why bother with it?".  They 
deserve an answer.

The most insidious objection to E-text is the claim that it is just 
printed text before it has been printed out.  In effect, this denies 
either that

  (1)  There is any difference between the needs of E-text and the needs 
of print; or

  (2)  That all text is printed out before being read.

The second premise is demonstrably false--most E-mail correspondence and 
anything longer than about 25 pages obtained over a computer network 
suffice as examples.  

The first premise requires a more extended answer, since it is the 
source of a great deal of confusion.  In fact, the entire first part of 
this work is devoted to refuting it.  In this brief apology I will 
answer two simpler objections:  that E-text is so esoteric that it is of 
no interest to ordinary persons; or that it is so commonplace as to be 
beneath our consideration.  I call these two objections the "Ham Radio" 
and "Telephone" objections, respectively.


    Not every communications medium is of interest to a large number of 
persons.  Take, for example, Amateur Radio.  Using short-wave radio to 
communicate requires a fair technical knowledge and special equipment.  
Because of these two investments, neither the medium nor the skills 
required to master it are common.  This situation is very similar to 
that of computers in the late '70's.  Computers were not commonplace, 
being owned mostly by hobbyists.  Communication and distribution of 
information was primitive, often by floppy disk passed hand to hand.  
And the special programs required to create and read E-text--word 
processors--were uncommon and required special skills.

On the other hand, some will object that E-text is now so commonplace 
that it needs no consideration.  You don't read style manuals about how 
to talk on the Telephone do you?  Although some scholars may discuss how 
telephone conversation differs from the ordinary face-to-face variety, 
most of us use telephones un-self-consciously.  E-text is like typing a 
letter.  Who cares?

Although the *mechanics* of talking on a telephone are trivial, the 
social implications are not.  One can point out, for example, that to 
most people, their parents have become persons that they talk to on the 
telephone and not persons that they work with every day and see face-to-
face.  The social implications of this are enormous; the technology 
trivial.  

Similarly with E-text:  while the mechanics are easily mastered and 
perhaps of little interest, E-text together with global computer 
networks make possible a form of community that didn't exist prior to 
the medium.  The sort of community that will form around E-text is 
different from the kind of communities that are centered on the 
telephone.  Rather than family or casual friends, it is likely to be a 
community that cares about a single issue or agenda.  

These communities can range from complex communities like companies or 
groups of scholars, to persons sharing a single, simple interest.  
Already, in our society, we find that technology has allowed us to adopt 
a pattern of individualism never seen in the world before.  Most face-
to-face communication is with your immediate family, your co-workers, 
and perhaps a few friends.  These friends are not as likely to live next 
to you as in a small town, and you see them less often.  

E-text both carries this atomization to its extreme and simultaneous 
offers a way out from its worst effects.  It is possible, using the 
medium, to form important relationships with persons you have never seen 
or talked to--this is individual atomism in the extreme.  At the same 
time, E-text provides a communications medium that can go beyond.  It 
solves the problem, inherent in much of our society, of shallow 
relationships with other humans.

These new, deep relationships can be business or scholarly, , or just 
old-fashioned friendship.  Thus communicating well carries social 
implications that go far deeper than talking well on the telephone.  How 
you write E-text may affect how you *appear* to potential friends, 
clients, and one day perhaps even family.**

  ** It is only a matter of time before parents of college children 
realize they can have a much closer relationship with their children for 
the 10 dollars a month it costs to open an E-mail account.

Despite the unnaturalness compared to talking, in many ways E-text is 
superior to the telephone as a way to "keep in touch".  The telephone 
requires that both persons be available simultaneously.  Most 
conversations are short and business-like, with marathon sessions being 
reserved for close family and a few friends.  

But it is not for writing the occasional personal note that one needs a 
style manual.  Unlike the telephone, E-mail has more serious uses--the 
same uses that print media have.  It is used for business, persuasion, 
publication, and scholarship.  E-mail may become as commonplace as 
telephone, but it will not be approached with the same casualness.  


    Over the course of the past year or so I have seen collaborations of 
individuals in many fields spring up.  These collaborations at first 
were of course among computer scientists.  Then, in the last couple of 
years the Scientists have caught on.  There are signs that all academic 
disciplines will soon have such collaborations.  The cost in equipment 
is low and the advantages great.  Software for business "working groups" 
is already in the marketplace.  

Collaboration by E-mail--and a consequent reliance on E-text--may become 
the dominant social model for certain kinds of collaboration:  E.g., 
within a company or scholarly community--wherever the persons cannot 
meet face to face.

There are many who say that E-text as we now know it--the typewriter-
like production of character-oriented terminals--will soon give way to a 
new medium, mulitmedia.  In this view, newer computers will spawn newer 
media and the old ones will be forgotten.  In five years, ten at the 
most, E-text will be a thing of the past.  Surely, the argument goes, we 
should not invest time in perfecting a medium that is little better than 
a fad.

Multimedia indeed shows great promise.  I have no doubt that soon it 
will be possible to mail graphic images, audio, and video clips along 
with text.  Printers will print not only color but black and white.  And 
visual formatting information like font, point size, and so on will be 
sent alongside the basic text.  Not only that, but these capabilities 
will become part of every household, every phone system, cable system, 
and cellular communications network.  Personal computers will replace 
telphones as the "communications center" of the household.

The vision of multimedia is one of old media--color magazines, 
television, telephone, radio--being reborn in the new guise of 
electronics.  But what do you think will be a large component of each 
and every mulitmedia message?  Could it be that most of it will be E-
text?  I think multimedia will turn out to be a lot like a letter to 
home.  We may send an occasional picture, or even an audio cassette, but 
most of the communication will be in our writing.  

Ultimately, writing is easier than taking photographs or editing video 
clips--though not as easy as talking.  It takes less time, less capital, 
and less effort.  Multimedia may be good for advertising, for writing 
textbooks, and for fun; but for just plain communicating?  If it 
requires more thought or needs to reach more persons than a short 
telephone call, it will be E-text.  Multimedia will fill the niche of 
four color magazines, coffee table art books, the biology textbooks, and 
advertising.

Look around you, at your bookshelves, and notice how many have no 
pictures.  Think how many typed letters your office sends out compared 
to the number of four-color brochures it creates.  Most information is 
disseminated by the cheapest possible means.  Right now, electronic text 
is that cheapest means.  As more and more persons learn how to get it, 
it will become the dominant medium.  E-text is the black and white print 
of the electronic age.

The uses of E-text are as diverse as the uses of print.  The chief 
innovation of the new medium is the fact that it places the capability 
to publish in the hands of *anyone*.  The capital required to spread 
information or ideas has been reduced to a level any person, or at least 
any community of persons, can afford.

The E-text revolution is that individuals are no longer dependent on 
institutions or even businesses to create, share, and gather 
information.**  Every interest and splinter group, every church or 
synagogue, every would-be author, student, or scholar can collaborate 
with others, write, and share texts.  

  ** They are still dependent on hardware, software, and 
telecommunications.

As E-text becomes more and more acceptable, it will become the medium of 
expression used by the masses.  If you wish to reach them, you will have 
to learn to write it effectively.  Education--real education--has always 
been a rather solitary effort.  The right conditions seem to involve 
access to a good library, a chance to talk with collaborators, write new 
material, and have it discussed by the community of interested persons.  
E-text can bring these necessary conditions for education out of the 
university to the simplest home.


    E-text is at the stage the European vernaculars were at the time of 
the Renaissance.  There were many doubters who pointed to the 
established Latin tongue as the medium of communication.  But, in time, 
reality forced even the scholars to yield.  A revolution was 
accomplished in which masses of ordinary people could own books and even 
on occasion produce them.  The implications for society and learning 
were staggering.

Like that earlier time, when print was new, there is now much innovation 
and experimentation, and the wise practitioner will sift carefully the 
techniques and suggestions offered both here and by others.  In time we 
shall have our Dantes, our Bacons, and our Shakespeares; the persons who 
will show us how to make this new medium not only a utilitarian one but 
a sublime one.  For now, let us take those first hesitant steps down 
that path.


<Part I>  Writing for an E-Text Audience:  Basic Problems

Writing for an E-text audience is very much like writing for a print 
audience, but there are subtle differences.  Nowadays, both works 
destined for print and works aimed at the global networks are likely to 
be created on a personal computer.  The advantage of being able to make 
incremental changes to a manuscript, and to create near print-quality 
works with a laser printer--not to mention the advantages of spell-
checkers, automatic footnotes and the like--means that both kinds of 
author will be using a computer.  But one will be aiming for an 
effective and attractive *printed* manuscript and the other will be 
aiming to accomplish the same end on a computer screen.

The difference between E-text and print comes down to two factors:  

  (1) it is not currently possible to create a file that simultaneously 
looks good in print and on the screen, yet is universally accepted by 
all computer programs; and 

  (2) the least common denominator computer screen has a lower 
resolution, a smaller viewing window, and a more limited repertoire of 
visual effects than even a typewriter.

This Chapter will address three questions:

  o  Why write for an E-text audience at all?

  o  Is it possible to write for both audiences at the same time?

  o  How does writing for an E-text audience differ from writing for a 
Print Audience?

In Part II we will explore the extent to which you can have it both 
ways--strategies for getting as close as possible to the Holy Grail of 
Electronic Communications, a file everyone can read that looks just as 
good on the screen and in print.

Part III presents the mechanics of creating E-text in the form of a very 
brief style manual.  The issues of the previous two chapters are 
summarized in a series of suggestions for creating your own effective 
style.

The last Chapter of this part of the course discusses copyright issues 
that effect the distribution of E-text.  These issues, important as they 
are for print media, become paramount concerns when copying your text is 
as easy as pressing a button.

  =Section 1.1=  Why Write for an E-text Audience?

  =Section 1.2=  Is it Possible to Write E-Text and Print at the Same 
Time?

  =Section 1.3=  Differences between E-Text and Print Media

  =Section 1.4=  Version Control


<Section 1.1>  Why Write for an E-text Audience?

The basic position here is that computers are basically machines for 
creating printed text.  This position, contrary to the one taken here, 
has a number of advantages:

  (1) The resolution (appearance) of the final product is superior to 
anything that can be created on the screen;

  (2) Print media are easier to handle and browse;

  (3) Paper is universally accepted and readable--no special hardware or 
programs are required; 

  (4) The product is compatible with all the information-handling 
systems that have been developed for paper (files, libraries, 
catalogues, ... ); and

  (5) The author's copyright is easier to maintain because the final 
text is harder to copy.

These are overwhelming advantages.  I call the resulting situation--
paper is the medium of storage and standard of communication while 
computers and printers are just tools for creating paper--the "cellulose 
interchange standard".  It is well established; it works; and it is hard 
to beat.  It is the norm even in the computing world.  The paper bias, 
e.g. of word processors, is obvious.

Against this view is a reality of modern life:  it is becoming very much 
cheaper to store information in electronic form and comparatively more 
expensive to store it as paper.  Let's consider some facts:

  o  A 300 page book takes up a Megabyte of memory--around one 50 cent 
floppy.

  o  CD-ROM storage lowers the cost to a few pennies for a book.  A 
single CD-ROM can store several hundred books.

  o  An 8 mm video tape can store several *thousand* books.  This means 
that the information in the Harvard University library system, one of 
the world's largest (6 million volumes) would take up few thousand 
cassette tapes presently costing less than $100,000.

Given that, in addition, you can 

  revise electronic text easily, 

  make copies faster, 

  send it further in less time and at less expense, 

  store it more cheaply, 

  print it out, 

  send it as a fax, 

  and convert it to other formats, 

it will soon be a commonplace that *even for documents that are designed 
to be printed out and looked at on paper* the principal means of storing 
and exchanging information will be in electronic format.

So let's get this straight:  we are not discussing whether it is better 
to store and exchange information on paper or electronically.  The bulk 
of information will soon be stored on magnetic media and exchanged 
electronically.  What we are discussing is whether it makes more sense 
to prepare *electronic* documents that look good printed out or ones 
that look good on a screen.

Paper will become (in fact already is) a luxury reserved for the cream 
of the information crop--just as four color printing is reserved for Art 
books, glossy magazines, and advertising, while most other printed 
information is black and white.  You will want the 1% of your 
information you use most often in print form.  You won't be *able* to 
get or afford print versions of most information, any more than you can 
afford to buy everything in hard cover or print every brochure in four 
colors.

And is this so bad?  Why not have a good print library *and* good 
electronic text.  My public library, a very good one, has around 100,000 
volumes and cost several millions to build.  If I want something they 
don't have, I have to wait a week for interlibrary loan, or a xerox 
copy, or a fax.  In most cases I would be happier with an electronic 
version I can look at *today*.  

The point is that paper doesn't compete with E-text, E-text is probably 
information you *would not have* in any other form.  It's books you 
wouldn't have because you can't afford 1000 books right now (but you can 
afford a couple of CD ROMs); it's text you can view (and download) at 
your public library that the library couldn't afford in paper; and it's 
free stuff that can't be distributed for free any other way, because 
paper just costs too much.

Do you have to write E-text?  You do if you want your audience to 
include the 25 million people with E-mail access (projected to be 75 
million in three years).  You do if you want your message to travel as 
far as possible--even if it is intended to be printed at the other end.

So you will write electronic text that will never be printed because you 
*have to*.  That means you need to learn to write effective E-text, 
because there really is no alternative.  Fortunately, if you can write 
well in the print medium, you can write well in E-text.  We'll give you 
a few tips in a moment, but first we need to dispel the notion that it 
is possible to write for both media at the same time.


<Section 1.2>  Is it Possible to Write for E-text and Print at the Same 
Time?

Here we come to the claim I made above, that it is impossible to satisfy 
all three of these criteria with a single file:

  (1) the file can be read by any computer

  (2) the file creates good looking print

  (3) the file looks good on the screen

You can pick two out of three, but you can't get all three.  This is 
unfortunate, but it is also true.  

The only kind of file that is *universally* accepted is the plain text 
file, also called the common =ASCII= file.  Actually, even this is an 
overstatement.  ASCII, the American Standard Code for Information 
Interchange, is a very specific code for representing text.  A 
*fraction* of that code can be translated without difficulty to 
virtually all computers, including the fifty-two letters of the English 
language (both upper and lower case), the ten digits, and a handful of 
punctuation marks.**  But the rest of ASCII--some punctuation marks and 
special "control" characters used by computers (including the common 
tab!)--are off-limits if you want your message to have a truly global 
reach.

  **  It is important to remember that some files that need the full 
ASCII repertoire, e.g. the source code of computer programs, may not 
travel well.

Anyway, if you want a file that looks good in print (criterion 2) and is 
also a plain text file (criterion 1), you *have* to give up criterion 3 
--the file will not look good on the screen.  This is because creating 
"laser quality" output with a "book quality" appearance requires printed 
commands, called =markup=, to be interspersed with text.  This can be 
minimized, but the cost is a text without even the visual effects that 
are possible with a typewriter--underlining, superscripts and 
subscripts, and diacritic marks to name a few.  If you want these 
effects and others typical of book-quality printing--multiple fonts, 
automatic footnotes, and so on--then the markup burden makes the file 
unpleasant to read, i.e. not effective as E-text.

Finally, the question of how to get as close as possible to satisfying 
all three criteria and a discussion of formats and markup is left to 
next chapter.  There has been some success at creating files that look 
good on the screen and in print--using so called WYSIWYG ("what you see 
is what you get") word processors or using SGML ("Standard Generalized 
Markup Language")--but these are not in universal use.  I.e., you have 
to give up criterion 1 to get numbers 2 and 3.

So right now the plain truth is you can have any two out of three but 
not all three.  Sorry.


<Section 1.3>  Differences Between E-text and Print Media

Creating a manuscript on a computer is quite a different process from 
the old fashioned method of revising a manuscript by (literally) cutting 
an pasting typescript, of maintaining bibliographies, and of checking 
spelling against a dictionary.  Most of the peculiarities of using a 
computer are true whether or not the output is meant to be the printed 
page.  Nevertheless, it helps to enumerate a few, since these 
considerations apply in spades to producing E-text:


  o  Computer aided Research and Organization Methods

Note taking, creating bibliographies and databases, and gathering 
information now involves all the techniques discussed in my course 
notes, _EMAIL 101_.**

  **  Available free from:

          //mrcnext.cso.uiuc.edu:/etext/etext93/email025.txt.  

  o  Rigidity of Format and Outlining

Word Processing programs enforce a formatting and outlining discipline 
to a degree that would be unusual in the old style.  Outlining 
encourages a strict hierarchical style and the automated formatting 
features make for a more rigorous observance of whatever conventions are 
built into the program used.  

Rigorous formatting is a virtual requirement for E-text, since it is 
otherwise impossible for programs to tell where a chapter starts, say, 
or what portions of the text are italicized.

Spellchecking programs are another example of (welcome?) rigidity of 
style imposed by the new methods.  Rigorous spelling is what enables the 
SEARCH command to find all references to a given subject.

  o  Incomplete drafts are more likely to be circulated

The ease of making changes leads to a more collaborative style of 
working in which draft after draft (not uncommonly 10 or 20) is 
circulated to a large group for comments.  Often documents are *never* 
final, but are instead continuously revised.  It is useful to compare 
this process to the way computer programs are written:

  First a trial version, or "alpha" version is circulated to a few 
select individuals.

  Next a beta version, mostly complete and supposedly correct, is given 
wide circulation as a trial balloon.

  Finally there is a succession of ever more refined upgrades ranging 
from minor changes to major "releases".

It is a good guess that most working documents will be produced in a 
similar way.  In a way, this is similar to the print industries 
"editions" and "printings", except, like the manuscripts themselves, 
there is more consciousness of the structure of the process when 
computers are involved.  Also, the cost of producing minor revisions is 
less, so there is less fanfare for a new edition--and more trouble with 
version control!

  o  Collaborative efforts are easier.

When the drafts we have been discussing are circulated by E-mail, the 
working style discussed above becomes even more natural.  

  o  Backup copies are necessary

Although one might take the precaution of xeroxing an important 
manuscript, failing to make backups of works stored in magnetic media is 
sheer folly for anything that takes longer than 15 minutes to write.
There is a whole new discipline of saving work frequently to disk, 
copying it to backup floppies or tape, and so on.


<Section 1.4>  Version Control

The problem of multiple versions is a big one any time the revision 
process is easy or frequent.  Most computer systems keep track of the 
date a file was last modified--so you can tell which of seven files.**  
But even time stamps won't help if some files are exact copies of 
others--as they should be if you are doing proper backups.  It helps to 
use version numbers like "3.1.5a" to distinguish the multiple copies.  

  ** (three on various floppies, one in a directory "/project/old" and 
two in directory "/project/new", is the most current)

As with any tree structure,** it is often good to use =dotted decimal= 
notation:  Version 5.18.2 means release 2 of minor revision 18 of major 
revision 5.  Version 0.1 is probably a rough draft.  

  ** This concept is discussed in part III of my course, _EMAIL 101_.

You have to be careful:  this notation can either represent successive 
versions or divergent versions.  For example, 1.4.3 can mean the third 
minor change to version 1.4, which was the fourth major change to 
version 4.  This is the most common scheme.  It provides an odometer-
like method of numbering the versions.  It differs from an odometer in 
that you are not forced to increment the next place when you get to the 
tenth revision.  As long as the revision path is a straight line, with 
each version being derived from the version before it, this scheme will 
work.

It gets into trouble if there are any branches in the revision path.  
Suppose two versions, [a] and [b], are both derived from 1.2.  Does 
1.2.1 refer to [a] and 1.2.2 to [b]?  This is a natural way to describe 
*branching* versions, i.e. with a tree notation, but you can't use both 
schemes simultaneously.

It's a good bet that Version Control software--programs that keep track 
of multiple versions, store them as "deltas", or difference files, to 
save space, and allow you to recover *any* past version or display 
differences between versions--will become more common and integrated in 
word processing software.


<Part II>  Specific Differences of Style and Mechanics

This part enumerates some of the differences between E-text and print 
media and discusses them in a general way.  Actual recommended practice 
is deferred until Part III, which takes the form of a conventional style 
manual.  

In the long run, the reader will find the material in this part more 
valuable than the style manual.  The manual, after all, is only one 
possible concrete realization of the principles discussed here.  It is 
better to give thought to these principles in the context of your own 
writing than to slavishly follow the manual.

  =Section 2.1=   Differences Traceable to Physical Media

  =Section 2.2=   Differences in Style

  =Section 2.3=   Differences in Process

  =Section 2.4=   Differences in Repertoire

  =Section 2.5=   Differences in Layout

  =Section 2.6=   Searching and Hypertext

  =Section 2.7=   Copyright Issues

  =Section 2.8=   The Parts of a Book

  =Section 2.9=   The General Theory of Markup (SGML)

  =Section 2.10=  Summary:  Basic Tricks of the Trade


<Section 2.1>  Differences Traceable to Physical Media

The basic differences between E-text and print can be traced to the 
physical differences of the media, and to the fact that the needs of the 
human reader and computer must coexist.**  

  (1)  The human's need for visual relief within a 24 line frame.

  (2)  The computer's need for a rigid hierarchy and consistent 
spelling.

  (3)  The limitations of the =character set= available

  (4)  The limited possibilities of different renderings of the 
characters, e.g. by font and placement on the page; and 

  (5)  a consequent dependence on =delimiters= and structure for 
rendering.

Taken together, these factors account, in the first instance, for most 
of the differences that their are.between E-text and print.  In this 
Part we will primarily be working out the implications for representing 
text and developing techniques for dealing with the limitations.

  **  I thank Michael Hart for pointing out this second requirement to 
me.

The small viewing window of E-text--commonly 24 lines and often less--
has a number of consequences.  Combined with the fact that moving around 
within a document requires one of:

  scrolling (moving a scroll bar with a mouse);

  paging (hitting a single key, such as "return", repeatedly); or

  searching (using special commands to find sequences of characters);

we can see why E-text is hard to navigate, or, as I say, E-text is less 
perspicuous than print.  I think this limitation of the medium is a 
greater bar to its widespread acceptance than visual resolution.  

This limited window and lack of perspicuity has a number of immediate 
consequences for writing style:

  (1)  paragraphs must be short enough to present at least one break in 
any given 24 line window.  

For practical purposes paragraphs much over 10 lines are anathema.  This 
means that the flow of thought must be broken up on a finer scale than 
is common in print--though not, perhaps, as radically as in newspapers.

  (2)  E-text is much more linear than print.

Signposts, such as enumeration and other cues, organization, and 
arrangement in sequence are much more critical.  The trick is to 
structure your argument so that the mainline reader can read it in 
sequence.  Side-trips have a much higher penalty for the reader than in 
print.

This statement, that E-text is more linear than print, seems to go 
against the promise of "hypertext", i.e. documents in which you can skip 
around to your heart's content.  In fact, it is precisely because E-text 
is so linear that hypertext is important.  It makes navigating E-text 
manageable.  

The high penalty for skipping around (or passing through long sections) 
has a number of other implications:  

  (1) tables of contents should be distributed throughout the text, as a 
sort of preview of the following section.

In effect, these tables become "hypertext menus", allowing the reader to 
locate the appropriate section with a SEARCH command.  This gives as 
much aid as possible to the reader.  However, if the text is long and 
there are many logical levels, then the full table of contents should be 
provided at the *end* of the document (Not the beginning! We don't want 
the reader to have to scroll past a very long table to begin reading.).  
The Table of Contents is discussed at length in =Section =.

  (2) footnotes should be located immediately after the paragraph to 
which they refer.

An E-text is logically a scroll.  There is no such thing as a "page", 
except as a arbitrary marker added to synchronize the E-text version 
with a print version.  Because of the small viewing window, the only 
place you can put a footnote is after the paragraph.  In effect, it 
becomes a "small print" section with added detail.

  (3) bulleted lists should be relatively short and should not turn into 
full-fledged tables.  

Instead, they should be broken up into sub-lists if possible, with no 
more than ten items in one run.  Tables and long lists should be placed 
in appendices or separate files unless they are exceptionally compact or 
unless viewing them is necessary to the flow of ideas.


<Section 2.2>  Differences in Style

The most marked characteristic of E-text style is brevity.  We have 
already commented on brevity of paragraphs.  The same can be said for 
the overall work.  200k, or about 70 printed pages, is already quite 
long.  A larger work should probably be broken up into 100-150k 
segments.  

Two other stylistic characteristics are =hierarchy= and =rigidity=.  
Given that computers, in popular culture, are often associated with 
mindless authority or fascism, these are not promising characteristics 
for the would-be writer of E-text.  The words "hierarchy" and "rigidity" 
are just convenient labels, however.  We could use more complimentary 
terms, such as "logical organization" and "consistency of style".  

In any event, the hierarchy and rigidity apply to the formatting and not 
the ideas expressed.  Qualities such as brevity, an organizational 
structure that helps the reader, and consistency in spelling, grammar, 
punctuation, and layout, are generally accounted hallmarks of good 
style.  In fact, their is a trend in print media towards these 
qualities, as well as towards shorter paragraphs, perhaps occasioned by 
the widespread use of computers for preparing printed text.

It might be said that the style advocated is essentially that of 
journalism and the classic pyramid scheme for writing newspaper 
articles.  This is true to a point.  E-text, however, is much more 
linear than a newspaper article.  Above the article level the typical 
newspaper is a jumble of many articles bundled together in a very large 
package.  The E-text equivalent of a newspaper will almost certainly be 
a large number of separate files, indexed and arranged in a directory 
hierarchy.**  Long files purporting to be E-journals are very tedious to 
read, precisely because they violate the brevity maxim.

  **In fact this is the case with Usenet Newsgroups.


    Another stylistic difference is repetition.  Saying the same thing 
in different contexts, even verbatim, is more acceptable in E-text than 
in print.  Since it is harder to navigate E-text, repitition** saves the 
readers time looking up references.  Material that is repeated in 
several places is a good candidate for a footnote or "small print" 
section. 

  ** Repetition is a technique widely used in computer programming to 
save the time needed to follow up a reference.  In this context it is 
called "in-line coding".

On top of the major stylistic differences, there are numerous minor 
points of grammar and markup (punctuation) that are covered in Part III.  
These are almost at the quirk level, and have little effect on style 
=per se=, so we don't consider them here.


<Section 2.3>  Differences in Process

Electronic text and printed text created on computers are prepared in a 
different fashion from print.  E-text typically passes through more 
stages and is in a rougher form than print.  This does not prove that 
print is a superior medium because the product is more polished; rather, 
the capital investment required to produce *any* edition is so high that 
intermediate drafts are too expensive to circulate.  E-text creation is 
more collaborative and not punctuated by such monumental milestones as 
"first draft to printer" or "second edition".  The stages tend to be so 
incremental as to blend into each other.

Thus, "publishing" an incomplete or rough draft is appropriate for E-
text.  The medium seems to invite statements that "this section is under 
construction".  I call this the =cathedral model= of text production.  A 
premium is placed on the execution of one's art, collaboration among 
successive "generations", and grand design, but the product itself is 
never really finished.  The stages of the E-text production process are 
discussed at greater length in =Section 2.3=.

The pervasive sense of hierarchy in E-text affects the writing process.  
You might think that the rigid hierarchy leads to a top-down process in 
which each section is outlined in excruciating detail and the writing 
fills in the gaps.  In fact, the actual process is a combination of this 
and a bottom-up one in which sections are created piecemeal and tacked 
together as ideas emerge.  The ideal working style, like that of 
building a cathedral, works from both ends.  There is both a grand 
design (far more ambitious than what the author can produce at the 
moment) and whole sections that are created of a piece.  Unlike 
cathedrals, the parts can be re-organized with ease after construction.


    There is another respect in which the E-text production process 
differs from its print analogue.  In E-text, self-publishing is the 
norm.  The low capital investment, both in equipment and training 
required to create the text, all points to self-publishing as the most 
economical distribution method.  The traditional segmentation into 
author, publisher, printer, distributor, follows the logic of the print 
production process.  E-text needs only an author and a distributor--the 
distributor being a friendly archive site or bulletin board.  

Print media can use this same simplified distribution scheme *if* it is 
in electronic format.  It is important to differentiate between 
distributing E-text and distributing files that are intended to be 
printed.  The later are likely to have special markup, commands, or 
formatting codes.  Often they are binary (i.e. not text) files.


    Since E-text is easily copied, far more so than text locked up in 
proprietary formats, it presents a problem for compensating the author.  
There are four suggested compensation schemes:

  (freeware model)  no compensation--the text is either in the public 
domain or copyrighted but with a license for free distribution

The advantage of this model is that the work gains the widest possible 
distribution.  Without fee, license, or undue copyright restriction, the 
work travels wherever it is wanted.

  (shareware model)  distribution is unrestricted but there is a 
licensing fee for use.

This is an elegant solution to the compensation problem.  Its reliance 
on the honor system has drawbacks, however.

  (proprietary model)  distribution is restricted by licensing and 
copyright.

This is the common method for distributing commercial software.  In 
effect it assimilates E-text to print media by artificially taking away 
the natural ease of copying E-text.

  (patron model)  the work is commissioned and paid for by a patron--a 
university, government, or other buyer.  Since the work is paid for by 
the patron, distribution can be free or by any of the other methods.

In fact, the patron model is the common, since royalties.  Thus cries 
that free distribution of E-text will destroy intellectual property are 
have little merit.  In fact, except in the commercial world, 
intellectual property has little market value and is almost always a 
public, not a private, good.


<Section 2.4>  Differences in Repertoire

In addition to physical, stylistic, and process differences, E-text has 
a different repertoire of visual techniques--and consequently different 
problems.  The major problem is the limited number of characters.  
Unlike even the typewriter, E-text is limited to letters, numbers, and a 
few punctuation characters *in a single font*.  Print-oriented word 
processors eliminate these restrictions, of course, but they remain for 
E-text.

In addition, the visual effects are more limited even than the 
typewriter's.  Super- and subscripting are not possible, and certain 
layouts involving lots of vertical space are ill-advised.**  Finally, 
graphic images are presently hard to include with text--at best they are 
separate files distributed with the text and viewed with difficulty--and 
such visual effects as parallel columns, and tabular layout do not work 
well.  They are not very robust in the E-text environment.

  ** more on this below, in =Section 2.5=.

The solutions to the character repertoire problem is to extend the 
character set by a number of techniques:

  o  escape characters, 

  o  delimiters, and

  o  tags.

An =escape character= is a rarely used character, such as the ampersand 
or percent sign, that indicates the next character or characters is not 
to be interpreted literally but as a symbol for some other character.  
In effect, it acts as a sort of shift key to shift the character set.  
Thus, "&e" might represent a Greek epsilon instead of an English [e].  

=Delimiters= are pairs of characters used to mark off text.  The equals 
signs I have been using in place of italics are delimiters.  So are the 
asterisks I use if I *really* want to emphasize something.  Delimiters 
are so-called because they serve to "delimit" the text they enclose.  
This strategy, widely used in E-text, replaces *rendering* by 
*delimiting*.

A final technique is =tagging=.  Tagging is discussed at length in 
=Section 2.9= on markup.  It extends the repertoire of delimiters by 
combining delimiters and escape characters in a construct called a 
<tag>.  The tag is a logical unit that indicates an entity or logical 
unit ("element") in the text.

The character repertoire problem becomes most acute when different fonts 
or formulae are needed.  Fonts are effectively handled by the techniques 
discussed above, but formulas are a very sticky problem.  Probably the 
only solution is to realize that the notation we use for formulas grew 
up in the handwritten environment.  It has been brilliantly adapted to 
print, but it's adaptation to E-text is new and awkward.  All we can do 
is let notation for formulas occurring in E-text evolve *without 
reference to their print analogues*.  The solution is not to

  (a) give up and wait for multimedia; or

  (b) to use print-oriented markup as an interim solution.

Programming languages have of necessity experimented with representing 
mathematical formula.  As E-text communication becomes more common, 
conventions *will* evolve that are elegant and empower, rather than 
hinder, communication.  Some suggestions (and they are only that) for 
mathematical notation are contained in Part III.


<Section 2.5>  Differences in Layout

Layout of text on the page is one of the major differences between E-
text and print media.  Naturally, this consideration is dominated by the 
small viewing window of E-text.  In E-text, the paragraph, not the page, 
is the fundamental frame of reference for the reader.  

  o  footnotes, as mentioned above, should be placed at the foot of 
their *paragraphs*;

  o  manifestations of hierarchy at the chapter level or above do not 
need the differentiated rendering (special indentation, typefaces, 
capitalization, and the like) that they have in print media.  

Instead high-level headings are optimized for searching, using a 
consistent numbering scheme such as dotted decimal (e.g. 3.5.2)--or else 
replaced by breaking the document into separate files.


    Vertical and horizontal space is less important visually, because 
the reader is conceptually "closer" to the text and unable to appreciate 
such effects as indentation and vertical spacing.  In particular:

  o  paragraphs should not be indented except to mark structural 
features such as:

    list items;

    sub-paragraphs;

    "small print"; and

    minor section breaks.

Minor section breaks should have a larger indent than list items, to 
distinguish the two.

  o  vertical spaces beyond five blank lines or so are an annoyance.

  o  lines printed with deep indentation in print media, e.g. letter 
signatures, date and place of writing, and run-on lines in poetry, 
should use some other device to set them off.

  o  unlike print, the visual effect of a block of text carries less
     weight in E-text.  Consequently you should not go to great
     efforts to block text by hand like this paragraph--your
     efforts will be wasted in a proportional font anyway.

Just as pushed margins are to be avoided in E-text, so attempting to 
line up blocks of text in list items should be avoided.  While this 
visual effect works well in print, it is actually harder to read in E-
text.

On the whole, vertical and horizontal spacing merges with formal markup 
in E-text, so that it becomes just one more way of delimiting text.  Its 
role in creating visually pleasing forms is very muted in E-text.  Since 
the reader is so close to the "painting", the effect, which depends on a 
certain distance, is lost.  E-text is not a medium that lends itself to 
impressionism.  

Combining "white space" role with the delimiting role is very much an 
art.  Functionality and minimalism are the main virtues of this art.  
Mostly, it is a matter of being sensitive to the different needs of E-
text and print, and avoiding elaborate markup that mimics print 
techniques that have little meaning for E-text.


    Tables, multiple columns, and the like do not adapt well to E-text.  
Although you might think that E-text is the medium =par excellence= for 
tabular material, tables--perspicuous as they are in print--are very 
difficult to navigate in E-text.  They tend to be long *and* wide.  This 
is especially true of double-spaced tables common in typewritten text.  
Also, E-text tables are difficult to transport and maintain, since 
whitespace is the most unstable part of E-text.  Various programs may 
trim, condense, and reinterpret spaces, tabs, and returns.

A far better solution to viewing tables is to treat them as 
spreadsheets.  Spreadsheet programs, unlike word processors, are 
optimized for viewing tables.  I would rather have an table in Comma 
Separated Value format that I can cut and paste into a spreadsheet 
program than one formatted with spaces.**

  ** Admittedly, some, but not all, spreadsheet programs can handle 
space formatting.

If you are tempted to include a long table in E-text, try to observe the 
following:

  o  Put tabular material in appendices or in a separate file so the 
reader is not forced to traverse it. 

Last ditch:  tell the reader how to jump over it, if it absolutely must 
interrupt the flow of text.

  o  Redesign the data structure of the table so that it is as narrow as 
possible, e.g. by breaking it into several logical units--sub-tables--
that can be related by an =index= or =key= column.

  o  Tabular material should have field delimiters other than spaces.  
Commas are something of a standard, as are tabs, if portability is not 
an issue.

Very often, a table that looks good in print has to be redesigned 
altogether for E-text.  You should constantly ask yourself *why* the 
table is effective.  

  Does it have to be a table at all or is it really a list in disguise;

  Does the tabular arrangement make the right comparison;

  What is the main relationship a user will look for in the table?

One example of an organizing principle useful in print but less so in E-
text is alphabetical ordering.  An alphabetical list is very effective 
in print because it aids searching.  It is also effective in E-text that 
has to be *modified by hand*.  But it is not effective in E-text that is
meant to be searched, because it gives up the chance for an alternate 
organization of the material.

Besides tables, layout effects such as =parallel columns= should be 
avoided altogether in E-text.  The likeliest result is that the text 
will be corrupted and rendered unreadable by a program somewhere along 
the line.

E-text has very different visual needs from print.  These are strongly 
reflected in layout design.  Writing visually appealing E-text requires 
a conscious effort to meet the needs of the E-text medium on its own 
terms.  Whitespace, markup, and structure are all handled differently in 
this new medium.


<Section 2.6>  Searching and Hypertext

We have already discussed the SEARCH capability of E-text on a number of 
occasions.  In the present section we tie some of these strands 
together, the most important of which is that

     The author must be constantly aware of the need of the reader to 
navigate their text by searching.

This imperative leads to a pervasive tendency in E-text:  all manner of 
references, cross-references, and indexing are replaced by a single 
concept, the =pointer=.  The pointer is a sequence of characters that 
allows the reader to find the reference.  This reference may be in the 
present file, somewhere else in the same computer system, or in print.  
In print, pointers take the following forms:

  o  cross-references (e.g., See page 37.  See also "Dinosaurs").

  o  glossary and index references

  o  bibliographic citations

  o  mailing and telephony addresses

  o  subject classifications and shelf locations

To these the electronic medium (including E-text) adds:

  o  network and other information retrieval references

  o  hypertext links and menus

In E-text, the mechanism of pointing is the same for all these 
categories, and =consequently the syntax should be the same also=.  
Merely mimicking print forms of expression, with its elaborate 
formatting rules for footnotes and bibliographies, obscures the 
underlying unity of the "pointer" notion.  In print, the visual 
differentiation cues the reader in to the process required to resolve 
(look up) the reference.  

In E-text, great efforts should be expended to make the lookup process 
the same for all manner of references.  The main practical distinction 
is between internal references and external ones.  Part III discusses 
this topic in greater depth.


<Section 2.7>  Copyright issues

The most prominent characteristic of E-text, the ease with which it is 
copied, leads to endless copyright headaches.  Even the simplest E-text 
is likely to sport a copyright, even if the author wishes to distribute 
it for free, since otherwise who will know that it's for free?

Here we just present a few basic copyright concepts:

  o  Everything you "fix in a medium" (e.g., type into E-text) is 
copyrighted, whether or not you have a notice, which merely announces 
the fact that you have a copyright; or whether you have a registration, 
which is legal evidence of your rights.

  o  *WHO* has the copyright is complicated.  Usually the author; but it 
could be their employer.

  o  Copyrights include the right to (1) copy, (2) distribute, (3) 
display publicly, and (4) create derivative works.  For other rights, 
such as the right to sell these rights to other people and so on, 
consult a legal manual.

  o  Copyrights, claimed or otherwise, remain in effect for a *long* 
time.  The "public domain" ends around 1917, with rare exceptions.  If 
you *place* your work in the public domain, that's another matter.

  o  A compromise between retaining all your copyrights and is a 
"freelore copyright"++ like this manual's.  You retain a copyright but 
let others copy and distribute your work for free.  

This is the preferred approach unless you think your work has commercial 
value--or if you want to restrict distribution.  It lets the work 
circulate widely and, most importantly, gives permission to do so 
without losing the work to the public domain.

You cannot use this method if you want others to be able to produce 
"derivative works", however.  For that, the Public Domain is your only 
choice.**

  ** You could try to write an elaborate general public license, but 
with few exceptions it is not worth it.  Software source code and 
educational curricula are likely exceptions to this rule.

  ++ The term "freelore copyright" is not a legal term.  In programming 
circles you will sometimes hear it called a GNU-like copyright, after 
the GNU project, the first programming project to make extensive use of 
a non-restrictive copyright for copyrighting software.


<Section 2.8>  The Parts of a Book

In this section we take a brief tour of the typical book and make a few 
observations along the way.

The front matter of an E-text differs somewhat from its print cousin, 
the main virtue being brevity.  No reader wants to scroll through page 
after page of apparatus to get to text.  In a book, it makes a great 
deal of sense to put tables and reference material at both ends of the 
book, these being the places one can find most easily.  They are also 
easy to reach in E-text, but most likely the reader wants to begin 
reading quickly, so the front of the work, at least, is forbidden 
territory.

An E-text should have the following frontmatter:

  o  cataloguing information (the title, author's name, preferred name 
for the text file, subject classification, and how to get an electronic 
copy, since the reader may, after all, be looking at a printout);

  o  an advertisement, abstract, or teaser to entice the reader;

  o  copyright information or terms of use (if too lengthy these should 
be placed at the end with a pointer too them after the copyright 
statement itself);

and THAT'S ALL.  Do not ask your reader to scroll through more than 
this.  Tables of Contents and the like belong in appendices at the end 
or in another file.

Whether or not you have an official Table of Contents or other indexing 
material in an appendix, you should, at the beginning of each major 
division, have a list of the contents of that section.  You can think of 
these as "menus".  A merged version of all these local menus is needed 
so the reader does not have to search through the entire document to get 
an overview; neither should the reader have to scroll back and forth to 
the beginning or end for help navigating.  E-texts thus always have 
*two* tables of contents.

The body of E-text is much like that of a print work, except for 
comments pertaining to length, the pervasive sense of hierarchy (no more 
than three local levels!), and the placement of footnotes after their 
paragraphs.

The endmatter is likely to contain tables and bibliographies and 
multiple indices, the *very last* of which is the Table of Contents.  
The end of an E-text file is a very special place, because it is an easy 
place to find; yet, unlike the front, few readers start there.  It 
should thus be the location of the most important navigation aid for the 
document.  Normally this is a full hierarchical list of the document's 
contents with pointers back to the text.

With E-text, the notion of a Table of Contents and an Index is blurred.  
In a book, the index really serves two purposes.  It takes the place of 
the SEARCH command in E-text, except that not every word is indexed 
(barring the existence of a concordance, of course--a luxury in print).  

It also serves as a schematic and *alternate* representation of how the 
text might be organized.  Most works could have been organized 
profitably in more than one way.  One way is fixed by the linear 
organization of the text.  The Index provides an alternate organization.

E-texts do not really need the first form of index.  Computer programs 
make their own search indices with lightening speed.  In effect, you 
have a concordance for every document.  Alphabetic indices are of little 
use.  Not even hypertext programs can navigate them well.  Either they 
present a menu with 26 entries (too long!) or else you have to go 
through two levels to get to your entry ("select A-G").  Even glossaries 
are best arranged by topic and not alphabetically, since the 
alphabetical order is irrelevant to the SEARCH command.**

  **Not quite true: very long indices profit from "clustering", or 
physical arrangement in search order.

Unless there is some reason that browsing topics in alphabetical order 
might be interesting in itself, you shouldn't bother.  Notice that this 
is very different from electronic print, where the computer should 
always be used to create an elaborate print index in the final print-
out.

The best way to think of an E-text index is as an alternate topical 
organization of your work.  It is especially useful if there are two (or 
more) *hierarchical* ways to approach your subject.  Your layout can 
only show one way--echoed in your Table of Contents.  The others have to 
be represented by an index.


<Section 2.9>  The General Theory of Markup (SGML)

The International Standards Organization (ISO) has developed a very 
flexible standard for marking text, Standard Generalized Markup 
Language.(SGML, or ISO-8879).  SGML has a very flexible syntax for 
describing the logical structure of documents.  Its drawback is that, 
like markup languages that are intended for print media, the burden of 
the markup makes the text unreadable.  SGML goes a long way towards 
creating a text that can look good on paper, on the screen, or to a 
program.  The problem is that SGML software is not widely available, so 
although SGML files are portable and *potentially* useful, there is 
little use for them as yet.  A widely available 

SGML Tags are extraneous material used to mark a section of text.  Along 
with delimiters, they comprise the markup added to a text.  A program 
that uses the marked up text has to recognize delimiters and find the 
tags.  Since we are more or less following SGML, the tags themselves are 
delimited by angle brackets, like this:  

     <outline>

The word "outline" is the =generic identifier= (GI) of the tag.  The 
left angle bracket is the Start-Tag-Open delimiter (STAGO); the right 
angle bracket is the Tag-Close delimiter (TAGC).  Ending tags look like 

     </outline>

The sequence "</" is the End-Tag-Open delimiter (ETAGO).  The end tag 
ends with TAGC, just like the opening tag.  Thus paired tags themselves 
become themselves a sort of delimiter, albeit at a higher level than the 
delimiters they are built out of.  They serve as a sort of named 
parentheses to represent the "nesting" structure of the document.

Here is the hierarchical structure of the document represented as an 
outline:

     outline

       chapter 1

         section 1.1

         footnote 1

       chapter 2

         section 2.1

         footnote 2

         section 2.2

In parenthesis notation (a common mathematical device), the same 
structure looks like this:

   (outline (chapter 1 (section 1.1) (footnote 1) ) (chapter 2 (section 
2.1) (footnote 2) (section 2.2) ) ).

Maybe this is a bit more clear:

   (outline 
      (chapter 1 
         (section 1.1) 
         (footnote 1) 
      ) 
      (chapter 2 
         (section 2.1) 
         (footnote 2) 
         (section 2.2) 
      ) 
   )

The parenthesis notation allows the tree structure of the outline, which 
used to be represented only by the indentations, to be faithfully 
represented even when the indentation is lost.  I.e., we have a flexible 
method of representing tree-structures in *running text*.

The parenthesis are delimiters whose purpose is to make clear the 
nesting structure of the textual =elements=.  If we think of SGML as 
having "named parentheses" with <tag> being a left (opening) parenthesis 
and </tag> being the matching closing parenthesis, we have:

   <outline>
      <chapter 1>
         <section 1.1> Section 1.1 text ... </section>
         <footnote 1> Footnote 1 text ... </footnote>
      </chapter>
      <chapter 2>
         <section 2.1> Section 2.1 text ... </section>
         <footnote 2> Footnote 2 text ... </footnote>
         <section 2.2> Section 2.2 text ... </section>
      </chapter>
   </outline>

Notice the exact match between parentheses above and tags.  Each text 
element is clearly delimited.  The section and footnote numbers only 
appear in the opening delimiter as =attributes=.  They would be 
redundant in the closing delimiters.  

The reason for the funny names, STAGO, ETAGO, TAGC, and GI, is that SGML 
actually has an =abstract syntax=.  The delimiters "<", ">", and "</" 
could be any symbols at all (within reason).  The choice shown here is 
called the =Reference Concrete Syntax=.  It is a particular choice for 
the abstract syntax of SGML.  In practice, you will almost always see 
the standard choice.

In addition to the tagging of elements, SGML has a very general facility 
for including text and making the sort of references we discussed in 
=Section 2.6=.  An =entity reference= is meant to be replaced either 
with a character or with the contents of a file.  It starts with an 
ampersand (and-sign) and ends with a semicolon.  Thus &file1; means 
include file1 here.  And if you can't type an "e" with an acute accent 
on your keyboard you can use &eacute; to get the same effect.  Of course 
your entities have to be defined as part of your document's =entity 
set=.  SGML provides a way to do this.

SUMMARY:  We have introduced the basic ideas of SGML:  representing the 
=logical structure= of textual =elements= using =tags= as delimiters; 
the various parts of an opening and closing tag; entities for external 
references and character substitution; and the notion of abstract vs. 
concrete syntax.  These ideas are useful in developing notations and 
markup conventions.


<Section 2.10>  Summary:  Basic Tricks of the Trade

This part has covered a lot of ground.  Creating E-text that is visually 
pleasing and communicates effectively is an art.  Some themes, driven by 
the nature of the medium, recur over and over.  I have summarized these 
as a series of Tricks of the Trade:

  TRICK 1:  Replace visual rendering with delimiters and other markup, 
but be sparing.  The minimalist wins this game.

  TRICK 2:  Use a tree structure no more than three levels deep for the 
basic hierarchy.

This trick goes hand in hand with the next:

  TRICK 3:  For more levels of hierarchy use data hiding techniques.

The point here is to remember that the reader has an unnaturally narrow 
window on a very wide world.  To avoid giving the reader the sense of 
being helplessly lost, you *must* make an effort to keep the relevant 
portion of reality small and easy to navigate.

  TRICK 4:  Use pointers to fill the roles of notes, cross-references, 
bibliographic citations, hypertext links, etc.

Pointers are a recurring theme in computer science; they serve to unify 
a whole series concepts that are visually distinct in the print media.  
They are used to implement hierarchy and to allow "nonlinearity" in the 
text.

  TRICK 5:  Think less in terms of traditional categories like "Table of 
Contents" or "Index" and more in terms of data structure.

This trick follows naturally from the observation that logical structure 
and not its rendering in a particular system should be primary.  This is 
a prerequisite for communicating with readers using *you know not what* 
software or device.

  TRICK 6:  Use escape characters and tags to extend character set and 
delimiter repertoire respectively.

  TRICK 7:  Formatting, rigorous markup that looks like visual layout, 
can meet the needs of humans and computers.

The trick is to rigorously use sequences of characters (especially 
"white space" like carriage returns and spaces) to create what appears 
to be visual formatting.  This simultaneously satisfies the human and 
the computer.  This is a nice trick, but hard to carry too far.

In all things use moderation.  Being too clever or too idiosyncratic 
usually marrs the effect for little gain.  As always, the main trick is 
to hide the effort that goes into the art, making the difficult look 
easy.


<Part III>  A Very Brief E-Text Style Manual

  =Section 3.1=  Backups and Saving Work

  =Section 3.2=  Compressed Files

  =Section 3.3=  Version Control

  =Section 3.4=  Use of Word Processing Features

  =Section 3.5=  Character Set and Font

  =Section 3.6=  Outlining and Hierarchies

  =Section 3.7=  Text Inclusions

  =Section 3.8=  Esoterica

This chapter is meant as a concrete example of the suggestions in the 
previous two chapters, in the form of a "style manual".  You should take 
these guidelines as suggestions you may want to adopt, not as rigid 
rules.


<Section 3.1>  Backups and Saving Work

RULE 1.1  You should always keep two copies of any electronic text you 
would mind not having one day, one on your hard disk and one on a 
floppy.  The floppy is far more likely to fail, so you should consider 
keeping two floppies.

A common scheme if you don't work at home is to keep two backups, one at 
home and one at work.  Alternate which one you revise so that you will 
always have the most recent one at home and the next most recent at 
work.  This is so that if a fire or other disaster destroys your work 
records you still have the most recent copy.

If you work at home, make sure your two sets of backup disks are in 
different places.  That way an accident with a strong magnetic field 
(found near motors, in telephones, in TV monitors, etc.)--or a spilled 
cup of coffee--will not wipe out both copies.

RULE 1.2  (Archive copies) You should have *both* an archive copy of 
each important "milestone" version *and* a set of backups.  Backups are 
usually snapshots of your system.  If you delete a file from your hard 
disk and then revise your backup, you will no longer have the file on 
your backup disk!  Even if you put your backup set aside from time to 
time as an archive of "My System, December 1992", One day, you will 
decide to recycle those disks and lose your copy.

The Moral:  you need both an archive copy of each important project and 
a revolving set of backup disks.  (I call mine "A" for archives and "B" 
for Backups).

Checklist:

  o  original working copy on hard disk

  o  second copy on hard disk for really important files

  o  daily archive of important work, organized by project

  o  most recent revolving backup set at another location (weekly or 
monthly; more often for critical files).

  o  second most recent backup set on site.

Remember the basics:  *at least* one backup and don't put your eggs all 
in one basket.  If you think I sound paranoid about this backup stuff, 
trust me.  Do this or you will get burned one day.  I know what I am 
talking about.

RULE 1.3  (Exception to Backups) An exception can be made for E-text 
that can be easily obtained over the network if you don't modify it and 
*if* getting a replacement would not be burdensome.  In effect, the 
network is your backup copy.  But beware that what is on the network 
today may not be there forever.

You can also "forget" about backups (but not archive copies) if your 
computer is on a local area network and you know for a fact that backups 
are made over the network on a regular basis.  Many businesses, 
recognizing that most persons would rather risk losing a months work 
than spend five minutes backing it up, make systematic backups, often 
using automatic systems that work at night, when the network is quite.
That is nice, but remember that you can still lose nearly an entire 
day's work if disaster strikes just before you go home for the day.

RULE 1.4  (Saving Your Work) Unless your word processor has an autosave 
and recover feature, you should develop the habit of saving your work at 
least every fifteen minutes and whenever you get up to leave your 
workstation.


<Section 3.2>  Compressed Files

It is possible to compress text files to around half their original 
size.  Of course, you have to uncompress them before reading, but in 
effect you can double your hard disk size with *software*.  File 
compression is becoming a standard feature of many programs and systems.  

Compression works because text has very regular patterns that can be 
encoded more compactly than the standard encoding.  Files that have more 
random bit patterns--binary files like programs or graphic images--
seldom compress more than a few percent.  

RULE 2.1  Never compress any file except a text file.

I would be a bit leery of compression.  It trades memory, which is 
fairly cheap, for your time, which is expensive.  Also, it complicates 
the strategies your software has to use--what happens if your system 
goes down in the middle of uncompressing an important file?  

Compression is here to stay, but I recommend you follow this rule:

RULE 2.2  Only compress things you keep around for archival purposes--
old reports and projects, things you want at your finger tips but don't 
use day-to-day.  

To give some further guidelines, compression makes a lot of sense in 
these cases:

  o  compressed files make good archive copies, at least if you are 
keeping the file to feel safe and not because you need it regularly.

  o  compressed files are good for network transfers because, for text 
files at least, they cut the time in half.

  o  more subtlely, there is a limit to the amount of hard disk you can 
safely use--you shouldn't use more than you can backup in 10 minutes a 
week or half an hour a month.  

If you make your own backups on floppies, that means that 80 Megabytes 
is about tops.  More than that and you have 100 backup disks (times two 
sets!) to deal with.  Probably you get sloppy.  File compression means 
you can get twice as much stuff on your disk without increasing the 
backup burden, so you save both time and space.  

SUMMARY  The most important thing to remember about file compression is 
that there is a trade off between time and disk space.  The fact that 
you can get twice as much on your hard disk is traded against the fact 
that it takes time to compress and uncompress files.  Given that memory 
is very cheap this is not always a good trade.  The most likely outcome 
is that by keeping too much useless stuff on disk you're setting 
yourself up to waste a lot of time.  


<Section 3.3>  Version Control

Version control is important.  It is easy to keep on top of *if* you 
bother.  If you don't, one day you will modify the second most recent 
version of a long manuscript and then have to figure out the differences 
between two variant documents and "merge" them into your next draft.  
Then again, you could give up all the work you did and go back to the 
old version.  Or maybe you would like to follow this rule:

RULE 3.1  Versions should be numbered consecutively in "dotted decimal" 
notation.  

E.g., 2.4.1 means version 1 of sub-version 2.4 of main version 2.0.  You 
can add the version number to the heading of the file or make it part of 
the file name.  This is hardest to do in DOS, where filenames look like 
DRAFT241.TXT meaning version 2.4.1.  

Version 0.1 up to, but not including 1.0, are reserved for "drafts".  
Version 1.0 is the first public release and Version 1.0.1 its first 
minor revision.

RULE 3.2  In general, the primary version of your work should be in the 
format your word-processing program considers to be "native".  Plain 
text files should be derived from this master copy.

Creating a plain text version and then re-importing it to the word 
processor will often result in problems.  The word processor's native 
format (the one that understands all the nifty features) is proprietary; 
i.e., it is not directly portable to other systems like plain text.  
Something is lost translating proprietary format to plain text and back 
again.  The most common problems are:

  o  A "hard" return is located at the end of every line, making editing 
difficult because you constantly have to adjust the length of each line 
by hand, or else use the "fill" command on each paragraph;

  o  Unusual "line wraps" result from incompatible line lengths;

  o  Lists of items that are supposed to be on separate lines are 
compressed into paragraphs;

  o  Visual formatting like the spaces or tabs before an indented block 
quote, vertical bars alongside paragraphs, and similar things are 
scrambled.

  o  Structures requiring elaborate spacing or tabbing like outlines, 
tables, or section headings are confused;

  o  Double and single spacing is mixed up.

  o  Special symbols and codes are no longer readable.

These problems cause severe version control headaches unless you follow 
the "master copy" strategy.

SUMMARY  Strategies for avoiding these problems in general are given in 
the next section.  But in general you can avoid them if you follow this 
basic strategy:

  (1)  Always consider native format the "primary" version and plain 
text the "derived" version.

  (2)  Never use any feature of your word processor that can't be easily 
translated into the plain text version.

The next two sections concentrate on just which features you can use.  


<Section 3.4>  Use of Word Processing Features

>From the standpoint of creating effective E-text it is extremely 
important to understand the following concepts:

  hard return :  a control character that signals the end of a line of 
text.  The actual code, an ASCII character, varies from computer to 
computer.  This is a source of many formatting problems.

  filling : many word processors are capable of adjusting the length of 
lines automatically in a process called paragraph filling.  This can 
either be automatic or on command.  In the older method, the line 
"wraps" when you reach the end, but if you make editing changes you have 
to select the "fill" command.  Newer word processors constantly refill 
the paragraph as you make changes, adjust margins, etc.

  formatting codes and markup : in order to represent all the effects 
you can create on paper using a text file, it is necessary to add 
additional characters that control the formatting of the document--
italics and underlining, fonts, superscripts, and the like.  These codes 
can be typed letters like ".cl" or "</p>"; or they can be "invisible" on 
the screen but nevertheless present in the underlying file.

  bit-mapped vs. character-oriented screens : The screen is represented 
in the computer memory as a series of black and white dots, called 
pixels ("picture elements").  There are two kinds of screens, those that 
can only represent characters and those that can draw any graphic shape 
(including any screen font ever devised, any line or shapes, patterns, 
and complex artistic images like photographs and computer drawings).  
Character-oriented terminals only have

  WYSIWYG : "What You See Is What You Get" is a strategy adopted by many 
word processing systems that run on bit-mapped terminals.  A single 
underlying file can create 

  font size and rulers : fonts in a traditional character-oriented 
screen are all the same size--usually 80 or 132 characters per line.  In 
a bit-mapped system the fonts can have any size.  Some fonts are fixed 
width, meaning that any character takes up the same width (and hence 
there are the same number in every line); in proportional fonts the 
characters have different widths.  The line expands and contracts when 
you change letters.  


    With these concepts in mind, we can discuss how to create E-text 
that is meant to be read as E-text.  The main problem is that not every 
word processor can read files from every other word processor.  The 
least common denominator is the plain text file, or ASCII file.  ASCII 
means "American Standard Code for Information Interchange".  The ASCII 
code includes the characters commonly found on an American typewriter 
keyboard plus some "control characters" representing actions like 
"carriage return" or "horizontal tab".  The issue of which characters 
you can use is discussed in the next section.

When using a word processor you have to be careful because it is not 
always obvious which features will come out well in plain ASCII.  Word 
Processors compete on the basis of their wonderful features.  Often, 
however, the fancy features you paid for cannot be used in the real E-
text world.  They are oriented towards producing pretty paper, but will 
*confuse* other computers unless they are running identical software.  

You will not be able to represent 

  o  bolding

  o  italics

  o  underscores

  o  superscripts

  o  subscripts

  o  indenting and margins (except by spacing--not tabbing)

  o  soft returns

  o  multiple proportional fonts

  o  double columns

  o  special symbols or formulas

  o  included graphics and spreadsheets

and so on and so on.  This means that you must forgo essentially 
anything that needs a formatting code.  In newer WYSIWYG word 
processors, it may be hard to tell what is formatting and what isn't.  
In general, you have to think like a typewriter.  *sigh*

RULE 4.1  Change to a non-proportional font, preferably 10 point (elite) 
or 12 point (pica) and 6 inch distance between margins.  This works out 
to 72 characters for elite or 60 for pica.

To be safe, lines should not exceed 72 characters; but in no event 
should there be more than 80 characters without a hard return.  

Actually, 60 characters per line (12-point non-proportional font with a 
six inch ruler) is more portable, because it can be read both on 
standard 80 character screens and with the default settings of most word 
processors.  If you use the 72 character line, some users may have to 
select the whole text and convert it to Courier-10 to read it in a 
WYSIWYG word processor without funny line wrapping.  Not all users are 
that sophisticated, so you are better off using a 60 character line 
unless you have a special reason to go with 72.  Also, short lines are 
easier to read, as you will learn in any speed reading course.

RULE 4.2  Don't justify the text, but keep all text "ragged right", like 
typescript.  

RULE 4.3  Don't hyphenate words.  Let the right column look uneven.

If someone is using a different screen width, your hyphenated word could 
end up looking like "this in the mid-dle of the text".  Also, SEARCH 
commands choke on hyphenation.  There is probably nothing you can do to 
prevent your word processor from breaking words that have "real" hyphens 
in them and happen to fall at the end of a line (remember this when 
*you* have to search).

RULE 4.4  Start text flush with the left margin and don't add spaces to 
create an indented effect.  Do not use indentation, tab stops, or 
spreadsheet like tables to format your text.

If you want to include spreadsheet data, use Comma Separated Value (CSV) 
text like this:

     "January Actual","January Budget" <hard return>
     23201.45,20000.00 <hard return>

You can cut and paste this into any Spreadsheet program.  

RULE 4.5  If you use the autofill feature to avoid having to type 
return, make sure your word processing program has a feature that will 
insert "hard" returns at the end of each line when you create your plain 
text output file.

In Microsoft Word, this is the "Save as Text with Linebreaks" command.  
If you use "Save as Text" you get returns at the end of every 
*paragraph*, not every *line*.  Someone with an old-fashioned text 
editor--one that likes hard returns after every line--will see *very* 
long lines (and probably truncate them to boot).  You may have to 
experiment to find the equivalent command in your system.

RULE 4.6  Don't use special characters like non-breaking spaces or 
optional hyphens to dictate where line breaks occur.  These features are 
not portable.

RULE 4.7  Try to prevent your word processor from hyphenating words on 
its own.  

It's OK to break a word that has a "hard" hyphen at the hyphen.  That 
is,  if a hyphen is a normal part of the word's spelling and the word 
processor decides to break the word at the hyphen, don't worry.  But you 
should try to avoid breaking sentences that have dashes--like this--at 
the double dash.  Sometimes the word processor will break a double dash 
in half.

RULE 4.8  Use single spacing with two hard returns between paragraphs.

Many WYSIWYG word processors allow single, double, or triple spacing 
between lines.  In the text file, however, there is not necessarily two 
returns between each paragraph.  Double spaced text *is* much easier to 
read on a screen, but it is hard to re-paragraph.  The two returns 
between each line tend to make word processors think that each line is a 
paragraph.  

In general, it is easier to 

RULE 4.9  Keep paragraphs short, say around 10 lines.

Paragraph breaks form a visual guide for the eye.  A book that has 
paragraphs spanning whole pages is hard to read.  Similarly, on a 24 
line screen it may be difficult to read paragraphs longer than 20 lines.  
Even if you naturally express yourself in paragraphs of 7 to 10 
sentences, you should break your progression of thought into shorter 
segments after writing it down, if you want to reach your audience.  If 
I see a paragraph that fills the whole screen, I tend to want to scroll 
down and skip ahead.  


<Section 3.5>  Character Sets and Fonts

In order to be portable, a document must be coded as a text file.  The 
American Standard Code for Information Interchange (ASCII) represents 
each character as a seven bit number.  There are variant dialects of 
ASCII, especially for languages other than English, but the variations 
do not affect the subset we will be discussing.  In particular, there 
are many extensions of ASCII to eight bits, of which Latin-1 is the most 
popular.  These extensions are *not* portable, and hence not discussed 
further.  

In order to be as portable as possible, ASCII text , or plain text, must 
observe a number of conventions:

Rule 5.1 Use only the 84 character subset of ASCII consisting of the 
twenty-six letters of the English alphabet (both upper and lower case), 
the ten digits, and twenty-two punctuation marks in Table 1 below.  Do 
*not* use the ten bad characters:

  o  dollar sign

  o  pound sign (number sign)

  o  at-sign

  o  carat (circumflex)

  o  tilde

  o  back quote 

  o  backslash

  o  vertical bar, and 

  o  curly brackets 

These symbols do not translate well into character sets in other 
countries.

               TABLE 1.  The 22 Legal Punctuation Marks
  o  comma

  o  period

  o  colon

  o  semicolon

  o  exclamation point

  o  percent sign

  o  ampersand

  o  asterisk

  o  parentheses
 
  o  hyphen

  o  underscore

  o  plus sign

  o  equals sign

  o  square brackets

  o  apostrophe

  o  double quote

  o  angle brackets (less-than and greater-than signs)

  o  question mark

  o  and (forward) slash.  

More briefly:

     !%&*()-_=+[];:'",.<>?/

The reason why these characters are fine and others aren't is obscure.**

  **  There is an international standard, ISO-646 that adapts ASCII to 
non-English languages.  Part of its character set, the "invariant 
subset",  is the same on all keyboards.  There are also obscure problems 
translating ASCII to its IBM mainframe equivalent, EBCDIC.  Even the 22 
legal characters are too many in some circles.

RULE 5.2  The only "white-space" allowed are spacebar and line endings 
(carriage returns).  Horizontal tabs and other "control characters" are 
not portable.

Actually you can use tabs for text that is not going to pass through 
unusual or difficult conversions.  For example, if you are sharing a 
Spreadsheet by E-mail you can probably exchange a tab-formatted file 
rather than using the (safer) Comma Separated Value format.  

Sticklers will point out that Rule 5.2 means we allow an 87 character 
subset of ASCII.


<Section 3.6>  Outlining and Hierarchies

RULE 6.1  Impose a relatively rigid outline (hierarchy) on your 
manuscript and reflect that hierarchy in a rigid formatting scheme for 
the section and chapter headers.

E.g., this manual uses angle brackets *plus* two spaces *plus* a section 
title.  Each section is preceded by two blank lines, each part by three.  
Sections are numbered in dotted-decimal form.  

Conventions like these allow casual searching, or "navigation" of the 
document.  Unless you have a "hypertext" document that lets you skip 
around easily, such guideposts are necessary.


    In designing markup conventions, you should keep in mind that it is 
more valuable to represent *logical* structure than to try to mimic the 
*physical* appearance of a printed page.  Thus, 

  o  it is wasteful to use vertical space to try to mimic vertical 
layout of a printed page, because the resulting effect looks 
disconcerting on the screen.  Use the number of blank lines to represent 
the logical structure of the document instead.

  o  use flush left headers for the top levels, indent a couple of 
spaces for lower level.  At the very lowest logical level, just skip an 
extra line between paragraphs and don't bother with a separate title for 
the header.

  o  try "tagging" important breaks with special characters like angle 
brackets, a row of hyphens "----------", or a decorative break like 
this:

                    +     +     +     

  o  don't overuse all caps in titles, especially for Section breaks.  
They scream too loudly.  You can't really mimic the print-based effect 
of small caps on the screen.

  o  don't bother developing elaborately different formats for headers 
that are seen infrequently (e.g. chapters, parts, or "books").  
Concentrate instead on the sections, sub-sections and minor breaks.  The 
"wide-area" structure is more simply represented by dotted-decimal and 
the low level structure by visual formatting.  

Remember that the global appearance of the document is much less 
important than it is for a book, since the user never sees the document 
as a whole, only small local sections.  In fact, the highest level 
logical divisions are probably not visual at all--they are the breaks 
between computer *files*, or even a directory hierarchy.  And--ever more 
commonly--hierarchies of *computers*!  This leads us to the next rule.

In preparing your outline, remember the following rule:

RULE 6.2  Never nest a hierarchy or outline more than three levels deep 
without hiding some of the structure.

There is a great deal of structure in the computer world.  

Countries contain 

  domains contain 

    networks contain 

      companies contain 

        individual computers (nodes) contain 

          directories contain 

            subdirectories contain 

              documents contain 

                chapters contain 

                  sections . . . (*whew* that was ten levels).  

You *must* try to hide some of this structure from your reader.  The 
easy way to do this is to narrow in on the local focus and pretend that 
what we're looking at right now is the *only* thing in the world.  It is 
impossible to read a document and simultaneous think of its place in the 
wide world.  Forget the tree structure of the whole network or computer 
system; let the reader focus on the local tree-structure.

And, whatever you do, don't let the reader know they are more than three 
levels deep.


<Section 3.7>  Text Inclusions

We have already discussed basic formatting issues like paragraphing, 
line length, and basic layout.  This section concentrates on the myriad 
details that bedevil the typist.  We save most of the *really* technical 
stuff like tables, foreign languages, and formula, for =Section 3.9=.  
In this section we discuss very common inclusions in text--


<Section 3.7.1>  Alternate Fonts

RULE 7.1  Use markup to represent *logical* emphasis rather than 
particular font effects.  

Here are some typical reasons and traditional *print* renderings:

  o  emphasis (italics or underlining)

  o  strong emphasis (bolding, all caps)

  o  interior dialog (italics)

  o  editor's emphasis (italics)

  o  foreign language phrase (italics)

  o  book title (italics, underline)

  o  article title (quotation marks)

  o  new term, index term, glossary item (italics, quotes, underline)

As you see, italics are overused and the choices are not always 
consistent.  In order to make your meaning PERFECTLY CLEAR, it is best 
to observe this rule:

RULE 7.2  Prefer delimiters for marking inclusions.  Use different 
delimiters for different purposes.

A delimiter is a character or pair of characters that is used fore and 
aft to set off text.  

It is not a bad idea to develop a set of guidelines for how to render 
each sort of inclusion.  Here is what I use:

  o  now *this* is emphasis (and *strong* emphasis)

  o  as I said, markup is part of =la vie=.

  o  or we can introduce a new "term" like this (See "taxonomy").

  o  book titles, like _Elements of Style_, are a snap.

I also have a series of conventions I use for special situations that 
arise in scholarly text, such as multiple languages or included math 
text.

By the way, avoid the effect that results when you try to 
_mimic_print_media_by_underlining_in_this_fashion_.  The result is 
tedious and leads to long words that don't wrap well.  In E-text, a pair 
of underlines is just another delimiter, nothing more.

RULE 7.3  In E-text, always place punctuation *outside* delimiters.

     Otherwise, the E-text looks "silly."  Better:  "silly".

In print, you put the punctuation after a quotation on the "inside."  
This looks good in print but terrible on the screen.  If your E-text is 
destined for computer screens (and automated search programs) it is 
better put the punctuation on the "outside".  If this disturbs you, 
remember that in the last century the printers rule (as I have seen in 
many books,) was to put *commas* inside parentheses as well as inside 
quotation marks.  We are allowed to change these conventions from time 
to time.


<Section 3.7.2>  Quotations and Included Blocks of Text

     There are a number of ways to include quoted or included
     materials.  One, favored in print, is to push the margin
     of included text inwards, like this.

You should use this technique *very* sparingly.  It requires a hard 
return and hand spacing for each line.  Reformatting (to shorten the
passage, say) is very difficult.  WYSIWYG word processors let you shift 
margins on a per-paragraph basis.  This feature is not transportable so 
you can't use it for E-text.

In E-mail correspondence you often see the convention that a right angle 
bracket in the left column sets off correspondence.  Often, this 
continues to the point of inanity:

     >>> I said I don't like the President's new policy
     >> O yeah?
     > Yeah.
     O yeah?
     >  Not only that, you're an idiott
     Well so are you.  And you don't spell good either.

Our ability to reconstruct the whole train of correspondence is a poor 
trade for legibility.  

Another device to avoid is the frequent use of vertical bars alongside 
text to indicate changes.  Although most computer keyboards have a 
"vbar" (usually a shift-backslash) this character does not travel well 
and the visual effect is lost in some fonts or if the line length 
changes.

An alternative to the vertical bar is to mark changed sections with 
double brackets:

[[Our new improved widget has 
a longer lifetime and
higher customer satisfaction
rating.]]

More elaborate schemes for marking changes are discussed in the section 
on Editing and Marked Sections.

In summary, we have this rule:

RULE 7.4  Avoid block quotes and text with vertical lines to represent 
additions or changes.  Just use conventional quotation marks or a 
special "delimiter" like double square brackets.


<Section 3.7.3>  Lists

There are two basic kinds of list, ordered** and unordered.  Unordered 
lists often have "bullets" in front of the items.

  **  Also called enumeration's.

RULE 7.5  Indent list items at least two spaces and make sure list items 
are in separate "paragraphs", i.e. with a blank line between each item.

This prevents formatting problems that occur when the word processor 
decides that a list is actually a paragraph and pours it, bullets and 
all, into a rectangular shape.

RULE 7.6  Do not use a "hanging indent" for list items.  Let subsequent 
lines run to the left margin.

  o  This is an example of a list item
     that looks good in print but
     is hard to re-format in E-text.

  o  This second list item is more typical
of E-text.  You can reformat it without deleting
lots of spaces at the beginning of each line.

Also, as mentioned in Part II, the visual effect of straight line 
margins is less important in E-text.  You don't gain all that much 
visually by going for the pretty-but-hard-to-format look.


<Section 3.7.4>  Cross References, Hypertext, and Embedding

References to other parts of the text should be set off so they can be 
found.  Cross-references are of several sort, all related:

  o  Cross-References to other parts of the document:  See Section 3.4,  
See "UNIX" in glossary, Page 43.

These cross references are essentially pointers that urge you to leap 
over the intervening text.  This is easy in print media, where you have 
all the pages in your hand.  With a computer program you have to use the 
comparatively clumsy method of manipulating the keyboard or mouse to 
move around.  With plain text, the only rational approach is to use the 
"search" or "find" command of the word processor to locate the passage.  
The art comes in guessing good "strings" (sequences of letters) to 
"search" for.

  o  Hypertext references (outline overview, hypertext menu and 
references)

Many word processors allow you to "navigate" a document by traversing an 
outline overview.  In what amounts to the same thing, "hypertext" 
programs often implement the natural tree-structure of a document by a 
series of menus representing the possible "branches" available at each 
"node".  This is the computer equivalent of the dime-store "interactive 
adventure book" in which you get to choose the plot developments by 
making choices like "If you want to rescue the damsel go to page 43; if 
you want to kill the ogre, go to page 136."

  o  External File References

Here the point is that we can name other files and directories--and even 
other computers, e.g. "rtfm.mit.edu:/pub/usenet" means subdirectory 
"usenet" of directory "pub" on computer node "rtfm" at M.I.T.

  o  "Bibliographic Citations" of print media

The Bibliographic citation, either as a hypertext link in the text 
(footnote) or as a list of references (menu) is a subject of great 
attention in print media, with all sorts of elaborate formatting rules.

  o  Embedded Figures and Included Files

Very often, word processors (and long computer programs) have a master 
file that looks something like this:

     include <Front.Matter>
     include <Chapter.1>
     include <Chapter.2>
     include <Chapter.3>
     include <Chapter.4>
     include <Chapter.5>
     include <Appendix.A>
     include <Index>

This master document sews together a bunch of smaller files.  In 
advanced programs, you may be unaware that this process, called file 
inclusion, or embedding, is taking place.  

File inclusion is especially common as a solution to the following 
problem:  how do you include material that is "foreign" to textual 
matter, say a graphic image or drawing.  If you just cut and paste the 
text, the program will mistake it for part of running text, often with 
dire consequences.  The solution is to keep the offending material in a 
separate file and have only the file reference in the text itself.  Then 
all the word processing program has to do is 

You can immediately see that all these are applications of a single 
idea, the idea of a "pointer", or "reference".  One part of the document 
points to another.  We are supposed to imagine--and the program is 
supposed to make us think--that there is a bridge from one place to 
another, or that the reference can be expanded to that we can enter into 
the other file or location and get back again.  Thus, the "See 
Reference", hypertext link, or external file reference really amount to 
the same thing.

The point is that in all cases we need is someway to represent the 
starting point (reference or pointer) and ending point (anchor point) of 
the arrow.  The World Wide Web uses Hypertext Markup Language (HTML).
An HTML cross-reference looks like this:


The corresponding target, or anchor, is marked this way:


One soon tires of making up unique names to allow each cross-reference 
to mate with its anchor.  It is more natural to use the document's 
natural tree structure (perhaps represented by dotted decimal) for 
anchor identification.  Admittedly, this lends itself to dangling 
references like "See page 25" when page 37 is the correct page for 
version 2.3.  Correcting these references is probably less work than 
typing the ungainly syntax of an HTML cross-reference.

If we are not creating a source document to connect to the World Wide 
Web, a simpler method is to delimit the reference with equals signs, 
=See Index=, and the anchor point with angle brackets.  This has an 
added advantage if you are using equals signs to delimit italic text, 
since glossary entries are often rendered in italics.  You can see how 
natural this is given the section marking scheme adopted here (See 
=Section 26.6.4= below).

  <=Index>  This is the anchor point for the index reference made in the 
above paragraph.  The equals sign is optional.  It just serves to mark 
the tag <index> as an anchor point.  

RULE 7.7  Delimit glossary entries, index entries, See references, and 
so on with equals signs.  Use a consistent notation, such as angle 
brackets, to mark the anchor points.

The World Wide Web attempts to link documents with cross-references 
(hypertext links) on a global scale.  The notation developed for this 
project is called a universal reference locator (URL) and is very 
similar to 

     protocol://node:/directory/file:port

E.g.:

     ftp://ftp.ncsa.uiuc.edu:/pub/education/README:80.

     news://comp.sys.mac
     

The "protocol" part has to do with the method of getting the document 
(and thus implicitly with the classification scheme).  The examples here 
are File Transport Protocol and Usenet News, two common document 
retrieval systems.  "ftp.ncsa.uiuc.edu" is a computer, "comp.sys.mac" is 
a "newsgroup".  "/pub/education/README" is a file in a directory called 
"/pub/education"; and "80" is a "port number".  These details only 
concern the retriever, who may be just a computer program.

The URL notation is easily adapted to other hierarchical schemes used 
outside the computing world, especially if the syntax rules are relaxed 
a bit.  Here are some ideas:

For Books:

  dewey://stcharles.pub.lib:270.23.07:gilson:4  (St. Charles Public 
Library, Dewey Decimal, Author Ettienne Gilson, copy 4).

  LoC://QA.22.4:  (a library of Congress citation)

  ISBN://123-24-55

For a Journal Article:

  journal://Time:1990.23.56-69

For the Phone System:

  voice://1.708.840.8069  (A voice number)

  fax://1.708.840.8069  (A FAX number)

  internet://jgoodwin:adcalc.fnal.gov  (E-mail address)

  postal://Box.6022:St.Charles:IL:60174  (Surface mail)

Or something like that.  

RULE 7.8  Use Universal Reference Locators (URLs) for worldwide computer 
file references.  Campaign for its extension to other obvious (paper and 
telephonic) information sources.


<Section 3.7.5>  Editing and Marked Sections

RULE 7.9  Indicate short deletions [and additions] with square brackets.  
If you need to tell them apart add a plus or minus sign in front.  
Indicate the version of the change by a version number (single number or 
dotted decimal) after the sign.

     This regulation shall apply to each +1[and every] tax payer -2[, 
except members of this legislature].

We can thus reconstruct the history of this text:

     Version 0:  This regulation shall apply to each taxpayer, except 
members of this legislature.

     Revision 1:  This regulation shall apply to each and every 
taxpayer, except members of this legislature.

     Revision 2:  This regulation shall apply to each and every 
taxpayer.

This principle can be extended to whole sections of text except that it 
is better to use double square brackets since the text itself may 
contain "innocent" brackets.

-2.3[[  . . . ]]  means that this section is omitted in Version 2.3.

This notation soon becomes wearisome after multiple and intricate 
revisions.  Jim Warren has devised a visual format that makes collating 
multiple versions in tabular or outline form:

012
    This regulation shall apply to each 
      and every
    taxpayer
       , except members of this legislature
    .

RULE 7.10  For complicated additions and deletions, such as those found 
in legal matter, use Warren format.

Here are three examples of the formats we have been discussing:


[[include example here]]


One final rule:

RULE 7.11  Don't space between ellipsis.  Instead, leave one blank space 
before and after:  ( ... ).  

Word processors do not necessarily recognize ellipses as a single 
"thing".  The gracious effect of spacing created by a typewriter seems 
lost on a computer screen.


<Section 3.7.6>  General Style and Conventions

This section is about rules that are conventional to almost all typing.  
A brief list is included here for completeness:

RULE 6.12  Add two spaces after each major break (period or question 
mark, colon, etc.) and two spaces after minor pauses (comma, semicolon).  

An exception is made for periods that are part of an abbreviation or 
initials of a name, where the rule is:

RULE 6.13  Allow one space after each initial in a name but not between 
initials of an abbreviation:  J. E. Goodwin, St. Charles, Ill., U.S.A.

RULE 6.14  Represent a double dash with two hyphen and do not allow 
spaces on either side of the dash--instead like this.

RULE 6.15  Certain Latin abbreviations do not have internal spaces, nor 
are they in italics:  i.e., e.g., etc.


<Section 3.8>  Esoterica

99% of all ordinary E-text written in English does not need this 
Section.  But the issues discussed here greatly effect certain kinds of 
text:

  1. Texts requiring traditional scholarly adjuncts such as citations, 
cross-references, indexing, bibliographies, glossaries, critical 
apparatus, and figures;

  2. Scientific and mathetmatical texts that use formulas extensively;

  3. Statistical text with frequent use of numbers, uncertainties (plus 
or minus), scientific notation, and tabular material.  Such text occurs 
commonly in the physical and social sciences, e.g. reports of 
experiments.

  4. Texts in one language that discuss another (language textbooks, 
grammars, dictionaries, commentaries, many works in the humanities); 


<Section 3.8.1>  Inclusions in Languages Other than English

In English, where diacritical marks are rare, foreign languages are.  It 
is important to distinguish between =transcription= and 
=transliteration=.  In transcription, an attempt is made to render the 
word as nearly as possible using the English alphabet, with or without 
diacritics.  Precision .  Transliteration is an attempt to represent the 
*spelling* of the word in the non-English alphabet.  Great effort is 
made, in designing the tranliteration system, to make the 
transliteration reversible, so that the exact original text can be 
recovered by a knowledgable human or program.

These two possible approaches to including non-English text lead to two 
different rules, depending on intent:

RULE 8.1  Set off foreign phrases with the same delimiters used in place 
of italics (usually equals signs).

RULE 8.2  Use special delimiters (for example plus signs or asterisks) 
to signal special notations used for *tranliteration*.

No attempt is made to distinguish three uses of "equals-italics"--
foreign language italics, cross-reference signal, and miscellaneous 
italics.  As in print, these can usually be distinguished by context.


    Beyond representing foreign phrases exactly, one might want an 
informal notation for representing the diacritic marks that do 
occasionally occur in English.  Using these is probably pedantic in 
ordinary E-text, but from time to time they may be useful, e.g. in 
grammatical discussions:

RULE 8.3  In ordinary English texts it is not usual to use diacritical 
marks, even when the English word technically has them, such as:  
fac?ade, ro=le, coo%rdinate, blesse!d.  

If absolutely necessary, we recommend: 

  acute accent: ne/e

  grave accent:  blesse!d

  circumflex accent, tilde, or macron:  ro=le, nolo= contendere

  diaeresis or umlaut:  coo%dinate

  cedilla:  fac?ade

The choice of symbols is based in portability (which excludes, for 
example a tilde or circumflex).  Also, the notation is just a little 
ugly to discourage its overuse.


    E-texts that discuss foreign languages present special problems.  
Here are some suggestions:  

  1.  The basic convention is that the primary language is unmarked, and 
the secondary language delimited by asterisks: *E pluribus unum*, or by 
equals signs =E pluribus unum=.  

The choice of delimiter used requires some thought.  In Latin, asterisks 
should be used so that equals signs can be used to represent macrons:

     Ve=ni=, vi=di=, vi=ci=.

Unless there are considerations like these, the asterisk is chosen for 
the most frequent use in the text (usually italics-for-emphasis) because 
it is less obtrusive and most conventional.

Since such text do not usually contain quotations, double quotes may be 
used to represent translations or definitions:

     =E pluribus unum= means "from many, one."

In printing, both the foreign text and the translations are often 
rendered in a different style.  If italics are needed for other 
purposes, they should be delimited by asterisks:

     =E pluribus unum= is *so* Eighteenth Century.

  2.  If the text contains a selection of many different languages, 
special delimiters are used to segregate languages that use the Latin 
alphabet from others.  In this case no effort is made to choose one 
secondary language as "the" secondary language.  Instead, the delimiters 
are used to mark alphabets that differ visually from the Latin alphabet.

=  =   Languages using the Latin Alphabet, other than the primary 
language (in effect "language italics").

*  *   Greek

+  +   Hebrew

/  /   International Phonetic Alphabet

Other delimiters can be constructed =ad hoc=, such as  &&[ ... ]&& or
+/ ... /, (* ... *) and so on.

Just a reminder:  the recommendations here are strictly for informal use 
in the context of "flat" ASCII files, e.g. for casual communication, or 
as character-oriented output from a program that uses a proprietary 
format or SGML for internal use.  Any substantial work with multiple 
languages is probably worth the effort to use something other than E-
text for the *underlying* representation.  In particular, scholars 
should consider the Text Encoding Initiative's recommendation.  

Even with an elaborate underlying markup system, however, the problem 
remains of how to render the foreign language text, perhaps a text that 
does not even use the Latin alphabet, on a character-oriented screen.  


<Section 3.8.2>  Footnotes, Cross-References, and Bibliographic 
Citations

There are two issues here:  how to write the citation and where to put 
it.  As to the first issue, citation schemes that work well in print are 
often cumbersome in E-text.  The answer to the second issue is 

RULE 8.4  Place footnotes at the foot of the paragraph, or else gather 
them in an appendix at the end of the work.

Another common place to put notes, at the end of a Chapter, should be 
avoided since it is a relatively hard place to find, compared to the end 
of the file.  

The inclusion of footnotes in the body of the text with special 
delimiters, as is done by any word processors, is a concession to print-
oriented production of text.  It places the footnote where the *program* 
wants it.  From the standpoint of the reader, there may as well not be a 
footnote at all!

RULE 8.5  The footnote mark should be as unobtrusive and short as 
possible:  usually ** or ++, [34], or [Wells85].

     . . . as discussed in the paper by Wells.[Wells85]  Another . . .

     . . . again makes this point in Ref.[36], where the bias . . .

     . . . See the Nichomachean Ethics+[NE,1150a]. . . .


Footnotes with a single asterisk could be confused with an "emphasis" 
delimiter.  Putting asterisks in brackets, [*], seems long-winded.

RULE 8.6  Footnote sequencing should not continue across physical files.  
Use dotted decimal notation to refer to "long-range" footnotes:  [2.15] 
means footnote 15 in chapter 2.


    Designing a good bibliographic citation scheme for E-text means 
breaking away from print models. Long dashes and hanging indents are 
useless in E-text.  Also, most readers, if they read notes at all, will 
synchronize two windows so that notes can be read in one and the text in 
another.  *Therefore* it is better to make your annotated bibliography 
follow chapter organization than to make it alphabetical or 
chronological.  

In general, it is a good idea to gather bibliographic references in one 
place and *not* put them in footnotes, as is common in print.  This is 
because many of the citations will be URL's (see =Section 3.7.4=), which 
mar the appearance of the text.**

  ** This assumes the E-text is not being prepared for linkage to the 
World Wide Web!  In this context, our discussion applies more to the 
output of a WWW server than to its input.


<Section 3.8.3>  Formulas and Statistical Text

There is a great deal of scope for developing new mathematical notations 
that work well with E-text.  I can only make a few recommendations and 
observations here.

RULE 8.7  Use square brackets to set off "math italics", especially 
variable names embedded in ordinary text.  Omit the brackets for 
displayed equations.

This rule is necessary to make variables stand out.  Human eyes that are 
used to picking out subtle font differences find it hard to read text 
that refers to variables like a where a is the unknown.  To repeat, [a], 
where [a] is the unknown.

RULE 8.8  Separate displayed material by one blank line before and 
after, and indent consistently (five spaces recommended).

Here is a well know example:

     E = m c[2]


     E[2] = p[2]c[2] + m[2]c[4]

where [E] is the total energy, [m] is the rest mass, and [c] is the 
speed of light in a vacuum.  


    Scientific notation is a travesty in type.  One commonly sees such 
attempts as 1e12, 2.005+/-.01, or 2 x 10 5.  We recommend quoting 
numbers in the following fashion:

     1.0E+12, 2.005(10), and 2.E+5.

To my eye, at least, the following rules are useful:

RULE 8.9A. Always use a sign after the "E" in exponential notation;

RULE 8.9B. Always express the decimal in floating point numbers and 
precede a decimal point by a zero, i.e. 0.05, not .05.

RULE 8.9C  Represent symmetric tolerances in parenthesis after the base 
number.

A little care here is considerate of the reader and helpful for 
subsequent typesetting.

RULE 8.10  In running text, superscripts and subscripts could be 
represented the same way as footnotes in the main guidelines, viz.  

     2+[20] = 4+[10], 

although the FORTRAN notation 2**20 = 4**10 is more perspicuous.  

RULE 8.11  Subscripts and superscripts that do not represent powers but 
represent labels, are conveniently handled like array subscripts:

    a(1,3) = b(2,4) instead of a+[1]-[3] = b+[2]-[4].

The array indices might use square brackets instead of parentheses.

RULE 8.12  For the mixed case of subscripts for labeling and 
superscripts for powers, we recommend:

    a1[2] = a2[2] or a1**2 = a2**2 or a(1)[2] = a(2)[2].

The first approach is better suited for long formulas with many powers:

    (x+y)[3] := x[3]+x[2]y+xy[2]+y[3]
 
    (x+y)**3 := x**3 + x**2*y + x*y**2 + y**3.

RULE 8.13.  Complex expression like summations and integrals can be 
handled informally as follows:

     (1/n)*sum(i=0,n; x(i)[2]) or int(x=0,infty;x[-2]).

RULE 8.14  Matrices, tables, and outlines are handled in a consistent 
fashion.   

       7,  18,     19
     -43,  72,  930.1
    -1.1,  18,    100

Whereas in print vectors and Matrices are represented by boldface 
letters, in E-text it is probably best to adopt Paul Dirac's bra-ket** 
notation, first developed for Quantum Mechanics.  Here, the vector "v" 
is represented as [v>.  This notation is well-developed and *can* be 
typed in E-text.  

  ** The name comes from the following construction:  <bra] c [ket>.  
The vector is called a "ket", the dual vector a "bra", and [c] is the 
operator matrix.


<Section 3.8.4>  Verse, Drama, and Liturgy

RULE 8.15  Each line is a separate paragraph.  There should be two hard 
returns between lines and three between stanzas.  

Alternatively, two returns may mark stanzas, with lines beyond the first 
indented by white space (one space recommended).  Three returns can mark 
longer segments.  Only one of these two methods should be used in any 
one work.

RULE 8.16  Do not try to mimic vertical or horizontal spacing of a 
printed source, unless the visual effect of the poem is the main 
concern.

RULE 8.17  Run on lines (say past 80 characters) can be represented by a 
slash (/) at the beginning of the line.  

RULE 8.18  An asterisk, *, is used to mark caesura, pause, or breathing 
mark.  

This should be preceded and followed by a space (or return) to prevent 
its confusion with a footnote or emphasis delimiter.

RULE 8.19  Use asterisks to delimit stage directions or rubrics.

RULE 8.20  Use special delimiters to mark speakers, roles, or questions 
and answers.  Follow these with two spaces.

This helps the reader skip from part to part.  Ampersands and periods 
make unobtrusive delimiters.  Brackets are visually more striking:

     &Ham.  To be or not to be.

     &Pol.  That is the question.

*or this*

     &V.  The Lord be with you.

     &R.  And with thy spirt.

*or this*

     &Q.1.5  What is LINUX?

     &A.  LINUX is a small, free UNIX-like operating system for 386 
computers.


<Section 3.9>  Electronic Forms and Tests

E-text is often used as a medium for distributing forms, tests, and 
other items to be filled out and returned.  Often, these forms mimic 
paper counterparts at the expense of their purpose--to be easy to fill 
out and return.  Here are some rules:

RULE 9.1  Avoid the multiple column format common on paper forms.

As soon as you start to fill out the form, the columns don't line up.

RULE 9.2  Skip a line between questions.

This avoids the dread re-formatting problem.

RULE 9.3  Place a left open bracket wherever an answer is required, but 
not a right closing one at the end.

In order to fill in a checkbox, you have to position the cursor exactly 
in the middle of the box, delete a character and type and "x".  It is 
easier to position a cursor at the end of the line and start typing 
right away.

RULE 9.4  Avoid checkboxes.  Ask for a one-character typed answer 
instead.

RULE 9.5  Leave four hard returns (three blank lines)  between "short 
answer" questions

The responder begins typing at the beginning of the second blank line.

RULE 9.6  Do not use spaces or underscores to show blanks; use periods 
or hyphens instead.  Put them on the line *below* the response area (so 
the responder doesn't have to erase them and lose count!).

     Your state or province:  [
                               --

     Your zip or postal code:  [
                                -----

This cues the responder as to desired length of the response.  Blanks 
are invisible, except in certain word processors, and underscores are 
often run together, so you can't count them easily.  

This sort of form is easy to fill out:

Your city of residence (20 characters max): [Chicago, Illinois
                                             --------------------.


<Section 3.10>  The E-Mail Business Letter

The paramount rule in writing an effective E-mail business letter is 
brevity.  

RULE 10.1  In general, you should omit as much of the traditional 
apparatus of the business letter as you can,

since the mailing system may well add lots of unwanted detail.  An 
effective letter can be as short as:

     From: jegoodwin
     To: anotheruser
     Subject: E-mail
     <blank line>
     This is what I have to say.  =John=

RULE 10.2  Always begin E-mail with a single blank line.

This is to allow some visual separation from the mail header.

RULE 10.3  For short (one paragraph) messages, use only the paragraph 
and your name, in-line with the last sentence.

Since brevity is the rule, anything beyond a one-paragraph note should 
be carefully trimmed.  The model below is about the *maximum* you can do 
and still have a brief effective letter.  Feel free to omit anything 
unnecessary.

At most, an E-mail letter will have the following parts:

  1. Mail Header

Do not add a letterhead** or mailing address.  The mail system will add 
enough garbage as it is.  Your info goes at the end of the letter.

  ** An exception is in resumes and advertisements, where catching the 
readers attention is of paramount importance.  There, lots of whitespace 
and visually arresting designs are welcome.  The effect wears off 
quickly, however, so think twice before adding eye-catching effects to 
all your E-Mail.

  2. Greeting

This is optional.  "<skip one line>Dear Sir or Madam<Skip another line>" 
(if you don't know the sex of the person you are writing--very 
frequently the case, with E-mail), or "Dear George" or simply "George--"

  3. Body

This follows the principles in the rest of this manual.  Remember:  
flush left.

  4. Closing and Signature

The closing optional.  "<skip line>Your Name<skip line>" is fine.
If you want one, don't indent it a half page, as is customary in print.  

Suggested formal closings are "Sincerely", "[Best] Regards", and 
"Thanks".  I generally avoid "Thanks in advance", since it implies that 
either you aren't thankful if the person doesn't respond (which is 
ungracious); or you don't plan to thank them if they do (which is 
churlish).

You may use special delimiters to mark your signature, but keep these 
light and tasteful.  I sign =John Goodwin=.  Other persons use two 
slashes before there name or add a plus (for clergy), etc., etc.  This 
is more distinctive than a signature file.

  5. Contact information

Since the reader is most likely to contact you just after reading the 
letter, but info here.  

RULE 10.4  Keep contact information short, probably only your E-mail 
address and phone number (two of each, at most)

RULE 10.5  Use the international style for phone numbers:  e.g.,
+1 708 840 8069 (work).  

  Note:  "+1" is the Country Code for the U.S.A.

RULE 10.6  Never, NEVER, include a character-drawing or funny quote in a 
signature file.

     ////
     [oo]    <-- This is me!!!   "Remember O Man that Dust Thou Art"
     ----

Many persons use a "dot-signature" file that is automatically appended 
to all their E-mail.  The effect is almost invariably puerile and 
tasteless.  If you include it twice you can add "incompetent" to the 
list.

Here is how it looks all together:

     To:    blah blah
     From:  blah blah
     Subj:  blah blah
                                    <--blank line required
     Dear Sir or Madam:             <-- or Dear George, or Dear Ms.Smith

     [Body of text]

     [Body of text]

     [Last paragraph] 

     Sincerely,  <--optional close

     =Your name=  <--use signature delimiters for visual effect

     [Your Contact information]


<Section 3.11>  The Final Rule

And lest the reader forget, 

RULE B.  All Rules are Made to Be Broken.

Rules summarize experience and judgment.  In this manual I have tried to 
reflect my own judgment as to what is appropriate, functional, and 
aesthetically pleasing.  I have not always succeeded.  If I have spurred 
the reader to consider their own style and refine it for their own 
purposes, I will have achieved all my end in writing this manual.  Above 
all, remember, dear reader, 

          Question Authority.  It's wrong.


                    +     +     +

<Appendix A>  Technical Details:  Relationship to SGML and TEI

Many of the concerns addressed in this manual are common to participants 
in the Text Encoding Initiative (TEI) and other users of the ISO 
standard, Standard Generalized Markup Language (SGML, see =Section 
2.9=).  I would like to emphasize, for their benefit, that this manual 
describes a *presentation format* and not an encoding format.  It is 
perfectly possible to create an SGML- or TEI-compliant file that uses 
the format discussed in this manual as a visual output format.  

There are very distinct advantages to having a visually appealing, 
informal, character-oriented format, like the one advocated here, in 
which the logical structure (i.e. markup) is still present, but not 
visually intrusive.  SGML compliant systems may well produce such a flat 
file at the request of a user, or the screen output may be cut from the 
program's display window and pasted into such a file.  This style manual 
has tried to describe design principles that will make the resulting 
flat file useful and appealing to read.

Naturally, there are many uses for such a format outside SGML systems as 
well; and a certain uniformity, or at least attention to design 
principles, can only help make the texts created more useful.  

The advantages of SGML or TEI encoding will only come about if word 
processors that hide the markup process from the casual user become 
commonplace and interoperable.  Probably, a low-end freeware editing 
system will have to be created.**  Until that time, welcome or not, flat 
ASCII is not only a visual format, but an interim interchange standard 
as well.  

  **  Such a system is being created for the LINUX operating system.

Once again:  this is not a new encoding or input format, nor is it 
primarily intended as an interchange standard; it is a suggested format 
for visual *output* that happens to be maximally transportable at the 
present moment.

                    +     +     +

<Table I>  Table of Contents

=Part I=    Writing for an E-text Audience

    =Section 1.1=  Why Write for an E-text Audience?

    =Section 1.2=  Is it Possible to Write E-Text and Print at the Same 
Time?

    =Section 1.3=  Differences between E-Text and Print Media

    =Section 1.4=  Version Control


=Part II=   Specific Differences of Style and Mechanics

    =Section 2.1=   Differences Traceable to Physical Media

    =Section 2.2=   Differences in Style

    =Section 2.3=   Differences in Process

    =Section 2.4=   Differences in Repertoire

    =Section 2.5=   Differences in Layout

    =Section 2.6=   Searching and Hypertext

    =Section 2.7=   Copyright Issues

    =Section 2.8=   The Parts of a Book

    =Section 2.9=   The General Theory of Markup (SGML)

    =Section 2.10=  Summary:  Basic Tricks of the Trade


=Part III=  A Very Brief E-Text Style Manual

    =Section 3.1=  Backups and Saving Work

    =Section 3.2=  Compressed Files

    =Section 3.3=  Version Control

    =Section 3.4=  Use of Word Processing Features

    =Section 3.5=  Character Set and Font

    =Section 3.6=  Outlining and Hierarchies

    =Section 3.7=  Text Inclusions

        =Section 3.7.1=  Alternate Fonts

        =Section 3.7.2=  Quotations and Included Blocks of Text

        =Section 3.7.3=  Lists

        =Section 3.7.4=  Cross-References, Hypertext, and Embedding

        =Section 3.7.5=  Editing and Marked Sections

        =Section 3.7.6=  General Style and Conventions

    =Section 3.8=  Esoterica

        =Section 3.8.1=  Inclusions in Languages Other than English

        =Section 3.8.2=  Footnotes, Cross-References, and Bibliographic 
Citations

        =Section 3.8.3=  Formulas and Statistical Text

        =Section 3.8.4=  Verse, Drama, and Liturgy

    =Section 3.9=   Electronic Forms and Tests

    =Section 3.10=  The E-Mail Business Letter

    =Section 3.11=  The Final Rule


                    +     +     +

(end of _Elements of E-Text Style_)