Index: ossp-pkg/sugar/sugar.txt
RCS File: /v/ossp/cvs/ossp-pkg/sugar/Attic/sugar.txt,v
rcsdiff -q -kk '-r1.3' '-r1.4' -u '/v/ossp/cvs/ossp-pkg/sugar/Attic/sugar.txt,v' 2>/dev/null
--- sugar.txt	2000/11/08 22:04:04	1.3
+++ sugar.txt	2000/11/09 10:07:46	1.4
@@ -6,19 +6,28 @@
   Christian Reiber <chrei@en.muc.de>
   
   ++ | Genesis:     | 12-Mar-1999 |
-     | Last Update: | 26-Sep-2000 |
+     | Last Update: | 08-Nov-2000 |
 
   Introduction
   ------------
 
-  Sugar is a markup language and corresponding translator tool for technical
-  documentations that uses mostly invisible markup tags (the so-called
-  //syntactic sugar// in compiler construction folk terminology). The
-  general idea is that the markup text looks already like the textual output
-  of the translator phase, that is the Sugar source can be already treated
-  as its text format version. Additionally the Sugar markup language is
-  considered intuitive enough to be recognized easily, so writing technical
-  documentation is mainly just a matter of performing a brain dump.
+  Sugar is a markup language and corresponding translator tool for
+  writing technical documentation that uses mostly invisible markup
+  tags (the so-called //syntactic sugar// in compiler construction folk
+  terminology).
+  
+  The general idea is that the markup text looks already like the
+  textual output of the translator phase, that is, the Sugar source
+  can be already treated as its text output format ("ASCII WYSIWYG").
+  Additionally the Sugar markup language is considered intuitive enough
+  to be recognized easily, so writing technical documentation is mainly
+  just a matter of performing a brain dump.
+
+  So Sugars syntactic principle is "keep it simple'n'stupid" (KISS)
+  but still powerful enough to allow one to produce high-quality
+  output. Sugars goal is not to provide all features of a full featured
+  documentation system. Instead it provides only a few markup concepts
+  but those are streched to a maximum.
 
   Sugar Grammar
   -------------
@@ -34,9 +43,10 @@
      | <1d-tag>       | ::= | "!!##!!" \| "!!\|\!!|" \| "!!``!!" \| "!!''!!" \| ... |
      | <2d-tag>       | ::= | "!!**!!" \| ...                    |
 
-  where <regular-block> is defined visually as a rectangular block of
-  continued text inside the document, that is a paragraph of text (without
-  any blank lines) where each line starts at the same indentation position.
+  where <regular-block> is defined visually as a rectangular block
+  of continued text inside the document, that is a paragraph of
+  text (without any blank lines) where each line starts at the same
+  indentation position.
 
   Markup Language
   ---------------
@@ -46,10 +56,10 @@
   
   o ..Visual Formatting:..
 
-    For visual formatting of text the following <1d-tag> exists. They can be
-    used either inlined in a paragraph by using them twice (to delimit begin
-    and end) or for marking up a whole block by using them in marched-out
-    way (to delimit indented block).
+    For visual formatting of text the following <1d-tag> exists. They
+    can be used either inlined in a paragraph by using them twice (to
+    delimit begin and end) or for marking up a whole block by using them
+    in marched-out way (to delimit indented block).
 
     ++ | tag      | formatting       | application     |
        | !!__!!   | underline        | inline, block   |
@@ -73,10 +83,10 @@
 
   o ..Links and References:..
 
-    For %%referencing%% textual locations (both document internal and to
+    For referencing textual locations (both document internal and to
     external documents), links can be specified.
 
-    ++ | construct  
+    ++ | construct | description |
        | !!->!![//text//]!!(!!//scheme//!!:!!//path//!!)!! |
          external hyperlink via URL |
        | !!->!!//text//!!(!!//ref//!!)!! |
@@ -100,9 +110,9 @@
 
   o ..Headers:..
   
-    Up to four levels of headlines can be marked up by placing the following
-    character sequences (or any number of concatenated repetitions of them)
-    at the end of a text block:
+    Up to four levels of headlines can be marked up by placing the
+    following character sequences (or any number of concatenated
+    repetitions of them) at the end of a text block:
 
     ++ | sequence | level |
        | !!==!!   | I.    |
@@ -117,10 +127,10 @@
 
   o ..List Environments:..
   
-    Three types of list environments can be used. They are identified by the
-    first non-blank word in the first line of each list item. For ordered
-    lists the start position is selectable by specifying an explicit digit
-    instead of the generic item character.
+    Three types of list environments can be used. They are identified
+    by the first non-blank word in the first line of each list item.
+    For ordered lists the start position is selectable by specifying an
+    explicit digit instead of the generic item character.
     
     ++ | construct      | alternatives                 | type       |
        | !!-!!          | !!o!!, !!*!!                 | unordered  |
@@ -138,22 +148,32 @@
 
   o ..Table Environment:..
 
-    A generic table environment can be used for any type of data which has
-    to be rendered in a tabular layout. A table is a <2d-block> starting
-    with a !!++!! tag. The contents of the <2d-block> consists of a
-    2-dimensional table specified by cells. The table cells are indicated by
-    ''|'' characters. The number of columns is indicated by the first table
-    row. This first row can be either a complete (all cells are specified)
-    and regular row, or (in case the first regular row is not a complete
-    one, that is has multi-column cells) it can be an empty row (all cells
-    are specified for indication but are left blank).
+    A generic table environment is provided for any type of data which
+    has to be rendered in a tabular layout. A table is a <2d-block>
+    starting with a !!++!! tag. 
+    
+    The contents of the <2d-block> consists of a 2-dimensional table
+    specified by cells. The table cells are seperated by ''|''
+    characters. Every row has to start with a ''|'' at the same
+    horizontal position. The number of columns is indicated by the
+    first table row. This first row can be either a complete (all cells
+    are specified) and regular (the contents is used) row, or (in case
+    the first //regular// row is not a complete one, that is, it has
+    multi-column cell spans), the first row can be an empty row (all
+    cells are specified for indication but are left blank).
+
+    The ''|'' marks can be placed arbitrary in each row, but if
+    multi-column spans exists, the surrounding ''|'' marks have to be
+    placed exactly at the same horizontal character position as the
+    first table row has (else the multi-column cells are ambiguous).
+    Empty rows can be indicated by using just the starting ''|'' mark.
 
     Example:
 
     !! 
        ++ |     |        |      |
           | foo | !!bar  | quux |
-          | 
+          |
           | foo | baz 
                   bar    | quux |
           | bazfoo
@@ -162,14 +182,16 @@
                 | dsdjks
                          | foo  |
           | dsdsds              |
+
+____
     
-  o Special Formatting:
+  o ..Special Formatting:..
 
     ''  quotemeta
     ##  command         (charblock until EOL)
     ``  shell command   (charblock until EOL)
 
-  o Escaping and Special Characters: 
+  o ..Escaping and Special Characters:..
 
     --   em-dash      (ger. "Gedankenstrich")
     \_   strong blank (prevents line break as in HTML's &nbsp;)
@@ -191,197 +213,26 @@
     ##img ...
     ##<formatierung> [range]
 
-_______________________________________________________________________________
-
  
-1. Scanner erkennt die Intentation, strippt sie
-   weg, berechnet aber durch sie die "schliessenden
-   Klammern" zu den 2d-tags.
-
-2. Scanner erkennt auch die Unterschiede zwischen
-   1d und 2d tags, da der Parser ja keinerlei
-   Unterscheidung treffen kann (spaces/indent nicht mehr da)
-
-3. Scanner hat einen Look-Ahead von 1 Zeile plus
-   ihrem Indent
-
-4. Das Parsen von Headern "(====)" geht einfach:
-   Der Scanner erkennt nur das "^========" und
-   ein Baumtransformator haengt spaeter 
-   die Sohn-Sequenz "<paragraph> x ..... y <header>"
-   um in "<parapraph> <header> x ... y", d.h.
-   der transformator geht bis zum letzten Paraphraph
-   Knoten zurueck.
-
-    If the "text" on hyperlinks is missing in links, the reference is printed
-    instead. For internal links the text is chapter and pagenumber
-    (except for HTML, there exists real hyperlinks).
-
-Stichworte:
-
-Whatever   | Irgendwas
----------- | -----------------------------
-Brain Dump | VHIT (Vom Hirn ins Terminal) 
-Blabla     | ASCII WYSIWYG
-
-Design-Grundsaetze
-------------------
-1. KISS bei der Sprache (Beschreibung geht auf eine Seite und ist ISO-Latin-1!)
-2. KISS bei der Implementierung (Code-Groesse <= 80KB)
-3. Wir implementieren nur das, was wir _WIRKLICH_ brauchen.
-4. Sugar ist wie Unix: Wenige Konzepte existieren und 
-   werden konsequent durchgezogen
-5. Sugar hat *keine* GUI, sondern ist ein Filter!
-   Beispielaufruf:
-   $ cat test.txt | sugar --html -otest.html
-6. Sugar ist stand-alone (bis auf Postscript),
-   man braucht also nicht 1001 Tools bei der Installation
-7. Release early, release often (Eric S. Raymond)
-8. Jedes Markup kann immer eindeutig formuliert werden (=non-magic),
-   nur sieht es dann eventuell nicht so schoen aus.
-   Wenn man sich an bestimmte Regeln haelt, kann man
-   im Magic Mode ASCII-Aesthetik pur nutzen.
-   Non-Magic ist immer nutzbar und aktiviert, Magic-Mode per default an, aber
-   kann abgeschalten werden (per -xx und/oder inline tag) Idee: -xx im
-   Dokument direkt eingeben ala vi/less
-
-Was Sugar nicht ist
--------------------
-1. Sugar ist _keine_ Textverarbeitung oder ein DTP-Tool
-2. Sugar ist keine Markup-Sprache (der Text ist bereits das Endprodukt)
-3. Sugars Brother is more/less and not nroff (i.e. Sugar is fast!)
-
-Anwendungsfeld
---------------
-1. Technische Dokumentation fuer mehrere Darstellungsplatformen:
-   Plain ASCII (= Sugar Quelle), roff/-man (Unix), HTML (= Online), PS (= Print)
-2. Brain Dump!
-
-Optionale Zusatzfeatures
-------------------------
-- ToC: Automatische Generierung 
-- Numerierung von Headern
-- Index
-- Aufrufen von Makroprozessor: m4
-
-Tabellen:
----------
-   o Tabellen sind Bloecke und werden mit ++ eingeleitet wie
-     andere Bloecke auch, d.h. Ende ist bei Ausrueckung oder
-     selber Level.
-   o Jede Tabellenzeile faengt mit einem | an und immer in der selben Spalte.
-   o Die |'s der ersten Zeile geben die Gesamtanzahl und die Normposition
-     der Spalten an.
-   o Besteht die erste Zeile nur aus |'s (und keinem Inhalt), dann
-     ist sie eine _reine_ Normungszeile und erzeugt auch keine Leerzeile.
-     Ansonsten (Zeile 2, ...) kann man so selbstverstanelich eine
-     Leerreihe erzeigen.
-   o Spaltentrennungs-| koennen an belieber Stelle stehen, wenn
-     genuegend da sind. 
-   o Folgespalten sind dadurch gekennzeichnet, dasz ihr | eingerueckt
-     erscheint.
-   o Multicolums liegen vor, wenn weniger |'s auftreten, als die
-     Normungszeile vorgibt.  Die Erkennung der Span's erfolgt dabei ueber die
-     Position der |'s, d.h. sie muessen die |'s der Normunszeile matchen.
-     Zusaetzlich kann die Normungszeile beliebig oft wiederhlt werden.
-     Aber dabei darf sich nur die Position der |'s aendern, aber
-     nicht die Anzahl (klar!).
-   o Leerzeilen bestehen aus nur einem |' am Anfang und sonst nichts.
-   o Normungszielen haben mind.(!) 2 |'s.
-   o Leerzeilen erzeugen im Output soviele |'s wie die Normungszeile
-     vorgibt. Fuer andere Layouting-Dinge muss man z.B. ``| \_'' schreiben.
-   o In einer Tabelle koennen alle Zeichenformatierungen genutzt werden.
-   o In der Normungszeile kann mit den Zeichenformatierung-Tags
-     die Formatierung der Tabellenspalten angegeben werden!
-   
-3. Block-Konzept
-
-   Es gibt zwei Blockkonzepte: 
-     - character block (eindimensional) und 
-     - line block (zweidimensional).
-
-   Der //character block// wird durch das Tag eingeleitet und wieder beendet. Das
-   Paragraph Ende beendet in jedem Fall den character block.
-
-   Der //line block// beginnt mit dem Tag __ausgerückt__, wobei davor keine
-   Leerzeile stehen muß (ein \n und ggf. \s davor reicht). Er enthält ganze
-   Zeilen und zwar solange, wie Text in der Zeile mindestens zwei Leerzeichen
-   weiter rechts beginnt als das einleitende Tag. Achtung: Tags stellen selbst
-   __nicht__ den Zeilenanfang dar! Damit kann ich also line blocks schachteln.
-   (Anders gesagt: Es geht nicht um den linken Rand der Textdatein, sondern um
-   den linken Rand des übergeordneten Line Blocks.)
-
-   Automatischer reflow durch den Editor ist bei character blocks **kein**
-   Problem, da das Tag keine positionsabhängige Bedeutung hat (daher wurde
-   auch verworfen, daß ein Tag am Zeilenanfang, aber nicht ausgerückt, am
-   Zeilenende beendet wird). Das Start-Tag beim line block wird vom Editor
-   nicht versetzt (wenn er was taugt).
-
-   Möglicherweise kann für bestimmte Tags das Ende des char blocks auch das
-   Zeilenende (nicht das Para. Ende) sein. Gedacht ist an Kommandos:
-
-   Gehen sich nach http://laber.lall 
-   Das ist ## eine blöde Zeile und ich will daß ##das## unterstrichen ist
-   ''##das ist ungut##
-     ##das ist intuitiver, bedeutet aber Kommandoende=Zeilenende
-   
-   Das wäre dann eine Eigenschaft des Tags, d.h. es verhält sich dann
-   //immer// so (und nicht mal so und mal anders).
-
-   Beispiele:
-
-   1. ''Dies ist ein Beispiel für einen Text, __in dem der zweite
-        Halbsatz unterstrichen wird__, obwohl er sich über eine
-        Zeilengrenze erstreckt.
-
-   o. ''__In diesem Fall wird der line block unterstrichen. Das
-          geht solange, bis der Text wieder ausgerückt wird.
-
-          Auch Leerzeilen stellen da kein Hindernis dar.
-
-        Diese Zeile beendet den Line Block.
-
-   o. ''Ein Sonderfall: __Dieser Text hat kein Ende-1d-Tag.
-        Er wird dann durch das Paragraph-Ende beendet.
-
-        Ab hier also keine Unterstreichung mehr.
-
-   Das haben wir gemacht, weil sonst bei vergessenen Endetags
-   das Restdokument fehlformatiert wird.
-        
-o  Native-Output-Stuff
-   xxxx
-
-   ``jdjlasdjajlad``
-   skd asdk s
-   dsö ksaölkdaös##
-
-   dfkdjsdal
-     html
-
-   xxxx
-
-   ##endif
-   
-   xxxx
-
-o  Comments
-   ##//
-   ##/*
-   ##*/
-
-5. Inline-Images
-   - Source ist immer Bitmap-Grafik im GIF Format!
-     (Fuer ASCII: gifscii, Fuer HTML: Direkt, Fuer PS: gif2ps)
-
-     ##img xx.gif size=jsjs s=xx
-
-- UNBEDINGT Unicode und UTF-8 unterstutezen von anfang an!
-
-Idea for homogenous tags:
-- any XX tags can be repeated multiple times, ie XXXXX is valid also
-- any begin XX tag at the end of a paragraph wraps around its scope, ie
-  it is applied to the whole paragraph as it would stand at the start
-  of the block (marged out?).
-Results:
-- headlines are marked equally with blocks
+  Sugar Output Formatting
+  -----------------------
+
+  The Sugar transformation tool parses a Sugar source text, transforms
+  it into an internal abstracted syntax tree and finally applies to it a
+  particular output formatting module in order to transform the abstract
+  syntax tree into target markup language. The target language then is
+  either already an end-user document (HTML, Text, etc.) or intended for
+  post-processing by external programs (LaTeX, PDF, etc.).
+
+  The following outputs are supported:
+
+  ++ | sugar output | post-processor(s) | final output         |
+     | Text         | -                 | Text                 |
+     | HTML         | -                 | HTML                 |
+     | Roff         | nroff             | Text                 |
+     | Lout         | lout              | Postscript           |
+     | PDF          | pdflib            | PDF                  |
+     | LaTeX        | latex, dvips      | DVI, Postscript, PDF |
+     | XML          | docbook           | ...                  |
+     | POD          | pod2xxx           | ...                  |
+