HTML Dialect Rule Reference
Author: Henrik Mikael Kristensen Date: 11-Aug-2010 Copyright: 2010 - HMK Design Version: 0.0.8
Introduction
The following rule-sets describe how the dialect parser works. The entire parser is built from the rule blocks or sets below. Actions in the code (the parentheses), are left out for clarity. We start with the types and work our way from the low-level rules and up toward the main parser.
Data Types
block-types
This describes all block types except for path!
block-types: [block! | hash! | list!]
value-types
This rule defines the datatypes that describe values directly, such as numbers, strings and urls. Tags are purposely disallowed.
value-types: [ money! | binary! | number! | date! | time! | tuple! | url! | email! | file! | any-string! | char! | pair! ]
cell-types
This rule defines the allowed datatypes for input parameters for most tags. It should be the same types as allowed in html-gen.
cell-types: [ ['do block-types] | get-word! | [ value-types | block-types | datatype! | word! | lit-word! | path! | lit-path! | refinement! | logic! ] ]
href-types
These rules define URL inputs for use in page-rules.
href-types: [ ['do block-types] | get-word! | [word! | url! | string! | path! | refinement!] ]
event-types
These rules describe events allowed when submitting a form for use in form-rules.
event-types: [string!]
doc-types
This rule is used as a word-rule.
doc-types: [ html-2.0-dtd | html-3.2-dtd | html-4.01-strict | html-4.01-transitional | html-4.01-frameset | xhtml-1.0-strict | xhtml-1.0-transitional | xhtml-1.0-frameset | xhtml-1.0-dtd | xhtml-basic-1.0-dtd | xhtml-basic-1.1-dtd | mathml-1.01-dtd | xhtml-mathml-svg-dtd | svg-1.0-dtd | svg-1.1-full-dtd | svg-1.1-basic-dtd | svg-1.1-tiny-dtd ]
Rules
Small Rules
Some base rules that are used as elements in larger rules below.
set-val: [get-word! set-class: [some refinement!] set-opt-class [set-class | ()] set-id [issue!]
verbatim-rules
These rules are output directly as they are input (verbatim). They are the first rules in the HTML dialect, as the input is not processed at all.
verbatim-rules: [ value-types | tag! | lit-word! | path! | lit-path! | refinement! | datatype! | logic! ]
eval-rules
These rules are used for the TAG command.
eval-rules: [any ['do block-types | any-type!]]
base-rules
These rules produce various types of common tags and links and are used as base for higher levels of rules.
base-rules: [ '=== word! opt ['opts block! cell-types] cell-types | 'tag into eval-rules | 'end-tag | block-types | 'do block-types | ['newline | 'crlf] ]
link-rules
These are the rules used with the at command to produce links using various input formats.
link-rules: [ 'at [ 2 cell-types any [ 'vars [block! | object! | get-word!] | 'words block! ] ] ]
image-rules
These are the rules for producing image references.
image-rules: [ image cell-types ]
table-rules
These are the main rules for building an HTML table. It uses several sub-rules which are described below.
table-rules: [ 'table opt 'debug [0 2 [set-class | set-id]] any [ [ 'format any [row-format-rule block-types] any [ 'rows [ set-val | into table-format-rules | table-format-rules ] ] ] | [ 'rows [ get-word! | into table-row-rules | table-row-rules | into table-block-rules ] ] ] ] ]
row-format-rule
This rule is used to determine the type of row used for a particular format block.
row-format-rule: [ opt [ 'first | 'even-last | 'odd-last | 'last | 'odd | 'even | 'any ] ]
table-row-rules
These rules define how a single row in a table can be shaped.
table-row-rules: [ some [ 'row any [ ['cell | 'header] any [ 'colspan integer! | 'align word! | 'width integer! opt 'percent | set-class | set-id ] [none! | cell-types] ] ] ]
table-cell-rule
This table rule is used to generate a table cell, where there are multiple columns per row in the input data, or the input data consists of objects.
table-cell-rule: cell-types
table-row-rule
This table rule is used to generate a table cell, where there is only one column per row in the input data and the input data does not consist of objects.
table-row-rule: cell-types
table-format-rules
These rules are used after the format command and are identical in structure to table-block-rules, however when using blocks of blocks or plain blocks as input, the formatting is ignored.
table-format-rules: [ any [object! | into [any table-cell-rule] | table-row-rule] ]
table-block-rules
These rules are used after the rows command without using format first. This means objects are just output cell by cell. The data row is parsed the same way as the formatting row.
table-block-rules: [ any [object! | into [any table-cell-rule] | table-row-rule] ]
tag-rule
These rules are used in cases where a normal HTML tag is wanted. It can be used recursively.
tag-rule: [ [ 'html | 'head | 'title | 'body | 'p | 'strong | 'em | 'b | 'i | 'u | 'tt | 'big | 'small | 'strike | 'del | 'pre | 'ul | 'il | 'li | 'sup | 'sub | 'samp | 'code | 'blockquote | 'q | 'kbd | 'var | 'cite | 'tr | 'th | 'td | 'table | 'a | 'div | 'span | 'dl | 'dt | 'dd | 'h1 | 'h2 | 'h3 | 'h4 | 'h5 | 'h6 ] 0 2 [set-class | set-id] opt ['id cell-types] [tag-rule | get-word! | cell-types | ()] ]
tag-rules
It looks redundant here, but in the source code, this rule collects all tags properly from recursive runs of tag-rule and generates the required HTML code.
tag-rules: tag-rule
loop-rules
These rules produce loops, and allow traversing data blocks either wholly or partially.
loop-rules: [ 'loop integer! block-types opt ['alternate block-types] | 'traverse [block-types | 'do block-types | get-word!] opt ['using [lit-word! | word! | block-types | get-word!]] block-types opt ['alternate block-types] ]
text-format-rules
These rules allow special formatting parsers for text. The rules are meant to be extensible later, and are not really useful now.
text-format-rules: [ 'format [word! function! | 'type word! cell-types | cell-types] ]
form-rules
These rules produce form tags and are considered a higher level of rules. They also manage the form content, either from words or a specific form object.
form-rules: [ 'form cell-types opt [get-word! | ['vars | object!]] opt ['onsubmit event-types] cell-types | 'textarea word! | ['text | 'hidden | 'password] word! | 'checkbox word! | 'radio word! cell-types | 'select word! ['values | 'key-values |Ê()] cell-types | 'button word! string! | ['submit | 'reset | 'button] string! ]
page-rules
These rules produce the outer skeleton of the webpage by providing the HEAD and BODY section.
page-rules: [ 'page cell-types any [ ['redirect | 'refresh] href-types integer! | 'favicon href-types | 'charset [string! | word!] | 'description string! | 'robots into [ some [ 'noindex | 'index | 'nofollow | 'follow | 'noarchive | 'nosnippet | 'noodp | 'noydir ] ] | 'css href-types | 'rss href-types string! | 'atom href-types string! | 'script href-types | 'style string! | 'meta ['name | 'http-equiv] 2 cell-types ] block-types ]
error-rules
These rules (actually only one rule for now) are used for handling and printing errors generated by the parser during HTML creation. They will be extended later to become more useful.
error-rules: 'errors
word-rules
The word rules are used for lists of words that are either dynamic to the parser, i.e. lists of words that are built during parsing or are built into the HTML dialect, such as the word list for doc-types.
word-rules: [ doc-types | set-word! [word! | value-types! | block-types] | get-word! | word! ]
all-rules
These rules are the collection of all the above mentioned rules. These rules are used directly by the parser, and you can see here in which order they are evaluated.
all-rules: [ any [ verbatim-rules | base-rules | link-rules | image-rules | table-rules | tag-rules | loop-rules | text-format-rules | form-rules | page-rules | error-rules | word-rules ] ]