HTML Dialect Author: Henrik Mikael Kristensen Date: #date Copyright: 2008 - HMK Design Version: 0.0.5 This is alpha software under development. Features may change drastically during development. =options no-nums ===Introduction The purpose of the HTML dialect is to produce HTML code using a REBOL dialect. There are multiple reasons for this: * Dialect code takes up much less space than HTML and is simpler and easier to write. * Fits both static and dynamic HTML content generation. * It's easier to make dynamic content. * Provide standards compliant HTML, no matter the doctype, using the same dialect code. * REBOL code all the way. No need to intermix REBOL code with HTML. The HTML Dialect is currently in version 0.0.5 and is released under the BSD license. ===Installation In order to use the HTML dialect, include the html.r file in your code and you're ready to go! You will know it's loaded when the ctx-html context exists in memory. ===Usage Primarily this is for usage with a webserver, such as Cheyenne, so there is an output buffer (a plain text string) called out-buffer available. When you want to output the content of it, you can do this in an entirely normal fashion with print, or by saving it or whatever you want to do. You generate HTML with the html-gen or the output-html function. This is similar to the layout function in VID in REBOL/View, if you've tried that. The html-gen function performs the parsing of the dialect and fills out-buffer with HTML code. Example: >> html-gen [=== test ["my code"]] == true >> out-buffer == "my code" Every time you add a word, string, element or other piece of dialect code inside the dialect, it's run through html-gen and appended to the end of out-buffer without spacing. >> html-gen ["more code"] == true >> out-buffer == "my codemore code" html-gen accepts a variety of datatypes and can therefore be used to generate small bits of HTML code, output a single char or even nothing, if your input is none!. html-gen uses itself extensively inside its own parser for this purpose to minimize code size and allow dialect recursion. Example: html-gen "Text" Will append the following to out-buffer: Text This code: html-gen 'a Will append the following to out-buffer: a This code: html-gen [tag [shout till noon]] Will append the following to out-buffer: The output-html function wraps html-gen. The function first clears the out-buffer, then generates the HTML, then copies the out-buffer, then clears the out-buffer again and finally returns the copy. This function is mostly useful, if you want to see output directly returned to the console, or if you generate entire pages without appending content manually in the out-buffer. Example: >> output-html [=== test ["my code"]] == "my code" >> output-html ["more code"] == "more code" Both html-gen and output-html accept none!, string!, tag!, file!, url!, number!, time!, date!, get-word!, word! and block! as input. If you want to generate multiple pages in sequence with html-gen, use clear out-buffer between generating pages with html-gen, or use the output-html function directly. For the examples below we will use output-html for simplicity. ---Full Page Example for generating one full page: output-html [page "My Page Title" ["This is my Webpage"]] That line produces the following HTML code (indentation used here for clarity, it is not present in the actual output): My Page Title This is my Webpage If you want to include a style sheet to the basic page, you can add it as a parameter to the page command: output-html [page "My Page Title" css style.css ["This is my Webpage"]] Produces: My Page Title This is my Webpage ---Enclosing Tags and Recursion The HTML dialect supports recursing tags in as many levels as you want. You can for example use enclosing tags with the the === command, tag and end-tag, or directly with a word for some tags: output-html [=== p [=== pre [=== tt [=== a [Hello] ]]]] == "

Hello

" output-html [ tag [p] tag [pre] tag [tt] tag [a] "Hello" end-tag end-tag end-tag end-tag ] == "

Hello

" output-html [p [pre [tt [a Hello]]]] == "

Hello

" The HTML dialect tracks which tags were inserted using the tag command, but not the === command or when using words directly. Whenever that happens, any tag that does not exist in the single-tags block inside the dialect context is tracked internally. Whenever you output an end-tag command, the last used tag is placed into the out-buffer. For the following examples, consider that >br<, >hr< and >img< are in the single-tags block: Examples: output-html [tag [p] tag [hr] end-tag] == "


" If you attempt to put an end-tag where there are no more end-tags to track, an error is returned: output-html [tag [p] tag [hr] end-tag end-tag] ** Script Error: Out of range or past end ** Where: html-gen ** Near: out close-tag either block? last Tags are tracked across multiple uses of html-gen, so if you don't end a tag correctly in one use of html-gen, subsequent uses of it will also contain errors. Furthermore, the HTML dialect can't track if your complete page contains too few end-tags. The only way is to check if ctx-html/end-tags is empty at the end of page generation. ---HTML Output Some things about the output: *There are never spaces between uses of html-gen, so any spaces that need to be there, must be added by you. *The output is always a string. *The output may not be very readable, as html-gen does not add newlines or indentations to the HTML code. ---CSS Styles Any tag can be followed by an optional issue! which describes a CSS class used for this style. If the issue! is not added, the style is not included for that tag. Example: output-html [div #headline "Hello"] == {
Hello
} ---HTML Doctypes The HTML dialect supports most available versions of the HTML specs, though not yet adhering to them 100%. When including this on the web page, the !DOCTYPE tag is automatically included at the top of the webpage. Single standing tags in XHTML 1.0 and upwards are always postfixed with a /. The following types are supported: html-2.0-dtd html-3.2-dtd html-4.01-strict html-4.01-transitional html-4.01-frameset xhtml-1.0-strict xhtml-1.0-transitional xhtml-1.0-frameset xhtml-1.0-dtd xhtml-basic-1.0-dtd xhtml-basic-1.1-dtd mathml-1.01-dtd xhtml-mathml-svg-dtd svg-1.0-dtd svg-1.1-full-dtd svg-1.1-basic-dtd svg-1.1-tiny-dtd They are all stored in the doc-types block, which is used in the HTML dialect. To switch the HTML version, just use it before any code: output-html [html-4.01-strict tag [br]] == {
} output-html [xhtml-1.0-strict tag [br]] == {
} ===Dynamic Content The use of get-word! types in code will automatically get a word from the global context, or whatever context the dialect code block is bound to at HTML generation time. This is a global rule. a: "my string" output-html [p [:a]] == "

my string

" ===Forms Creating forms with the HTML dialect is very straight forward: You add form elements and a submit button, and then when you submit the form, the server receives them via POST. The HTML dialect poses some intentional limitations on forms for simplifying the form system: * All HTML dialect forms send via the POST method. * There are no per-field settings yet, such as maximum field size. * There is no scheme yet for form validation, neither server- nor client-side. A form is created by stating the form action, name and possible default input values through an object. The form code is enveloped in a block. Each field element describes its associated name as a word! value. This method is fine, if you don't want default data to be put in the form or expect to revisit the form in a validation process, or if you are creating a form to be used on a completely static HTML page. Example: form submit.rsp [ div #label "Name" [field name] div #label "Address" [field address] div [submit "Submit Form"] ] Produces:
Name
Address
The names used for each field in the form are the same as those used in the dialect. ---Using the Form Object If you use a get-word! in the form specifications, you will be able to attach an external object to the form. The object must already exist with the required content. This is a better method than the above one, if you desire to recreate the form with its existing data, or you wish default data to be put in the form. While the job of the HTML dialect finishes when rendering the HTML code, it means you can essentially tie form data to a fixed object that simply updates its values when you submit form data to the server, granted that you must write this part yourself. The ctx-html internal value is form-object, which is by default none. The added benefit is that the HTML dialect will let you auto-refill the form with the stored values in the object, when the page needs to be rendered again. The form object is stored internally in the ctx-html context as form-object. Example, showing a pre-existing form object, that has the same words as used in the form: form-data: make object! [ name: "Luke Lakeswimmer" address: "Tatooine Rebol Base" ] And in the form dialect code, we include form-data: form submit.rsp :form-data [ div #label "Name" [field name] div #label "Address" [field address] div [submit "Submit Form"] ] Produces:
Name
Address
You can also create the form-data object inline in the dialect code or as a block of key/value pairs. ;===Highlighting ;It's possible to make automatic highlighting of words through a small sub dialect. This dialect will wrap specific words or sentences that are input in their entirety to html-gen. This is useful if you for example have developers' documentation (like this document) and want to highlight all commands in the text to make them stand out from the remaining text. The highlighting method involves enclosing the given text in a >span< tag that refers to a class called highlight. Example: ; output-html [highlight add "the" "The brown fox jumps over the dog."] ; == "The brown fox jumps over the dog." ;The highlighting will continue, even across multiple uses of html-gen or output-html until the highlighting setup list is changed or cleared. ;The default tag for this is the highlight class, but you can define your own: ; output-html [highlight add "the" using hl "The brown fox jumps over the dog."] ; == "The brown fox jumps over the dog." ;Remember, this is done in a single-pass operation, so if the words requested for highlight occur over several calls to html-gen, then the highlighting will not occur. Also when using multiple highlights inside eachother, only the outer highlight will be used. ;You can define multiple highlights by using the add subcommand, since all highlight setups are stored in a list. There is therefore also a remove and clear subcommand to, obviously, remove and clear the highlight list. ;Examples: ; highlight add "output-html" ; highlight add "html-gen" ; ... display text here ... ; highlight remove ; highlight add ; ... display text here ... ; highlight clear ; highlight add ===Dialect Rule Reference The following rule-sets describe how the dialect parser works. The entire parser is built from the rule blocks or sets below. Actions in the code (the parentheses), are left out for clarity. We start with the types and work our way from the low-level rules and up toward the main parser. ---block-types This describes all block types except for path! block-types: [block! | hash! | list!] ---value-types This rule defines the datatypes that describe values directly, such as numbers, strings and urls. Tags are purposely disallowed. value-types: [ money! | binary! | number! | date! | time! | tuple! | url! | email! | file! | any-string! | char! | pair! ] ---cell-types This rule defines the allowed datatypes for input parameters for most tags. It should be the same types as allowed in html-gen. cell-types: [ ['do block-types] | [ value-types | block-types | datatype! | word! | get-word! | lit-word! | path! | lit-path! | refinement! | logic! ] ] ---href-types These rules define URL inputs for use in page-rules. href-types: [['do block-types] | [word! | url! | string! | path! | refinement!]] ---doc-types This rule is used as a word-rule. doc-types: [ html-2.0-dtd | html-3.2-dtd | html-4.01-strict | html-4.01-transitional | html-4.01-frameset | xhtml-1.0-strict | xhtml-1.0-transitional | xhtml-1.0-frameset | xhtml-1.0-dtd | xhtml-basic-1.0-dtd | xhtml-basic-1.1-dtd | mathml-1.01-dtd | xhtml-mathml-svg-dtd | svg-1.0-dtd | svg-1.1-full-dtd | svg-1.1-basic-dtd | svg-1.1-tiny-dtd ] ---verbatim-rules These rules are output directly as they are input (verbatim). They are the first rules in the HTML dialect, as the input is not processed at all. verbatim-rules: [ value-types | tag! | lit-word! | path! | lit-path! | refinement! | datatype! | logic! ] ---eval-rules These rules are used for the TAG command. eval-rules: [any ['do block-types | any-type!]] ---base-rules These rules produce various types of common tags and links and are used as base for higher levels of rules. base-rules: [ '=== word! opt ['opts block-types] cell-types | 'tag into eval-rules | 'end-tag | 'do block-types | block-types ] ---link-rules These are the rules used with the at command to produce links using various input formats. link-rules: [ 'at [ 'page word! | 2 cell-types any [ 'vars [block! | object! | get-word!] | 'words block! ] ] ] ---image-rules These are the rules for producing image links. image-rules: [ image cell-types ] ---table-rules These are the main rules for building an HTML table. It uses several sub-rules which are described below. table-rules: [ 'table opt issue! any [ ['rows [get-word! | into table-row-rules | into table-block-rules]] | ['format block! 'rows [get-word! | into table-format-rules]] ] ] ---table-row-rules These rules define how a single row in a table can be shaped. table-row-rules: [ any [ 'row any [ ['cell | 'header] any [ 'colspan integer! | 'align word! | 'width integer! opt 'percent | 'class word! ] [none! | cell-types] ] ] ] ---table-cell-rule This table rule is used to generate a table cell, where there are multiple columns per row in the input data, or the input data consists of objects. table-cell-rule: cell-types ---table-row-rule This table rule is used to generate a table cell, where there is only one column per row in the input data and the input data does not consist of objects. table-row-rule: cell-types ---table-format-rules These rules are used after the format command and are identical in structure to table-block-rules, however when using blocks of blocks or plain blocks as input, the formatting is ignored. table-format-rules: [ any [object! | into [any table-cell-rule] | table-row-rule] ] ---table-block-rules These rules are used after the rows command without using format first. This means objects are just output cell by cell. The data row is parsed the same way as the formatting row. table-block-rules: [ any [object! | into [any table-cell-rule] | table-row-rule] ] ---tag-rule These rules are used in cases where a normal HTML tag is wanted. It can be used recursively. tag-rule: [ [ 'html | 'head | 'title | 'body | 'p | 'strong | 'em | 'b | 'i | 'u | 'tt | 'big | 'small | 'strike | 'del | 'pre | 'ul | 'il | 'li | 'sup | 'sub | 'samp | 'code | 'blockquote | 'q | 'kbd | 'var | 'cite | 'tr | 'th | 'td | 'table | 'a | 'div | 'span | 'dl | 'dt | 'dd | 'h1 | 'h2 | 'h3 | 'h4 | 'h5 | 'h6 ] opt issue! opt 'id word! [tag-rule | get-word! | cell-types | ()] ] ---tag-rules It looks redundant here, but in the source code, this rule collects all tags properly from recursive runs of tag-rule and generates the required HTML code. tag-rules: tag-rule ;---style-rules ; ;These rules manage style representations of larger blocks of code. ; ; style-rules: [ ; 'style word! block-types ; ] ---loop-rules These rules produce loops, and allow traversing data blocks either wholly or partially. loop-rules: [ 'loop integer! block-types opt ['alternate block-types] | 'traverse [block-types | get-word!] opt ['using [word! | 'lit-word | get-word! | block-types]] block-types opt ['alternate block-types] ] ---format-rules These rules allow special formatting parsers for text. The rules are meant to be extensible later, and are not really useful now. format-rules: ['format word! cell-types] ---form-rules These rules produce form tags and are considered a higher level of rules. They also manage the form content, either from words or a specific form object. form-rules: [ 'form cell-types opt [get-word! | ['vars | object!]] cell-types | 'textarea word! | ['field | 'button | 'hidden | 'password] word! | 'checkbox word! | 'radio word! | 'select word! opt ['values | 'key-values] cell-types | ['submit | 'reset] string! ] ---page-rules These rules produce the outer skeleton of the webpage by providing the HEAD and BODY section. page-rules: [ 'page cell-types any [ 'redirect href-types integer! | 'favicon href-types | 'charset [string! | word!] | 'css href-types | 'description string! | 'robots some [ 'noindex | 'index | 'nofollow | 'follow | 'noarchive | 'nosnippet | 'noodp | 'noydir ] | 'script href-types | 'meta ['name | 'http-equiv] 2 cell-types ] block-types ] ---error-rules These rules (actually only one rule for now) are used for handling and printing errors generated by the parser during HTML creation. They will be extended later to become more useful. error-rules: 'errors ---word-rules The word rules are used for lists of words that are either dynamic to the parser, i.e. lists of words that are built during parsing or are built into the HTML dialect, such as the word list for doc-types. word-rules: [doc-types | get-word! | word!] ---all-rules These rules are the collection of all the above mentioned rules. These rules are used directly by the parser, and you can see here in which order they are evaluated. all-rules: [ any [ verbatim-rules | base-rules | link-rules | image-rules | table-rules | tag-rules | loop-rules | format-rules | form-rules | page-rules | error-rules | word-rules ] ] ===Dialect Command Reference --- page Produces the outer skeleton for a webpage with correct HTML tags and DOCTYPE. Parsed as: 'page cell-types block-types Example: page "My Page" [] Produces: My Page page supports a range of subcommands. The page command supports using the subcommands as many times as you want, before creating the main page in a block. +++ redirect This allows setting a 302 redirect for a page along with the number of seconds to wait before redirecting. Parsed as: 'redirect href-types integer! Example: page "Wrong page" redirect http://foo.com 5 ["Redirecting to foo.com in 5 seconds"] Produces: Wrong page Redirecting to foo.com in 5 seconds +++refresh This is an alias for redirect and does exactly the same thing. Parsed as: 'refresh href-types integer! +++ favicon This sets the favicon for the webpage. It does not generate a favicon, but only links to an .ico file stored on the server. Parsed as: 'favicon href-types Example: page "My page" favicon icon.ico ["My web page"] Produces: My page My web page +++charset This allows you to set the charset for the webpage. A full list of charsets is given here. Note that REBOL 2 does not support Unicode, so text will be encoded as plain ASCII. Therefore you should be careful when selecting UFT-8 encoding. There is no standard selected charset. Unicode will be supported with REBOL 3. Parsed as: 'charset [string! | word!] Example: page "My Page" charset utf-8 ["My Unicode Webpage."] Produces: My Page My Unicode Webpage. +++ description This sets the description of the webpage for use by search engines. Parsed as: 'description string! Example: page "My Page" description "A great page" ["My page"] Produces: My Page My page +++ robots This manages settings for a webpage with respect to how webcrawlers and search engines like Google and Yahoo see them. There are several settings available, each one set as a block of words. Beware that not all webcrawlers adhere to your robots settings and malware is certain to ignore them. This subcommand does not generate a robots.txt file. Parsed as: 'robots into [ some [ 'noindex | 'index | 'nofollow | 'follow | 'noarchive | 'nosnippet | 'noodp | 'noydir ] ] :noindex - Tells the search engine not to index this page. :index - Tells the search engine to specifically index this page. This is default. :nofollow - Tells the search engine not to follow links on this page for indexing. :follow - Tells the search engine to specifically follow links on this page for indexing. This is default. :noarchive - Tells Google not to store a cached copy of this page. :nosnippet - Tells Google not to display a text snippet under the listing. :noodp - Tells Google, MSN and Yahoo not to use search information gathered from the Open Directory Project. :noydir - Tells Yahoo not to use search information from the Yahoo Directory. Example: page "My Page" robots [index nofollow] ["My page"] Produces: My Page My page +++ css This includes a CSS stylesheet file in the webpage. It neither produces CSS styles within the webpage nor produces a CSS file. The CSS file must already exist. You can include as many CSS stylesheet files as you want. Parsed as: 'css href-types Example: page "My Page" css style.css css menu.css ["My Webpage"] Produces: My Page My Webpage +++ script This includes javascript files in the webpage. It neither creates javascript code inside the webpage nor produces a separate javascript file. The javascript file must already exist. Parsed as: 'script href-types Example: page "My Page" script menu.js ["My Webpage"] Produces: