Comments and processing instructions

Top  Previous  Next

Examples > XML > Comments and processing instructions

 

There still is a number of flaws in ISO_XML.ttp which shall be removed now in the transition to XML.ttp.

 

The superfluous productions and tokens are removed
Cryptic abbreviations such as "PI" are replaced by more meaningful names: ProcInstr (= Processing Instructions)
Comments and processing instructions are dealt with as a part of the ignored characters or as inclusions.

 

The last point is for didactic purposes. It would be in other grammars of a greater use than for XML, where these inclusions may happen only in places specified exactly.

 

Comments and processing instructions have a special role: they can be included in many places in the document without changing the data, which are transported by the XML dorcument to an apllication; they are containing additional informations.

The comments are meant for the human reader and can be ignored by the application.

 

The regular expression for the comments can be combined with the other ignorable characters into a common expression

 

(\s|<!--([^-]|-+[^->]|->)*-+->)+

 

Processing instructions contain information for external applications - e.g. complete php scripts can be embedded here - and can be put as an inclusion production.

 

The new expression and the inclusion production can be put in the global project options. (ProcInstr then must be removed in the local options of itself). The parser then tolerates XML documents, though, where e.g. a comment occurs inside of a tag.

If such an occurrence shall cause a fault, the productions must be changed so, that their local options can be modified so, that exactly the permissible occurrences of comments and processing instructions are parsed. Whether there are characters to exclude or whether an inclusion follows, always is checked with the determination of the next token. So the local options of a production are effective as soon as within the production a new token is looked up. Since e.g. content can start with comment, the token, which is the last token before a comment in the XML syntax, must be the first token of a production, which checks for comments.

Therefore the additional production element_content is defined and analogously the additional production doctypedecl_core. The local options for the following productions are adapted so that comments and processing instructions are recognized in them.

 

 

content ::= ( element | CharData | "]]>" EXIT | Reference | CDSect )*

element_content ::=  content ETag

element_end ::=  "/>" | ">" element_content

 

doctypedecl_core ::= "[" ( markupdecl | PEReference )*

 

prolog ::= XMLDecl? doctypedecl?

 

Please notize that comments and processing instructions are also recognized in and after prolog production since the successors of production calls are checked explicitly too. 

The element production also is changed a little now. However, no local options are put in it.

 

element ::= "<" Name Attribute* element_end

 

 

 

 

 

 

 

 

 

 

 

 



This page belongs to the TextTransformer Documentation

Home  Content  German