| 
       Token  | 
    Top Previous Next | 
| 
 Examples > Text statistics > Token 
 In the project options all ignorable characters are deactivated. So the set of token must recognize all parts of a text, linefeeds and spaces included. 
 So a text consists of 
 WORD words NUMBER numbers ABBREVIATION abbreviations CONTINUATION sequences of dots like "..." LINEFEED linefeeds SENTENCE_END ends of sentences (dot, exclamation and question mark) SPECIAL_CHAR the rest of characters 
 In the actions of the tokens the counter are actualized. For example the WORD action: 
 m_iWords++; m_iChars += xState.length(); 
 Here the counter for words is augmented by one and the counter for characters is augmented by the number of characters, of which the word consists. 
 A little bit more complicated is the action of the token ABBREVIATION: (\w+)\. 
 if(xState.length() > 2 && !m_mAbbr.findKey(xState.str(1))) m_iSentences++; 
 m_iWords++; m_iChars += xState.length(); 
 If the recognized text consists of a single letter followed by a dot or if the text preceding the dot is found in the list of abbreviations, the recognized text is interpreted as an abbreviation. Otherwise the dot marks the end of a sentence and the sentence counter is incremented. 
 
  | 
| 
       This page belongs to the TextTransformer Documentation  | 
    Home Content German |