|
Scripts > Token definitions > Regular expressions > Character classes
Character classes are denoted using the syntax "[:classname:]" within a set declaration, for example "[[:space:]]" is the set of all white space characters.
[[:digit:],] is the set of all digit and the comma.
The available character classes are:
alnum
|
Any alpha numeric character; alpha and digit (*)
|
alpha
|
Any alphabetical character a-z and A-Z, umlauts etc. (*)
|
blank
|
Any blank character, either a white space, a non-breaking space (decimal 160) or a tab
|
cntrl
|
Any control character
|
digit
|
Any digit 0-9
|
graph
|
Any graphical character; all other except cntrl
|
lower
|
Any lower case character a-z (*)
|
print
|
Any printable character, graph and blank
|
punct
|
Any punctuation character
|
space
|
Any white space character (space, tabulator, carriage return, line feed... )
|
upper
|
Any upper case character A-Z (*)
|
xdigit
|
Any hexadecimal digit character, 0-9, a-f and A-F
|
word
|
Any word character - all alphanumeric characters plus the underscore (*)
|
(*) according to the local settings on your computer other characters might be recognized too. Try it in the dialog for the calculation of character classes!
There are some shortcuts that can be used in place of the character classes
\w
|
[:word:]
|
\W
|
^[:word:]
|
\s
|
[:space:]
|
\S
|
^[:space:]
|
\d
|
[:digit:]
|
\D
|
^[:digit:]
|
\l
|
[:lower:]
|
\L
|
^[:lower:]
|
\u
|
[:upper:]
|
\U
|
^[:upper:]
|
|