Selenium testing | Matching Text Patterns
Matching Text Patterns
Like locators, patterns are a type of parameter frequently required by Selenese commands. Examples of commands which require patterns are verifyTextPresent, verifyTitle, verifyAlert, assertConfirmation, verifyText, and verifyPrompt. And as has been mentioned above, link locators can utilize a pattern. Patterns allow you to describe, via the use of special characters, what text is expected rather than having to specify that text exactly.
There are three types of patterns: globbing, regular expressions, and exact.
Globbing Patterns
Most people are familiar with globbing as it is utilized in filename expansion at a DOS or Unix/Linux command line such as ls *.c
. In this case, globbing is used to display all the files ending with a .c
extension that exist in the current directory. Globbing is fairly limited. Only two special characters are supported in the Selenium implementation:
* which translates to “match anything,” i.e., nothing, a single character, or many characters.
[ ] (character class) which translates to “match any single character found inside the square brackets.” A dash (hyphen) can be used as a shorthand to specify a range of characters (which are contiguous in the ASCII character set). A few examples will make the functionality of a character class clear:
[aeiou]
matches any lowercase vowel
[0-9]
matches any digit
[a-zA-Z0-9]
matches any alphanumeric character
In most other contexts, globbing includes a third special character, the ?. However, Selenium globbing patterns only support the asterisk and character class.
To specify a globbing pattern parameter for a Selenese command, you can prefix the pattern with a glob: label. However, because globbing patterns are the default, you can also omit the label and specify just the pattern itself.
Below is an example of two commands that use globbing patterns. The actual link text on the page being tested was “Film/Television Department”; by using a pattern rather than the exact text, the click command will work even if the link text is changed to “Film & Television Department” or “Film and Television Department”. The glob pattern’s asterisk will match “anything or nothing” between the word “Film” and the word “Television”.
Command | Target | Value |
---|---|---|
click | link=glob:Film*Television Department | |
verifyTitle | glob:*Film*Television* |
The actual title of the page reached by clicking on the link was “De Anza Film And Television Department – Menu”. By using a pattern rather than the exact text, the verifyTitle
will pass as long as the two words “Film” and “Television” appear (in that order) anywhere in the page’s title. For example, if the page’s owner should shorten the title to just “Film & Television Department,” the test would still pass. Using a pattern for both a link and a simple test that the link worked (such as the verifyTitle
above does) can greatly reduce the maintenance for such test cases.
Regular Expression Patterns
Regular expression patterns are the most powerful of the three types of patterns that Selenese supports. Regular expressions are also supported by most high-level programming languages, many text editors, and a host of tools, including the Linux/Unix command-line utilities grep, sed, and awk. In Selenese, regular expression patterns allow a user to perform many tasks that would be very difficult otherwise. For example, suppose your test needed to ensure that a particular table cell contained nothing but a number. regexp: [0-9]+
is a simple pattern that will match a decimal number of any length.
Whereas Selenese globbing patterns support only the * and [ ] (character class) features, Selenese regular expression patterns offer the same wide array of special characters that exist in JavaScript. Below are a subset of those special characters:
PATTERN | MATCH |
---|---|
. | any single character |
[ ] | character class: any single character that appears inside the brackets |
* | quantifier: 0 or more of the preceding character (or group) |
+ | quantifier: 1 or more of the preceding character (or group) |
? | quantifier: 0 or 1 of the preceding character (or group) |
{1,5} | quantifier: 1 through 5 of the preceding character (or group) |
| | alternation: the character/group on the left or the character/group on the right |
( ) | grouping: often used with alternation and/or quantifier |
Regular expression patterns in Selenese need to be prefixed with either regexp:
or regexpi:
. The former is case-sensitive; the latter is case-insensitive.
A few examples will help clarify how regular expression patterns can be used with Selenese commands. The first one uses what is probably the most commonly used regular expression pattern–.* (“dot star”). This two-character sequence can be translated as “0 or more occurrences of any character” or more simply, “anything or nothing.” It is the equivalent of the one-character globbing pattern * (a single asterisk).
Command | Target | Value |
---|---|---|
click | link=regexp:Film.*Television Department | |
verifyTitle | regexp:.*Film.*Television.* |
The example above is functionally equivalent to the earlier example that used globbing patterns for this same test. The only differences are the prefix (regexp: instead of glob:) and the “anything or nothing” pattern (.* instead of just *).
The more complex example below tests that the Yahoo! Weather page for Anchorage, Alaska contains info on the sunrise time:
Command | Target | Value |
---|---|---|
open | http://weather.yahoo.com/forecast/USAK0012.html | |
verifyTextPresent | regexp:Sunrise: *[0-9]{1,2}:[0-9]{2} [ap]m |
Let’s examine the regular expression above one part at a time:
Sunrise: * |
The string Sunrise: followed by 0 or more spaces |
[0-9]{1,2} |
1 or 2 digits (for the hour of the day) |
: |
The character : (no special characters involved) |
[0-9]{2} |
2 digits (for the minutes) followed by a space |
[ap]m |
“a” or “p” followed by “m” (am or pm) |
It’s rather unlikely that most testers will ever need to look for an asterisk or a set of square brackets with characters inside them (the character class for globbing patterns). Thus, globbing patterns and regular expression patterns are sufficient for the vast majority of us.