|
"http://www.w3.org/TR/REC-html40/strict.dtd">
Let's put together a more interesting program. This time we test whether a string fits a description, encoded into a concise pattern. There are some characters and character combinations that have special meaning in these patterns, including:
The common term for patterns that use this strange vocabulary is regular expressions. In ruby, as in Perl, they are generally surrounded by forward slashes rather than double quotes. If you have never worked with regular expressions before, they probably look anything but regular, but you would be wise to spend some time getting familiar with them. They have an efficient expressive power that will save you headaches (and many lines of code) whenever you need to do pattern matching, searching, or other manipulations on text strings. For example, suppose we want to test whether a string fits this
description: "Starts with lower case f, which is immediately followed
by exactly one upper case letter, and optionally more junk after that,
as long as there are no more lower case characters." If you're an
experienced C programmer, you've probably already written about a
dozen lines of code in your head, right? Admit it; you can hardly
help yourself. But in ruby you need only request that your string be
tested against the regular expression How about "Contains a hexadecimal number enclosed in angle brackets"? No problem.
Though regular expressions can be puzzling at first glance, you will quickly gain satisfaction in being able to express yourself so economically. Here is a little program to help you experiment with regular
expressions. Store it as
The program requires input twice, once for a string and once for a regular expression. The string is tested against the regular expression, then displayed with all the matching parts highlighted in reverse video. Don't mind details now; an analysis of this code will come soon.
What you see above as red text will appear as reverse video in the program output. The "~~~" lines are for the benefit of those using text-based browsers. Let's try several more inputs.
If that surprised you, refer to the table at the top of this page:
What if there is more than one way to correctly match the pattern?
Here is a pattern to isolate a colon-delimited time field.
"
|