We learned in the last section about a few meta characters such as ".","+","?", and "*". Each of these are meta characters in regEx syntax. So how do we match them? Suppose you want to match a string which contains a number, then the plus (+) sign followed by a number. The regEx for that would be...
/\d\+\d/
The "\d" matches a single digit, the \ before the + tells it to look for the actual + sign, then another \d for a digit. The basic idea is, to match any meta character, use a backslash (\) before the character.
So far we've learned about grouping characters and meta characters in regular expression syntax, but how do we group strings of characters? For example, what if we wanted to check if a string contained either "hello" or "hi"? How would we do that?
Consider the following:
/hello\sJoe|Bob/
This will match the strings "Hello Joe" and "Hello Bob". The | (pipe) is the "or" operator much like || is the or operator in many progamming languages. What do you think the following will match:
/\.(net|com|edu|mil|gov)/
Ok, so above we have a simple regEx that will validate for a top-level domain such as .com in a string. But, this will also match "joe.company". For these situations, we have a few more meta characters. They are:
/b represents a word boundary ^(regEx) requires the pattern to match at the beginning of the string (regEx)$ requires the pattern to match at the end of the string
So to match "joe.com" and not "joe.company", we could use either of the following:
/\.(net|com|edu|mil|gov)\b/ /\.(net|com|edu|mil|gov)$/
Back referencing is a feature of regular expressions which allow you to grab the actual data which is matched. This can be useful for printing out the matched text or for replacing the matched text.
To back reference a match, you need to use the ( )s around the expression. For example:
/target=\"(.+)\"/
This will match...
target="_blank" target="_self" target="someFrame"
And the back referenced text will be...
_blank _self someFrame
Respectively.
We've looked at some of the meta characters such as "*" and "+" which try to match 0-n or 1-n occurances of the previous pattern in the string. So the question is, how much does it match?
By default, regular expressions are greedy, that is they match as much as they can. For example, consider the following regEx:
/target=\"(.+)\"/
This will match the following:
target="_blank" target="_blank"andmore" target="blank" and some more "text"
And the back referenced matches are:
_blank _blank"andmore _blank" and some more "text
This is because the "+" and "*" operators are greedy by default, meaning that they will grab as much as possible in a match. In the previous example, we're back referencing a match to any character one or more times, which means that we grab any text after the first " all the way to the last occurance of a " in the string.
To override this, we append a ? to our back reference, like this...
/target=\"(.+?)\"/
This tells the regEx to match all the text after the first " until you come to the next ".
To see back referencing in action, go back to my earlier slide.
INDEX