Regex

Regular expression ("regex"):
It is a description of a pattern of text.
 >>It can test whether a string matches the expression's pattern
 >> It can use a regex to search/replace characters in a string

Regex is supported in many languages. Java, Php, Javascript

In php, It starts with / and close with /
/abc/ >> it matches any string containing abc eg. "abc", "abcdef", "defabc", ".=.abc.=.", ...

Wildcards:
A  dot or (.)  matches any character
eg. /t.m/ matches 'tom', 'tin', 't_n'

if you put /t.m/i , it becomes case insensitive and matches , T_I, T_k

^\$[1-9]([0-9]{2,}?)?\.[0-9]{2,} -Regex to select the $1.00 and price.

Special characters: |, (), \
| means OR
/abc|def|g/ matches "abc", "def", or "g"
There's no AND symbol. Why not?
() are for grouping
/(Homer|Marge) Simpson/ matches "Homer Simpson" or "Marge Simpson"
\ starts an escape sequence
many characters must be escaped to match them literally: / \ $ . [ ] ( ) ^ * + ?
/<br \/>/ matches lines containing <br /> tags

Quantifiers: *, +, ?
* means 0 or more occurrences
/abc*/ matches "ab", "abc", "abcc", "abccc", ...
/a(bc)*/ matches "a", "abc", "abcbc", "abcbcbc", ...
/a.*a/ matches "aa", "aba", "a8qa", "a!?xyz__9a", ...
+ means 1 or more occurrences
/a(bc)+/ matches "abc", "abcbc", "abcbcbc", ...
/Goo+gle/ matches "Google", "Gooogle", "Goooogle", ...
? means 0 or 1 occurrences
/a(bc)?/ matches "a" or "abc"

More quantifiers: {min,max}
{min,max} means between min and max occurrences (inclusive)
/a(bc){2,4}/ matches "abcbc", "abcbcbc", or "abcbcbcbc"
min or max may be omitted to specify any number
{2,} means 2 or more
{,6} means up to 6
{3} means exactly 3

Anchors: ^ and $
^ represents the beginning of the string or line;
$ represents the end
/Jess/ matches all strings that contain Jess;
/^Jess/ matches all strings that start with Jess;
/Jess$/ matches all strings that end with Jess;
/^Jess$/ matches the exact string "Jess" only
/^Mart.*Stepp$/ matches "MartStepp", "Marty Stepp", "Martin D Stepp", ...
but NOT "Marty Stepp stinks" or "I H8 Martin Stepp"
(on the other slides, when we say, /PATTERN/ matches "text", we really mean that it matches any string that contains that text)

Character sets: []
[] group characters into a character set; will match any single character from the set
/[bcd]art/ matches strings containing "bart", "cart", and "dart"
equivalent to /(b|c|d)art/ but shorter
inside [], many of the modifier keys act as normal characters
/what[!*?]*/ matches "what", "what!", "what?**!", "what??!", ...
What regular expression matches DNA (strings of A, C, G, or T)?
/[ACGT]+/

Character ranges: [start-end]
inside a character set, specify a range of characters with -
/[a-z]/ matches any lowercase letter
/[a-zA-Z0-9]/ matches any lower- or uppercase letter or digit
an initial ^ inside a character set negates it
/[^abcd]/ matches any character other than a, b, c, or d
inside a character set, - must be escaped to be matched
/[+\-]?[0-9]+/ matches an optional + or -, followed by at least one digit
What regular expression matches letter grades such as A, B+, or D- ?
/[ABCDF][+\-]?/

Escape sequences
special escape sequence character sets:
\d matches any digit (same as [0-9]); \D any non-digit ([^0-9])
\w matches any word character (same as [a-zA-Z_0-9]); \W any non-word char
\s matches any whitespace character ( , \t, \n, etc.); \S any non-whitespace
What regular expression matches dollar amounts of at least $100.00 ?
/\$\d{3,}\.\d{2}/

Reference :
1. Maharishi University of Management, Web Application Programming, Lecture Notes

Comments