PHP regular expression summary
regular expression (regular expression) describes a string matching pattern that can be used to check whether a string contains a string, a substring that matches the string, or a substring that conforms to a condition from a string. In column directory, *.txt in dir *.txt or LS *.txt is not a regular expression, because here * is different from the regular form *.
regular expressions are written by common characters (such as characters a to Z) and special characters (called meta characters). Regular expressions serve as a template to match a character pattern with the string searched.
3.1 common character
consists of all printed and non printed characters that are not explicitly specified as meta characters. This includes all capitals and lowercase letters, all numbers, all punctuation marks, and some symbols.
3.2 non print character
character implication
\cx matches control characters specified by X. For example, \cM matches a Control-M or carriage return. The value of X must be one of A-Z or A-Z. Otherwise, C is regarded as an original "C'" character.
\f matches a page change. It is equivalent to \x0c and \cL.
\n matches a newline. It is equivalent to \x0a and \cJ.
\r matches a return character. It is equivalent to \x0d and \cM.
\s matches any blank character, including spaces, tabs, page breaks, and so on. It is equivalent to [\f\n\r\t\v]].
\S matches any non blank characters. It is equivalent to [^ \f\n\r\t\v]].
\t matches a tabs. It is equivalent to \x09 and \cI.
\v matches a vertical tab. It is equivalent to \x0b and \cK.
3.3 special character
a special character is a character with a special meaning, such as the * in "*.txt" said above, simply to express the meaning of any string. If you want to find a file with a * in the file name, you need to escape from it, that is, add a "before". Ls \*.txt. Regular expressions have the following special characters.
special character description
$matches the end position of the input string. If the Multiline property of the RegExp object is set, $also matches' \n'' or '\r' '. To match the $character itself, use \ $.
() marks the start and end position of a sub expression. Subexpressions can be obtained for later use. To match these characters, use \ (and \ \).
* matches the preceding subexpression zero or multiple times. To match the * character, use \ *.
matches the preceding sub expression one or more times. To match the character, use \.
. Any single character matching except line break \n. To match. Please use \.
[mark the beginning of a parenthesis expression. To match [, please use \.
matches the preceding subexpression zero or once, or indicates a non greedy qualifier. To match? Characters, please use \?
the next character is marked or special character, or original character, or backward reference, or octal escape character. For example, 'n' matches the character 'n'. '\n'matches the newline. The sequence "/" matches "," and "(") matches "("). The beginning position of
matching input string, unless it is used in square brackets, indicates that it does not accept this set of characters. To match the ^ character itself, please use \ ^.
> = the start of the tag qualifier expression. To match {, please use \ {.
> indicates an option between the two items. To match, please use \.
the method of constructing regular expressions is the same as that of creating mathematical expressions. That is, combining small expressions with multiple meta characters and operators to create larger expressions. The components of regular expressions can be single characters, character sets, character ranges, character selection, or any combination of all these components. The
3.4 qualifier
qualifier is used to specify how many times a given component of regular expression must appear to meet the match. There are 6 kinds: * or? Or? Or {n} or {n,} or {n, m}.
*, and? Qualifiers are greedy, because they match as many words as possible, and only one on the back of them can achieve non greed or minimum matching. The qualifier of
regular expression is:
character description
* matches the preceding subexpression zero or multiple times. For example, zo* can match "Z" and "zoo". * equivalent to {0,}.
matches the preceding sub expression one or more times. For example,'zo can match "Zo" and "zoo", but can not match "Z". It's equivalent to {1,}.
matches the preceding subexpression zero or once. For example, "do (ES)?" can match "do" or "does" in "do". It is equivalent to {0,1}.
{n} n is a non negative integer. Match the determined n times. For example,'o{2}'can not match the "o'" in "Bob", but it can match the two o in "food".
{n,} n is a non negative integer. Match at least N times. For example,'o{2, "can't match" o'in "Bob", but can match all o in "Foooood". 'o{1, X 'is equivalent to' o '. 'o{0, X '' is equivalent to 'o*' '.
{n, m} m and N are nonnegative integers, n.Lt; = m. Match n times at least and match up to m times at most. For example, "o{1,3}" will match the first three o in "fooooood". 'o{0,1}'is equivalent to' o? '. Please note that no spaces can be found between commas and two numbers. The
3.5 locator
is used to describe the boundary of a string or word, and ^ and $respectively refer to the beginning and end of a string, \b describes the front or back boundaries of a word, and \B represents the boundary of a non word. A qualifier can not be used for the locator.
3.6 chooses
to enclose all selections with parentheses and to separate them from adjacent items. But using parentheses has a side effect, that is, the relevant matching will be cached. At this time, it can be used: eliminate the side effects by putting the first option.
is one of the non capture elements, and there are two non capture elements? = and?! these two have more meaning. The former is a forward prelook, matches the search string in any regular expression pattern that matches the parenthesis, the latter is a negative prelook, and at any start, the regular table does not match the table. The location of the style pattern matches the search string.
3.7 to reference
to add parentheses to both sides of a regular expression pattern or partial mode will cause the correlation to be stored in a temporary buffer, and each child match captured is stored in accordance with the content that is encountered from left to right in the regular expression pattern. The number of buffer pairs matched by storage starts from 1 and continuously numbered until the largest 99 sub expressions. Each buffer can use "\n'access", where n is a one or two digit decimal number that identifies a specific buffer.
can use the non capture meta character '::', '= =', or '!!' to ignore the preservation of related matches. The operation priority of the various operators of
4. is
the same priority is performed from left to right, and the operations of different priorities are higher and then lower. The priority of various operators is from high to low as follows:
operator describes
escape character
(?), (?), [] parentheses and square brackets
*,?, {n}, {n,}, {n, m} qualifier
, $, \ > location and order > > > "or" operation "