JavaScript: The Definitive Guide, Sixth Editio javaScript权威指南(第6版) pdf 文字版-文字版, javascript电子书, 和javascript 有关的电子书:

10.1.4 Alternation, Grouping, and References

10.1.4 Alternation, Grouping, and References


The regular-expression grammar includes special characters for specifying alternatives, grouping subexpressions, and referring to previous subexpressions. The | character separates alternatives. For example, /ab|cd|ef/ matches the string “ab” or the string “cd” or the string “ef”. And /\d{3}|[a-z]{4}/ matches either three digits or four lowercase letters.

Note that alternatives are considered left to right until a match is found. If the left alternative matches, the right alternative is ignored, even if it would have produced a “better” match. Thus, when the pattern /a|ab/ is applied to the string “ab”, it matches only the first letter.

Parentheses have several purposes in regular expressions. One purpose is to group separate items into a single subexpression so that the items can be treated as a single unit by |, *, +, ?, and so on. For example, /java(script)?/ matches “java” followed by the optional “script”. And /(ab|cd)+|ef/ matches either the string “ef” or one or more repetitions of either of the strings “ab” or “cd”.

Another purpose of parentheses in regular expressions is to define subpatterns within the complete pattern. When a regular expression is successfully matched against a target string, it is possible to extract the portions of the target string that matched any particular parenthesized subpattern. (You’ll see how these matching substrings are obtained later in the chapter.) For example, suppose you are looking for one or more lowercase letters followed by one or more digits. You might use the pattern /[a-z]+\d +/. But suppose you only really care about the digits at the end of each match. If you put that part of the pattern in parentheses (/[a-z]+(\d+)/), you can extract the digits from any matches you find, as explained later.

A related use of parenthesized subexpressions is to allow you to refer back to a subexpression later in the same regular expression. This is done by following a \ character by a digit or digits. The digits refer to the position of the parenthesized subexpression within the regular expression. For example, \1 refers back to the first subexpression, and \3 refers to the third. Note that, because subexpressions can be nested within others, it is the position of the left parenthesis that is counted. In the following regular expression, for example, the nested subexpression ([Ss]cript) is referred to as \2:

/([Jj]ava([Ss]cript)?)\sis\s(fun\w*)/

A reference to a previous subexpression of a regular expression does not refer to the pattern for that subexpression but rather to the text that matched the pattern. Thus, references can be used to enforce a constraint that separate portions of a string contain exactly the same characters. For example, the following regular expression matches zero or more characters within single or double quotes. However, it does not require the opening and closing quotes to match (i.e., both single quotes or both double quotes):

/['"][^'"]*['"]/

256 | Chapter 10: Pattern Matching with Regular Expressions

To require the quotes to match, use a reference:

/(['"])[^'"]*\1/

The \1 matches whatever the first parenthesized subexpression matched. In this example, it enforces the constraint that the closing quote match the opening quote. This regular expression does not allow single quotes within double-quoted strings or vice versa. It is not legal to use a reference within a character class, so you cannot write:

/(['"])[^\1]*\1/

Later in this chapter, you’ll see that this kind of reference to a parenthesized subexpression is a powerful feature of regular-expression search-and-replace operations.

It is also possible to group items in a regular expression without creating a numbered reference to those items. Instead of simply grouping the items within ( and ), begin the group with (?: and end it with ). Consider the following pattern, for example:

/([Jj]ava(?:[Ss]cript)?)\sis\s(fun\w*)/

Here, the subexpression (?:[Ss]cript) is used simply for grouping, so the ? repetition character can be applied to the group. These modified parentheses do not produce a reference, so in this regular expression, \2 refers to the text matched by (fun\w*).

Table 10-4 summarizes the regular-expression alternation, grouping, and referencing operators.

Table 10-4. Regular expression alternation, grouping, and reference characters

Character Meaning

| Alternation. Match either the subexpression to the left or the subexpression to the right.

(...) Grouping. Group items into a single unit that can be used with *, +, ?, |, and so on. Also remember the characters that match this group for use with later references.

(?:...) Grouping only. Group items into a single unit, but do not remember the characters that match this group.

\n Match the same characters that were matched when group number nwas first matched. Groups are subexpressions within (possibly nested) parentheses. Group numbers are assigned by counting left parentheses from left to right. Groups formed with (?:are not numbered.

欢迎转载,转载请注明来自一手册:http://yishouce.com/book/1/31395.html
友情链接It题库(ittiku.com)| 版权归yishouce.com所有| 友链等可联系 admin#yishouce.com|粤ICP备16001685号-1