What is a regular expression

A regular expression is a special set of letters and symbols that can be used to find a sentence from text that meets the format you want.

A regular expression is a style that matches a string from left to right in a body string. For example, a "regular expression" is a complete sentence, but we often use the abbreviated terms "regex" or "regexp". Regular expressions can be used to replace strings in text, to validate forms, to extract strings and so on.

Imagine you are writing an application, and you want to set up a user naming convention so that the username contains characters, numbers, underscores and hyphens, as well as limiting the number of characters so that the name doesn't look so ugly. We use the following regular expression to validate a username:

! regexp-cn

The above regular expression will accept john_doe, jo-hn_doe, john12_as. But it does not match Jo, because it contains an upper case letter and is too short.

1. Basic matching

A regular expression is actually the format used when performing a search, which consists of a combination of letters and numbers. For example: a regular expression the, which represents a rule: it starts with the letter t, followed by h, followed by e.

"the" => The fat cat sat on the mat.

Online practice

The regular expression 123 matches the string 123. It compares the regular expression with the input character by character.

Regular expressions are case-sensitive, so The will not match the.

"The" => The fat cat sat on the mat.

practice online

2. Metacharacters

Regular expressions rely heavily on metacharacters.
Metacharacters do not represent their literal meaning, they have a special meaning. Some metacharacters have special meanings when written in square brackets. Here is an introduction to some of these metacharacters:

metacharacters	description
.	A full stop matches any single character except a line break.
[ ]	character type. Matches any character within square brackets.
[^ ]	negative character type. Matches any character except those in square brackets.
*	Match >=0 duplicates of the character before the * sign.
+	matches >1 repeated characters before the + sign.
?	The character before the ? The character before the marker is optional.
{n,m}	Match the characters before the num brackets (n <= num <= m).
(xyz)	The set of characters, matching the exact equivalent of xyz.
Or operator, matches characters before or after the symbol.	The
escape character, use to match some reserved characters `[ ]( ) { } . * + ? ^ $ \ \|`
^	Matches from the beginning of the line.
$	Match from the end.

2.1 The dot operator `.`

. is the simplest example of a metacharacter. . matches any single character, but not a line break. For example, the expression .ar' matches an arbitrary character followed by a' and `r'.

".ar" => The car parked in the garage.

online exercise

2.2 Character sets

Character sets are also called character classes. Square brackets are used to specify a character set. A hyphen is used in square brackets to specify the range of the character set. The set of characters in square brackets does not care about order. For example, the expression [Tt]he matches the and The.

"[Tt]he" => The car parked in the garage.

shorthand	description
.	all characters except newlines
\w	matches all alphanumeric characters, equivalent to `[a-zA-Z0-9_]`
\W	Match all non-alphanumeric, i.e. symbols, equivalent to: `[^\w]`
\d	Matching numbers: `[0-9]`
\D	matches non-numbers: `[^\d]`
\s	Matches all whitespace characters, equivalent to: `[\t\n\f\r\p{Z}]`
\S	matches all non-whitespace characters: `[^\s]`

symbol	description
? =	predecessor-constraint-presence
?!	predicate constraint-exclusion
? <=	Posterior Constraint-Presence
? <!	Posterior Constraint-Excluded

flags	description
i	ignores case.
g	Global search.
m	Multi-line: Anchor metacharacters `^` `$` work at the beginning of each line.

1. Basic matching​

2. Metacharacters​

2.1 The dot operator . ​

2.2 Character sets​

2.2.1 Negating character sets​

2.3 Number of repetitions​

2.3.1 The * sign​

2.3.2 The + sign​

2.3.3 ? sign​

2.4 The {} number​

2.5 (...) Feature group​

2.6 The | or operator​

2.7 Transcoding special characters​

2.8 Anchor points​

2.8.1 The ^ sign​

2.8.2 The $ sign​

3. Abbreviated character sets​

4. Pre- and post-association constraints (pre- and post-checking)​

4.1 ? =... Pre-constraints (exist)​

4.2 ?!... Pre-constraint-exclusion​

4.3 ? <= ... Posterior constraint-presence​

4.4 ? <!... Posterior constraint-exclusion​

5. signs​

5.1 Case Insensitive​

5.2 Global search​

5.3 Multiline modifiers (Multiline)​

Contributions​

License​

1. Basic matching

2. Metacharacters

2.1 The dot operator `.`

2.2 Character sets

2.2.1 Negating character sets

2.3 Number of repetitions

2.3.1 The `*` sign

2.3.2 The `+` sign

2.3.3 `?` sign

2.4 The `{}` number

2.5 `(...)` Feature group

2.6 The `|` or operator

2.7 Transcoding special characters

2.8 Anchor points

2.8.1 The `^` sign

2.8.2 The `$` sign

3. Abbreviated character sets

4. Pre- and post-association constraints (pre- and post-checking)

4.1 `? =...` Pre-constraints (exist)

4.2 `?!...` Pre-constraint-exclusion

4.3 `? <= ...` Posterior constraint-presence

4.4 `? <!...` Posterior constraint-exclusion

5. signs

5.1 Case Insensitive

5.2 Global search

5.3 Multiline modifiers (Multiline)

Contributions

License