PCRE Documentation and Change Log


As a convenience to PCRE users, with the permission of Philip Hazel, I aim to provide a mirror to the latest PCRE documentation whenever it is released. To download the latest PCRE, see pcre.org.

Apart from links to various versions of the PCRE documentation, this page presents a curated list of new feature introductions in PCRE's pattern syntax, as well as as links to other PCRE-related material on RexEgg.

Jumping Points
For easy navigation, here are some jumping points to various sections of the page:

Change Log
Documentation
Feature Additions to the PCRE Pattern Syntax
Links to other PCRE-related Material on RexEgg


(direct link)

Change Log

✽ For the latest official PCRE2 revision history (ChangeLog), follow the link, which should remain the same when new versions are released. For the official "PCRE 1" revision history (ChangeLog), follow the link, which shows all changes up to the latest version of PCRE1.

✽ For a brief, curated history of additions to the syntax, see Additions to PCRE further down.


(direct link)

Documentation

✽ Versions 10.0 and higher are called PCRE2. PCRE2 contains a new API, which includes a replacement function: pcre2_substitute(). The latest PCRE2 documentation should always be available on this link. If you are mostly interested in PCRE's regex syntax, the most important file in the PCRE2 documentation is the pcre2pattern man page. The pcre2api file has the replacement syntax.

✽ Versions below 10.0, sometimes known as "PCRE 1", are the original PCRE library—still widely but now in bug-fix mode only (no new features to be introduced). The latest "PCRE 1" documentation should always live on this link. If you are mostly interested in PCRE's regex syntax, the most important file in the "PCRE 1" documentation is the pcrepattern man page.

10.3010.2310.2210.2110.2010.1010.0
8.418.408.398.388.378.368.358.348.338.328.318.30
8.218.138.02
7.906.705.004.503.90


(direct link)

Feature Additions to the PCRE Pattern Syntax

This section is not the full PCRE change log. Instead, it presents the version and date when new features were added to the pattern syntax. This is a curated collection that does not claim to be exhaustive. For the full story, see the change log for PCRE and the change log for PCRE2.

VersionDateChange
10.2314 Feb 2017Allow backreferences in lookbehind so long as group names or numbers are unambiguous
10.2314 Feb 2017Added forward relative back-reference syntax: \g{+2} (mirroring the existing \g{-2})
10.2112 Jan 2016Added the PCRE2_SUBSTITUTE_EXTENDED option to enhance replacement syntax
10.2112 Jan 2016Added the ${*MARK} facility to pcre2_substitute()
10.2112 Jan 2016Added the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option to tweak what happens during replacements when the output buffer is too small
10.2112 Jan 2016Added the PCRE2_SUBSTITUTE_UNKNOWN_UNSET and PCRE2_SUBSTITUTE_UNSET_EMPTY options to fine-tune how empty capture groups are treated in replacements
10.2112 Jan 2016Added the PCRE2_ALT_VERBNAMES option to subtly modify marked names that can be used with backtracking control verbs
10.2030 Jun 2015Added the PCRE2_ALT_CIRCUMFLEX option to allow ^ to assert position after any newline including a terminating newline
10.2030 Jun 2015Added the PCRE2_NEVER_BACKSLASH_C option to disable \C
10.2030 Jun 2015pcre2_callout_enumerate was added to the API
10.106 Mar 2015Serialization functions were added to the API
10.05 Jan 2015Version check available via patterns such as (?(VERSION>=x)…)
10.05 Jan 2015PCRE2_NO_DOTSTAR_ANCHOR tells the engine not to automatically anchor patterns that start with .*
10.05 Jan 2015(*NOTEMPTY) and (*NOTEMPTY_ATSTART) tell the engine not to return empty matches)
10.05 Jan 2015By default, PCRE2 buils with unicode support
10.05 Jan 2015Name switch to PCRE2 and new API, which includes a replacement function: pcre2_substitute()
8.415 Jul 2017Inline comments can now be inserted between ++ and +? quantifiers, as in a+(?# make it possessive)+ or a+(?# up to b)?b
8.3415 Dec 2014Added support for the POSIX [[:<:]] and [[:>:]] (left- and right-of-word boundaries), which are converted to \b(?=\w) and \b(?<=\w) internally
8.3415 Dec 2014Added \o{…} to specify code points in octal
8.3328 May 2014Added \p{Xuc} (PCRE-specific) to match characters that can be expressed using Universal Character Names
8.1025 Jun 2010Added PCRE-specific Unicode properties: \p{Xan} (alphanumeric), \p{Xsp} (Perl space), \p{Xps} (POSIX space) and \p{Xwd} (word)
8.1025 Jun 2010Added support for (*MARK:ARG) and for ARG additions to PRUNE, SKIP, and THEN
8.1025 Jun 2010Added \N (any character that is not a line break)
8.1025 Jun 2010Added the (*UCP) start of pattern modifier, which affects \b, \d, \s and \w
7.9011 Apr 2009Added the (*UTF8) start of pattern modifier
7.707 May 2008Added Ruby-style subroutine call syntax: \g<2>, \g'name', \g'2'
7.3028 Aug 2007Added backtracking control verbs (*SKIP), (*FAIL), (*F), (*PRUNE), (*THEN), (*COMMIT), (*ACCEPT)
7.3028 Aug 2007Added the (*CR) start of pattern modifier
7.2019 Jun 2007Added (?-2) and (?+2) syntax for relative subroutine calls
7.2019 Jun 2007Added (?(-2)…) and (?(+2)…) conditional syntax to check if a relative capture group has been set
7.2019 Jun 2007Added \K to drop what has been matched so far from the match to be returned
7.2019 Jun 2007Added named back-reference synonyms: \k{foo} and \g{foo}
7.2019 Jun 2007Added branch reset syntax (?|…)
7.2019 Jun 2007Added \h and \v (and their counterclasses \H and \V) to match horizontal and vertical whitespace
7.0019 Dec 2006Added \R to match any Unicode newline sequence
7.0019 Dec 2006Added named group synonyms (?<foo>…) and (?'foo'…)
7.0019 Dec 2006Added named subroutine call synonym (?&foo)
7.0019 Dec 2006Added named back-reference synonyms \k<foo> and \k'foo'
7.0019 Dec 2006Added named conditional synonyms (?(<foo>)…), (?('foo')…) and (?(foo)…)
7.0019 Dec 2006Added pre-defined subroutines (?(DEFINE)…)
7.0019 Dec 2006Added conditional syntax to check if a subroutine or recursion level has been reached: (?(R2)…), (?(R&foo)…) and (?(R)…)
7.0019 Dec 2006Added \g2 and \g{-2} for relative back-references
6.704 Jul 2006Added named groups in conditionals: (?(foo)…)
6.501 Feb 2006Added support for Unicode script names via \p{Arabic}
5.0013 Sep 2004Added support for Unicode categories such as \p{L} and negated Unicode categories such as \P{Nd}
5.0013 Sep 2004Added \X Unicode grapheme token
4.0017 Feb 2003Added [:blank:] to match ASCII space character and tab
4.0017 Feb 2003Added \Q…\E escape sequence
4.0017 Feb 2003Added possessive quantifiers: ?+, *+, ++ and {…,…}+
4.0017 Feb 2003Added \C to match a single byte, even in UTF-8 mode
4.0017 Feb 2003Added the \G continuation anchor
4.0017 Feb 2003Added callouts (?C), (?C2) etc. which can be used in C but not PHP
4.0017 Feb 2003Added named groups (?P<foo>…) and back-references (?P=foo), and subroutine calls (?P>foo)
3.001 Feb 2000Added recursion (?R)
3.001 Feb 2000Added POSIX classes such as [:alpha:]
1.0827 Mar 1998Added the inline modifier (?U) to turn on ungreedy mode
1.0827 Mar 1998Added the inline modifier (?X) to turn on extras mode
0.9927 Oct 1997Added atomic groups (?>…)
0.9616 Oct 1997Added DOTALL mode, including inline modifier (?s)
0.9211 Sep 1997Added multiline mode, including inline modifier (?m)



(direct link)

When PCRE precedes Perl

For the most part, PCRE tries to stay in step with Perl regex syntax, but the two engines' behaviors are not always identical. As is bound to happen in communities with many active users, it can happen that an idea makes it to the PCRE engine before it gets adopted by Perl. This kind of friendly exchange is a good thing for all regexers. Parochial not invented here postures wouldn't serve us—we just want the best regex engines.

Here are examples of features where PCRE preceded Perl:

✽ Recursion was first implemented in PCRE by a contributor and appeared in version 3.0 (February 2000). Perl introduced recursion in version 5.10 (officially released in December 2007), which explains why certain details function differently in the two engines.

✽ PCRE implemented Python's named group syntax (?P<foo>…) in version 4.0 (February 2003). Perl started supporting named groups in version 5.10 (officially released in December 2007).


(direct link)

Links to other PCRE-related Material on RexEgg

PCRE-related material is peppered throughout the site. Below, I try to maintain a list of the most significant "PCRE pockets" on the site.

Reducing (?…) Syntax Confusion explains all the (?…) syntax. Other points of PCRE syntax can be found on the pages about anchors, boundaries, capture groups and others (see the "Black Belt Program") in the left-side menu at the top of the page.

✽ The page on flags and modifiers has a section about PCRE's Special Start-of-Pattern Modifiers.

✽ I've implemented an infinite lookbehind demo for PCRE.

pcregrep and pcretest presents two PCRE-specific tools and includes the latest Windows binaries.

✽ My page on backtracking control verbs shows useful contructs such as (*SKIP)(*FAIL)

✽ The PHP regex page shows the PHP interface to the PCRE engine.

✽ The trick about matching line numbers shows an interesting example of self-referencing groups and of recursion.

✽ The trick about matching numbers in plain English shows an full-scale example of how (?(DEFINE)…) can be used to produce modular, maintainable patterns.





next  Two marvelous PCRE tools:
 grep with pcregrep, debug and optimize with pcretest





Be the First to Leave a Comment






All comments are moderated.
Link spammers, this won't work for you.

To prevent automatic spam, we require that you type the two words below before you submit your comment.