What does this regex mean?
This page presents explanations of various regexes found throughout the site.
(direct link)
The Meaning of Life
Here is an explanation for the "Meaning of Life" regex found on the Regex Humor page.In the regex demo, note that the match is "42", while the first digit "4" is captured to Group 1.
The regex uses subroutines, with a syntax supported by Perl and PCRE (C, PHP, R, …)
Ruby also supports subroutines (with a slightly different syntax), but it doesn't support lookahead within lookbehind, so in a Ruby version the regex would have to be simplified.
And now… The explanation.
(?x) # Free-spacing mode ^ # Beginning of line (?= # Start lookahead. This actually checks that what # follows is 42, so if you take that out of the # parentheses and delete the rest, you have enough. (?!(.)\1) # Negative lookahead: what follows is not the same # two characters. Avoids 22 and 44. ( # Start Group 1. This matches a 2 or a 4. # The next line is a negated character class. [^\DO:105-93+30] # The key to the class is [^\D, which means # not a non-digit. Therefore, it matches a digit. # The digits 0,1,3, and 5-through-9 are removed. # Therefore, the only allowable chars are 2 and 4. # The additional characters O,:,+ and the extra 0 # are also removed, without influence on the class. ) # End Group 1 (?-1) # Match the previous subroutine, # i.e. another 2 or 4. (?<! # Start negative lookbehind. The purpose is to check # that the preceding character is not a 4. # Since we match two characters that are either # 2 or 4, but not identical and not ending in 4, # it has to be 42. # Assert that what precedes is not... \d # A digit (?<= # Preceded by... # What follows is a convoluted way of saying 4. (?! # Negative lookahead: # After the negative lookahead, we will match # one digit. The negative lookahead excludes # all digits that are not four, therefore # the digit \d will have to be four. [5-90-3] # Any digit in the ranges 0-through-3 and # 5-through-9. ) # End negative lookahead \d # One digit ) # End lookbehind ) # End negative lookahead ) # End lookahead: we now know that what follows is 42 # At this stage, we just need to match any two characters, and # we know they will be 42. We'll do it in a roundabout way . # Match one character # The next line is a negated character class. [^\WHY?] # The key to the character class is [^\W which means # not a non-word character—therefore it must be a # word character. This includes digits, which suits # us fine. The additional characters H,Y,? are # also removed, but we don't care. $ # Assert that we have reached the end of the string
Two marvelous PCRE tools:
grep with pcregrep, debug and optimize with pcretest