Regex to Match Numbers in Plain English
This page presents a regular expression to match numbers in plain English, such as:
one trillion seven hundred twenty two zero point nine five nine hundred ninety nine thousand two hundred thirteen
This pattern was an excuse to build a beautiful regex using the defined subroutine syntax (?(DEFINE) … ). As such, it only works in engine that support that syntax—currently Perl, PCRE (PHP, Delphi, R…) and Python's alternate regex engine.
The regex built in a modular way—like lego. We define some named subroutines—one_to_9, ten_to_19—then more named subroutines that build on the earlier ones: one_to_99, one_to_999…
At the bottom, after defining all the groups, is where the real matching begins. There you can decide to match a big number by calling the big_number subroutine with (?&bignumber), or to match smaller numbers using subroutines such as (?&one_to_99), and so on.
There are plenty of comments in the regex, so if you read the defined subroutine page, you shouldn't need further explanations.
(direct link)
(?x) # free-spacing mode (?(DEFINE) # Within this DEFINE block, we'll define many subroutines # They build on each other like lego until we can define # a "big number" (?<one_to_9> # The basic regex: # one|two|three|four|five|six|seven|eight|nine # We'll use an optimized version: # Option 1: four|eight|(?:fiv|(?:ni|o)n)e|t(?:wo|hree)| # s(?:ix|even) # Option 2: (?:f(?:ive|our)|s(?:even|ix)|t(?:hree|wo)|(?:ni|o)ne|eight) ) # end one_to_9 definition (?<ten_to_19> # The basic regex: # ten|eleven|twelve|thirteen|fourteen|fifteen|sixteen|seventeen| # eighteen|nineteen # We'll use an optimized version: # Option 1: twelve|(?:(?:elev|t)e|(?:fif|eigh|nine|(?:thi|fou)r| # s(?:ix|even))tee)n # Option 2: (?:(?:(?:s(?:even|ix)|f(?:our|if)|nine)te|e(?:ighte|lev))en| t(?:(?:hirte)?en|welve)) ) # end ten_to_19 definition (?<two_digit_prefix> # The basic regex: # twenty|thirty|forty|fifty|sixty|seventy|eighty|ninety # We'll use an optimized version: # Option 1: (?:fif|six|eigh|nine|(?:tw|sev)en|(?:thi|fo)r)ty # Option 2: (?:s(?:even|ix)|t(?:hir|wen)|f(?:if|or)|eigh|nine)ty ) # end two_digit_prefix definition (?<one_to_99> (?&two_digit_prefix)(?:[- ](?&one_to_9))?|(?&ten_to_19)| (?&one_to_9) ) # end one_to_99 definition (?<one_to_999> (?&one_to_9)[ ]hundred(?:[ ](?:and[ ])?(?&one_to_99))?| (?&one_to_99) ) # end one_to_999 definition (?<one_to_999_999> (?&one_to_999)[ ]thousand(?:[ ](?&one_to_999))?| (?&one_to_999) ) # end one_to_999_999 definition (?<one_to_999_999_999> (?&one_to_999)[ ]million(?:[ ](?&one_to_999_999))?| (?&one_to_999_999) ) # end one_to_999_999_999 definition (?<one_to_999_999_999_999> (?&one_to_999)[ ]billion(?:[ ](?&one_to_999_999_999))?| (?&one_to_999_999_999) ) # end one_to_999_999_999_999 definition (?<one_to_999_999_999_999_999> (?&one_to_999)[ ]trillion(?:[ ](?&one_to_999_999_999_999))?| (?&one_to_999_999_999_999) ) # end one_to_999_999_999_999_999 definition (?<bignumber> zero|(?&one_to_999_999_999_999_999) ) # end bignumber definition (?<zero_to_9> (?&one_to_9)|zero ) # end zero to 9 definition (?<decimals> point(?:[ ](?&zero_to_9))+ ) # end decimals definition ) # End DEFINE ####### The Regex Matching Starts Here ######## (?&bignumber)(?:[ ](?&decimals))? ### Other examples of groups we could match ### #(?&bignumber) # (?&one_to_99) # (?&one_to_999) # (?&one_to_999_999) # (?&one_to_999_999_999) # (?&one_to_999_999_999_999) # (?&one_to_999_999_999_999_999)
You can play with the regex and sample text in this live regex demo.