Regex to Match Numbers in Plain English


This page presents a regular expression to match numbers in plain English, such as:

one trillion
seven hundred twenty two
zero point nine five
nine hundred ninety nine thousand two hundred thirteen

This pattern was an excuse to build a beautiful regex using the defined subroutine syntax (?(DEFINE) … ). As such, it only works in engine that support that syntax—currently Perl, PCRE (PHP, Delphi, R…) and Python's alternate regex engine.

The regex built in a modular way—like lego. We define some named subroutines—one_to_9, ten_to_19—then more named subroutines that build on the earlier ones: one_to_99, one_to_999

At the bottom, after defining all the groups, is where the real matching begins. There you can decide to match a big number by calling the big_number subroutine with (?&bignumber), or to match smaller numbers using subroutines such as (?&one_to_99), and so on.

There are plenty of comments in the regex, so if you read the defined subroutine page, you shouldn't need further explanations.

(direct link)
(?x)           # free-spacing mode
(?(DEFINE)
  # Within this DEFINE block, we'll define many subroutines
  # They build on each other like lego until we can define
  # a "big number"

  (?<one_to_9>  
  # The basic regex:
  # one|two|three|four|five|six|seven|eight|nine
  # We'll use an optimized version:
  # Option 1: four|eight|(?:fiv|(?:ni|o)n)e|t(?:wo|hree)|
  #                                          s(?:ix|even)
  # Option 2:
  (?:f(?:ive|our)|s(?:even|ix)|t(?:hree|wo)|(?:ni|o)ne|eight)
  ) # end one_to_9 definition

  (?<ten_to_19>  
  # The basic regex:
  # ten|eleven|twelve|thirteen|fourteen|fifteen|sixteen|seventeen|
  #                                              eighteen|nineteen
  # We'll use an optimized version:
  # Option 1: twelve|(?:(?:elev|t)e|(?:fif|eigh|nine|(?:thi|fou)r|
  #                                             s(?:ix|even))tee)n
  # Option 2:
  (?:(?:(?:s(?:even|ix)|f(?:our|if)|nine)te|e(?:ighte|lev))en|
                                          t(?:(?:hirte)?en|welve)) 
  ) # end ten_to_19 definition

  (?<two_digit_prefix>
  # The basic regex:
  # twenty|thirty|forty|fifty|sixty|seventy|eighty|ninety
  # We'll use an optimized version:
  # Option 1: (?:fif|six|eigh|nine|(?:tw|sev)en|(?:thi|fo)r)ty
  # Option 2:
  (?:s(?:even|ix)|t(?:hir|wen)|f(?:if|or)|eigh|nine)ty
  ) # end two_digit_prefix definition

  (?<one_to_99>
  (?&two_digit_prefix)(?:[- ](?&one_to_9))?|(?&ten_to_19)|
                                              (?&one_to_9)
  ) # end one_to_99 definition

  (?<one_to_999>
  (?&one_to_9)[ ]hundred(?:[ ](?:and[ ])?(?&one_to_99))?|
                                            (?&one_to_99)
  ) # end one_to_999 definition

  (?<one_to_999_999>
  (?&one_to_999)[ ]thousand(?:[ ](?&one_to_999))?|
                                    (?&one_to_999)
  ) # end one_to_999_999 definition

  (?<one_to_999_999_999>
  (?&one_to_999)[ ]million(?:[ ](?&one_to_999_999))?|
                                   (?&one_to_999_999)
  ) # end one_to_999_999_999 definition

  (?<one_to_999_999_999_999>
  (?&one_to_999)[ ]billion(?:[ ](?&one_to_999_999_999))?|
                                   (?&one_to_999_999_999)
  ) # end one_to_999_999_999_999 definition

  (?<one_to_999_999_999_999_999>
  (?&one_to_999)[ ]trillion(?:[ ](?&one_to_999_999_999_999))?|
                                    (?&one_to_999_999_999_999)
  ) # end one_to_999_999_999_999_999 definition

  (?<bignumber>
  zero|(?&one_to_999_999_999_999_999)
  ) # end bignumber definition

  (?<zero_to_9>
  (?&one_to_9)|zero
  ) # end zero to 9 definition

  (?<decimals>
  point(?:[ ](?&zero_to_9))+
  ) # end decimals definition
  
) # End DEFINE


####### The Regex Matching Starts Here ########
(?&bignumber)(?:[ ](?&decimals))?

### Other examples of groups we could match ###
#(?&bignumber)
# (?&one_to_99)
# (?&one_to_999)
# (?&one_to_999_999)
# (?&one_to_999_999_999)
# (?&one_to_999_999_999_999)
# (?&one_to_999_999_999_999_999)
    

You can play with the regex and sample text in this live regex demo.






Be the First to Leave a Comment






All comments are moderated.
Link spammers, this won't work for you.

To prevent automatic spam, we require that you type the two words below before you submit your comment.