Using Regular Expressions with JavaScript


This page focuses on regular expressions in JavaScript. Before we start, I feel that a word is in order about what makes JavaScript regex special.

(direct link)

JavaScript has the Worst Regex Engine among the Majors

I know you don't want to hear bad news this early on the page, but you should know that among all the major regex engines, JavaScript is the worst, and by a long margin. Python's default re engine—which occupies the second-worst position—is far less crippled than JavaScript, and its shortcomings don't really matter to serious regexers, who will be using the alternate regex module anyway.

Since JavaScript is implemented differently in each browser, "JavaScript regex" is not one single engine. If in doubt about a feature, you'll want to test that your regex works with the Chrome implementation, which may perhaps be called the "most standard".

But the main issue that makes JavaScript regex so obnoxious is its lack of features. For instance, all major regex flavors support these features—except JavaScript:

✽ Dot-matches-line-breaks mode (a.k.a. DOTALL or single-line mode)
✽ Lookbehind
✽ Inline modifers such as (?i)
✽ Named capture groups
✽ Free-spacing mode
\A and \Z anchors
✽ Ability of $ to match before any line breaks at the end of the string.

Unicode properties, atomic groups and \Gare also absent. This "distinction" is shared with Python.

Needless to say, other advanced features that regex heads frequently use (such as subroutines, named subroutines, recursion, conditionals, and so on) are nowhere in sight.

In short, JavaScript regex is a horrible little engine. The lack of lookbehind means that you'll need to work a lot more with capture groups. On the other hand, scarcity can be the mother of invention, so the lack of features will sometimes inspire you to find alternate ways to reach your goals. One such example is the well-known hack to mimic an atomic group.


(direct link)

Better JavaScript regex: the XRegExp library

If you are stuck working in JavaScript and really cannot stand the default engine, consider using XRegExp, an alternate library written by Steven Levithan, a co-author of the Regular Expressions Cookbook.

Here are some features found in the XRegExp library but not in standard JavaScript implementations:
✽ Dot-matches-line-breaks mode, either inline with (?s) or with the "s" option
✽ Inline modifiers such as (?ism)
✽ Free-spacing mode, either inline with (?x) or with the "x" option
✽ Named capture with (?<foo>…), backreference \k<foo> and replacement insertion ${foo}
✽ Unicode properties such as \p{L}

Amazingly, XRegExp does not support lookbehind. Steven Levitan has provided a code workaround—apart from that, you're back to using capture groups.


Even Better JavaScript regex: PCRE port

You can also port PCRE to JavaScript using Emscripten, as Firas seems to have done on regex 101. But getting it to work just how will like it will be a lot of work.

About this Page

At the moment, I am not planning a fully fleshed-out guided tour of JavaScript regex, although I certainly intend to add plenty of tasty material to this page over time. My pages are always in motion.

In the meantime, I don't want to leave you JavaScript coders out dry, so I have something special to get you started.


A JavaScript program that shows
how to perform common regex tasks

Whenever I start playing with the regex features of a new language, the thing I always miss the most is a complete working program that performs the most common regex tasks—and some not-so-common ones as well.

This is what I have for you in the following complete JavaScript regex program. It's taken from my page about the best regex trick ever, and it performs the six most common regex tasks. The first four tasks answer the most common questions we use regex for:

✽ Does the string match?
✽ How many matches are there?
✽ What is the first match?
✽ What are all the matches?

The last two tasks perform two other common regex tasks:

✽ Replace all matches
✽ Split the string

If you study this code, you'll have a terrific starting point to start tweaking and testing with your own expressions with JavaScript. Bear in mind that the code inspects values captured in Group 1, so you'll have to tweak… but you'll have a solid base to understand how to do basic things&and fairly advanced ones as well.

As you can imagine, I am not fluent in all of the ten or so languages showcased on the site. This means that although the sample code works, a JavaScript pro might look at the code and see a more idiomatic way of testing an empty value or iterating a structure. If some idiomatic improvements jump out at you, please shoot me a comment.

Please note that usually you will choose to perform only one of the six tasks in the code, so your own code will be much shorter.


Click to Show / Hide code
or leave the site to view an online demo
<script>
var subject = 'Jane" "Tarzan12" Tarzan11@Tarzan22 {4 Tarzan34} ';
var regex = /{[^}]+}|"Tarzan\d+"|(Tarzan\d+)/g;
var group1Caps = [];
var match = regex.exec(subject);
// document.write match.toString();

// put Group 1 captures in an array
while (match != null) {
	if( match[1] != null ) group1Caps.push(match[1]);
    match = regex.exec(subject);
}

///////// The six main tasks we're likely to have ////////

// Task 1: Is there a match?
document.write("*** Is there a Match? ***<br>");
if(group1Caps.length > 0) document.write("Yes<br>");
else document.write("No<br>");

// Task 2: How many matches are there?
document.write("<br>*** Number of Matches ***<br>");
document.write(group1Caps.length);

// Task 3: What is the first match?
document.write("<br><br>*** First Match ***<br>");
if(group1Caps.length > 0) document.write(group1Caps[0],"<br>");

// Task 4: What are all the matches?
document.write("<br>*** Matches ***<br>");
if (group1Caps.length > 0) {
   for (key in group1Caps) document.write(group1Caps[key],"<br>");
   }

// Task 5: Replace the matches
// see callback parameters http://tinyurl.com/ocddsuk
replaced = subject.replace(regex, function(m, group1) {
    if (group1 == "" ) return m;
    else return "Superman";
});
document.write("<br>*** Replacements ***<br>");
document.write(replaced);

// Task 6: Split
// Start by replacing by something distinctive,
// as in Step 5. Then split.
splits = replaced.split("Superman");
document.write("<br><br>*** Splits ***<br>");
for (key in splits) document.write(splits[key],"<br>");
</script>

Read the explanation or jump to the article's Table of Contents





Smiles,

Rex



Be the First to Leave a Comment






All comments are moderated.
Link spammers, this won't work for you.

To prevent automatic spam, we require that you type the two words below before you submit your comment.