Using Regular Expressions with Java


Java regex is an interesting beast. On the one hand, it has a number of "premium" features, such as:

Character Class Intersection, Subtraction and Union
Lookbehind that allows a variable width within a specified range
✽ Methods that return the starting and ending point of a match in a string.
✽ Support for \R to match any kind of line break, including CRLF pairs.
✽ Support for the \G anchor (which asserts that the current position is the beginning of the string or the position immediately following the previous match)
✽ Support for the \Q … \E (block escape)
Possessive quantifiers.
✽ The (?d) modifier (also accessible via the UNIX_LINES option). When this is on, the line feed character \n is the only one that affects the dot . (which doesn't match line breaks unless DOTALL is on) and the anchors ^ and $ (which match line beginnings and endings in multiline mode.)

On the other hand, Java regex has several unpleasant aspects, such as:

✽ Absense of important other premium features found in .NET, Perl or PCRE—such as \K, (*SKIP)(*F), subroutines and recursion.
✽ The absence of raw strings, forcing us to double escape backslashes in regex patterns
✽ A buggy lookbehind which has a number of undocumented effects.


About this Page

At the moment, I am not planning a fully fleshed-out guided tour of Java regex, although I certainly intend to add plenty of tasty material to this page over time. My pages are always in motion.

In the meantime, I don't want to leave you Java coders out dry, so I have something special to get you started.


A Java program that shows
how to perform common regex tasks

Whenever I start playing with the regex features of a new language, the thing I always miss the most is a complete working program that performs the most common regex tasks—and some not-so-common ones as well.

This is what I have for you in the following complete Java regex program. It's taken from my page about the best regex trick ever, and it performs the six most common regex tasks. The first four tasks answer the most common questions we use regex for:

✽ Does the string match?
✽ How many matches are there?
✽ What is the first match?
✽ What are all the matches?

The last two tasks perform two other common regex tasks:

✽ Replace all matches
✽ Split the string

If you study this code, you'll have a terrific starting point to start tweaking and testing with your own expressions with Java. Bear in mind that the code inspects values captured in Group 1, so you'll have to tweak… but you'll have a solid base to understand how to do basic things&and fairly advanced ones as well.

As you can imagine, I am not fluent in all of the ten or so languages showcased on the site. This means that although the sample code works, a Java pro might look at the code and see a more idiomatic way of testing an empty value or iterating a structure. If some idiomatic improvements jump out at you, please shoot me a comment.

Please note that usually you will choose to perform only one of the six tasks in the code, so your own code will be much shorter.


Click to Show / Hide code
or leave the site to view an online demo
import java.util.*;
import java.io.*;
import java.util.regex.*;
import java.util.List;

class Program {
public static void main (String[] args) throws java.lang.Exception	{

String subject = "Jane\" \"Tarzan12\" Tarzan11@Tarzan22 {4 Tarzan34}";
Pattern regex = Pattern.compile("\\{[^}]+\\}|\"Tarzan\\d+\"|(Tarzan\\d+)");
Matcher regexMatcher = regex.matcher(subject);
List<String> group1Caps = new ArrayList<String>();

// put Group 1 captures in a list
while (regexMatcher.find()) {
	if(regexMatcher.group(1) != null) {
		group1Caps.add(regexMatcher.group(1));
		}
	} // end of building the list

///////// The six main tasks we're likely to have ////////

// Task 1: Is there a match?
System.out.println("*** Is there a Match? ***");
if(group1Caps.size()>0) System.out.println("Yes");
else System.out.println("No");

// Task 2: How many matches are there?
System.out.println("\n" + "*** Number of Matches ***");
System.out.println(group1Caps.size());

// Task 3: What is the first match?
System.out.println("\n" + "*** First Match ***");
if(group1Caps.size()>0) System.out.println(group1Caps.get(0));

// Task 4: What are all the matches?
System.out.println("\n" + "*** Matches ***");
if(group1Caps.size()>0) {
   for (String match : group1Caps) System.out.println(match);
   }

// Task 5: Replace the matches
// if only replacing, delete the line with the first matcher
// also delete the section that creates the list of captures
Matcher m = regex.matcher(subject);
StringBuffer b= new StringBuffer();
while (m.find()) {
    if(m.group(1) != null) m.appendReplacement(b, "Superman");
    else m.appendReplacement(b, m.group(0));
}
m.appendTail(b);
String replaced = b.toString();
System.out.println("\n" + "*** Replacements ***");
System.out.println(replaced);

// Task 6: Split
// Start by replacing by something distinctive,
// as in Step 5. Then split.
String[] splits = replaced.split("Superman");
System.out.println("\n" + "*** Splits ***");
for (String split : splits) System.out.println(split);

} // end main
} // end Program

Read the explanation or jump to the article's Table of Contents


Character Class Intersection, Subtraction and Union

The syntax […&&[…]] allows you to use a logical AND on several character classes to ensure that a character is present in them all. Intersecting with a negated character, as in […&&[^…]] allows you to subtract that class from the original class. For details, on the page about character class operations, see character class intersection and character subtraction union in Java and Ruby.

Similarly, the syntax […[…]] allows you to use a logical OR on several character classes to ensure that a character is present in either of them. For details, see character class union in Java and Ruby on the page about character class operations.




Smiles,

Rex



Be the First to Leave a Comment






All comments are moderated.
Link spammers, this won't work for you.

To prevent automatic spam, we require that you type the two words below before you submit your comment.