A regular expression is used to determine whether a string matches a pattern and, if it does, to extract or transform the parts that match.
Usage
This class delegates to the java.util.regex package of the Java Platform. See the documentation for java.util.regex.Pattern for details about the regular expression syntax for pattern strings.
An instance of Regex represents a compiled regular expression pattern. Since compilation is expensive, frequently used Regexes should be constructed once, outside of loops and perhaps in a companion object.
The canonical way to create a Regex is by using the method r, provided implicitly for strings:
val date = raw"(\d{4})-(\d{2})-(\d{2})".r
Since escapes are not processed in multi-line string literals, using triple quotes avoids having to escape the backslash character, so that "\\d" can be written """\d""". The same result is achieved with certain interpolators, such as raw"\d".r or a custom interpolator r"\d" that also compiles the Regex.
Extraction
To extract the capturing groups when a Regex is matched, use it as an extractor in a pattern match:
"2004-01-20" match {
case date(year, month, day) => s"$year was a good year for PLs."
}
To check only whether the Regex matches, ignoring any groups, use a sequence wildcard:
"2004-01-20" match {
case date(_*) => "It's a date!"
}
That works because a Regex extractor produces a sequence of strings. Extracting only the year from a date could also be expressed with a sequence wildcard:
"2004-01-20" match {
case date(year, _*) => s"$year was a good year for PLs."
}
In a pattern match, Regex normally matches the entire input. However, an unanchored Regex finds the pattern anywhere in the input.
val embeddedDate = date.unanchored
"Date: 2004-01-20 17:25:18 GMT (10 years, 28 weeks, 5 days, 17 hours and 51 minutes ago)" match {
case embeddedDate("2004", "01", "20") => "A Scala is born."
}
Find Matches
To find or replace matches of the pattern, use the various find and replace methods. For each method, there is a version for working with matched strings and another for working with Match objects.
For example, pattern matching with an unanchored Regex, as in the previous example, can also be accomplished using findFirstMatchIn. The findFirst methods return an Option which is non-empty if a match is found, or None for no match:
val dates = "Important dates in history: 2004-01-20, 1958-09-05, 2010-10-06, 2011-07-15"
val firstDate = date.findFirstIn(dates).getOrElse("No date found.")
val firstYear = for (m <- date.findFirstMatchIn(dates)) yield m.group(1)
To find all matches:
val allYears = for (m <- date.findAllMatchIn(dates)) yield m.group(1)
To check whether input is matched by the regex:
date.matches("2018-03-01") // true
date.matches("Today is 2018-03-01") // false
date.unanchored.matches("Today is 2018-03-01") // true
To iterate over the matched strings, use findAllIn, which returns a special iterator that can be queried for the MatchData of the last match:
val mi = date.findAllIn(dates)
while (mi.hasNext) {
val d = mi.next
if (mi.group(1).toInt < 1960) println(s"$d: An oldie but goodie.")
}
Although the MatchIterator returned by findAllIn is used like any Iterator, with alternating calls to hasNext and next, hasNext has the additional side effect of advancing the underlying matcher to the next unconsumed match. This effect is visible in the MatchData representing the "current match".
val r = "(ab+c)".r
val s = "xxxabcyyyabbczzz"
r.findAllIn(s).start // 3
val mi = r.findAllIn(s)
mi.hasNext // true
mi.start // 3
mi.next() // "abc"
mi.start // 3
mi.hasNext // true
mi.start // 9
mi.next() // "abbc"
The example shows that methods on MatchData such as start will advance to the first match, if necessary. It also shows that hasNext will advance to the next unconsumed match, if next has already returned the current match.
The current MatchData can be captured using the matchData method. Alternatively, findAllMatchIn returns an Iterator[Match], where there is no interaction between the iterator and Match objects it has already produced.
Note that findAllIn finds matches that don't overlap. (See findAllIn for more examples.)
val num = raw"(\d+)".r
val all = num.findAllIn("123").toList // List("123"), not List("123", "23", "3")
Replace Text
Text replacement can be performed unconditionally or as a function of the current match:
val redacted = date.replaceAllIn(dates, "XXXX-XX-XX")
val yearsOnly = date.replaceAllIn(dates, m => m.group(1))
val months = (0 to 11).map { i => val c = Calendar.getInstance; c.set(2014, i, 1); f"$c%tb" }
val reformatted = date.replaceAllIn(dates, _ match { case date(y,m,d) => f"${months(m.toInt - 1)} $d, $y" })
Pattern matching the Match against the Regex that created it does not reapply the Regex. In the expression for reformatted, each date match is computed once. But it is possible to apply a Regex to a Match resulting from a different pattern:
val docSpree = """2011(?:-\d{2}){2}""".r
val docView = date.replaceAllIn(dates, _ match {
case docSpree() => "Historic doc spree!"
case _ => "Something else happened"
})
A regular expression is used to determine whether a string matches a pattern and, if it does, to extract or transform the parts that match.
Usage
This class delegates to the java.util.regex package of the Java Platform. See the documentation for java.util.regex.Pattern for details about the regular expression syntax for pattern strings.
An instance of
Regexrepresents a compiled regular expression pattern. Since compilation is expensive, frequently usedRegexes should be constructed once, outside of loops and perhaps in a companion object.The canonical way to create a
Regexis by using the methodr, provided implicitly for strings:val date = raw"(\d{4})-(\d{2})-(\d{2})".rSince escapes are not processed in multi-line string literals, using triple quotes avoids having to escape the backslash character, so that
"\\d"can be written"""\d""". The same result is achieved with certain interpolators, such asraw"\d".ror a custom interpolatorr"\d"that also compiles theRegex.Extraction
To extract the capturing groups when a
Regexis matched, use it as an extractor in a pattern match:"2004-01-20" match { case date(year, month, day) => s"$year was a good year for PLs." }To check only whether the
Regexmatches, ignoring any groups, use a sequence wildcard:"2004-01-20" match { case date(_*) => "It's a date!" }That works because a
Regexextractor produces a sequence of strings. Extracting only the year from a date could also be expressed with a sequence wildcard:"2004-01-20" match { case date(year, _*) => s"$year was a good year for PLs." }In a pattern match,
Regexnormally matches the entire input. However, an unanchoredRegexfinds the pattern anywhere in the input.val embeddedDate = date.unanchored "Date: 2004-01-20 17:25:18 GMT (10 years, 28 weeks, 5 days, 17 hours and 51 minutes ago)" match { case embeddedDate("2004", "01", "20") => "A Scala is born." }Find Matches
To find or replace matches of the pattern, use the various find and replace methods. For each method, there is a version for working with matched strings and another for working with
Matchobjects.For example, pattern matching with an unanchored
Regex, as in the previous example, can also be accomplished usingfindFirstMatchIn. ThefindFirstmethods return anOptionwhich is non-empty if a match is found, orNonefor no match:val dates = "Important dates in history: 2004-01-20, 1958-09-05, 2010-10-06, 2011-07-15" val firstDate = date.findFirstIn(dates).getOrElse("No date found.") val firstYear = for (m <- date.findFirstMatchIn(dates)) yield m.group(1)To find all matches:
To check whether input is matched by the regex:
date.matches("2018-03-01") // true date.matches("Today is 2018-03-01") // false date.unanchored.matches("Today is 2018-03-01") // trueTo iterate over the matched strings, use
findAllIn, which returns a special iterator that can be queried for theMatchDataof the last match:val mi = date.findAllIn(dates) while (mi.hasNext) { val d = mi.next if (mi.group(1).toInt < 1960) println(s"$d: An oldie but goodie.") }Although the
MatchIteratorreturned byfindAllInis used like anyIterator, with alternating calls tohasNextandnext,hasNexthas the additional side effect of advancing the underlying matcher to the next unconsumed match. This effect is visible in theMatchDatarepresenting the "current match".The example shows that methods on
MatchDatasuch asstartwill advance to the first match, if necessary. It also shows thathasNextwill advance to the next unconsumed match, ifnexthas already returned the current match.The current
MatchDatacan be captured using thematchDatamethod. Alternatively,findAllMatchInreturns anIterator[Match], where there is no interaction between the iterator andMatchobjects it has already produced.Note that
findAllInfinds matches that don't overlap. (See findAllIn for more examples.)val num = raw"(\d+)".r val all = num.findAllIn("123").toList // List("123"), not List("123", "23", "3")Replace Text
Text replacement can be performed unconditionally or as a function of the current match:
val redacted = date.replaceAllIn(dates, "XXXX-XX-XX") val yearsOnly = date.replaceAllIn(dates, m => m.group(1)) val months = (0 to 11).map { i => val c = Calendar.getInstance; c.set(2014, i, 1); f"$c%tb" } val reformatted = date.replaceAllIn(dates, _ match { case date(y,m,d) => f"${months(m.toInt - 1)} $d, $y" })Pattern matching the
Matchagainst theRegexthat created it does not reapply theRegex. In the expression forreformatted, eachdatematch is computed once. But it is possible to apply aRegexto aMatchresulting from a different pattern:val docSpree = """2011(?:-\d{2}){2}""".r val docView = date.replaceAllIn(dates, _ match { case docSpree() => "Historic doc spree!" case _ => "Something else happened" })java.util.regex.Pattern