NET, Java, Perl, Python 3 and Ruby, it matches a position where only one side is a Unicode letter, digit or underscore. ✽ In PCRE (PHP, R…) with the Unicode mode turned on. ✽ In PCRE (PHP, R…) with the Unicode mode turned off, JavaScript and Python 2.7, it matches where only one side is an ASCII letter, digit or underscore. If you want to create a "real word boundary" (where a word is only allowed to have letters), see the recipe below in the section on DYI boundaries.Īs you can see on the regex cheat sheet, \b behaves differently depending on the engine: Word boundaries are useful when you want to match a sequence of letters (or digits) on their own, or to ensure that they occur at the beginning or the end of a sequence of characters.īe aware, though, that \bcat\b will not match cat in _cat or in cat25 because there is no boundary between an underscore and a letter, nor between a letter and a digit: these all belong to what regex defines as word characters. Both, of course, would match cat on its own. Removing one of the boundaries, \bcat would match cat in catfish, and cat\b would match cat in tomcat, but not vice-versa. The regex \bcat\b would therefore match cat in a black cat, but it wouldn't match it in catatonic, tomcat or certificate.
The word boundary \b matches positions where one side is a word character (usually a letter, digit or underscore-but see below for variations across engines) and the other side is not a word character (for instance, it may be the beginning of the string or a space character). We'll keep anchors and boundaries on separate pages because there's a lot of ground to cover, but just keep that in mind. Yep, in that light, our anchor is a boundary-we look left and right. Then we could translate the ^ anchor as:Īssert that immediately to the left of the current position, we can find the left wall, while to the right of the current position we cannot find the left wall. All the positions in the string are within that space. Imagine that a string is a space between two walls-one to the left and one to the right. But if you were in a mood to play with logic, you could say: Typically, you would translate ^ as something like "assert that the current position is the beginning of the string". In contrast, boundaries make assertions about what can be matched to the left and right of the current position. Therefore, none of them consume characters.Īnchors assert that the current position in the string matches a certain position: the beginning, the end, or in the case of \G the position immediately following the last match. These tokens have one thing in common: they are assertions about the engine's current position in the string. Why are ^ and $ called anchors while \b is called a boundary? ✽ Double Negative Delimiter: Character, or Beginning of String ✽ DIY Boundary: between a letter and a digit ✽ DIY Boundary Workshop: "real word boundary"
#Regex for number of lines how to#
Although this page starts with the regex word boundary \b, it aims to go far beyond: it will also introduce less-known boundaries, as well as explain how to make your own- DIY Boundaries.įor easy navigation, here are some jumping points to various sections of the page: