Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A somewhat clearer way (imho) is /(\d)\D*$/ since it anchors to the end of the string.


According to the debugger at http://regex101.com/:

  /.*(\d)/
- Searches right-to-left, backtracking until it finds the match.

  /(\d)\D*$/
- Searches left-to-right, going forward a step, backtrack, forward, backtrack, until it finds a match.

If you're looking for a match toward the end of a string, the .* version will be faster.


Wow, regex101.com is very nice, but I never would have noticed that debugger pane if I hadn't gone looking for it after your comment. Incredibly useful tool.


This is not the same as finding the last match though. The parent's example will match '2' in '1 of 2 steps.'


On the contrary, it does gives the same result.

$ anchors to the end of the string, \D clears the non-digits from the end to allow \d to match the digit '2'.


Thanks, I see where I was wrong now.

In this case when finding the last match from the end, would the lazy quantifier reduce backtracking? e.g. /(\d)\D*?$/


No, that would work very similarly to the greedy version. The backtracking happens because the \d gets matched to the '1' and the whole thing has to be rolled back when the $ attempts match and instead finds '2' (this would happen again if there were more digits for \d to speculatively match on). So the backtracking is not caused by the laziness or greediness of the \D* ; we really do want to gobble up all of the non-digits.

On the two options generally:

    /(\d)\D*$/
is problematic if you have a lot of digits, while

    /.*(\d)/ 
is problematic if you have a lot of text after the last digit. Both could potentially be optimized by the engine to run right-to-left (the former because it's anchored to the end and the latter because it greedily matches to the beginning), and then both would do well. I'm not sure if that happens in practice.

Overall, I prefer the latter, both because I think it's clearer and because its perf characteristics hold up under a wider variety of inputs.

Edit: how do you make literal asterisks on HN without having a space after them?


In addition to the other explanation, the lazy qualifier is redundant here anyway since there should only be one $ in any given expression.


As others before me have said, this pattern works as expected. Putting it through regexper, you can visually see this.

http://www.regexper.com/#%2F(%5Cd)%5CD*%24%2F


Technically, while they both capture the same digit, the match itself it different, including either everything before that digit or everything after it. But I tend to liberally use lookaround to keep the actual match clean myself; maybe others go more often for a capturing group. (Well, and not being able to use arbitrary-length lookaround in most engines might be a reason too.)


Actually, both will match the '2' in '1 of 2 steps.'


Yes, but that's harder to do for more complicated regexes, because you need to negate a regex (here \d => \D) for this trick.

If you have a complicated regex $r, you can only negate it with (?:(?!$r).), and in that case, .$r is much easier to read :-)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: