Python Regex – How to Match the First Repetition of a Digit

pythonregex

Examples:

  1. For 0123123123, 1 should be matched since the 2nd 1 appears before the repetition of any other digit.
  2. For 01234554321, 5 should be matched since the 2nd 5 appears before the repetition of any other digit.

Some regexes that I have tried:

  1. The below works for the 1st but not the 2nd example. It matches 1 instead because 1 is the first digit that appears in the string which is subsequently repeated.
import re
m = re.search(r"(\d).*?\1", string)
print(m.group(1))
  1. The below works for the 2nd but not the 1st example. It matches 3 instead – in particular the 2nd and 3rd occurrence of the digit. I do not know why it behaves that way.
import re
m = re.search(r"(\d)(?!(\d).*?\2).*?\1", string)
print(m.group(1))

Best Answer

Regex may not be the right tool for the task. While the regex in @CasimiretHippolyte's answer works, it is rather inefficient having to scan the entire rest of the string for each character in the string until a matching character is found, costing an average time complexity of O(n ^ 2).

A more efficient approach with a linear time complexity would be to use a set to keep track of characters that have been encountered, and return the first character already added to the set:

def first_repeating_digit(string):
    seen = set()
    for digit in filter(str.isdigit, string):
        if digit in seen:
            return digit
        seen.add(digit)
    raise ValueError('No repeating digit found.')

so that:

for s in '0123123123', '01234554321':
    print(s, first_repeating_digit(s))

outputs:

0123123123 1 
01234554321 5

Demo here

Benchmark test result:

blhsing 0123123123 1.2911038296297193
blhsing 01234554321 1.3835312821902335
CasimiretHippolyte 0123123123 3.6279739402234554
CasimiretHippolyte 01234554321 4.1985282939858735