Python Regex – Different Behavior Between re.finditer and re.findall

pythonregex

I am using the following code:

CARRIS_REGEX=r'<th>(\d+)</th><th>([\s\w\.\-]+)</th><th>(\d+:\d+)</th><th>(\d+m)</th>'
pattern = re.compile(CARRIS_REGEX, re.UNICODE)
matches = pattern.finditer(mailbody)
findall = pattern.findall(mailbody)

But finditer and findall are finding different things. Findall indeed finds all the matches in the given string. But finditer only finds the first one, returning an iterator with only one element.

How can I make finditer and findall behave the same way?

Thanks

Best Answer

I can't reproduce this here. Have tried it with both Python 2.7 and 3.1.

One difference between finditer and findall is that the former returns regex match objects whereas the other returns a tuple of the matched capturing groups (or the entire match if there are no capturing groups).

So

import re
CARRIS_REGEX=r'<th>(\d+)</th><th>([\s\w\.\-]+)</th><th>(\d+:\d+)</th><th>(\d+m)</th>'
pattern = re.compile(CARRIS_REGEX, re.UNICODE)
mailbody = open("test.txt").read()
for match in pattern.finditer(mailbody):
    print(match)
print()
for match in pattern.findall(mailbody):
    print(match)

prints

<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>
<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>
<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>
<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>

('790', 'PR. REAL', '21:06', '04m')
('758', 'PORTAS BENFICA', '21:10', '09m')
('790', 'PR. REAL', '21:14', '13m')
('758', 'PORTAS BENFICA', '21:21', '19m')
('790', 'PR. REAL', '21:29', '28m')
('758', 'PORTAS BENFICA', '21:38', '36m')
('758', 'SETE RIOS', '21:49', '47m')
('758', 'SETE RIOS', '22:09', '68m')

If you want the same output from finditer as you're getting from findall, you need

for match in pattern.finditer(mailbody):
    print(tuple(match.groups()))
Related Question