The is
operator compares the memory addresses of two objects, and returns True
if they're the same. Why, then, does it not work reliably with strings?
Code #1
>>> a = "poi"
>>> b = "poi"
>>> a is b
True
Code #2
>>> ktr = "today is a fine day"
>>> ptr = "today is a fine day"
>>> ktr is ptr
False
I have created two strings whose content is the same but they are living on different memory addresses. Why is the output of the is
operator not consistent?
Best Answer
I believe it has to do with string interning. In essence, the idea is to store only a single copy of each distinct string, to increase performance on some operations.
Basically, the reason why
a is b
works is because (as you may have guessed) there is a single immutable string that is referenced by Python in both cases. When a string is large (and some other factors that I don't understand, most likely), this isn't done, which is why your second example returns False.EDIT: And in fact, the odd behavior seems to be a side-effect of the interactive environment. If you take your same code and place it into a Python script, both
a is b
andktr is ptr
return True.This makes sense, since it'd be easy for Python to parse a source file and look for duplicate string literals within it. If you create the strings dynamically, then it behaves differently even in a script.
As for why
a is b
still results in True, perhaps the allocated string is small enough to warrant a quick search through the interned collection, whereas the other one is not?