Python – Why True is Returned When Checking if an Empty String is in Another


My limited brain cannot understand why this happens:

>>> print '' in 'lolsome'

In PHP, a equivalent comparison returns false (and a warning):

var_dump(strpos('lolsome', ''));

Best Answer

From the documentation:

For the Unicode and string types, x in y is true if and only if x is a substring of y. An equivalent test is y.find(x) != -1. Note, x and y need not be the same type; consequently, u'ab' in 'abc' will return True. Empty strings are always considered to be a substring of any other string, so "" in "abc" will return True.

From looking at your print call, you're using 2.x.

To go deeper, look at the bytecode:

>>> def answer():
...   '' in 'lolsome'

>>> dis.dis(answer)
  2           0 LOAD_CONST               1 ('')
              3 LOAD_CONST               2 ('lolsome')
              6 COMPARE_OP               6 (in)
              9 POP_TOP
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE

COMPARE_OP is where we are doing our boolean operation and looking at the source code for in reveals where the comparison happens:

        w = POP();
        v = TOP();
        if (PyInt_CheckExact(w) && PyInt_CheckExact(v)) {
            /* INLINE: cmp(int, int) */
            register long a, b;
            register int res;
            a = PyInt_AS_LONG(v);
            b = PyInt_AS_LONG(w);
            switch (oparg) {
            case PyCmp_LT: res = a <  b; break;
            case PyCmp_LE: res = a <= b; break;
            case PyCmp_EQ: res = a == b; break;
            case PyCmp_NE: res = a != b; break;
            case PyCmp_GT: res = a >  b; break;
            case PyCmp_GE: res = a >= b; break;
            case PyCmp_IS: res = v == w; break;
            case PyCmp_IS_NOT: res = v != w; break;
            default: goto slow_compare;
            x = res ? Py_True : Py_False;
        else {
            x = cmp_outcome(oparg, v, w);
        if (x == NULL) break;

and where cmp_outcome is in the same file, it's easy to find our next clue:

res = PySequence_Contains(w, v);

which is in abstract.c:

    Py_ssize_t result;
    if (PyType_HasFeature(seq->ob_type, Py_TPFLAGS_HAVE_SEQUENCE_IN)) {
        PySequenceMethods *sqm = seq->ob_type->tp_as_sequence;
        if (sqm != NULL && sqm->sq_contains != NULL)
            return (*sqm->sq_contains)(seq, ob);
    result = _PySequence_IterSearch(seq, ob, PY_ITERSEARCH_CONTAINS);
    return Py_SAFE_DOWNCAST(result, Py_ssize_t, int);

and to come up for air from the source, we find this next function in the documentation:

objobjproc PySequenceMethods.sq_contains

This function may be used by PySequence_Contains() and has the same signature. This slot may be left to NULL, in this case PySequence_Contains() simply traverses the sequence until it finds a match.

and further down in the same documentation:

int PySequence_Contains(PyObject *o, PyObject *value)

Determine if o contains value. If an item in o is equal to value, return 1, otherwise return 0. On error, return -1. This is equivalent to the Python expression value in o.

Where '' isn't null, the sequence 'lolsome' can be thought to contain it.