Python中str的+=性能分析

str.__iadd__

s = ''
for i in range(10000):
    s += str(i)

在一般的语言中,以上代码是不行的。 由于str是不可变对象,这个操作会产生N个过程对象。 而且越往后,str的长度越大,内存开销越大,对内存和性能都会有较大降低。

一般推荐用一些替代方案。

list''.join

l = []
for i in range(10000):
    l.append(str(i))
s = ''.join(l)

推导式和''.join

s = ''.join(str(i) for i in range(10000))

StringIO

from io import StringIO

buffer = StringIO()
for i in range(10000):
    buffer.write(str(i))
s = buffer.getvalue()

小结

以上通过3种方案,分别实现了+=的功能,构造了一个复杂的结果字符串s

一般是推荐后三种,但是第一种真的不能用吗?

性能比较

以下用timeit来比较各方案性能:

python3 -m timeit -u msec "
s = ''
for i in range(10000):
    s += str(i)
"
python3 -m timeit -u msec "
l = [str(i) for i in range(10000)]
s = ''.join(l)
"
python3 -m timeit -u msec "
s = ''.join(str(i) for i in range(10000))
"
python3 -m timeit -s 'from io import StringIO' "
buffer = StringIO()
for i in range(10000):
    buffer.write(str(i))
s = buffer.getvalue()
"

结果:

200 loops, best of 5: 1.26 msec per loop
500 loops, best of 5: 0.796 msec per loop
500 loops, best of 5: 0.878 msec per loop
200 loops, best of 5: 1.16 msec per loop

虽然第一个确实是最慢的,但是却和其它三个方案拉不开差距。 如果实际运行情况真的如前所述,会产生N个额外的str,而且后半段会非常大,那么+=的运行性能不可能跟上其它三个方案。

源码分析

当前版本(Python 3.10+)的+=相关源码在PyUnicode_Append(见附录)。 其中,关键部分是:

    if (unicode_modifiable(left)
        && PyUnicode_CheckExact(right)
        && PyUnicode_KIND(right) <= PyUnicode_KIND(left)
        /* Don't resize for ascii += latin1. Convert ascii to latin1 requires
           to change the structure size, but characters are stored just after
           the structure, and so it requires to move all characters which is
           not so different than duplicating the string. */
        && !(PyUnicode_IS_ASCII(left) && !PyUnicode_IS_ASCII(right)))
    {
        /* append inplace */
        if (unicode_resize(p_left, new_len) != 0)
            goto error;

        /* copy 'right' into the newly allocated area of 'left' */
        _PyUnicode_FastCopyCharacters(*p_left, left_len, right, 0, right_len);
    }

如果左值(即这里示例的s)是可修改的,则对其进行原地调整。 增加其大小,再把右值(即示例中的str(i))直接复制到新扩张的内存去。

简单来说,s+=的过程中,没有被取右值,因此被当做buffer来用。 这样性能就能跟上其它方案了。

这个修改,最早是在CPython 2.4.2引入。 目前在最新的CPython上仍然有效。

String concatenations in statements of the form s = s + “abc” and s += “abc” are now performed more efficiently in certain circumstances. This optimization won’t be present in other Python implementations such as Jython, so you shouldn’t rely on it; using the join() method of strings is still recommended when you want to efficiently glue a large number of strings together. (Contributed by Armin Rigo.)

结论

如果有条件用join,那么优先建议用join。 如果逻辑过于复杂,需要一个Buffer类的解决方案,那么直接使用+=,是简洁有效的。

注意:以上结论仅对CPython有效。如果预期运行时不确定,那么建议不要使用。

附录

以下是CPython源码(2022年main分支),Objects/unicodeobject.c中的PyUnicode_Append函数。 它是对+=的完整实现。

void
PyUnicode_Append(PyObject **p_left, PyObject *right)
{
    PyObject *left, *res;
    Py_UCS4 maxchar, maxchar2;
    Py_ssize_t left_len, right_len, new_len;

    if (p_left == NULL) {
        if (!PyErr_Occurred())
            PyErr_BadInternalCall();
        return;
    }
    left = *p_left;
    if (right == NULL || left == NULL
        || !PyUnicode_Check(left) || !PyUnicode_Check(right)) {
        if (!PyErr_Occurred())
            PyErr_BadInternalCall();
        goto error;
    }

    /* Shortcuts */
    PyObject *empty = unicode_get_empty();  // Borrowed reference
    if (left == empty) {
        Py_DECREF(left);
        Py_INCREF(right);
        *p_left = right;
        return;
    }
    if (right == empty) {
        return;
    }

    left_len = PyUnicode_GET_LENGTH(left);
    right_len = PyUnicode_GET_LENGTH(right);
    if (left_len > PY_SSIZE_T_MAX - right_len) {
        PyErr_SetString(PyExc_OverflowError,
                        "strings are too large to concat");
        goto error;
    }
    new_len = left_len + right_len;

    if (unicode_modifiable(left)
        && PyUnicode_CheckExact(right)
        && PyUnicode_KIND(right) <= PyUnicode_KIND(left)
        /* Don't resize for ascii += latin1. Convert ascii to latin1 requires
           to change the structure size, but characters are stored just after
           the structure, and so it requires to move all characters which is
           not so different than duplicating the string. */
        && !(PyUnicode_IS_ASCII(left) && !PyUnicode_IS_ASCII(right)))
    {
        /* append inplace */
        if (unicode_resize(p_left, new_len) != 0)
            goto error;

        /* copy 'right' into the newly allocated area of 'left' */
        _PyUnicode_FastCopyCharacters(*p_left, left_len, right, 0, right_len);
    }
    else {
        maxchar = PyUnicode_MAX_CHAR_VALUE(left);
        maxchar2 = PyUnicode_MAX_CHAR_VALUE(right);
        maxchar = Py_MAX(maxchar, maxchar2);

        /* Concat the two Unicode strings */
        res = PyUnicode_New(new_len, maxchar);
        if (res == NULL)
            goto error;
        _PyUnicode_FastCopyCharacters(res, 0, left, 0, left_len);
        _PyUnicode_FastCopyCharacters(res, left_len, right, 0, right_len);
        Py_DECREF(left);
        *p_left = res;
    }
    assert(_PyUnicode_CheckConsistency(*p_left, 1));
    return;

error:
    Py_CLEAR(*p_left);
}

相关笔记