Concatenation of strings in Python

Most Python books or blog posts, teach us that concatenating strings using the + sign is a bad idea. This is true: using + or =+ is a really bad idea. In Python, strings are immutable. This means that every time your assign a new value or you want to increase the size of an existing string, Python has to allocate a new memory space large enough to receive the new string, copy the new string, then deallocate the old space. The following example is really bad Python programming. Python will allocate, copy and deallocate 1000 times the memory for the variable s.

s = ""
for x in range(1000):
  s += str(x) + ', '

The result is that developers everywhere write lines like this:

filename = '.'.join([name, extension])


filename = "%s.%s" % (name, extension)

In the case of simple concatenation like this it doesn't make any sense to use join or string formatting. A simple plus will work faster and is easier to read.

The variable filename storing the result has to be allocated no matter what method you use. The variables name and extension are already allocated and will not be reallocated. In this case simply writing var3 = var1 + var2 makes total sense.

Here is the execution time for each solution:

# That quick benchmark was run on Python 2.7.8
In [1]: timeit("a = fname + ext",
               setup="fname='database'; ext='.dat'")
Out[1]: 0.06346487998962402

In [2]: timeit("a = ''.join((fname, ext))",
               setup="fname='database'; ext='.dat'")
Out[2]: 0.1665630340576172

In [3]: timeit("a = '%s%s' % (fname, ext)",
               setup="fname='database'; ext='.dat'")
Out[3]: 0.19054698944091797

In [4]: timeit("a = '%(fname)s%(ext)s' % (x)",
               setup="x=dict(fname='database', ext='.dat')")
Out[4]: 0.2898139953613281

As you can see in these benchmarks the form var3 = var1 + var2 is the fastest by a factor of 3. It is also obviously easier to read.

Comments !