How to debug with logging

Many developers often use print statements for debugging their code. Print statements are a good way to trace and troubleshoot potential problems, but they are not great when working on collaborative projects. Prints are also problematic in multi-threaded programs. You have to remove your print statements before shipping your code.

A better way to print debug information is to use the python standard logging module. Here are some of the advantages of using logging:

  • The messages are timestamped with the severity level.
  • The logging module logs useful information such as module name, function name, line number, process id, thread id, etc, etc.
  • Messages can be filtered to only log what you are interested in without changing the program.
  • The messages can be sent to standard out, standard err, files, mails, sockets, and more. You can run you program in production without having debugging messages cluttering the user output.

The logging module

The following program illustrates how I usually use the python logging module for debugging.

#!/usr/bin/env python
import os
import timeit
import logging

# Configure the logging module to print interesting information
loglevel = logging.getLevelName(os.getenv('LEVEL', 'WARN'))
logging.basicConfig(
  format="%(asctime)s %(filename)s.%(funcName)s:%(lineno)d %(message)s",
  level=loglevel
)

def bigloop():
  logging.info('Start')
  for counter in range(100):
    val = counter
    logging.debug('The value is: %d', val)
  logging.info('End')

if __name__ == '__main__':
  logging.info('Main starts')
  bigloop()

Running this program will produce the following output, with all the information I need to debug it: time to a thousandth of a second, name of the program, name of the function, line number and message.

[~]$ export LEVEL=DEBUG
[~]$ python debug-tip.py
2015-09-26 17:13:12,082 debug-tip.py.<module>:20 Main starts
2015-09-26 17:13:12,082 debug-tip.py.bigloop:13 Start
2015-09-26 17:13:12,082 debug-tip.py.bigloop:16 The value is: 0
2015-09-26 17:13:12,083 debug-tip.py.bigloop:16 The value is: 1
2015-09-26 17:13:12,083 debug-tip.py.bigloop:16 The value is: 2
2015-09-26 17:13:12,083 debug-tip.py.bigloop:16 The value is: 3
2015-09-26 17:13:12,083 debug-tip.py.bigloop:16 The value is: 4
2015-09-26 17:13:12,083 debug-tip.py.bigloop:16 The value is: 5
2015-09-26 17:13:12,083 debug-tip.py.bigloop:16 The value is: 6
2015-09-26 17:13:12,083 debug-tip.py.bigloop:16 The value is: 7
2015-09-26 17:13:12,083 debug-tip.py.bigloop:16 The value is: 8
2015-09-26 17:13:12,083 debug-tip.py.bigloop:16 The value is: 9
. . .
2015-09-26 17:13:12,087 debug-tip.py.bigloop:16 The value is: 98
2015-09-26 17:13:12,087 debug-tip.py.bigloop:16 The value is: 99
2015-09-26 17:13:12,087 debug-tip.py.bigloop:17 End

When you want to turn off the debugging messages, just unset the environment variable LEVEL and the messages will disappear.

[~]$ unset LEVEL
[~]$ python debug-tip.py

I can set the environment variable to a different level so only the messages at that level will be printed out.

[~]$ export LEVEL=INFO
[~]$ python debug-tip.py
2015-09-26 17:17:42,026 debug-tip.py.<module>:20 Main starts
2015-09-26 17:17:42,026 debug-tip.py.bigloop:13 Start
2015-09-26 17:17:42,026 debug-tip.py.bigloop:17 End

Everything has a price

The logging module is so great that I use it all over my code to print code states, variable information, to help me to debug. That awesomeness comes at a price, especially for the line logging.debug(...). A call to logging.debug is made for every run of the loop. In a big program or an event based program, these little calls can be very expensive and make your program run slower.

To solve that problem some people just call that line if the logging LEVEL is set to DEBUG. This new version of the program includes an if to test if the line logging.debug has to be called.

#!/usr/bin/env python
import os
import timeit
import logging

# Configure the logging module to print interesting information
loglevel = logging.getLevelName(os.getenv('LEVEL', 'WARN'))
logging.basicConfig(
  format="%(asctime)s %(filename)s.%(funcName)s:%(lineno)d %(message)s",
  level=loglevel
)

def bigloop():
  logging.info('Start')
  for counter in range(100):
    val = counter
    if loglevel == logging.getLevelName('DEBUG'):
      logging.debug('The value is: %d', val)
  logging.info('End')

if __name__ == '__main__':
  logging.info('Main starts')
  timer = timeit.Timer('bigloop()', 'from __main__ import bigloop')
  print timer.timeit(5000)

Timing the execution of this program shows us that the program runs 3 times faster without the call to logging.debug.

# without the if
[~]$ python debug-tip.py
0.623032093048

# with the if
[~]$ python debug-tip.py
0.243802070618

A better way to do it

This is a huge improvement in terms of execution time, but not really an improvement in terms of code readability. Having these 2 lines all over your code might clutter your program. This is why some people create a debug function.

#!/usr/bin/env python
import os
import timeit
import logging

# Configure the logging module to print interesting information
loglevel = logging.getLevelName(os.getenv('LEVEL', 'WARN'))
logging.basicConfig(
  format="%(asctime)s %(filename)s.%(funcName)s:%(lineno)d %(message)s",
  level=loglevel
)

def debug(msg):
  if loglevel == logging.getLevelName('DEBUG'):
    logging.debug(msg)

def bigloop():
  logging.info('Start')
  for counter in range(100):
    val = counter
    debug('The value is: %d' % val)
  logging.info('End')

if __name__ == '__main__':
  logging.info('Main starts')
  timer = timeit.Timer('bigloop()', 'from __main__ import bigloop')
  print timer.timeit(5000)

This new version is better in terms of readability but once again slower. A call to the debug function is made every time.

[~]$ python debug-tip.py
0.584649085999

Assertions

To solve that problem I use assertions. An assertion is a sanity-check that you can turn on or turn off when you are done testing the program.

When you ship a program to production or when you create a package, run python with the option -O for optimize or set the environment variable PYTHONOPTIMIZE. When you do this, python removes all the assertions.

Here is the final version of the program using assertions:

#!/usr/bin/env python
import os
import timeit
import logging

# Configure the logging module to print interesting information
loglevel = logging.getLevelName(os.getenv('LEVEL', 'WARN'))
logging.basicConfig(
  format="%(asctime)s %(filename)s.%(funcName)s:%(lineno)d %(message)s",
  level=loglevel
)

def debug(msg):
  logging.debug(msg)
  return True

def bigloop():
  logging.info('Start')
  for counter in range(100):
    val = counter
    assert debug('The value is: %d' % val)
  logging.info('End')

if __name__ == '__main__':
  logging.info('Main starts')
  timer = timeit.Timer('bigloop()', 'from __main__ import bigloop')
  print timer.timeit(5000)

As you can see below, the program now runs 34 times faster. All the debug information is still present in the source code and can be turned on or off at anytime. Just set the environment variable LEVEL and run the program without the optimization flag.

[~]$ python debug-tip.py
0.927535057068
[~]$ python -O debug-tip.py
0.027086019516


Comments !