Managing records in python

A simple way to access data from a file or from a database is to read each line or row, then assign each value to a list or a tuple. The data that has been read can then be accessed by its position in the list.

Lists, Tuples or Sets

In the following example we read from the file /etc/password, split the line on the char :, and finally the program prints the login field 0 and the home directory field 5 from the password file.

with open('/etc/passwd') as fd:
  for line in fd:
    line = line.strip()
    if line.startswith('#'):
      continue
    record = line.split(':')
    print 'login:', record[0], 'home:', record[5]

Using dictionaries

This works great, but when your program has more than a few lines and you are passing the record between objects from module to module, it can be more convenient to self document your code by naming each field. You can do that using dictionaries as shown in the following example.

fields = ['login', 'passwd', 'id', 'gid', 'gcos', 'home', 'shell']

with open('/etc/passwd') as fd:
  for line in fd:
    line = line.strip()
    if line.startswith('#'):
      continue
    record = dict(zip(fields, line.split(':')))
    print 'login:', record['login'], 'home:', record['home']

Using namedtuple

Namedtuple is a lightweight object factory that can be found in the collections module. It extends the tuple object assigning names to each component of the tuple. Each component of the tuple can then be accessed by its name, like in a dictionary.

The first argument of the namedtuple specifies the name of the new type, the second argument is a string containing the field names.

from collections import namedtuple

Password = namedtuple('Password', 'login passwd id gid gcos home shell')

with open('/etc/passwd') as fd:
  for line in fd:
    line = line.strip()
    if line.startswith('#'):
      continue
    record = Password(*line.split(':'))
    print 'login:', record.login, 'home:', record.home

Now your code is self documented you don't have to read the documentation to find out about the positioning of each field. Namedtuple are also more memory efficient than dictionaries since you don't have to carry the keys with each instance.

Also dictionaries are only hashed only on the keys. Namedtuples have no key therefore hasing them isn't an issue.


Comments !