Python descriptors

Python descriptors have been around for a long time, but probably because of the lack of good documentation they are still not widely used nor understood.

Here is what the Python documentation says about descriptors:

In general, a descriptor is an object attribute with "binding behavior", one whose attribute access has been overridden by methods in the descriptor protocol. Those methods are __get__(), __set__(), and __delete__(). If any of those methods are defined for an object, it is said to be a descriptor.

This simply means that a class with the methods __get__, __set__ and __delete__ can be bound to another class and these methods will overwrite the attributes that class is bound to.

OK, this is still confusing. Let's write some code to explain all this. To explain how this thing works we are going to write a "validator" class, similar to the ones you are used to see in web frameworks like Django, or Flask.

We have a class describing a product at a hardware store. For now this class has the name of the product and the quantity in stock. We need to ensure that the quantity is an integer. One way to do that would be to use a property setter.

class Product(object):

    @property
    def quantity(self):
        return self._quantity

    @quantity.setter
    def quantity(self, value):
        if not isinstance(value, int):
            raise ValueError('Only integer is allowed')
        self._quantity = value

The problem with getters and setters is that you need one for every property in every class in your project.

This is how you solve that problem using descriptors.

class IntValidator(object):

    def __get__(self, instance, otype):
        return self.value

    def __set__(self, instance, value):
        if not isinstance(value, int):
            raise ValueError('Only integer is allowed')
        self.value = value

class Product(object):
    quantity = IntValidator()

instock = Product()
instock.name = 'Nails'
instock.quantity = 'twelve'

--------------------------------------------------------------------------
ValueError                               Traceback (most recent call last)
<ipython-input-14-36dea9a2eacc> in <module>()
     20 instock = Product()
     21 instock.name = 'Nails'
---> 22 instock.quantity = 'twelve'

<ipython-input-14-36dea9a2eacc> in __set__(self, instance, value)
     11         print self.__class__, 'set'
     12         if not isinstance(value, int):
---> 13             raise ValueError('Only integer is allowed')
     14         self.value = value
     15

ValueError: Only integer is allowed

The attribute quantity only accepts integers.

instock.quantity = 12
print instock.quantity
12
print type(instock.quantity)
int

This looks fantastic, but that version doesn't really work. There are few gotchas. For python to automatically invoke the __get__ and __set__ methods, descriptors need to be defined at the class level. The problem is that all the instances of Product share the same instance of IntValidator, leading to the following kind of behavior.

class Product(object):
    quantity = IntValidator()

instock = Product()
instock.quantity = 12
ordered = Product()
ordered.quantity = 42

print 'instock:', instock.quantity
print 'ordered:', ordered.quantity

instock: 42
ordered: 42

For this to work we need to some bookkeeping and track the data for each instance of the class using that particular descriptor.

The first argument of the descriptor's methods is the caller's instance. We can use a dictionary to save the data of each instance using that argument as key. Like in the following example:

from weakref import WeakKeyDictionary
class IntValidator(object):

    def __init__(self, default=None):
        self.values = WeakKeyDictionary()

    def __get__(self, instance, otype):
        return self.values[instance]

    def __set__(self, instance, value):
        if not isinstance(value, int):
            raise ValueError('Only integer is allowed')
        self.values[instance] = value

class Product(object):
    quantity = IntValidator()

instock = Product()
instock.name = 'Bolts'
instock.quantity = 12

ordered = Product()
ordered.quantity = 42

print 'instock', instock.quantity
instock 12

print 'ordered', ordered.quantity
ordered 42

This will work for as long as the instance can be hashed. For instance you cannot use the IntValidators described here in a class that subclasses a list, a dict or a set.

To work around this problem you can use metaclasses to label each descriptor. It solves the problem of non hashable instances, but it does it by adding all the complexity of metaclasses. I won't cover the details of metaclasses in this article because this is a subject for a entire new blog post. This method covers more than 90% of the use cases you will encounter in your project.


Comments !