This article discusses a common problem about Python classes’ __init__()
method.
The Dilemma of Arguments in super().__init__()
Let’s revisit the toy example from my previous article about diamond inheritance in Python:
class A:
def __init__(self, a):
self.a = a
class B(A):
def __init__(self, a, b):
super().__init__(a)
self.b = b
class C(A):
def __init__(self, a, c):
super().__init__(a)
self.c = c
class D(B, C):
def __init__(self, a, b, c, d):
# How to call its parent classes' __init__ methods?
self.d = d
In the previous article, we figured out which parent class super()
refers to. The conclusion is: if we call super().__init__(...)
in class D
, it calls the parent classes in MRO order: B.__init__(...)
→ C.__init__()
→ A.__init__(...)
. If we want to call all parent classes’ __init__()
methods exactly once each, we simply need to call super().__init__(...)
in class D
, and Python will elegantly handle the rest.
Since super()
is dynamic, super().__init__(...)
can effectively be B.__init__(...)
, C.__init__(...)
, or A.__init__(...)
. Here’s another question: what should the argument list ...
be?
A natural approach might be super().__init__(a=a, b=b, c=c)
because it covers all the arguments in B
, C
, and A
. However, this is problematic, because B.__init__()
only accepts argument a
and b
. If we call super().__init__(a, b, c)
, it will raise an error indicating c
is not accepted by B.__init__(a, b)
. What if we call super().__init__(a,b)
? This also leads to error because super().__init__(a)
is called after calling B.__init__(a, b)
, and it is equivalent to callingC.__init__(a)
, but C.__init__(a)
requires both b
and c
.
I believe this was one of the main reasons that prevented me from using super()
in my project, leading me to a series of pitfalls. If you go to my previous code pitfall articles, you will find how much trouble I had with this problem.
It’s Multiple Inheritance’s Fault!
This phenomenon only occurs in multiple inheritance, when different direct parents have conflicting argument requirements.
In single inheritance, super()
always calls the direct parent class’s __init__()
method, so the argument lists are monotonic. For example, in the following code:
class A:
def __init__(self, a):
self.a = a
super().__init__()
class B(A):
def __init__(self, a, b):
self.b = b
super().__init__(a)
class C(B):
def __init__(self, a,b,c):
# How to call its parent classes' __init__ methods?
self.c = c
We can safely use super().__init__(a, b)
in C
’s __init__()
method, because it will call B.__init__(a,b)
, which will call A.__init__(a)
, and everything is fine.
Solution: Arbitrary Keyword Arguments
There is an inelegant solution to this problem that I mentioned in my previous article: abandon super()
and call the parent classes’ __init__()
methods explicitly. In this way, the __init__()
of each direct parent is called separately and statically, which causes no ambiguity on which arguments should be passed. But again, as we’ve discussed in previous article, that leads to lots of problems and pitfalls. Therefore, we have to use super()
to handle the inheritance and initialization calls. But how can we do this?
The solution is to use arbitrary arguments. Remember when we call super().__init__(a, b, c)
, it is accepted neither by B.__init__(a, b)
( because of argument c
) nor C.__init__(a, c)
(because of argument b
). If B
andC
could tolerate these extra arguments, then the problems would be solved. In Python, we can use *args, **kwargs
to accept any additional arguments without raising an error. w. So we can modify the code as follows:
class A:
def __init__(self, a, **kwargs):
self.a = a
class B(A):
def __init__(self, a, b, **kwargs):
super().__init__(a, **kwargs)
self.b = b
class C(A):
def __init__(self, a, c, **kwargs):
super().__init__(a, **kwargs)
self.c = c
class D(B, C):
def __init__(self, a, b, c, d):
super().__init__(a=a, b=b, c=c)
self.d = d
Here, when we call super().__init__(a=a, b=b, c=c)
, it will first call B.__init__(a=a, b=b, c=c)
(where c=c
goes to kwargs
), then C.__init__(a=a, b=b, c=c)
(where b=b
goes to kwargs
), and finally A.__init__(a=a, b=b, c=c)
(where b=b
, c=c
goes to kwargs
).
There are a few important points to note here:
- We only use
**kwargs
in the parent classes’__init__()
methods, not*args
. This is because passing positional arguments tosuper()
is not recommended, as it can lead to so much confusion and errors. For example, if we usesuper().__init__(a, b, c)
, it will callB.__init__(a, b, c)
first and map(a, b)
as the corresponding argumentsa
andb
according to position, but then inC.__init__(a, b, c)
, it will also assume thata
is the first argument andb
is the second, which is not correct (we expecta
to be the first argument andc
to be the second).
The Pros and Cons of Arbitrary Arguments
As we see, once a multiple inheritance is involved above a node, we should always use **kwargs
in all its parent classes’ __init__()
methods. For example, when D
involves multiple inheritance B
and C
, class A
should also use **kwargs
in its __init__()
method, even though it seems not directly involved in multiple inheritance. This means a large number of parent classes’ __init__()
methods in the system must accept arbitrary arguments.
Another question is: when should we use **kwargs
in practice? We could analyze which classes involve multiple inheritance and choose accordingly, but this approach is also quite tedious and violates good programming principles. However, if we use **kwargs
in all parent classes’ __init__()
methods, it also has drawbacks: since any arguments are accepted, it becomes difficult to detect unwanted arguments, such as misspelled parameter names. The interpreter won’t raise an error, making debugging more difficult.
What do you think about this? Please let me know if you are experienced in programming and have any suggestions about this.