AI Concept Takeaway: Multi-Task Learning

My takeaway series follow a Q&A format to explain AI concepts at three levels:

Conceptual Level

Anyone with general knowledge can understand them.

Implementation Level

For anyone who wants to dive into the code implementation details of the concept.

Mathematical Level

For anyone who wants to understand the mathematics behind the technique.

What is multi-task learning?

Multi-Task Learning (MTL) is a machine learning paradigm that allows a machine learning model to learn multiple tasks together.

Not to confuse with multi-label learning, which belongs to single-task classification problem where a single input can have multiple labels. Some might use “multi-task” to describe multi-label learning, such as this. That is because multi-task learning is sometimes a broader concept consisting of SIMO (Single Input Multiple Output) and MIMO (Multiple Input Multiple Output) learning, where SIMO is multi-label learning and MIMO is this multi-task learning. In this article, I will focus on the latter.

What is the objective of multi-task learning?

The objective is to improve average performance across all tasks.

Moreover, multi-task learning is more about knowledge sharing, i.e. to improve over single-task learning of all tasks independently (I refer to this as independent learning), and to promote greater generalisation ability.

What are the differences between multi-task learning and other machine learning paradigms?

This is thoroughly discussed in my article about continual learning.

Figure 1: Illustration about differences among Multi-task paradigms. (Source: De Lange et al. 2021)

What are the input and output of multi-task learning?

We limit our scope to classification problem. Multi-task learning is a machine learning paradigm:

Inputs:

$T$ tasks $t = 1, 2, \dots, T$
Training data of tasks $D_{train}^{(t)} = {(x_{i}, y_{i})}_{i = 1}^{N_{t}} \in (X^{(t)}, Y^{(t)})$
A initialized machine learning model $f$ that can produce logits of the task given input $(x, y, t)$ from any task $t$

Outputs:

Trained model $f (θ^{⋆})$ that gives good performance on test data of tasks $D_{test}^{(t)} \in (X^{(t)}, Y^{(t)})$ .

References

De Lange, Matthias, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Aleš Leonardis, Gregory Slabaugh, and Tinne Tuytelaars. 2021. “A Continual Learning Survey: Defying Forgetting in Classification Tasks.” IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (7): 3366–85.