TY - JOUR
T1 - Leveraging label hierarchy using transfer and multi-task learning
T2 - A case study on patent classification
AU - Aroyehun, Segun Taofeek
AU - Angel, Jason
AU - Majumder, Navonil
AU - Gelbukh, Alexander
AU - Hussain, Amir
N1 - Publisher Copyright:
© 2021 Elsevier B.V.
PY - 2021/11/13
Y1 - 2021/11/13
N2 - When labels are organized into a meaningful taxonomy, the parent-child relationship between labels at different levels can give the classifier additional information not deducible from the data alone, especially with limited training data. As a case study, we illustrate this effect on the task of patent classification—the task of categorizing patent documents based on their technical content. Existing approaches do not take into consideration this additional information. Experiments on two patent classification datasets, WIPO-alpha and USPTO-2M, show that our regularized Gated Recurrent Unit (GRU) architecture already gives a performance improvement with a micro-averaged precision score using the top prediction of 0.5191 and 0.5740 on the two datasets, respectively. However, knowledge transfer along the label hierarchy gives further significant improvement on WIPO-alpha, raising the score to 0.5376, and a small improvement on USPTO-2M to 0.5743. Our analyses reveal that incorporating label information improves performance on classes with fewer examples and makes model robust to errors that result from predicting closely related labels.
AB - When labels are organized into a meaningful taxonomy, the parent-child relationship between labels at different levels can give the classifier additional information not deducible from the data alone, especially with limited training data. As a case study, we illustrate this effect on the task of patent classification—the task of categorizing patent documents based on their technical content. Existing approaches do not take into consideration this additional information. Experiments on two patent classification datasets, WIPO-alpha and USPTO-2M, show that our regularized Gated Recurrent Unit (GRU) architecture already gives a performance improvement with a micro-averaged precision score using the top prediction of 0.5191 and 0.5740 on the two datasets, respectively. However, knowledge transfer along the label hierarchy gives further significant improvement on WIPO-alpha, raising the score to 0.5376, and a small improvement on USPTO-2M to 0.5743. Our analyses reveal that incorporating label information improves performance on classes with fewer examples and makes model robust to errors that result from predicting closely related labels.
KW - Machine learning
KW - Multi-task learning
KW - Natural language processing
KW - Neural networks
KW - Patent classification
KW - Transfer learning
UR - http://www.scopus.com/inward/record.url?scp=85115652725&partnerID=8YFLogxK
U2 - 10.1016/j.neucom.2021.07.057
DO - 10.1016/j.neucom.2021.07.057
M3 - Artículo
AN - SCOPUS:85115652725
SN - 0925-2312
VL - 464
SP - 421
EP - 431
JO - Neurocomputing
JF - Neurocomputing
ER -