Abstract
Despite the significant success of transformer models and their successors in various natural language processing (NLP) applications, their internal workings are still not fully understood. Much of the current interpretability research has focused primarily on numerical components, often missing the complex semantic layers within these models. To fill this gap, this study explores the interpretability of the transformer model, a cornerstone of modern NLP, by addressing the semantic complexities of its multi-layer architecture. We identify three key questions: (i) the influence of the multi-layer structure on semantic processing, (ii) the unique contributions of each layer to model performance, and (iii) methodologies for determining optimal layer counts for the encoder and decoder. To tackle these issues, we introduce the semantic interpreter for transformer hierarchy, an innovative framework that employs convex hull metrics to visualize and assess semantic quality and quantity. Our contributions include novel methods for semantic assessment, a dual analytical framework that integrates cumulative and layer-to-layer perspectives, and insights into the dynamics of encoding and decoding. This comprehensive approach aims to enhance the understanding of Transformer models, ultimately guiding their refinement for improved interpretability and effectiveness in natural language tasks.
| Original language | English |
|---|---|
| Pages (from-to) | 1237-1251 |
| Number of pages | 15 |
| Journal | Computer Journal |
| Volume | 68 |
| Issue number | 9 |
| DOIs | |
| Publication status | Published - 1 Sept 2025 |
| Externally published | Yes |
Fingerprint
Dive into the research topics of 'Unravelling the semantic mysteries of transformers layer by layer'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver