TY - JOUR
T1 - Prompt Tuning In a Compact Attribute Space
AU - Hou, Shiyu
AU - Zhou, Tianfei
AU - Zhang, Shuai
AU - Yuan, Ye
AU - Wang, Guoren
N1 - Publisher Copyright:
Copyright © 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2025/4/11
Y1 - 2025/4/11
N2 - Prompt tuning (PT) has emerged as a key to unlocking the power of visual-language models like CLIP for various downstream tasks. Predominant approaches learn a small set of task-relevant soft prompts by solving an image-class matching problem. Nevertheless, by optimizing merely with respect to class names, they face challenges in learning high performant prompts capable of capturing fine-grained, diverse characteristics of each class, and tends to overfit potentially biased distribution of base classes. In this work, we propose PTinCAS to tackle prompt tuning in a compact attribute space, driven by the premise that attributes offer detailed class interpretations and can facilitate transfer across related categories. Particularly, PTinCAS is grounded in two innovative designs. First, we create a compact attribute space by properly prompting large language models to generate factual descriptions about categories, which are subsequently clustered to form a concise attribute vocabulary. Second, we leverage attributes as a source of supervision in PT to transfer the inherent common sense knowledge in attributes to soft prompts. An object-aware visual prompting mechanism is developed to effortlessly highlight intended regions in the original image, which guides the model towards learning visual attributes associated with object regions rather than the background. We show that PTinCAS not only improves few-shot generalizability compared to existing PT methods, but also provides some level of inherent explainability that helps us understand why a class name is determined based on the attributes activated in an image.
AB - Prompt tuning (PT) has emerged as a key to unlocking the power of visual-language models like CLIP for various downstream tasks. Predominant approaches learn a small set of task-relevant soft prompts by solving an image-class matching problem. Nevertheless, by optimizing merely with respect to class names, they face challenges in learning high performant prompts capable of capturing fine-grained, diverse characteristics of each class, and tends to overfit potentially biased distribution of base classes. In this work, we propose PTinCAS to tackle prompt tuning in a compact attribute space, driven by the premise that attributes offer detailed class interpretations and can facilitate transfer across related categories. Particularly, PTinCAS is grounded in two innovative designs. First, we create a compact attribute space by properly prompting large language models to generate factual descriptions about categories, which are subsequently clustered to form a concise attribute vocabulary. Second, we leverage attributes as a source of supervision in PT to transfer the inherent common sense knowledge in attributes to soft prompts. An object-aware visual prompting mechanism is developed to effortlessly highlight intended regions in the original image, which guides the model towards learning visual attributes associated with object regions rather than the background. We show that PTinCAS not only improves few-shot generalizability compared to existing PT methods, but also provides some level of inherent explainability that helps us understand why a class name is determined based on the attributes activated in an image.
UR - http://www.scopus.com/inward/record.url?scp=105003928489&partnerID=8YFLogxK
U2 - 10.1609/aaai.v39i4.32365
DO - 10.1609/aaai.v39i4.32365
M3 - Conference article
AN - SCOPUS:105003928489
SN - 2159-5399
VL - 39
SP - 3518
EP - 3526
JO - Proceedings of the AAAI Conference on Artificial Intelligence
JF - Proceedings of the AAAI Conference on Artificial Intelligence
IS - 4
T2 - 39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025
Y2 - 25 February 2025 through 4 March 2025
ER -