TY - GEN
T1 - Making The Best of Both Worlds
T2 - 30th ACM International Conference on Multimedia, MM 2022
AU - Ma, Wenxuan
AU - Zhang, Jinming
AU - Li, Shuang
AU - Liu, Chi Harold
AU - Wang, Yulin
AU - Li, Wei
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/10/10
Y1 - 2022/10/10
N2 - Extensive studies on Unsupervised Domain Adaptation (UDA) have propelled the deployment of deep learning from limited experimental datasets into real-world unconstrained domains. Most UDA approaches align features within a common embedding space and apply a shared classifier for target prediction. However, since a perfectly aligned feature space may not exist when the domain discrepancy is large, these methods suffer from two limitations. First, the coercive domain alignment deteriorates target domain discriminability due to lacking target label supervision. Second, the source-supervised classifier is inevitably biased to source data, thus it may underperform in target domain. To alleviate these issues, we propose to simultaneously conduct feature alignment in two individual spaces focusing on different domains, and create for each space a domain-oriented classifier tailored specifically for that domain. Specifically, we design a Domain-Oriented Transformer (DOT) that has two individual classification tokens to learn different domain-oriented representations, and two classifiers to preserve domain-wise discriminability. Theoretical guaranteed contrastive-based alignment and the source-guided pseudo-label refinement strategy are utilized to explore both domain-invariant and specific information. Comprehensive experiments validate that our method achieves state-of-the-art on several benchmarks. Code is released at https://github.com/BIT-DA/Domain-Oriented-Transformer.
AB - Extensive studies on Unsupervised Domain Adaptation (UDA) have propelled the deployment of deep learning from limited experimental datasets into real-world unconstrained domains. Most UDA approaches align features within a common embedding space and apply a shared classifier for target prediction. However, since a perfectly aligned feature space may not exist when the domain discrepancy is large, these methods suffer from two limitations. First, the coercive domain alignment deteriorates target domain discriminability due to lacking target label supervision. Second, the source-supervised classifier is inevitably biased to source data, thus it may underperform in target domain. To alleviate these issues, we propose to simultaneously conduct feature alignment in two individual spaces focusing on different domains, and create for each space a domain-oriented classifier tailored specifically for that domain. Specifically, we design a Domain-Oriented Transformer (DOT) that has two individual classification tokens to learn different domain-oriented representations, and two classifiers to preserve domain-wise discriminability. Theoretical guaranteed contrastive-based alignment and the source-guided pseudo-label refinement strategy are utilized to explore both domain-invariant and specific information. Comprehensive experiments validate that our method achieves state-of-the-art on several benchmarks. Code is released at https://github.com/BIT-DA/Domain-Oriented-Transformer.
KW - contrastive learning
KW - mutual information
KW - unsupervised domain adaptation
KW - vision transformer
UR - http://www.scopus.com/inward/record.url?scp=85150982657&partnerID=8YFLogxK
U2 - 10.1145/3503161.3548229
DO - 10.1145/3503161.3548229
M3 - Conference contribution
AN - SCOPUS:85150982657
T3 - MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia
SP - 5620
EP - 5629
BT - MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia
PB - Association for Computing Machinery, Inc
Y2 - 10 October 2022 through 14 October 2022
ER -