Image classification using vision transformers with complex positional embeddings

Han-Young Kim;Yeong-Jun Cho;

한국정보처리학회:학술대회논문집 (Annual Conference of KIPS)

한국정보처리학회 2024년도 추계학술발표대회
/
Pages.619-621
/
2024
/
2005-0011(pISSN)
/
2671-7298(eISSN)

한국정보처리학회 (Korea Information Processing Society)

복소수 위치 임베딩을 적용한 비전 트랜스포머 활용 이미지 분류

Image classification using vision transformers with complex positional embeddings

김한영 (전남대학교 인공지능융합학과) ;
조영준 (전남대학교 인공지능융합학과)

Han-Young Kim (Dept. of Artificial Intelligence Convergence, Chonnam National University) ;
Yeong-Jun Cho (Dept. of Artificial Intelligence Convergence, Chonnam National University)

발행 : 2024.10.31

PDF

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

본 연구에서는 Complex Order Position Embedding (COPE)을 Vision Transformer (ViT)에 적용하여 컴퓨터 비전 태스크에서의 효과성을 검증하였다. COPE는 복소수 연산을 활용하여 위치 정보를 인코딩하는 방법으로, 기존에 자연어 처리 분야에서 성공적으로 적용된 바 있다. ImageNet-Tiny 데이터셋을 사용한 실험에서, COPE를 적용한 ViT-Tiny 모델은 기존 모델 대비 1.8%p 높은 34.0%의 정확도를 달성하였다. 이는 파라미터 수의 미미한 증가(약 37,000개)만으로 이루어진 성능 향상이다. 본 연구 결과는COPE가 컴퓨터 비전 분야에서도 효과적임을 입증하며, 특히 객체 검출이나 의미론적 분할과 같이 위치 정보가 중요한 고난도 비전 태스크에서의 잠재적 성능 향상 가능성을 제시한다. 이는 복소수 위치 임베딩의 응용 범위를 확장하고, 트랜스포머 기반 비전 모델의 성능 개선을 위한 새로운 방향을 제시한다는 점에서 의의가 있다.

키워드

과제정보

본 연구는 과학기술정보통신부 및 정보통신기획평가원의 인공지능융합혁신인재양성사업(IITP-2023-RS-2023-00256629) 및대학ICT연구센터사업(IITP-2024-RS-2024-00437718)의 연구결과로 수행되었음

참고문헌

Vaswani, A. "Attention is all you need. "Advances in Neural Information Processing Systems (2017).
Sherstinsky, Alex. "Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network." Physica D: Nonlinear Phenomena 404 (2020): 132306.
Dosovitskiy, Alexey. "An image is worth16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929 (2020).
O'Shea, K. "An introduction to convolutional neural networks." arXiv preprint arXiv:1511.08458(2015).
Wang, Benyou, et al. "Encoding word order in complex embeddings." arXiv preprint arXiv:1912.12333 (2019).
Shaw, Peter, Jakob Uszkoreit, and Ashish Vaswani. "Self-attention with relative position representations." arXiv preprint arXiv:1803.02155(2018).
Le, Ya, and Xuan Yang. "Tiny imagenet visual recognition challenge." CS 231N 7.7 (2015):3.

한국정보처리학회:학술대회논문집 (Annual Conference of KIPS)

복소수 위치 임베딩을 적용한 비전 트랜스포머 활용 이미지 분류

Image classification using vision transformers with complex positional embeddings

초록

키워드

과제정보

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)