Sound Event Classification Based on Concatenated Residual Network Applicable to Closed Captioning Services for the Hearing Impaired

Kim, Nam Kyun;Park, Dong Keun;Kim, Jun Ho;Kim, Hong Kook;Ahn, Chung Hyun;

한국방송∙미디어공학회:학술대회논문집 (Proceedings of the Korean Society of Broadcast Engineers Conference)

한국방송∙미디어공학회 (The Korean Institute of Broadcast and Media Engineers)

청각장애인용 자막방송 서비스를 위한 연쇄잔차 신경망 기반 음향 사건 분류 기법

Sound Event Classification Based on Concatenated Residual Network Applicable to Closed Captioning Services for the Hearing Impaired

김남균 (광주과학기술원) ;
박동건 (광주과학기술원) ;
김준호 (광주과학기술원) ;
김홍국 (광주과학기술원) ;
안충현 (한국전자통신연구원)

Kim, Nam Kyun (Gwangju Institute of Science and Technology) ;
Park, Dong Keun (Gwangju Institute of Science and Technology) ;
Kim, Jun Ho (Gwangju Institute of Science and Technology) ;
Kim, Hong Kook (Gwangju Institute of Science and Technology) ;
Ahn, Chung Hyun (Electronics and Telecommunications Research Institute)

발행 : 2020.07.13

PDF

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

본 논문에서는 청각장애인에게 자막방송을 제공하기 위하여 오디오 콘텐츠에 등장하는 음향 사건을 분류하는 기법을 제안한다. 제안된 기법은 복수의 잔차 신경망(ResNet)을 연결하는 연쇄잔차(concatenated residual) 신경망 구조를 갖는다. 신경망의 입력 특징을 위해 음성의 멜-주파수 켑스트럼 벡터를 다수의 프레임으로 결합하여 형성한 2 차원 이미지와 전체 프레임에 대한 멜-주파수 켑스트럼 벡터들로부터 얻은 1 차원의 통계 특징벡터를 얻는다. 각각의 입력은 2 차원 잔차 신경망과 1 차원 잔차 신경망으로 모델링되고, 두 개의 잔차 신경망을 연쇄연결(concatenation)하는 구조를 가진 연쇄잔차 신경망으로 구성된다. 성능평가를 위해 수집된 데이터셋으로부터 6-fold 교차검증을 통해 평가한 결과, 85.48%의 분류 정확도를 얻을 수 있었다.

한국방송∙미디어공학회:학술대회논문집 (Proceedings of the Korean Society of Broadcast Engineers Conference)

청각장애인용 자막방송 서비스를 위한 연쇄잔차 신경망 기반 음향 사건 분류 기법

Sound Event Classification Based on Concatenated Residual Network Applicable to Closed Captioning Services for the Hearing Impaired

초록

키워드