Search | Korea Science

N-Best Reranking for Improving Automatic Speech Recognition of Korean (N-Best Re-ranking에 기반한 한국어 음성 인식 성능 개선)

Joung Lee;Mintaek Seo;Seung-Hoon Na;Minsoo Na;Maengsik Choi;Chunghee Lee
- Annual Conference on Human and Language Technology
- /
- 2022.10a
- /
- pp.442-446
- /
- 2022
자동 음성 인식(Automatic Speech Recognition) 혹은 Speech-to-Text(STT)는 컴퓨터가 사람이 말하는 음성 언어를 텍스트 데이터로 전환하는 일련의 처리나 기술 등을 일컫는다. 음성 인식 기술이 다양한 산업 전반에 걸쳐 적용됨에 따라 높은 수준의 정확도와 더불어 다양한 분야에 적용할 수 있는 음성 인식 기술에 대한 필요성이 점차 증대되고 있다. 다만 한국어 음성 인식의 경우 기존 선행 연구에 비해 예사말/높임말의 구분이나 어미, 조사 등의 인식에 어려움이 있어 음성 인식 결과 후처리를 통한 성능 개선이 중요하다. 따라서 본 논문에서는 N-Best 음성 인식 결과가 구성되었을 때 Re-ranking을 통해 한국어 음성 인식의 성능을 개선하는 모델을 제안한다.
PDF

A Development of the Power Distribution Map Auto Input & Positioning System for NDIS(New Distribution Information System) DB Construction (신배전정보시스템 DB구축을 위한 도면자동입력 및 위치보정 롱합시스템 개발)

Yi, Bong-Jae
- Proceedings of the KIEE Conference
- /
- 2003.07d
- /
- pp.2585-2587
- /
- 2003
한전에서는 전력설비의 효율적인 관리를 위하여 일찌기 배전분야에 GIS를 도입하여 신배전정보시스템(NDIS : New Distribution Information System)을 구축, 시범운영을 마치고 전국에 단계적으로 확대운영하고 있다. GIS를 업무에 활용하기 위해서는 설비도면의 입력이 선행되어야 하나 이를 수작업에 의존할 경우 많은 비용과 시간이 소요될 뿐만 아니라 입력자의 숙련정도에 따라 자료의 정확도가 달라지게 되므로, 이러한 문제점들을 근본적으로 해결하고자 설비의 위치, 심볼, 계통연결, 속성자료 등을 컴퓨터로 자동인식 입력시켜 수작업을 최소화하는 기법 및 적용연구가 필요하며, 특히 국가기본도를 Base Map으로 사용함에 따른 상대오차 보정문제도 해결되어야 한다. 본 개발은 변전소에서 전력수용가까지의 전력공급설비를 나타내는 배전설비도면에서 도면내 주요 설비인 전주와 전선을 인식하는 방법 즉, 반투명 필름에 손으로 그려진 배전설비도면의 스캐닝 영상을 인식기법을 적용하여 설비내용, 설치위치, 전선종류별 설치상태 등 지리정보시스템에서 사용될 정보를 Digital 형태의 Data로 자동생성하고 국가기본도와의 상대오차보정까지 처리하는 것을 주요내용으로 하고 있다.
PDF

Automatic Recognition System for Wild edible greens using Leaf External Form (잎의 외형을 이용한 산나물 자동인식 시스템)

Kim, Seong-Jung;Ju, Hwi-Yoon;Kim, Hyun-Jung;Won, Il-Yong
- Proceedings of the Korea Information Processing Society Conference
- /
- 2014.11a
- /
- pp.980-983
- /
- 2014
본 논문은 잎의 외형을 이용한 산나물 자동 인식 시스템을 제안한다. 더 나은 산나물 자동 인식 시스템을 위해 잎의 외형 추출뿐만 아니라 방향벡터를 이용하여 정확도를 향상시켰다. 이를 위해 BP와 HMM 알고리즘을 이용하여 개선하였고, 외형 특징점을 추출하여 표현하였다. 제안하는 시스템의 성능은 실험으로 증명하였으며, 어느 정도 의미 있는 결과를 얻을 수 있었다.
https://doi.org/10.3745/PKIPS.y2014m11a.980 인용 PDF

A Development of Cloud Based Auto Video Enhancement Service (클라우드 기반의 영상 자동 향상 서비스개발)

Park, Sang-oh;Choi, Seung-ho;Park, Sang-il
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2018.06a
- /
- pp.130-132
- /
- 2018
최근 1인 미디어의 확장과 맞물려 개인 차원에서의 영상편집이 활성화되고 있다. 인기 영상강의를 위주로 초보자들도 쉽게 접근할 수 있으나 여전히 많은 사람들이 영상제작을 어려워하고 있다. 특히 밝기, 대비 및 색 보정에서 어려움을 많이 겪는다. 전문적인 영상편집 툴의 경우 자동 보정 기능을 제공하고 있으나 파이널 컷의 경우 Apple 사의 맥 디바이스 환경을 구축해야 하는 문제, Adobe사 프로그램의 경우 완전 자동 기능 부재 및 무거운 연산처리 과정 및 유료화로 인한 접근성 저하, 기타 프로그램들의 경우 설치 접근성이 낮다는 단점이 있었다. 이에 본 연구에서는 클라우드 기반의 쉽고 빠른, 접근성을 높인 자동 영상보정 서비스를 제시하려 한다. 최종 단계의 클라우드 서비스에서는 흔들림 보정, 색 보정, 대비 보정, 명암 보정의 향상 기능과 컷 단위 인식, 신단위 인식, 객체 단위 인식의 서비스를 제공해야 한다는 결론에 도출하였다. 본 논문에서는 연구의 시작으로 클라우드 서비스 구축 및 OpenCV를 활용하여 프레임 별 영상 향상 알고리즘 구현을 시행하였다.
PDF

Effective Process Improvement of Plating Line Based on Rack Auto-Identification (피도금걸이-랙 자동인식을 통한 도금라인 공정프로세스 효율화 방안)

Hwang, Jeseong;Song, Oksu;Jang, Minki;Lim, Wontae;Moon, Mikyeong
- Proceedings of the Korea Information Processing Society Conference
- /
- 2013.11a
- /
- pp.409-412
- /
- 2013
본 논문에서는 현재 표면처리산업에서의 피도금걸이-랙에 대한 수동적인 데이터 관리의 문제점을 해결하고자 모바일 RFID(Radio Frequency Identification) 리더기 및 스마트 모바일 디바이스를 이용한 시스템을 제안한다. 이를 위해 본 논문에서는 랙의 라이프사이클에 따른 상태전이를 분석하고, RFID 시스템을 이용한 자동인식기술을 사용하였다. 자동 인식된 랙은 현재 상태에 따라 전이를 일으키는 활동을 수행하도록 시스템을 구축하였으며, 다양한 유형의 랙을 각 상황에 맞게 식별해 내는 방법을 통해 도금라인 프로세스의 효율을 향상시킬 수 있도록 하였다.
https://doi.org/10.3745/PKIPS.y2013m11a.409 인용 PDF

Cyber Threats Analysis of AI Voice Recognition-based Services with Automatic Speaker Verification (화자식별 기반의 AI 음성인식 서비스에 대한 사이버 위협 분석)

Hong, Chunho;Cho, Youngho
- Journal of Internet Computing and Services
- /
- v.22 no.6
- /
- pp.33-40
- /
- 2021
Automatic Speech Recognition(ASR) is a technology that analyzes human speech sound into speech signals and then automatically converts them into character strings that can be understandable by human. Speech recognition technology has evolved from the basic level of recognizing a single word to the advanced level of recognizing sentences consisting of multiple words. In real-time voice conversation, the high recognition rate improves the convenience of natural information delivery and expands the scope of voice-based applications. On the other hand, with the active application of speech recognition technology, concerns about related cyber attacks and threats are also increasing. According to the existing studies, researches on the technology development itself, such as the design of the Automatic Speaker Verification(ASV) technique and improvement of accuracy, are being actively conducted. However, there are not many analysis studies of attacks and threats in depth and variety. In this study, we propose a cyber attack model that bypasses voice authentication by simply manipulating voice frequency and voice speed for AI voice recognition service equipped with automated identification technology and analyze cyber threats by conducting extensive experiments on the automated identification system of commercial smartphones. Through this, we intend to inform the seriousness of the related cyber threats and raise interests in research on effective countermeasures.
https://doi.org/10.7472/jksii.2021.22.6.33 인용 PDF KSCI HTML

Deep learning-based speech recognition for Korean elderly speech data including dementia patients (치매 환자를 포함한 한국 노인 음성 데이터 딥러닝 기반 음성인식)

Jeonghyeon Mun;Joonseo Kang;Kiwoong Kim;Jongbin Bae;Hyeonjun Lee;Changwon Lim
- The Korean Journal of Applied Statistics
- /
- v.36 no.1
- /
- pp.33-48
- /
- 2023
In this paper we consider automatic speech recognition (ASR) for Korean speech data in which elderly persons randomly speak a sequence of words such as animals and vegetables for one minute. Most of the speakers are over 60 years old and some of them are dementia patients. The goal is to compare deep-learning based ASR models for such data and to find models with good performance. ASR is a technology that can recognize spoken words and convert them into written text by computers. Recently, many deep-learning models with good performance have been developed for ASR. Training data for such models are mostly composed of the form of sentences. Furthermore, the speakers in the data should be able to pronounce accurately in most cases. However, in our data, most of the speakers are over the age of 60 and often have incorrect pronunciation. Also, it is Korean speech data in which speakers randomly say series of words, not sentences, for one minute. Therefore, pre-trained models based on typical training data may not be suitable for our data, and hence we train deep-learning based ASR models from scratch using our data. We also apply some data augmentation methods due to small data size.
https://doi.org/10.5351/KJAS.2023.36.1.033 인용 PDF

Design and Implementation of a User Activity Auto-recognition System based on Multimodal Sensor in Ubiquitous Computing Environment (유비쿼터스 컴퓨팅환경에서의 Multimodal Sensor 기반의 Health care를 위한 사용자 행동 자동인식 시스템 - Multi-Sensor를 이용한 ADL(activities of daily living) 지수 자동 측정 시스템)

Byun, Sung-Ho;Jung, Yu-Suk;Kim, Tae-Su;Kim, Hyun-Woo;Lee, Seung-Hwan;Cho, We-Duke
- 한국HCI학회:학술대회논문집
- /
- 2009.02a
- /
- pp.21-26
- /
- 2009
A sensor system capable of automatically recognize activities would allow many potential Ubiquitous applications. This paper presents a new system for recognizing the activities of daily living(ADL) like walking, running, standing, sitting, lying etc. The system based on the state-dependent motion analysis using Tri-Accelerometer and Zigbee tag. Two accelerometers are used for the classification of body and hand activities. Classification of the environment and instrumental activities is performed based on the hand interaction with an object ID using.
PDF

A Study on the Automatic Fuel-Filling-Recognition system for a city bus (자동인식 주유량 처리 시스템에 관한 연구)

김현수;안병원;박중순;박영산;배철오;김철홍
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2001.05a
- /
- pp.414-417
- /
- 2001
In this paper the fuel filling system for a city bus was investigated in order to improve the system. The suggested fuel filling system was designed to have functions of identifying a bus arrival tine, and measuring volume of fuel filled. The system consisted of four parts of bus identification, IBM PC, interface card, fuel filling control system and program for integrating all parts. It is believed that the information obtained by this system can be used for analysing driver's driving habits and performance of engine of a bus, and accordingly the prime cost can be reduced.
PDF

Automatic Word-Spacing of Syllable Bi-gram Information for Korean OCR Postprocessing (음절 Bi-gram정보를 이용한 한국어 OCR 후처리용 자동 띄어쓰기)

전남열;박혁로
- Proceedings of the Korean Society for Cognitive Science Conference
- /
- 2000.06a
- /
- pp.95-100
- /
- 2000
문자 인식기를 가지고 스캔된 원문 이미지를 인식한 결과로 형태소 분석과 어절 분석을 통해 대용량의 문서 정보를 데이터베이스에 구축하고 전문 검색(full text retrieval)이 가능하도록 한다. 그러나, 입력문자가 오인식된 경우나 띄어쓰기가 잘못된 데이터는 형태소 분석이나 어절 분석에 그대로 사용할 수가 없다. 한글 문자 인식의 경우 문자 단위의 인식률은 약 90.5% 정도나 문자 인식 오류와 띄어쓰기 오류 등을 고려한 어절 단위의 인식률은 현저하게 떨어진다. 이를 위해 한극어의 음절 특성을 고려해서 사전을 기반하지 않고 학습이 잘된 말뭉치(corpus)와 음절 단위의 bi-gram 정보를 이용한 자동 띄어쓰기를 하여 실험한 결과 학습 코퍼스의 크기와 띄어쓰기 오류 위치 정보에 따라 다르지만 약 86.2%의 띄어쓰기 정확도를 보였다. 이 결과를 가지고 형태소 분서고가 언어 평가 등을 이용한 문자 인식 후처리 과정을 거치면 문자 인식 시스템의 인식률 향상에 크게 영향을 미칠 것이다.
PDF

Search Result 2,017, Processing Time 0.037 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)