• Title/Summary/Keyword: Speech Recognition Technology

Search Result 530, Processing Time 0.024 seconds

Noise Reduction for Korean Connected Digit Recognition through Telephone Channel (전화망 환경에서 한국어 숫자음 인식을 위한 잡음처리)

  • Kim Kyuhong;Kim Hoirin
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.211-214
    • /
    • 2003
  • 일반적으로 음성 인식에서의 성능은 잡음의 영향으로 인하여 저하된다. 전화망을 통한 한국어 연속 숫자음 인식은 음성인식 분야에 있어서 어려운 영역에 속하는데, 이는 조음 현상으로 인한 인식률 저하되는 점과 전화망 채널의 영향으로 인하여 스펙트럼 포락이 왜곡되며 음성신호의 대역폭이 제한되기 때문이다. 본 논문에서는 잡음의 영향을 줄이기 위하여, 2WF(2-stage Wiener Filter) 와 SWP (SNR-dependent Waveform Processing) 그리고 CMN(Cepstrum Mean Normalization)을 사용하였다. 2WF는 음성 신호의 포만트 구조를 적게 왜곡시키면서 전체적인 가산잡음 뿐만 아니라 동적 가산잡음도 줄여준다. SWP는 음성파형에서 SNR값이 상대적으로 큰 부분을 강조하여 전체적인 SNR을 향상시킬 수 있다. 또한, CMN은 특징벡터로부터 채널잡음의 영향을 정규화하여 음성 인식 성능을 향상시킨다. 이러한 방법들을 전화망 한국어 연속 숫자음 DB를 이용하여 실험한 결과, 음성신호의 왜곡을 최소화하면서 잡음의 영향을 줄여 전화망에서의 숫자음 인식 성능을 향상시킬 수 있었다.

  • PDF

Improvement of Confidence Measure Performance using Background Model Set Algorithm (BMS 알고리즘을 이용한 거절기능 성능 향상)

  • Kim ByoungDon;Lee KyongRok;Kim JinYoung;Choi SeungHo
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.79-82
    • /
    • 2003
  • In this paper, we proposed Backgorund Model Set algorithm for the speaker verification to improve the shortcoming of calculating process in conventional confidence measure(CM). CM is to display relative likelihood between recognized models and unrecognized models. Unrecognized models is known as antiphone models. Calculate probability and standard deviation using all phonemes at process that compose antiphone model. At this process, antiphone CM brought bad result. Also, recognition time increases. In order problem, we studied about method to reconstitute average and standard deviation taking BMS algorithm using antiphoneme that near phoneme of CM calculation.

  • PDF

Artificial Intelligence Applications on Mobile Telecommunication Systems (AI의 이동통신시스템 적용)

  • Yeh, C.I.;Chang, K.S.;Ko, Y.J.
    • Electronics and Telecommunications Trends
    • /
    • v.37 no.4
    • /
    • pp.60-69
    • /
    • 2022
  • So far, artificial intelligence (AI)/machine learning (ML) has produced impressive results in speech recognition, computer vision, and natural language processing. AI/ML has recently begun to show promise as a viable means for improving the performance of 5G mobile telecommunication systems. This paper investigates standardization activities in 3GPP and O-RAN Alliance regarding AI/ML applications on mobile telecommunication system. Future trends in AI/ML technologies are also summarized. As an overarching technology in 6G, there appears to be no doubt that AI/ML could contribute to every part of mobile systems, including core, RAN, and air-interface, in terms of performance enhancement, automation, cost reduction, and energy consumption reduction.

Development and Utilization of Speech Recognition Service for Ship Radio Communication (선박무선통신 음성인식 서비스 개발 및 활용)

  • Kwang-Il Kim;Sang-Lok Yoo
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2023.11a
    • /
    • pp.236-237
    • /
    • 2023
  • 선박무선통신장비는 선박이 항해하는데 필요한 안전정보, 선박교통 모니터링 및 관제, 입·출항 정보를 교환하기 위한 필수 장비이므로 선박항해사는 무선통신 내용을 항상 주의 깊게 청취해야 함. 본 연구에서는 선박의 실제 음성 교신데이터 500시간 데이터를 수집 및 학습하고, Wav2Vec 및 Whisper 모델을 활용하여 한글 및 영어(해사영어) 음성인식 모델을 개발하고 실용화를 수행하였다. 음성인식 모델의 성능은 CER(Character Error Rate) 기준 94.5%로 향후 선박 운항 관련 댜양한 분야에 적용이 가능할 것으로 사료된다.

  • PDF

A Study on the Development of Language Education Service Platform for Teaching Assistance Robots (교사도우미 로봇을 활용한 어학교육 서비스 플랫폼 구축방안 연구)

  • Yoo, Gab-Sang;Choi, Jong-Chon
    • Journal of Digital Convergence
    • /
    • v.14 no.8
    • /
    • pp.223-232
    • /
    • 2016
  • This study focuses on the new teaching assistance robot platform and the cloud-based education service model to support the server. In the client area we would like to use the teacher assistant robot in elementary school classrooms to utilize the language education service platform. Emerging IoT technology will be adopted to provide a comfortable classroom environment and various media interfaces. Extensive precedent review and case study have been conducted to identify basic requirements of proposed service platform. Embedded system and technology for image recognition, speech recognition, autonomous movement, display, touch screen, IR sensor, GPS, and temperature-humidity sensor were extensively investigated to complete the service. Key findings of this paper are optimized service platform with cloud server system and possibilities of potential smart classroom with intelligent robot by adopting IoT and BIM technology.

A Study on the Implementation of RFID-Based Autonomous Navigation System for Robotic Cellular Phone (RCP) (RFID를 이용한 RCP 자율 네비게이션 시스템 구현을 위한 연구)

  • Choe Jae-Il;Choi Jung-Wook;Oh Dong-Ik;Kim Seung-Woo
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.12 no.5
    • /
    • pp.480-488
    • /
    • 2006
  • Industrial and economical importance of CP(Cellular Phone) is growing rapidly. Combined with IT technology, CP is one of the most attractive technologies of today. However, unless we find a new breakthrough in the technology, its growth may slow down soon. RT(Robot Technology) is considered one of the most promising next generation technologies. Unlike the industrial robot of the past, today's robots require advanced features, such as soft computing, human-friendly interface, interaction technique, speech recognition object recognition, among many others. In this paper, we present a new technological concept named RCP (Robotic Cellular Phone) which integrates RT and CP in the vision of opening a combined advancement of CP, IT, and RT, RCP consists of 3 sub-modules. They are $RCP^{Mobility}$(RCP Mobility System), $RCP^{Interaction}$, and $RCP^{Integration}$. The main focus of this paper is on $RCP^{Mobility}$ which combines an autonomous navigation system of the RT mobility with CP. Through $RCP^{Mobility}$, we are able to provide CP with robotic functions such as auto-charging and real-world robotic entertainment. Ultimately, CP may become a robotic pet to the human beings. $RCP^{Mobility}$ consists of various controllers. Two of the main controllers are trajectory controller and self-localization controller. While the former is responsible for the wheel-based navigation of RCP, the latter provides localization information of the moving RCP With the coordinates acquired from RFID-based self-localization controller, trajectory controller refines RCP's movement to achieve better navigation. In this paper, a prototype of $RCP^{Mobility}$ is presented. We describe overall structure of the system and provide experimental results on the RCP navigation.

Enhancing Korean Alphabet Unit Speech Recognition with Neural Network-Based Alphabet Merging Methodology (한국어 자모단위 음성인식 결과 후보정을 위한 신경망 기반 자모 병합 방법론)

  • Solee Im;Wonjun Lee;Gary Geunbae Lee;Yunsu Kim
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.659-663
    • /
    • 2023
  • 이 논문은 한국어 음성인식 성능을 개선하고자 기존 음성인식 과정을 자모단위 음성인식 모델과 신경망 기반 자모 병합 모델 총 두 단계로 구성하였다. 한국어는 조합어 특성상 음성 인식에 필요한 음절 단위가 약 2900자에 이른다. 이는 학습 데이터셋에 자주 등장하지 않는 음절에 대해서 음성인식 성능을 저하시키고, 학습 비용을 높이는 단점이 있다. 이를 개선하고자 음절 단위의 인식이 아닌 51가지 자모 단위(ㄱ-ㅎ, ㅏ-ㅞ)의 음성인식을 수행한 후 자모 단위 인식 결과를 음절단위의 한글로 병합하는 과정을 수행할 수 있다[1]. 자모단위 인식결과는 초성, 중성, 종성을 고려하면 규칙 기반의 병합이 가능하다. 하지만 음성인식 결과에 잘못인식된 자모가 포함되어 있다면 최종 병합 결과에 오류를 생성하고 만다. 이를 해결하고자 신경망 기반의 자모 병합 모델을 제시한다. 자모 병합 모델은 분리되어 있는 자모단위의 입력을 완성된 한글 문장으로 변환하는 작업을 수행하고, 이 과정에서 음성인식 결과로 잘못인식된 자모에 대해서도 올바른 한글 문장으로 변환하는 오류 수정이 가능하다. 본 연구는 한국어 음성인식 말뭉치 KsponSpeech를 활용하여 실험을 진행하였고, 음성인식 모델로 Wav2Vec2.0 모델을 활용하였다. 기존 규칙 기반의 자모 병합 방법에 비해 제시하는 자모 병합 모델이 상대적 음절단위오류율(Character Error Rate, CER) 17.2% 와 단어단위오류율(Word Error Rate, WER) 13.1% 향상을 확인할 수 있었다.

  • PDF

Development of a Voice-activated Map Information Retrieval System based on MFC (MFC 기반 음성구동 수치지도정보 검색시스템의 구현)

  • Kim, Nag-Cheol;Kim, Tae-Soo;Jo, Myung-Hee;Chung, Hyun-Yeol
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.3 no.1
    • /
    • pp.69-77
    • /
    • 2000
  • In retrieving and analyzing digital map information using mouse or key strokes, it needs several times of repeated mouse operation for designating the range of study area. In this study, we proposed a voice activated map information retrieval system for eliminating such repetitions and we realized the system on the personal computer. The system was constructed in two ways - traditional OLE(object linking embedding) method and MFC(Microsoft fundamental class) method in controlling of window display for practical use. In the system performance evaluation, the retrieval data for digital map were consisted of 68 words uttered by 3 male persons which include attribute words and control words for Susung-gu area of Taegu city in a 1:5,000 map. As the results, we obtained the average 98.02% of recognition rate through on-line tests in the office environment and the operating speed of 5.39 seconds by OLE, 10.38 seconds by MFC. These results showed the possibility for practical use of information retrieval system using speech recognition in digital map.

  • PDF

A Smart Refrigerator System based on Internet of Things (IoT 기반 스마트 냉장고 시스템)

  • Kim, Hanjin;Lee, Seunggi;Kim, Won-Tae
    • Journal of IKEEE
    • /
    • v.22 no.1
    • /
    • pp.156-161
    • /
    • 2018
  • Recently, as the population rapidly increases, food shortages and waste are emerging serious problem. In order to solve this problem, various countries and enterprises are trying research and product development such as a study of consumers' purchasing patterns of food and a development of smart refrigerator using IoT technology. However, the smart refrigerators which currently sold have high price issue and another waste due to malfunction and breakage by complicated configurations. In this paper, we proposed a low-cost smart refrigerator system based on IoT for solving the problem and efficient management of ingredients. The system recognizes and registers ingredients through QR code, image recognition, and speech recognition, and can provide various services of the smart refrigerator. In order to improve an accuracy of image recognition, we used a model using a deep learning algorithm and proved that it is possible to register ingredients accurately.

A Computer Access System for the Physically Disabled Using Eye-Tracking and Speech Recognition (아이트래킹 및 음성인식 기술을 활용한 지체장애인 컴퓨터 접근 시스템)

  • Kwak, Seongeun;Kim, Isaac;Sim, Debora;Lee, Seung Hwan;Hwang, Sung Soo
    • Journal of the HCI Society of Korea
    • /
    • v.12 no.4
    • /
    • pp.5-15
    • /
    • 2017
  • Alternative computer access devices are one of the ways for the physically disabled to meet their desire to participate in social activities. Most of these devices provide access to computers by using their feet or heads. However, it is not easy to control the mouse by using their feet, head, etc. with physical disabilities. In this paper, we propose a computer access system for the physically disabled. The proposed system can move the mouse only by the user's gaze using the eye-tracking technology. The mouse can be clicked through the external button which is relatively easy to press, and the character can be inputted easily and quickly through the voice recognition. It also provides detailed functions such as mouse right-click, double-click, drag function, on-screen keyboard function, internet function, scroll function, etc.