• Title/Summary/Keyword: Speech Recognition Technology

Search Result 527, Processing Time 0.029 seconds

Pitch Period Detection Algorithm Using Rotation Transform of AMDF (AMDF의 회전변환을 이용한 피치 주기 검출 알고리즘)

  • Seo, Hyun-Soo;Bae, Sang-Bum;Kim, Nam-Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.2
    • /
    • pp.1019-1022
    • /
    • 2005
  • As recent information communication technology is rapidly developed, a lot of researches related to speech signal processing have been processed. So pitch period is applied as important factor to many application fields such as speech recognition, speaker identification, speech analysis and synthesis. Therefore, many algorithms related to pitch detection have been proposed in time domain and frequency domain and AMDF(average magnitude difference function) which is one of pitch detection algorithms in time domain chooses time interval from valley to valley as pitch period. But, in selection of valley point to detect pitch period, complexity of the algorithm is increased. So in this paper we proposed pitch detection algorithm using rotation transform of AMDF, that taking the global minimum valley point as pitch period and established a threshold about the phoneme in beginning portion, to exclude pitch period selection. and compared existing methods with proposed method through simulation.

  • PDF

Analysis of IT Technology through the Trends in Home Video Game Console (가정용 게임기 동향을 통해 본 IT 기술 분석)

  • Bae, Jung-Min;Bae, Yu-Mi;Jung, Sung-Jae;Jang, Rea-Young;Sung, Kyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.05a
    • /
    • pp.675-678
    • /
    • 2014
  • One time, Home video game console's penetration was as comparable to the personal computer's penetration, growth has slowed since the advent of smartphones, tablets and moblie devices. But game console actively introducing new IT technologies not available in the pc games and mobile games, still keeping a firm position in the relevent market. In this paper Home video game console's history, contemporary trends, and learn about trends in the company, New IT technologies applied to gaming was analyzed. Home video console market become the arena of New IT technologies according to the introduction of New IT technologies such as gesture recognition technology, speech recognition technology, media facade technology, virtual reality technology.

  • PDF

Method of Automatically Generating Metadata through Audio Analysis of Video Content (영상 콘텐츠의 오디오 분석을 통한 메타데이터 자동 생성 방법)

  • Sung-Jung Young;Hyo-Gyeong Park;Yeon-Hwi You;Il-Young Moon
    • Journal of Advanced Navigation Technology
    • /
    • v.25 no.6
    • /
    • pp.557-561
    • /
    • 2021
  • A meatadata has become an essential element in order to recommend video content to users. However, it is passively generated by video content providers. In the paper, a method for automatically generating metadata was studied in the existing manual metadata input method. In addition to the method of extracting emotion tags in the previous study, a study was conducted on a method for automatically generating metadata for genre and country of production through movie audio. The genre was extracted from the audio spectrogram using the ResNet34 artificial neural network model, a transfer learning model, and the language of the speaker in the movie was detected through speech recognition. Through this, it was possible to confirm the possibility of automatically generating metadata through artificial intelligence.

Gendered innovation for algorithm through case studies (음성·영상 신호 처리 알고리즘 사례를 통해 본 젠더혁신의 필요성)

  • Lee, JiYeoun;Lee, Heisook
    • Journal of Digital Convergence
    • /
    • v.16 no.12
    • /
    • pp.459-466
    • /
    • 2018
  • Gendered innovations is a term used by policy makers and academics to refer the process of creating better research and development (R&D) for both men and women. In this paper, we analyze the literatures in image and speech signal processing that can be used in ICT, examine the importance of gendered innovations through case study. Therefore the latest domestic and foreign literature related to image and speech signal processing based on gender research is searched and a total of 9 papers are selected. In terms of gender analysis, research subjects, research environment, and research design are examined separately. Especially, through the case analysis of algorithms of the elderly voice signal processing, machine learning, machine translation technology, and facial gender recognition technology, we found that there is gender bias in existing algorithms, and which leads to gender analysis is required. We also propose a gendered innovations method integrating sex and gender analysis in algorithm development. Gendered innovations in ICT can contribute to the creation of new markets by developing products and services that reflect the needs of both men and women.

Performance Analyzer for Embedded AI Processor (내장형 인공지능 프로세서를 위한 성능 분석기)

  • Hwang, Dong Hyun;Yoon, Young Hyun;Han, Chang Yeop;Lee, Seung Eun
    • Journal of Internet Computing and Services
    • /
    • v.21 no.5
    • /
    • pp.149-157
    • /
    • 2020
  • Recently, as interest in artificial intelligence has increased, many studies have been conducted to implement AI processors. However, the AI processor requires functional verification as well as performance verification on whether the AI processor is suitable for the application. In this paper, We propose an AI processor performance analyzer that can verify the application performance and explore the limitations of the processor. By Using the performance analyzer, we explore the limitations of the AI processor and optimize the AI model to fit an AI processor in image recognition and speech recognition applications.

Study on Gesture and Voice-based Interaction in Perspective of a Presentation Support Tool

  • Ha, Sang-Ho;Park, So-Young;Hong, Hye-Soo;Kim, Nam-Hun
    • Journal of the Ergonomics Society of Korea
    • /
    • v.31 no.4
    • /
    • pp.593-599
    • /
    • 2012
  • Objective: This study aims to implement a non-contact gesture-based interface for presentation purposes and to analyze the effect of the proposed interface as information transfer assisted device. Background: Recently, research on control device using gesture recognition or speech recognition is being conducted with rapid technological growth in UI/UX area and appearance of smart service products which requires a new human-machine interface. However, few quantitative researches on practical effects of the new interface type have been done relatively, while activities on system implementation are very popular. Method: The system presented in this study is implemented with KINECT$^{(R)}$ sensor offered by Microsoft Corporation. To investigate whether the proposed system is effective as a presentation support tool or not, we conduct experiments by giving several lectures to 40 participants in both a traditional lecture room(keyboard-based presentation control) and a non-contact gesture-based lecture room(KINECT-based presentation control), evaluating their interests and immersion based on contents of the lecture and lecturing methods, and analyzing their understanding about contents of the lecture. Result: We check that whether the gesture-based presentation system can play effective role as presentation supporting tools or not depending on the level of difficulty of contents using ANOVA. Conclusion: We check that a non-contact gesture-based interface is a meaningful tool as a sportive device when delivering easy and simple information. However, the effect can vary with the contents and the level of difficulty of information provided. Application: The results presented in this paper might help to design a new human-machine(computer) interface for communication support tools.

Optimal Design of a MEMS-type Piezoelectric Microphone (MEMS 구조 압전 마이크로폰의 최적구조 설계)

  • Kwon, Min-Hyeong;Ra, Yong-Ho;Jeon, Dae-Woo;Lee, Young-Jin
    • Journal of Sensor Science and Technology
    • /
    • v.27 no.4
    • /
    • pp.269-274
    • /
    • 2018
  • High-sensitivity signal-to-noise ratio (SNR) microphones are essentially required for a broad range of automatic speech recognition applications. Piezoelectric microphones have several advantages compared to conventional capacitor microphones including high stiffness and high SNR. In this study, we designed a new piezoelectric membrane structure by using the finite elements method (FEM) and an optimization technique to improve the sensitivity of the transducer, which has a high-quality AlN piezoelectric thin film. The simulation demonstrated that the sensitivity critically depends on the inner radius of the top electrode, the outer radius of the membrane, and the thickness of the piezoelectric film in the microphone. The optimized piezoelectric transducer structure showed a much higher sensitivity than that of the conventional piezoelectric transducer structure. This study provides a visible path to realize micro-scale high-sensitivity piezoelectric microphones that have a simple manufacturing process, wide range of frequency and low DC bias voltage.

A clustering algorithm of statistical langauge model and its application on speech recognition (통계적 언어 모델의 clustering 알고리즘과 음성인식에의 적용)

  • Kim, Woo-Sung;Koo, Myoung-Wan
    • Annual Conference on Human and Language Technology
    • /
    • 1996.10a
    • /
    • pp.145-152
    • /
    • 1996
  • 연속음성인식 시스템을 개발하기 위해서는 언어가 갖는 문법적 제약을 이용한 언어모델이 요구된다. 문법적 규칙을 이용한 언어모델은 전문가가 일일이 문법 규칙을 만들어 주어야 하는 단점이 있다. 통계적 언어 모델에서는 문법적인 정보를 수작업으로 만들어 주지 않는 대신 그러한 모든 정보를 학습을 통해서 훈련해야 하기 때문에 이를 위해 요구되는 학습 데이터도 엄청나게 증가한다. 따라서 적은 양의 데이터로도 이와 유사한 효과를 보일 수 있는 것이 클래스에 의거한 언어 모델이다. 또 이 모델은 음성 인식과 연계시에 탐색 공간을 줄여 주기 때문에 실시간 시스템 구현에 매우 유용한 모델이다. 여기서는 자동으로 클래스를 찾아주는 알고리즘을 호텔예약시스템의 corpus에 적용, 분석해 보았다. Corpus 자체가 문법규칙이 뚜렷한 특성을 갖고 있기 때문에 heuristic하게 클래스를 준 것과 유사한 결과를 보였지만 corpus 크기가 커질 경우에는 매우 유용할 것이며, initial map을 heuristic하게 주고 그 알고리즘을 적용한 결과 약간의 성능향상을 볼 수 있었다. 끝으로 음성인식시스템과 접합해 본 결과 유사한 결과를 얻었으며 언어모델에도 음향학적 특성을 반영할 수 있는 연구가 요구됨을 알 수 있었다.

  • PDF

Post-Processing of Speech Recognition Using Phonological Variables and Improved Edit-distance (발음 변이와 개선된 편집 거리를 이용한 음성 인식 후처리)

  • Kim, Yejin;Park, Youngmin;Kang, Sangwoo;Jung, Sangkeon;Lee, Cheongjae;Seo, Jungyun
    • Annual Conference on Human and Language Technology
    • /
    • 2014.10a
    • /
    • pp.9-12
    • /
    • 2014
  • 본 논문에서는 오인식된 고유명사의 후처리 방법을 제안한다. 최근 음성 인식 후처리를 위해 통계적 방법을 이용하는 연구가 활발히 진행되어 왔다. 하지만 고유명사의 음성 인식 후처리는 대용량의 데이터 수집에 많은 비용이 필요하므로 통계적 방법을 효과적으로 적용하기 어렵다. 따라서 본 논문에서는 발음 변이 현상을 고려하여 편집 거리 알고리즘을 개선한 기법을 제안한다. 본 논문에서는 고유명사의 음성 오인식 교정 성능을 검증하였고, 그 결과 P@3의 결과가 비교 모델보다 55%의 성능 향상률을 보였다.

  • PDF

Subword Modeling of Vocabulary Independent Speech Recognition Using Phoneme Clustering (음소 군집화 기법을 이용한 어휘독립음성인식의 음소모델링)

  • Koo Dong-Ook;Choi Joon Ki;Yun Young-Sun;Oh Yung-Hwan
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.33-36
    • /
    • 2000
  • 어휘독립 고립단어인식은 미리 훈련된 부단어(sub-word) 단위의 음향모델을 이용하여 수시로 변하는 인식대상어휘를 인식하는 것이다. 본 논문에서는 소용량 음성 데이터베이스를 이용하여 어휘독립음성인식 시스템을 구성하였다. 소용량 음성 데이터베이스에서 미관측문맥 종속형 부단어에 대한 처리에 효과적인 백오프 기법을 이용한 음소 군집화 방법으로 문턱값을 변화시키며 인식실험을 수행하였다. 그리고 훈련용 데이터의 부족으로 인하여 문맥 종속형 부단어 모델이 훈련용 데이터베이스로 편중되는 문제를 deleted interpolation 방법을 이용하여 문맥 종속형 부단어 모델과 문맥 독립형 부단어 모델을 병합함으로써 해결하였다. 그 결과 음성인식의 성능이 향상되었다.

  • PDF