• Title/Summary/Keyword: voice search

Search Result 89, Processing Time 0.037 seconds

In Search of Models in Speech Communication Research

  • Hiroya, Fujisaki
    • Phonetics and Speech Sciences
    • /
    • v.1 no.1
    • /
    • pp.9-22
    • /
    • 2009
  • This paper first presents the author's personal view on the importance of modeling in scientific research in general, and then describes two of his works toward modeling certain aspects of human speech communication. The first work is concerned with the physiological and physical mechanisms of controlling the voice fundamental frequency of speech, which is an important parameter for expressing information on tone, accent, and intonation. The second work is concerned with the cognitive processes involved in a discrimination test of speech stimuli, which gives rise to the phenomenon of so-called categorical perception. They are meant to illustrate the power of models based on deep understanding and precise formulation of the functions of the mechanisms/processes that underlie observed phenomena. Finally, it also presents the author's view on some models that are yet to be developed.

  • PDF

Algorithm for Concatenating Multiple Phonemic Units for Small Size Korean TTS Using RE-PSOLA Method

  • Bak, Il-Suh;Jo, Cheol-Woo
    • Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.85-94
    • /
    • 2003
  • In this paper an algorithm to reduce the size of Text-to-Speech database is proposed. The algorithm is based on the characteristics of Korean phonemic units. From the initial database, a reduced phoneme unit set is induced by articulatory similarity of concatenating phonemes. Speech data is read by one female announcer for 1000 phonetically balanced sentences. All the recorded speech is then segmented by phoneticians. Total size of the original speech data is about 640 MB including laryngograph signal. To synthesize wave, RE-PSOLA (Residual-Excited Pitch Synchronous Overlap and Add Method) was used. The voice quality of synthesized speech was compared with original speech in terms of spectrographic informations and objective tests. The quality of the synthesized speech is not much degraded when the size of synthesis DB was reduced from 320 MB to 82 MB.

  • PDF

A Study Video using Image and Voice Search (음성과 이미지를 이용한 동영상 검색에 관한 연구)

  • Sin, In-Gyeong;Park, Sung-Hyun;Ahn, Hyo-Chang;Rhee, Sang-Burm
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.11a
    • /
    • pp.568-571
    • /
    • 2012
  • 정보화 사회의 정보 기반 구조로서, 고속 정보망의 구축, 개인용 컴퓨터의 급속한 보급, 멀티미디어 기술의 발전 등으로 인하여 정보 서비스의 새로운 장이 열리고 있다. 동영상 데이터는 텍스트만이 아니라 영상정보, 음성정보등 각종 의미있는 다양한 멀티미디어 정보를 포함하고 있다. 본 논문에서는 동영상에서 음성과 영상을 분리하여 음성을 이용하여 음성열을 분할 및 복원하여 음성을 텍스트로 변환하여 텍스트색인파일을 만들고 영상은 이미지를 분할 및 히스토그램을 사용하여 이미지 샷을 검출하여 두 색인파일을 이용하여 인덱싱을 하여 동영상 검색에 활용한다.

Glanceable and Informative WearOS User Interface for Kids and Parents

  • Kim, Siyeon;Yoon, Hyoseok
    • Journal of Multimedia Information System
    • /
    • v.8 no.1
    • /
    • pp.17-22
    • /
    • 2021
  • This paper proposes a wearable user interface intended for kids and parents using WearOS smartwatches. We first review what constitutes a kids smartwatch and then design UI components for watchfaces to be used by kids and parents. Different UI components ranging from activity, education, voice search, app usage, video, location, health, and quick dial are described. These components are either implemented as complications or on watchfaces and may require on-device standalone function, cross-device communication, and external database. We introduce a theme-based amusing UI for kids whereas simple and easily accessible components are recommended to parents' watchface. To illustrate use cases, we present 3 scenarios for enhancing communication between parents and child. To show feasibility and potential of our approach, we implement our proof-of-concept using commercial smartwatches, smartphones, and external cloud database. Furthermore, performance of checking app usages on different devices are presented, followed by discussion on limitations and future work.

A Development of Speech Recognition System for Mobile Card Search (모바일 명함 검색을 위한 음성인식시스템 구현)

  • Hong, In-Suk;Ko, You-Jung;Kim, Yoon-Joong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.04a
    • /
    • pp.138-141
    • /
    • 2009
  • 모바일 명함 관리 시스템은 간편하게 모바일 기기를 이용하여 명함을 등록하고 검색할 수 있으나 모바일 기기의 특징상 화면이 작고 정보를 이용하기 위해서는 펜을 이용하여 검색어를 입력해야하는 불편함이 있다. 이를 해결하기 위해 명령을 음성으로 처리하고자하는 VUI(Voice User Interface)의 필요성이 증가하였다. 또한 모바일 기기의 메모리 공간상의 제약으로 인한 음성인식엔진 탑재의 어려움이 있다. 이에 본 논문에서는 모바일 단말기로부터 음성을 입력받아 인식결과를 모바일 단말기로 되돌려 주는 음성인식 시스템을 구축하고 본 인식시스템과 모바일 클라이언트 시스템을 분산처리 가능한 웹서비스 환경으로 구성하였다.

Verification of Automatic PAR Control System using DEVS Formalism (DEVS 형식론을 이용한 공항 PAR 관제 시스템 자동화 방안 검증)

  • Sung, Chang-ho;Koo, Jung;Kim, Tag-Gon;Kim, Ki-Hyung
    • Journal of the Korea Society for Simulation
    • /
    • v.21 no.3
    • /
    • pp.1-9
    • /
    • 2012
  • This paper proposes automatic precision approach radar (PAR) control system using digital signal to increase the safety of aircraft, and discrete event systems specification (DEVS) methodology is utilized to verify the proposed system. Traditionally, a landing aircraft is controlled by the human voice of a final approach controller. However, the voice information can be missed during transmission, and pilots may also act improperly because of incorrectness of auditory signals. The proposed system enables the stable operation of the aircraft, regardless of the pilot's capability. Communicating DEVS (C-DEVS) is used to analyze and verify the behavior of the proposed system. A composed C-DEVS atomic model has overall composed discrete state sets of models, and the state sequence acquired through full state search is utilized to verify the safeness and the liveness of a system behavior. The C-DEVS model of the proposed system shows the same behavior with the traditional PAR control system.

Lens Position Error Compensated Fast Auto-focus Algorithm in Mobile Phone Camera Using VCM (VCM을 이용한 휴대폰 카메라에서의 렌즈 위치 오차 보상 고속 자동 초점 알고리즘)

  • Han Chan-Ho;Kim Tae-Kyu;Kwon Seong-Geun
    • Journal of Korea Multimedia Society
    • /
    • v.9 no.5
    • /
    • pp.585-594
    • /
    • 2006
  • Due to the size limit, the voice coil motor (VCM) is adopted in most of the mobile phone camera to control auto-focus instead of step motor. The optical system using the VCM has the property that the focus values are varying even though the same current is induced. It means that an error of the lens position was taken placed due to the characteristics of the VCM. In this paper, a algorithm was proposed to compensate the lens position error using the step size and the search count of each stage. In the proposed algorithm -7 step middle searching stage is inserted the conventional searching algorithm for the fast auto-focus searching and the final searing step size was set to +1 for the precise focus control, respectively. In the experimental results, the focus values was found more fast in the proposed algorithm than the conventional. And more the image quality by the proposed algorithm was superior to that of the conventional.

  • PDF

Big Data Analysis Method for Recommendations of Educational Video Contents (사용자 추천을 위한 교육용 동영상의 빅데이터 분석 기법 비교)

  • Lee, Hyoun-Sup;Kim, JinDeog
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.12
    • /
    • pp.1716-1722
    • /
    • 2021
  • Recently, the capacity of video content delivery services has been increasing significantly. Therefore, the importance of user recommendation is increasing. In addition, these contents contain a variety of characteristics, making it difficult to express the characteristics of the content properly only with a few keywords(Elements used in the search, such as titles, tags, topics, words, etc.) specified by the user. Consequently, existing recommendation systems that use user-defined keywords have limitations that do not properly reflect the characteristics of objects. In this paper, we compare the efficiency of between a method using voice data-based subtitles and an image comparison method using keyframes of images in recommendation module of educational video service systems. Furthermore, we propose the types and environments of video content in which each analysis technique can be efficiently utilized through experimental results.

Personalized Smart Mirror using Voice Recognition (음성인식을 이용한 개인맞춤형 스마트 미러)

  • Dae-Cheol, Kang;Jong-Seok, Lim;Gil-Ho, Lee;Beom-Hee, Lee;Hyoung-Keun, Park
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.17 no.6
    • /
    • pp.1121-1128
    • /
    • 2022
  • Information about the present invention is made available for business use. You are helping to use the LCD, you can't use the LCD screen. During software configuration, Raspbian was used to provide the system environment. We made our way through the menu and made our financial through play. It provides various information such as weather, weather, apps, streamer music, and web browser search function, and it can be charged. Currently, the 'Google Assistant' will be provided through the GUI within a predetermined time.

A Basic Performance Evaluation of the Speech Recognition APP of Standard Language and Dialect using Google, Naver, and Daum KAKAO APIs (구글, 네이버, 다음 카카오 API 활용앱의 표준어 및 방언 음성인식 기초 성능평가)

  • Roh, Hee-Kyung;Lee, Kang-Hee
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.7 no.12
    • /
    • pp.819-829
    • /
    • 2017
  • In this paper, we describe the current state of speech recognition technology and identify the basic speech recognition technology and algorithms first, and then explain the code flow of API necessary for speech recognition technology. We use the application programming interface (API) of Google, Naver, and Daum KaKao, which have the most famous search engine among the speech recognition APIs, to create a voice recognition app in the Android studio tool. Then, we perform a speech recognition experiment on people's standard words and dialects according to gender, age, and region, and then organize the recognition rates into a table. Experiments were conducted on the Gyeongsang-do, Chungcheong-do, and Jeolla-do provinces where the degree of tongues was severe. And Comparative experiments were also conducted on standardized dialects. Based on the resultant sentences, the accuracy of the sentence is checked based on spacing of words, final consonant, postposition, and words and the number of each error is represented by a number. As a result, we aim to introduce the advantages of each API according to the speech recognition rate, and to establish a basic framework for the most efficient use.