• Title/Summary/Keyword: voice extract

Search Result 72, Processing Time 0.096 seconds

Isolated Word Recognition Using k-clustering Subspace Method and Discriminant Common Vector (k-clustering 부공간 기법과 판별 공통벡터를 이용한 고립단어 인식)

  • Nam, Myung-Woo
    • Journal of the Institute of Electronics Engineers of Korea TE
    • /
    • v.42 no.1
    • /
    • pp.13-20
    • /
    • 2005
  • In this paper, I recognized Korean isolated words using CVEM which is suggested by M. Bilginer et al. CVEM is an algorithm which is easy to extract the common properties from training voice signals and also doesn't need complex calculation. In addition CVEM shows high accuracy in recognition results. But, CVEM has couple of problems which are impossible to use for many training voices and no discriminant information among extracted common vectors. To get the optimal common vectors from certain voice classes, various voices should be used for training. But CVEM is impossible to get continuous high accuracy in recognition because CVEM has a limitation to use many training voices and the absence of discriminant information among common vectors can be the source of critical errors. To solve above problems and improve recognition rate, k-clustering subspace method and DCVEM suggested. And did various experiments using voice signal database made by ETRI to prove the validity of suggested methods. The result of experiments shows improvements in performance. And with proposed methods, all the CVEM problems can be solved with out calculation problem.

Implementation of User Recommendation System based on Video Contents Story Analysis and Viewing Pattern Analysis (영상 스토리 분석과 시청 패턴 분석 기반의 추천 시스템 구현)

  • Lee, Hyoun-Sup;Kim, Minyoung;Lee, Ji-Hoon;Kim, Jin-Deog
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.12
    • /
    • pp.1567-1573
    • /
    • 2020
  • The development of Internet technology has brought the era of one-man media. An individual produces content on user own and uploads it to related online services, and many users watch the content of online services using devices that allow them to use the Internet. Currently, most users find and watch content they want through search functions provided by existing online services. These features are provided based on information entered by the user who uploaded the content. In an environment where content needs to be retrieved based on these limited word data, user unwanted information is presented to users in the search results. To solve this problem, in this paper, the system actively analyzes the video in the online service, and presents a way to extract and reflect the characteristics held by the video. The research was conducted to extract morphemes based on the story content based on the voice data of a video and analyze them with big data technology.

A Study on the Spectrum Variation of Korean Speech (한국어 음성의 스펙트럼 변화에 관한 연구)

  • Lee Sou-Kil;Song Jeong-Young
    • Journal of Internet Computing and Services
    • /
    • v.6 no.6
    • /
    • pp.179-186
    • /
    • 2005
  • We can extract spectrum of the voices and analyze those, after employing features of frequency that voices have. In the spectrum of the voices monophthongs are thought to be stable, but when a consonant(s) meet a vowel(s) in a syllable or a word, there is a lot of changes. This becomes the biggest obstacle to phoneme speech recognition. In this study, using Mel Cepstrum and Mel Band that count Frequency Band and auditory information, we analyze the spectrums that each and every consonant and vowel has and the changes in the voices reftects auditory features and make it a system. Finally we are going to present the basis that can segment the voices by an unit of phoneme.

  • PDF

Emotion Recognition Based on Frequency Analysis of Speech Signal

  • Sim, Kwee-Bo;Park, Chang-Hyun;Lee, Dong-Wook;Joo, Young-Hoon
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.2 no.2
    • /
    • pp.122-126
    • /
    • 2002
  • In this study, we find features of 3 emotions (Happiness, Angry, Surprise) as the fundamental research of emotion recognition. Speech signal with emotion has several elements. That is, voice quality, pitch, formant, speech speed, etc. Until now, most researchers have used the change of pitch or Short-time average power envelope or Mel based speech power coefficients. Of course, pitch is very efficient and informative feature. Thus we used it in this study. As pitch is very sensitive to a delicate emotion, it changes easily whenever a man is at different emotional state. Therefore, we can find the pitch is changed steeply or changed with gentle slope or not changed. And, this paper extracts formant features from speech signal with emotion. Each vowels show that each formant has similar position without big difference. Based on this fact, in the pleasure case, we extract features of laughter. And, with that, we separate laughing for easy work. Also, we find those far the angry and surprise.

A New Current Source Modeling of Silicon Bipolar Transistor for Wireless Transceiver Module (무선 송수신모듈용 실리콘 바이폴라 트랜지스터의 새로운 전류원 모델링)

  • Suh, Young-Suk
    • Journal of the Korean Institute of Illuminating and Electrical Installation Engineers
    • /
    • v.19 no.3
    • /
    • pp.93-98
    • /
    • 2005
  • Silicon bipolar transistors (Si-BJT) are widely used in the telecommunication system such as short range wireless control and wireless indoor voice communication system. New modeling method for the internal current source model of Si-BJT is proposed. The proposed method based on new thermal resistance extraction method and new analytical expressions for the current source parameters of Si-BJT. The proposed method can directly extract the model parameters without any optimization procedure which is adopted in the conventional modeling method. The proposed method is applied to 5 finger $0.4\times20[{\mu}m^2]$ and the model shows good prediction of the measured data in $3[\%]$ of errors proving the validity of this method.

Monosyllable Speech Recognition through Facial Movement Analysis (안면 움직임 분석을 통한 단음절 음성인식)

  • Kang, Dong-Won;Seo, Jeong-Woo;Choi, Jin-Seung;Choi, Jae-Bong;Tack, Gye-Rae
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.63 no.6
    • /
    • pp.813-819
    • /
    • 2014
  • The purpose of this study was to extract accurate parameters of facial movement features using 3-D motion capture system in speech recognition technology through lip-reading. Instead of using the features obtained through traditional camera image, the 3-D motion system was used to obtain quantitative data for actual facial movements, and to analyze 11 variables that exhibit particular patterns such as nose, lip, jaw and cheek movements in monosyllable vocalizations. Fourteen subjects, all in 20s of age, were asked to vocalize 11 types of Korean vowel monosyllables for three times with 36 reflective markers on their faces. The obtained facial movement data were then calculated into 11 parameters and presented as patterns for each monosyllable vocalization. The parameter patterns were performed through learning and recognizing process for each monosyllable with speech recognition algorithms with Hidden Markov Model (HMM) and Viterbi algorithm. The accuracy rate of 11 monosyllables recognition was 97.2%, which suggests the possibility of voice recognition of Korean language through quantitative facial movement analysis.

Structural live load surveys by deep learning

  • Li, Yang;Chen, Jun
    • Smart Structures and Systems
    • /
    • v.30 no.2
    • /
    • pp.145-157
    • /
    • 2022
  • The design of safe and economical structures depends on the reliable live load from load survey. Live load surveys are traditionally conducted by randomly selecting rooms and weighing each item on-site, a method that has problems of low efficiency, high cost, and long cycle time. This paper proposes a deep learning-based method combined with Internet big data to perform live load surveys. The proposed survey method utilizes multi-source heterogeneous data, such as images, voice, and product identification, to obtain the live load without weighing each item through object detection, web crawler, and speech recognition. The indoor objects and face detection models are first developed based on fine-tuning the YOLOv3 algorithm to detect target objects and obtain the number of people in a room, respectively. Each detection model is evaluated using the independent testing set. Then web crawler frameworks with keyword and image retrieval are established to extract the weight information of detected objects from Internet big data. The live load in a room is derived by combining the weight and number of items and people. To verify the feasibility of the proposed survey method, a live load survey is carried out for a meeting room. The results show that, compared with the traditional method of sampling and weighing, the proposed method could perform efficient and convenient live load surveys and represents a new load research paradigm.

Prediction of Closed Quotient During Vocal Phonation using GRU-type Neural Network with Audio Signals

  • Hyeonbin Han;Keun Young Lee;Seong-Yoon Shin;Yoseup Kim;Gwanghyun Jo;Jihoon Park;Young-Min Kim
    • Journal of information and communication convergence engineering
    • /
    • v.22 no.2
    • /
    • pp.145-152
    • /
    • 2024
  • Closed quotient (CQ) represents the time ratio for which the vocal folds remain in contact during voice production. Because analyzing CQ values serves as an important reference point in vocal training for professional singers, these values have been measured mechanically or electrically by either inverse filtering of airflows captured by a circumferentially vented mask or post-processing of electroglottography waveforms. In this study, we introduced a novel algorithm to predict the CQ values only from audio signals. This has eliminated the need for mechanical or electrical measurement techniques. Our algorithm is based on a gated recurrent unit (GRU)-type neural network. To enhance the efficiency, we pre-processed an audio signal using the pitch feature extraction algorithm. Then, GRU-type neural networks were employed to extract the features. This was followed by a dense layer for the final prediction. The Results section reports the mean square error between the predicted and real CQ. It shows the capability of the proposed algorithm to predict CQ values.

A Study of the Factors Influencing Adoption of Mobile VoIP: Applying the UTAUT Model (모바일 VoIP 수용에 영향을 미치는 요인 연구 : UTAUT 모형을 중심으로)

  • Kim, Su-Yeon;Lee, Sang Hoon;Hwang, Hyun-Seok
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.7
    • /
    • pp.3238-3246
    • /
    • 2013
  • The progress of Information Technology enables people can communicate each others using the Internet. As a smartphone proliferates, many free mobile VoIP(Voice over IP) application are developed and accepted by smartphone users. In this study we investigate the factors affecting the acceptance of mobile VoIP applications and the structural relationship among these factors. We review the related works and extract the related factors and build a research model describing the causal relationship among these factors. We conduct an empirical study - a survey and statistical analysis - to verify the research model. EFA(Exploratory Factor Analysis) is applied for variables in the survey and SEM(Structural Equation Model) is used to reveal the structural relationship among the factors. We can find that two factors - usefulness of mobile VoIP and social influence - positively affect usage intention and actual use. These findings imply that it is required to emphasize the benefits of mobile VoIP use and add S/W functionalities enhancing social influence.

Interactions between AI Speaker and Children : A Field Study on the Success/Failure Cases by Types of Interactions (인공지능 스피커와 아동들의 상호작용 :유형별 성공/실패 사례 도출을 위한 현장 연구)

  • Hong, Junglim;Choi, Boreum
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.7
    • /
    • pp.19-29
    • /
    • 2020
  • As the AI speaker market is growing rapidly in recent years, the competition for the preoccupation of children who are the main users and the future prospective customers of the related companies is very intense. However, there is a lack of empirical research on how children interact with AI speakers. Therefore, this research examines the interactions between children and AI speakers, primarily through field studies, to extract what functions they use and what features they have. For this purpose, 799 conversations were collected and analyzed using the log data of the AI speaker recorded in real time. As a result, children were more likely to use children's songs, fairy tales, emotional conversations, and personification compared to adults. In addition, content analysis by specific types resulted in success/failure cases of interaction between children and AI speakers and proposed improvements by failure type. This study is meaningful in that it identifies children's AI speaker preferences, content, and major conversation patterns, and provides guidelines for developing services that meet children's eye level.