• 제목/요약/키워드: Sound Segmentation

검색결과 29건 처리시간 0.022초

SVM을 이용한 자동 음소분할에 관한 연구 (Research about auto-segmentation via SVM)

  • 권호민;한학용;김창근;허강인
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2003년도 하계종합학술대회 논문집 Ⅳ
    • /
    • pp.2220-2223
    • /
    • 2003
  • In this paper we used Support Vector Machines(SVMs) recently proposed as the loaming method, one of Artificial Neural Network, to divide continuous speech into phonemes, an initial, medial, and final sound, and then, performed continuous speech recognition from it. Decision boundary of phoneme is determined by algorithm with maximum frequency in a short interval. Recognition process is performed by Continuous Hidden Markov Model(CHMM), and we compared it with another phoneme divided by eye-measurement. From experiment we confirmed that the method, SVMs, we proposed is more effective in an initial sound than Gaussian Mixture Models(GMMs).

  • PDF

Support Vector Machines에 의한 음소 분할 및 인식 (Phoneme segmentation and Recognition using Support Vector Machines)

  • 이광석;김현덕
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국해양정보통신학회 2010년도 춘계학술대회
    • /
    • pp.981-984
    • /
    • 2010
  • 우리는 본 연구에서 학습방법으로서 연속음성을 초성, 중성, 종성의 음소단위로 분할하기 위하여 인공 신경회로망의 하나인 SVMs을 사용하였으며 분할한 음소단위의 음성으로 연속음성인식에 적용하여 그 성능을 살펴보았다. 음소경계는 단 구간에서의 최대 주파수를 가진 알고리듬에 의하여 결정되며 또한 음성인식처리는 CHMM에 의하여 이루어지며 목측에 의한 분할결과와도 비교하여 살펴보았다. 시뮬레이션 결과로부터 초성의 분할성능에서 제안한 SVMs를 적용한 결과가 GMMs보다 효율적인을 알 수 있었다.

  • PDF

청각장애 아동의 음운인식 능력과 단어확인 능력의 상관연구 (A Study of Correlation Between Phonological Awareness and Word Identification Ability of Hearing Impaired Children)

  • 김유경;김문정;안종복;석동일
    • 음성과학
    • /
    • 제13권3호
    • /
    • pp.155-167
    • /
    • 2006
  • Hearing impairment children possess poor underlying perceptual knowledge of the sound system and show delayed development of segmental organization of that system. The purpose of this study was to investigate the relationship between phonological awareness ability and word identification ability in hearing impaired children. 14 children with moderately severe hearing loss participated in this study. All tasks were individually administered. Phonological awareness tests consisted of syllable blending, syllable segmentation, syllable deletion, body-coda discrimination, phoneme blending, phoneme segmentation and phoneme deletion. Close-set Monosyllabic Words(12 items) and lists 1 and 2 of open-set Monosyllabic Words in EARS-K were examined for word identification. Results of this study were as follows: First, from the phonological awareness task, the close-set word identification showed a high positive correlation with the coda discrimination, phoneme blending and phoneme deletion. The open-set word identification showed a high positive correlation with phoneme blending, phoneme deletion and phoneme segmentation. Second, from the level of phonological awareness, the close-set word identification showed a high positive correlation with the level of body-coda awareness and phoneme awareness while the open-set word identification showed a high positive correlation only with the level of phoneme awareness.

  • PDF

한국어 고립 단어 음성의 자음/모음/유성자음 음가 분할 및 인식에 관한 연구 (A Study on Consonant/Vowel/Unvoiced Consonant Phonetic Value Segmentation and Recognition of Korean Isolated Word Speech)

  • 이준환;이상범
    • 한국정보처리학회논문지
    • /
    • 제7권6호
    • /
    • pp.1964-1972
    • /
    • 2000
  • For the Korean language, on acoustics, it creates a different form of phonetic value not a phoneme by its own peculiar property. Therefore, the construction of extended recognition system for understanding Korean language should be created with a study of the Korean rule-based system, before it can be used as post-processing of the Korean recognition system. In this paper, text-based Korean rule-based system featuring Korean peculiar vocal sound changing rule is constructed. and based on the text-based phonetic value result of the system constructed, a preliminary phonetic value segmentation border points with non-uniform blocks are extracted in Korean isolated word speech. Through the way of merge and recognition of the non-uniform blocks between the extracted border points, recognition possibility of Korean voice as the form of the phonetic vale has been investigated.

  • PDF

GMM을 이용한 프레임 단위 분류에 의한 우리말 음성의 분할과 인식 (Korean Speech Segmentation and Recognition by Frame Classification via GMM)

  • 권호민;한학용;고시영;허강인
    • 융합신호처리학회 학술대회논문집
    • /
    • 한국신호처리시스템학회 2003년도 하계학술대회 논문집
    • /
    • pp.18-21
    • /
    • 2003
  • In general it has been considered to be the difficult problem that we divide continuous speech into short interval with having identical phoneme quality. In this paper we used Gaussian Mixture Model (GMM) related to probability density to divide speech into phonemes, an initial, medial, and final sound. From them we peformed continuous speech recognition. Decision boundary of phonemes is determined by algorithm with maximum frequency in a short interval. Recognition process is performed by Continuous Hidden Markov Model(CHMM), and we compared it with another phoneme divided by eye-measurement. For the experiments result we confirmed that the method we presented is relatively superior in auto-segmentation in korean speech.

  • PDF

Desktop program production

  • Enami, Kazumasa;Fukui, Kazuo;Yagi, Nobuyuki
    • 한국방송∙미디어공학회:학술대회논문집
    • /
    • 한국방송공학회 1996년도 Proceedings International Workshop on New Video Media Technology
    • /
    • pp.77-81
    • /
    • 1996
  • In order to conform to the needs of effective program production in multimedia era, we are studying Desk Top Program Production system. With the DTPP, users can easily produce multimedia program including video, sound, and ancillary data, and freely handle video images synthesizing video components retrieved from video database. This paper describes the new program production system, DTPP and its key technologies such as cooperative program production via multimedia network, indexing and utilization of attribute information of images, and image segmentation and spatio-temporal editing.

  • PDF

향상된 실시간 음원방향 인지 시스템의 하드웨어 설계 (Hardware Design of Enhanced Real-Time Sound Direction Estimation System)

  • 김태완;김동훈;정연모
    • 한국음향학회지
    • /
    • 제30권3호
    • /
    • pp.115-122
    • /
    • 2011
  • 본 논문에서는 십자 형태로 구성된 네 개의 마이크로폰을 이용하여 일반화된 상호 상관 기법을 적용한 음성 도달시간 지연을 측정하여 정확한 음원 방향을 실시간으로 계산하는 방식에 대해 제시하였다. 기존 시스템에서는 마이크로폰 어레이 신호처리를 위해 데이터 수집 장치를 필요로 하므로 시스템을 내장하기 힘들고, 또한 DSP 프로세서를 사용한 음원방향 인지는 마이크로폰의 채널의 수가 늘어날수록 실시간 처리가 어려워지는 두 가지 단점이 있다. 본 논문에서는 이러한 한계를 극복하기 위하여 마이크로폰 어레이 신호처리를 이용한 향상된 음원방향 인지 하드웨어의 개발을 제안하였다. 공간 구분 기법을 이용한 효율적인 설계 및 검증방식을 제안하였고 이를 통하여 보다 정확한 방향 추정과 설계시간 단축이 가능하다. 최종적으로 음성 코덱과 FPGA를 이용하는 임베디드 시스템을 위해서 사용이 가능한 시스템을 개발하였다. 실험 결과에 의하면 PC 기반이나 DSP 프로세서를 사용한 경우에 비해 보다 빠른 처리 시간을 보였다.

청각장애아동의 음운인식능력에 대한 연구 (Phonological Awareness in Hearing Impaired Children)

  • 박상희;석동일;정옥란
    • 음성과학
    • /
    • 제9권2호
    • /
    • pp.193-202
    • /
    • 2002
  • The purpose of this study is to examine the phonological awareness of hearing impaired children. A number of researches indicate that hearing impaired children have articulation disorders due to their impaired auditory feedback. However, in children who have the ability to distinguish certain phonemes, they sometimes show misarticulation of the phonemes. Phonological awareness refers to recognizing the speech-sound units and their forms in spoken language (Hong, 2001). The subjects who participated in the experiment are composed of four hearing impaired children (3 cochlear implanted children and 1 hearing aided child). Phonological Awareness was evaluated by the test battery developed by Paik et al. (2001). The subtests consisted of rhyme matching, onset matching I II, word initial segmentation and matching I II. If the children asked for retelling, it was retold to a maximum of 4 times. Each item score was 1 point. The results were compared to those of Paik et al. (2001). The results of study were that subject 1 showed superior rhyme matching ability, subjects 2 and 3 fair ability, and subject 4 inferior ability. In onset matching I, all subjects showed inferior ability except for subject 3. Interestingly, subjects 1 showed the lowest onset matching I score. In word initial segmentation and matching I, subjects 1 and 4 showed inferior ability and subjects 2 and 3 showed fair ability. In onset matching II, subject 2 showed the perfect score 10 even though she showed very low score. In word initial segmentation and matching II, only subjects 2 and 3 showed appropriate levels of the skill. The results show that the phonological awareness of hearing impaired children is different from that of normal children.

  • PDF

음성정보 내용분석을 통한 골프 동영상에서의 선수별 이벤트 구간 검색 (Retrieval of Player Event in Golf Videos Using Spoken Content Analysis)

  • 김형국
    • 한국음향학회지
    • /
    • 제28권7호
    • /
    • pp.674-679
    • /
    • 2009
  • 본 논문은 골프 동영상에 포함된 오디오 정보로부터 검출된 이벤트 사운드 구간과 골프 선수이름이 포함된 음성구간을 결합하여 선수별 이벤트 구간을 검색하는 방식을 제안한다. 전체적인 시스템은 동영상으로부터 분할된 오디오 스트림으로부터 잡음제거, 오디오 구간분할, 음성 인식 등의 과정을 통한 자동색인 모듈과 사용자가 텍스트로 입력한 선수 이름을 발음열로 변환하고, 색인된 데이터베이스에서 질의된 선수 이름과 상응하는 음성구간과 연결되는 이벤트 구간을 찾아주는 검색 모듈로 구성된다. 선수이름 검색을 위해서 본 논문에서는 음소 기반, 단어 기반, 단어와 음소를 결합한 하이브리드 방식을 적용한 선수별 이벤트 구간 검색결과를 비교하였다.

악리론으로 본 정음창제와 정음소 분절 알고리즘 (Ortho-phonic Alphabet Creation by the Musical Theory and its Segmental Algorithm)

  • 진용옥;안정근
    • 음성과학
    • /
    • 제8권2호
    • /
    • pp.49-59
    • /
    • 2001
  • The phoneme segmentation is a very difficult problem in speech sound processing because it has found out segmental algorithm in many kinds of allophone and coarticulation's trees. Thus system configuration for the speech recognition and voice retrieval processing has a complex system structure. To solve it, we discuss a possibility of new segmental algorithm, which is called the minus a thirds one or plus in tripartitioning(삼분손익) of twelve temporament(12 율려), first proposed by Prof. T. S. Han. It is close to oriental and western musical theory. He also has suggested a 3 consonant and 3 vowel phonemes in Hunminjungum(훈민정음) invented by the King Sejong in the 15th century. In this paper, we suggest to newly name it as ortho-phonic phoneme(OPP/정음소), which carries the meaning of 'the absoluteness and independency'. OPP also is acceptable to any other languages, for example IPA. Lastly we know that this algorithm is constantly applicable to the global language and is very useful to construct a voice recognition and retrieval structuring engineering.

  • PDF